Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mLab failures are non-graceful, need better error handling #4188

Closed
unsoluble opened this issue Jan 5, 2019 · 9 comments
Closed

mLab failures are non-graceful, need better error handling #4188

unsoluble opened this issue Jan 5, 2019 · 9 comments

Comments

@unsoluble
Copy link
Contributor

When a user's free-tier mLab storage becomes full (and possibly on other mLab-related failures), the Nightscout frontend does not currently handle this gracefully — typical symptom is a repeated request for the user's API_SECRET.

We should investigate the error handling here, and ideally present an actionable suggestion to the user (along the lines of "it looks like your mLab is full, here's how to empty it"). Or at the very least fail more silently.

@unsoluble
Copy link
Contributor Author

(I personally don't have any familiarity at all with the db interface code, or even where all the interconnected parts are located. Would be great to have some expert eyes on this to get the ball rolling at least.)

@PieterGit PieterGit added this to the 0.12.0 milestone Jan 5, 2019
@PieterGit
Copy link
Contributor

@unsoluble thanks for creating an issue. I don't have a full mlab db so can't test this.
For pointers in the code I think we should check this PR:
#4004
of @jpcunningh. I wonder if the current behavior is still a repeated request for the user's API_SECRET.
Starting from 0.11-dev the user should see at least more info in the console log (I hope).

@unsoluble
Copy link
Contributor Author

Oh hey, I didn't notice #4004 there — maybe this is moot? I guess we'll see once 0.11 rolls out and gets used in the wild, but if I'm reading those changes correctly we might not have this problem any more.

@jpcunningh
Copy link
Collaborator

A key need to help updating the code to be more resilient against database full errors is to obtain the Nightscout console output of events when it crashes due to a full database.

I know when it happens, I'm focused on getting Nightscout operational again rather than troubleshooting. It's amazing how much we depend on something we didn't know anything about a year ago! If anybody is willing to help by providing log contents, it's helpful to have already gone through the steps to get to the console output before the event happens. On Heroku, you can view the console output by installing the free PaperTrail add-on in the Heroku dashboard. You can also get to the output on Heroku using the Heroku command line interface (cli). The Heroku cli is more difficult to get setup on a computer, though.

Since #4004, Nightscout hasn't crash on us or asked for an API_SECRET when the database fills up, but that doesn't mean there aren't other paths that could cause a crash we just haven't hit, yet. 😄

I have on my list of things to do at some point to create a test application to fill up each collection individually to test how Nightscout responds to the resulting database errors. My goal is to implement the items below over time.

  • keep Nightscout operational so the user can use the admin tools to free up space
  • generate a meaningful alarm so the user doesn't have to figure out the database is full
  • add the Mongo command to compress the database after the admin tools deletes records

@sulkaharo
Copy link
Member

After having debugged a few crashed instances, looks like part of the problem is that most Nightscout users have an old release that also doesn’t handle Mongo connection breaking. So figuring out how to get users to update would also be good.

@jpcunningh
Copy link
Collaborator

I successfully tested filling the entries collection until inserts failed due to insufficient space this evening. Nightscout handled it gracefully.

From what I can tell, devicestatus and entries seem to fail gracefully with the current dev branch. I still have to test treatments.

@PieterGit
Copy link
Contributor

@danamlewis and me pointed @Dave9111 to some documentation on these problems. Gitter discussion starts here: https://gitter.im/nightscout/cgm-remote-monitor?at=5c324b145ec8fe5a85100157
I think it would also help to improve the user documentation for this so that we can refer people with Nightscout 0.11+ to a good documentation page. I created nightscout/documentation#2 for that, and I'm hoping that a new user like @Dave9111 (or somebody else) can help make documentation improvements for this.

@jpcunningh
Copy link
Collaborator

I tested NS with a full treatments collection today. It failed gracefully with the current dev branch. For the entries, treatments, and devicestatus collections, I've tested a full database and used the admin tools to delete old data to free up space.

@zencuke
Copy link

zencuke commented Nov 10, 2019

Regardless of the status of this issue it might be user friendly to add a task that detects how close to full the database is and issues a warning when the avaiable space falls below a threshold so the user can try to fix it before it runs out. A graceful fail is always valuable, especially to developers, but preventing the fail in the first place would be even better.

Maybe even two thresholds, and two warnings, one a week away and the other a 24 hour warning. Or if it is too hard to predict then maybe a 90% full warning followed by a 99% full warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants