mLab failures are non-graceful, need better error handling #4188

unsoluble · 2019-01-05T18:19:57Z

When a user's free-tier mLab storage becomes full (and possibly on other mLab-related failures), the Nightscout frontend does not currently handle this gracefully — typical symptom is a repeated request for the user's API_SECRET.

We should investigate the error handling here, and ideally present an actionable suggestion to the user (along the lines of "it looks like your mLab is full, here's how to empty it"). Or at the very least fail more silently.

unsoluble · 2019-01-05T18:21:12Z

(I personally don't have any familiarity at all with the db interface code, or even where all the interconnected parts are located. Would be great to have some expert eyes on this to get the ball rolling at least.)

PieterGit · 2019-01-05T19:23:49Z

@unsoluble thanks for creating an issue. I don't have a full mlab db so can't test this.
For pointers in the code I think we should check this PR:
#4004
of @jpcunningh. I wonder if the current behavior is still a repeated request for the user's API_SECRET.
Starting from 0.11-dev the user should see at least more info in the console log (I hope).

unsoluble · 2019-01-05T20:16:47Z

Oh hey, I didn't notice #4004 there — maybe this is moot? I guess we'll see once 0.11 rolls out and gets used in the wild, but if I'm reading those changes correctly we might not have this problem any more.

jpcunningh · 2019-01-05T21:16:17Z

A key need to help updating the code to be more resilient against database full errors is to obtain the Nightscout console output of events when it crashes due to a full database.

I know when it happens, I'm focused on getting Nightscout operational again rather than troubleshooting. It's amazing how much we depend on something we didn't know anything about a year ago! If anybody is willing to help by providing log contents, it's helpful to have already gone through the steps to get to the console output before the event happens. On Heroku, you can view the console output by installing the free PaperTrail add-on in the Heroku dashboard. You can also get to the output on Heroku using the Heroku command line interface (cli). The Heroku cli is more difficult to get setup on a computer, though.

Since #4004, Nightscout hasn't crash on us or asked for an API_SECRET when the database fills up, but that doesn't mean there aren't other paths that could cause a crash we just haven't hit, yet. 😄

I have on my list of things to do at some point to create a test application to fill up each collection individually to test how Nightscout responds to the resulting database errors. My goal is to implement the items below over time.

keep Nightscout operational so the user can use the admin tools to free up space
generate a meaningful alarm so the user doesn't have to figure out the database is full
add the Mongo command to compress the database after the admin tools deletes records

sulkaharo · 2019-01-05T21:21:26Z

After having debugged a few crashed instances, looks like part of the problem is that most Nightscout users have an old release that also doesn’t handle Mongo connection breaking. So figuring out how to get users to update would also be good.

jpcunningh · 2019-01-07T04:02:18Z

I successfully tested filling the entries collection until inserts failed due to insufficient space this evening. Nightscout handled it gracefully.

From what I can tell, devicestatus and entries seem to fail gracefully with the current dev branch. I still have to test treatments.

PieterGit · 2019-01-07T07:40:08Z

@danamlewis and me pointed @Dave9111 to some documentation on these problems. Gitter discussion starts here: https://gitter.im/nightscout/cgm-remote-monitor?at=5c324b145ec8fe5a85100157
I think it would also help to improve the user documentation for this so that we can refer people with Nightscout 0.11+ to a good documentation page. I created nightscout/documentation#2 for that, and I'm hoping that a new user like @Dave9111 (or somebody else) can help make documentation improvements for this.

jpcunningh · 2019-01-12T17:22:44Z

I tested NS with a full treatments collection today. It failed gracefully with the current dev branch. For the entries, treatments, and devicestatus collections, I've tested a full database and used the admin tools to delete old data to free up space.

zencuke · 2019-11-10T03:33:50Z

Regardless of the status of this issue it might be user friendly to add a task that detects how close to full the database is and issues a warning when the avaiable space falls below a threshold so the user can try to fix it before it runs out. A graceful fail is always valuable, especially to developers, but preventing the fail in the first place would be even better.

Maybe even two thresholds, and two warnings, one a week away and the other a 24 hour warning. Or if it is too hard to predict then maybe a 90% full warning followed by a 99% full warning.

PieterGit added this to the 0.12.0 milestone Jan 5, 2019

PieterGit added the enhancement label Jan 5, 2019

sulkaharo added the closed-due-inactivity label Dec 14, 2019

TwistaTim closed this as completed Feb 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mLab failures are non-graceful, need better error handling #4188

mLab failures are non-graceful, need better error handling #4188

unsoluble commented Jan 5, 2019

unsoluble commented Jan 5, 2019

PieterGit commented Jan 5, 2019

unsoluble commented Jan 5, 2019

jpcunningh commented Jan 5, 2019

sulkaharo commented Jan 5, 2019

jpcunningh commented Jan 7, 2019

PieterGit commented Jan 7, 2019

jpcunningh commented Jan 12, 2019

zencuke commented Nov 10, 2019 •

edited

Loading

mLab failures are non-graceful, need better error handling #4188

mLab failures are non-graceful, need better error handling #4188

Comments

unsoluble commented Jan 5, 2019

unsoluble commented Jan 5, 2019

PieterGit commented Jan 5, 2019

unsoluble commented Jan 5, 2019

jpcunningh commented Jan 5, 2019

sulkaharo commented Jan 5, 2019

jpcunningh commented Jan 7, 2019

PieterGit commented Jan 7, 2019

jpcunningh commented Jan 12, 2019

zencuke commented Nov 10, 2019 • edited Loading

zencuke commented Nov 10, 2019 •

edited

Loading