-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service state tests and fixes #22
base: master
Are you sure you want to change the base?
Conversation
It looks like in 829de29 Updater.updates_cached changed from returning the length of _cache.files() to just that list directly. The corresponding test was not updated though because it looks like it will be skipped unless it's being run as root, which is not the common configuration. My local github runner however runs it as root and was failing on this test.
Users of the Gpio class expect GpioError on exceptional conditions, but if the class didn't find a particular pin, it would raise a KeyError. This changes it to GpioError, so that users don't have to except both KeyError and GpioError.
There are no guarantees which thread __del__() is run in. It could potentially be run in the Service's thread, where it attempts to join() itself and RuntimeErrors, as captured by this exception I witnessed during testing: Exception ignored in: <function Service.__del__ at 0x719dbf61e8c0> Traceback (most recent call last): File "oresat-olaf/olaf/common/service.py", line 48, in __del__ self._thread.join() File "/usr/lib/python3.10/threading.py", line 1093, in join raise RuntimeError("cannot join current thread") RuntimeError: cannot join current thread Furthermore it doesn't really make sense to wait for the thread to have stopped in __del__(). We don't do anything with that information and it doesn't make sense to pause GC until the thread is stopped. Setting the _event seems enough, the thread can die at its leisure.
This takes the Service class through its lifecycle, starting, stopping, and causing errors. Some questions remain about the intended behavior. In most tests it derives a new class that overrides the specific behavior that the test requires, but black mangles the formatting of that so I turn it off for the class creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though I made a mermaid diagram for service state. I guess not.
- They should have different values. But
STOPPING
should always exist betweenRUNNING
andFAILED
/STOPPED
- If the child class has overrode it, it should be used to put the service (and any hardware in a nice state after an error, if possible) and then it should go to the FAILED state with the thread stopped.
- Umm, idk. state mostly just a flag anyways.
- Nope, once the service has reached
STOPPING
,STOPPED
, orFAILED
, it should remand inSTOPPED
orFAILED
. No restarts. Trying to keep things simple. - The service should handle it
- The service should handle it
- Same as 4
The exact meaning of "always exist" is perhaps a bit unclear to me. Because concurrency is weird it might make the most sense to talk about which hooks run under which state. Certainly
That sounds a whole lot like what the stuff in The changes then would look like:
def _loop(self):
self._status = ServiceState.RUNNING
while not self._event.is_set():
try:
self.on_loop()
except:
try:
self.on_loop_error(e)
except:
pass # really logger.exception()
self._event.set()
self._error = True
self._status = ServiceState.STOPPING
try:
self.on_stop()
except:
self._error = True
self._status = ServiceState.FAILED if self._error else ServiceState.STOPPED
def stop(self):
self._status = ServiceState.STOPPING
try:
self.on_stop_before()
except:
self._error = True
self._event.set()
self._thread.join() The join ensures A few additional ideas I had while writing this:
|
This fixes confusion between the two states: when printing it would always show STOPPING even explicitly set to FAILING, and when checking for FAILING it would be true when set to just STOPPING. This additionally changes the type from IntEnum to just Enum. IntEnums are intended for when the members should also be used interchangeably with normal ints. Arithmetic, ordering comparisons, and the like. It doesn't really make sense to use the int properties for ServiceState except as a way to distinguish them so we shouldn't advertise the option that it could be done. Finally @unique is a fun decorator to prevent the kind of error that occurred here. It ensures that each member has a unique value.
Yea, we could add a
It is kinda apart of the stopping process. That does make sense tho.
It should problaby go into
Yeah, that makes sense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good.
I'm am surprised those asserts in the updater unit-tests passed previously.
There's a handful of minor fixes in here, but the bulk of it is unit tests for
Service
. Before this gets merged though I'd like to pin down some intended behaviors ofService
, which will then need another couple commits on this branch.ServiceState.STOPPING
andServiceState.FAILED
supposed to have the same value (currently 3)? Trying to printService.state
will always showSTOPPING
, even if the state is explicitly set toFAILED
which confused me while debugging for a bit. On the other hand the only wayFAILED
is set is ifService.stop()
is explicitly called with_error
set. Until that point theService
will beSTOPPING
. On the third hand, the only user ofFAILED
(the watchdog thread in oresat-c3-software) seems to depend on the behavior thatFAILED
andSTOPPING
are the same. If the two states are made different values, the watchdog won't ever notice a service that raised an unhandled exception inon_loop()
.on_loop_error()
? The current behavior is to crash theService
thread but otherwise leave it in stateRUNNING
.Service
supposed to be in stateSTOPPING
untilService.stop()
is called from the outside? For example if inon_loop()
,service.cancel()
is called, the thread will stop but it will remain only inSTOPPING
, not transitioning toSTOPPED
until outside action is taken.start()
on aSTOPPING
service to restart it?start()
is called twice on the sameService
?stop()
is called twice on the sameService
?start()
is called afterstop()
?