hiffy: nicer ringbufs and counters#2531
Draft
hawkw wants to merge 3 commits into
Draft
Conversation
Issue #2528 describes a bug in Hiffy introduced by the change to make Hiffy callable over the management network directly, rather than through `udprpc` (PR #2466). The bug results in Hiffy calls via a debug probe timing out while the system is connected to a management network. As I described in [this comment][1], @jamesmunns and I determined that this occurs due to a combination of two factors: First, the top of the `main()` loop in `task-hiffy` always begins by unconditionally advancing the deadline by `sleep_ms` and calling `sys_set_timer()` with the new deadline. Second, the check for whether `HIFFY_KICK` was set by the probe is *conditional on whether the timer notification has fired*. If hiffy was only notified by the timer, this would be fine; the previous timeout has elapsed so we should advance the timer. However, if Hiffy has been notified by the `net` task's `SOCKET` notification, we will set the timer forwards, but we will _not_ check `HIFFY_KICK`, meaning any kick delivered by the debug probe is missed. While debugging, we noticed that a large number of seemingly spurious notifications were being posted to hiffy by the `net` task. When the management network is connected, these spurious notifications are posted more frequently than the timer, meaning that the timer *never* completes, and `HIFFY_KICK` is never checked. This commit fixes the bug by changing `task-hiffy`'s main loop in two main ways. First, we _always_ check the value of `HIFFY_KICK`, regardless of which notification woke us. This change alone is sufficient to fix the bug, as I demonstrated in [this comment][2]. This probably also reduces the latency of the `net` Hiffy RPC, as it kicks execution by _also_ setting `HIFFY_KICK`, and then waiting for the timer to fire to start execution. This way, we no longer need to wait for the timer. In addition, I have also changed the `main()` loop so that the timer is only reset at the end of the loop, if it has actually fired. This way, no matter how often the `net` task notifies us, the timer will always fire before being reset. I believe that either of these changes would be sufficient to fix the bug, but fixing both of them makes the behavior closer to what we would expect. To demonstrate that it works, here's me flashing `mb-1-sp` (the system on which the bug was first encountered) with an image from this branch, and calling Hiffy both via the probe (using `humility net ip`, which uses Hiffy), and then over the network. Note that they both work fine now. Yay! ```console eliza@jeeves ~ $ export HUMILITY_ARCHIVE="$HOME/build-cosmo-b-dev-image-default.hiffy-kickme.zip" eliza@jeeves ~ $ pfexec /staff/eliza/humility -t mb-1-sp flash humility: WARNING: archive in environment variable overriding archive in environment file humility: attached to archive humility: attaching with chip set to "STM32H753ZITx" humility: attached to 1fc9:0143:ISS1EG0OBG2HT via CMSIS-DAP humility: flash/archive mismatch; reflashing humility: skipping measurement token handoff humility: auxiliary flash data is already loaded in slot 0; skipping programming humility: done with auxiliary flash humility: flashing done eliza@jeeves ~ $ pfexec /staff/eliza/humility -t mb-1-sp net ip humility: WARNING: archive in environment variable overriding archive in environment file humility: attached to 1fc9:0143:ISS1EG0OBG2HT via CMSIS-DAP MAC address: a8:40:25:04:11:c5 IPv6 address: fe80::aa40:25ff:fe04:11c5 eliza@jeeves ~ $ /staff/eliza/humility --ip fe80::aa40:25ff:fe04:11c5%e1000g0 hiffy -c Sequencer.get_state humility: connecting to fe80::aa40:25ff:fe04:11c5%2 Sequencer.get_state() => A0PlusHP eliza@jeeves ~ $ ``` This fixes #2528. Note that the spurious notifications from the `net` task are *not* fixed by this change, but Hiffy can now behave directly despite them. We need to determine why we're getting so many notifications while the recv queue is empty as well, but this commit fixes the bug. I'll open a new ticket for investigating the spurious notifications. [1]: #2528 (comment) [2]: #2528 (comment) The root cause of the spurious notifications hasn't been determined yet; we need to investigate this separately. However, in the meantime
As @jamesmunns pointed out in #2530 (comment)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.