Describe the bug
As currently implemented, the rate limit system does two significant things:
- Causes rate-limit events without user action; it is trivial for the software and service to rate-limit itself. /fleet, background agents, or even an individual agent running multiple commands too quickly,
- Effectively obliterates work the user has paid for where a sub-agent was doing non-trivial or highly destructive work,
When a non-trivial sub-agent encounters a rate limit, the software appears to simply fail the agent/context, causing the parent agent to kick in, and often fail too.
In one case, a sub-agent had worked for nearly 2 hours on a single large file only to corrupt it by making a one-line change with no newline, then decided to rewrite the file. It tried to do it with a here-document command but messed up the escaping, so then it tried to do it again using the write-file tool but checking on the file after every few lines.
Right about here I asked the idling parent agent how long it would take. It told me the sub-agent was preparing the final commit, and that's when we hit the rate limit. I waited the minute it suggested and asked the agent to resume, but the background task was gone, and the parent agent couldn't tell what the status was. So it began doing its own evaluation of where the worker had gotten to, spending nearly 30 minutes re-reading code, logs, etc, and hitting two compaction events.
At that point my only option was a rewind, but that undid everything the sub-agent had spent 2 hours working on, there was no checkpoint at/near the rate limit event.
While I recognize that MS Legal will wish to argue that users are solely paying for the generation of random tokens, your consolidation of all things AI into the Copilot branding means that ship has sailed, sunk, been discovered by Bob Ballard, looted, raised, sank on the way back to dock, and the Mk II is under construction by an Australian billionaire.
I can't even tell for sure if the agents see the rate limit message. In previous versions it seemed like they did, but the wording ([emphasis mine] "Sorry, you've hit a rate limit that restricts the number of Copilot model requests you can make within a specific time period. Please try again in 1 minute. Please review our Terms of Service (...)") actively seems to make GPT think the user is doing something nefarious.
Affected version
GitHub Copilot CLI 1.0.25.
Steps to reproduce the behavior
Use /fleet,
Give agent a task that may run multiple parallel sub-agents,
Give agent a task where the model predicts many micro-operations instead of individual larger ones (GPT 5.4 reading 200 small files individually with shell commands rather than using the read-file tool),
Agent triggers multiple sub-agents in parallel rather than serially,
Expected behavior
- Internal self-rate limiting to avoid hitting the limits in the first place,
- Automatic back-off/retry in the client with user-facing surfacing of the issue,
- sub-agent resume/checkpoint,
Additional context
No response
Describe the bug
As currently implemented, the rate limit system does two significant things:
When a non-trivial sub-agent encounters a rate limit, the software appears to simply fail the agent/context, causing the parent agent to kick in, and often fail too.
In one case, a sub-agent had worked for nearly 2 hours on a single large file only to corrupt it by making a one-line change with no newline, then decided to rewrite the file. It tried to do it with a here-document command but messed up the escaping, so then it tried to do it again using the write-file tool but checking on the file after every few lines.
Right about here I asked the idling parent agent how long it would take. It told me the sub-agent was preparing the final commit, and that's when we hit the rate limit. I waited the minute it suggested and asked the agent to resume, but the background task was gone, and the parent agent couldn't tell what the status was. So it began doing its own evaluation of where the worker had gotten to, spending nearly 30 minutes re-reading code, logs, etc, and hitting two compaction events.
At that point my only option was a rewind, but that undid everything the sub-agent had spent 2 hours working on, there was no checkpoint at/near the rate limit event.
While I recognize that MS Legal will wish to argue that users are solely paying for the generation of random tokens, your consolidation of all things AI into the Copilot branding means that ship has sailed, sunk, been discovered by Bob Ballard, looted, raised, sank on the way back to dock, and the Mk II is under construction by an Australian billionaire.
I can't even tell for sure if the agents see the rate limit message. In previous versions it seemed like they did, but the wording ([emphasis mine] "Sorry, you've hit a rate limit that restricts the number of Copilot model requests you can make within a specific time period. Please try again in 1 minute. Please review our Terms of Service (...)") actively seems to make GPT think the user is doing something nefarious.
Affected version
GitHub Copilot CLI 1.0.25.
Steps to reproduce the behavior
Use /fleet,
Give agent a task that may run multiple parallel sub-agents,
Give agent a task where the model predicts many micro-operations instead of individual larger ones (GPT 5.4 reading 200 small files individually with shell commands rather than using the read-file tool),
Agent triggers multiple sub-agents in parallel rather than serially,
Expected behavior
Additional context
No response