fix: prevent intermittent digest-mismatch errors during archive updates#296
fix: prevent intermittent digest-mismatch errors during archive updates#296gruyaume wants to merge 7 commits into
Conversation
Signed-off-by: Guillaume Belanger <guillaume.belanger27@gmail.com>
|
Hey @gruyaume, thanks for your PR! Is this ready for review? |
Yes. If you find the code changes mostly up to you standards, I am happy to address minor issues. However if there are major issues and you have a completely different idea of the solution, please go ahead and implement it your way. I only want the issue fixed as soon as possible, I don't care about getting the attribution. |
Thanks! I explored another approach in parallel but I ended up circling back to yours so there is no point just having another PR for the sake of it. I will review yours then, because I expect only minor changes will be requested. |
Signed-off-by: Guillaume Belanger <guillaume.belanger27@gmail.com>
Signed-off-by: Guillaume Belanger <guillaume.belanger27@gmail.com>
|
|
||
| body := resp.Body | ||
| if strings.HasSuffix(path, ".gz") { | ||
| if flags&fetchGzip != 0 { |
There was a problem hiding this comment.
This is a behavioral change. Has anyone double checked that we don't have any other cases that were handled before and are not handled now?
There was a problem hiding this comment.
Yes I checked. We only had one call site with a path ending with .gz, in fetchIndex, which is reworked in this PR to use the fetchGzip flag.
| // are content-addressed and so are immune to the inconsistent view of | ||
| // InRelease vs Packages.gz that mirrors and CDNs can serve while a | ||
| // publication is propagating. See https://wiki.ubuntu.com/AptByHash. | ||
| gzPath := packagesPath + ".gz" |
There was a problem hiding this comment.
This whole function needs to have its variable names reviewed and polished, as it's becoming unreadable. We have digests, digest, gzPath, gzDigest, byHashPath, etc. These things all talk about their types and suffixes, and not about what they are. And they are not the same thing.
/cc @upils
| func (s *httpSuite) sawByHashRequest() bool { | ||
| for _, req := range s.requests { | ||
| if strings.Contains(req.URL.Path, "/by-hash/SHA256/") { | ||
| return true |
There was a problem hiding this comment.
If the archive only has the right file in the right place, the only way for us to get the right content is to use that file. I might be missing something, but it looks like at the moment all we are looking at is a request being made inside a directory, possibly in the wrong place considering we fallback.
| } | ||
| if r.AcquireByHash && itemPath != r.Path() && !skipByHash[itemPath] { | ||
| byHashPath := path.Join(prefix, "dists", r.Suite, path.Dir(itemPath), "by-hash", "SHA256", makeSha256(itemContent)) | ||
| content[byHashPath] = itemContent |
There was a problem hiding this comment.
This used to be a very clean and simple function, and it'd be nice to keep it that way. From these options, having a ByHash bool (not "Acquire", that's on the fetch side) seems in line with the type semantics. The other options look like custom hacks that require reading both the test and the implementation to be made sense of, so we need to think a bit about how to represent these while keeping that clean and simple aspect.
/cc @upils
There was a problem hiding this comment.
This function is now simpler. The remaining additional complexity is due to adding the byHash path.
Tests are now responsible for tweaking the responses to simulate a bad mirror.
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
|
@gruyaume would you mind giving me permissions on your branch in your fork so I can push commits to it? I do not have the "maintainer" role on the main repo so I do not have this permissions by default. I could create another branch in my fork but I would like to keep the history of this PR. |
Sure, I just added you. |
|
|
||
| func (index *ubuntuIndex) fetchIndex() error { | ||
| digests := index.release.Get("SHA256") | ||
| releaseDigests := index.release.Get("SHA256") |
There was a problem hiding this comment.
[Note to reviewer]: I did some renaming to hopefully clarify what these variables are.
| base string | ||
| request *http.Request | ||
| requests []*http.Request | ||
| requestResults []requestResult |
There was a problem hiding this comment.
[Note to reviewer]: This line is the only new one in this struct. It enables recording submitted requests and associated status. The additional tests were then reworked to properly check when the "by-hash" URL was used and what was the result.
I expect this additional capability will be useful in the future to test other fallback mechanisms.
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Chisel intermittently fails with
expected digest X, got Ywhen Ubuntu's archive is mid-publication and InRelease is temporarily inconsistent withPackages.gzat the named path (a window we observed lasting at least 39 minutes against the same publication in real CI runs for Ella Core). Fetch the package index by content hash using apt'sAcquire-By-Hash, which Ubuntu archives have advertised since 16.04 (2016), so the URL itself encodes the expected bytes; falls back to the named path when an archive doesn't advertise the feature or a specific hash has been garbage-collected.Fixes #295
Reference