feat: Refactor repositories download contents by stevehipwell · Pull Request #4153 · google/go-github

stevehipwell · 2026-04-14T16:00:26Z

This PR refactors the behaviour of DownloadContents & DownloadContentsWithMeta with the former now being a direct passthrough to the latter as the only difference was the signature. The code has been refactored to use the API directly instead of via an unnecessary layer of indirection.

I've added an OpenAPI update to this PR as it proves that the updated code works against GitHub.

This change is required for #4151.

codecov · 2026-04-14T18:05:11Z

Codecov Report

❌ Patch coverage is 28.84615% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.68%. Comparing base (1d6a852) to head (52d75be).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
example/contents/main.go	0.00%	34 Missing ⚠️
github/repos_contents.go	83.33%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4153      +/-   ##
==========================================
- Coverage   93.83%   93.68%   -0.15%     
==========================================
  Files         209      210       +1     
  Lines       19685    19695      +10     
==========================================
- Hits        18472    18452      -20     
- Misses       1015     1047      +32     
+ Partials      198      196       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

stevehipwell · 2026-04-15T16:49:45Z

@gmlewis can we get this merged?

gmlewis

I'm quite concerned about this PR because it appears to me that the behavior of following redirects has been deleted and there are many unit tests that have also simply been deleted without comment or explanation. One of the great things about unit tests is that when major refactors are performed like this one, if the unit tests are left alone we can easily detect regressions. As it is in this PR, however, where a major refactor happens and unit tests are also heavily refactored and/or deleted, it is hard to tell what is actually happening.

Can this be broken down into 3 PRs?

Update the openapi_operations.yaml file - I'll do that myself momentarily.
Refactor the download methods without modifying unit tests
Refactor and/or delete unit tests

stevehipwell · 2026-04-15T19:12:40Z

@gmlewis let me take a look, but the main problem here is that the tests appear to be tightly coupled to the implementation with mocks designed to make the test pass rather than to mirror the actual API. I'll add the deleted tests back, but the mocks will need to be refactored to add the schema required download_url to the content payload.

On a slight tangent, shouldn't the mock payloads be validated against the schema?

gmlewis · 2026-04-15T19:15:39Z

On a slight tangent, shouldn't the mock payloads be validated against the schema?

Yes, they probably should. I don't remember when GitHub v3 API docs started sharing schemas for endpoints, but it is possible that these were written prior to that.

I think my biggest concern is following redirects because I remember a bunch of issues devoted solely to this topic, and to my shock and disappointment, I don't see any of the unit tests actually testing out following redirects and I could have sworn that it took a good deal of effort to get those unit tests to pass at one point. :-(

stevehipwell · 2026-04-16T10:05:14Z

I think my biggest concern is following redirects because I remember a bunch of issues devoted solely to this topic, and to my shock and disappointment, I don't see any of the unit tests actually testing out following redirects and I could have sworn that it took a good deal of effort to get those unit tests to pass at one point. :-(

@gmlewis there are no redirects in the removed tests. The old code pattern was just ignoring the presence of download_url on the file content response and making an unnecessary call to the same get content endpoint for the parent directory (if the content wasn't returned inline). I've just removed this unnecessary step, if the content isn't returned inline we still use exactly the same pattern for fetching it from the download link.

FYI the following example snippet will error using the current code but pass with the updated code as the last file requested is at an index greater than 1000 and has a size of greater than 1mb so won't have returned content.

package main

import (
	"context"
	"fmt"
	"io"
	"os"

	"github.com/google/go-github/v84/github"
)

// downloadContents downloads the contents of a file in a repository and returns it as a byte slice.
func downloadContents(ctx context.Context, client *github.Client, owner, repo, path, ref string) ([]byte, error) {
	rc, _, err := client.Repositories.DownloadContents(ctx, owner, repo, path, &github.RepositoryContentGetOptions{Ref: ref})
	if err != nil {
		return nil, err
	}
	defer rc.Close()

	by, err := io.ReadAll(rc)
	if err != nil {
		return nil, err
	}

	fmt.Printf("Downloaded %v/%v/%v as %d bytes\n", owner, repo, path, len(by))
	return by, nil
}

func main() {
	client := github.NewClient(nil)

	t := []struct {
		owner string
		repo  string
		path  string
		ref   string
	}{
		{"google", "go-github", "README.md", "master"},
		{"github", "rest-api-description", "descriptions/api.github.com/api.github.com.2026-03-10.yaml", "main"},
		{"ScoopInstaller", "Main", "bucket/yq.json", "master"},
		{"stevehipwell", "scoop-main-bucket", "bucket/zzztest.bin", "test-content"},
	}

	for _, v := range t {
		if _, err := downloadContents(context.Background(), client, v.owner, v.repo, v.path, v.ref); err != nil {
			fmt.Printf("Error: %v\n", err)
			os.Exit(1)
		}
	}
}

stevehipwell · 2026-04-16T11:01:27Z

@gmlewis I've added back the removed tests and undone some of the cosmetic changes to make the diff clearer that none of the actual tests have changed (it's only the mocks). I haven't rebased to fix the conflict yet in case you want to look at anything first?

gmlewis · 2026-04-16T12:29:24Z

@gmlewis I've added back the removed tests and undone some of the cosmetic changes to make the diff clearer that none of the actual tests have changed (it's only the mocks). I haven't rebased to fix the conflict yet in case you want to look at anything first?

Thank you, @stevehipwell!
This looks great. Yes, let's proceed with this PR and get this in so you can continue to make progress on the context overhaul.

Signed-off-by: Steve Hipwell <steve.hipwell@gmail.com>

stevehipwell · 2026-04-16T12:43:28Z

@gmlewis I've rebased this and it should be good to go.

@alexandear I've updated the example to be closer to the other patterns and to have a valid comment.

gmlewis · 2026-04-16T13:22:27Z

+		return nil, fileContent, resp, err
 	}

-	for _, contents := range dirContents {


After closer inspection, the docs here:
https://docs.github.com/en/rest/repos/contents?apiVersion=2022-11-28#get-repository-content
say that contents from a repo directory can be downloaded with this endpoint.

Before, I said I was concerned about losing the functionality of following redirects, specifically in these lines 204-220. However, this code is not following redirects, it is downloading the contents of a directory.

Are we losing that capability in this PR?

I'm wondering why there are no unit tests that exercise the ability to download the contents from a repo directory?

I'm not sure that I follow your concern, the code in lines 204-220 is only triggered when a file is larger than 1 mb or the input is invalid (a dir not a file). For files larger than 1mb the updated code uses the download link already returned instead of making an additional API call and iterating through all of the dir files. For invalid input the updated code errors early while this code runs all the way to the end and errors.

I can add a test to show this behaviour? As there wasn't already a test and you asked for the tests to be aligned I didn't add one when I spotted that it was missing earlier.

I've added tests for calling a directory to show that it errors.

If I'm reading https://docs.github.com/en/rest/repos/contents?apiVersion=2022-11-28#get-repository-content correctly, the provided link can point to a repo directory and ALL the contents of that directory will be downloaded. Am I reading that wrong? Or are you saying that even though the docs claim this feature, it doesn't actually work?

I don't have time to investigate this myself at the moment, so any insight you can provide would help tremendously.

I've investigated and when calling the endpoint on a directory any child directories we get back have an empty download link. If you can download a whole directory you probably need to use the raw content type.

Also I don't see that in the description, where are you seeing it?

Some context:

Add DownloadContentsWithMeta to receive RepositoryContent #1810

Improve DownloadContents and DownloadContentsWithMeta methods #3573

Also the unit tests that are modified in this PR show that a list could historically be returned which represented the names of items within a directory.

When I'm off my phone I'll look at the official docs again and quote the part that I'm concerned about.

The API will return a list if you ask for the parent dir contents, my point here is that it's unnecessary.

The first PR you link above just copies the download function and also returns the metadata. The second PR adds a check for the content in the initial API call.

AFAIK the content API has always returned the download URL for a file, so the dir call and loop has always been unnecessary. Remember both calls are going to the same API and I can't believe that even GitHub would skip the download URL in the specific response and make you make second call that's also limited on response.

The API will return a list if you ask for the parent dir contents, my point here is that it's unnecessary.

The first PR you link above just copies the download function and also returns the metadata. The second PR adds a check for the content in the initial API call.

AFAIK the content API has always returned the download URL for a file, so the dir call and loop has always been unnecessary. Remember both calls are going to the same API and I can't believe that even GitHub would skip the download URL in the specific response and make you make second call that's also limited on response.

OK, I'm trying to write an example that lists the contents in a directory, and I'm not getting it to work.

Here are the paragraphs that concern me:

Gets the contents of a file or directory in a repository. Specify the file path or directory with the path parameter. If you omit the path parameter, you will receive the contents of the repository's root directory.

application/vnd.github.object+json: Returns the contents in a consistent object format regardless of the content type. For example, instead of an array of objects for a directory, the response will be an object with an entries attribute containing the array of objects.

If the content is a directory, the response will be an array of objects, one object for each item in the directory. When listing the contents of a directory, submodules have their "type" specified as "file". Logically, the value should be "submodule". This behavior exists for backwards compatibility purposes. In the next major version of the API, the type will be returned as "submodule".

Before we rip out functionality that someone might miss, though, I would like another set of eyes on this.

@alexandear - what are your thoughts about ripping out the for loops that are being removed in this PR?
Will anyone miss them?

If I'm reading @stevehipwell's arguments correctly, he is saying that they never actually did anything, although we have a hint of proof that at one point they did something because he had to remove parts of the unit tests (that contained objects with arrays) to get tests to pass... so that is another one of my concerns.

@gmlewis I'm not saying they didn't do anything, I'm saying the implementation was inefficient and unnecessary. The mocks needed updating because they were implemented to make the tests pass.

So from first principals; the new mocks actually match the API schema and the new code functions correctly and mirrors the behaviour of the old code. The only difference in functionality is the new code doesn't fail when getting the content from a file that's larger than 1mb and at an index of greater than 1000 in its directory.

@gmlewis I'm not saying they didn't do anything, I'm saying the implementation was inefficient and unnecessary. The mocks needed updating because they were implemented to make the tests pass.

So from first principals; the new mocks actually match the API schema and the new code functions correctly and mirrors the behaviour of the old code. The only difference in functionality is the new code doesn't fail when getting the content from a file that's larger than 1mb and at an index of greater than 1000 in its directory.

OK, thank you, @stevehipwell. Sounds good to me.
I know @alexandear already approved, but let's please just wait for one more confirmation before merging.
Thank you for your patience with me! I appreciate it.

Signed-off-by: Steve Hipwell <steve.hipwell@gmail.com>

alexandear approved these changes Apr 15, 2026

View reviewed changes

Comment thread tools/metadata/main_test.go Outdated

gmlewis requested changes Apr 15, 2026

View reviewed changes

Comment thread openapi_operations.yaml Outdated

gmlewis mentioned this pull request Apr 15, 2026

chore: Update openapi_operations.yaml #4157

Merged

alexandear reviewed Apr 16, 2026

View reviewed changes

Comment thread example/contents/main.go Outdated

stevehipwell added 6 commits April 16, 2026 13:40

feat: Refactor repositories download contents

e66fe34

Signed-off-by: Steve Hipwell <steve.hipwell@gmail.com>

fixup! feat: Refactor repositories download contents

55c3ae6

fixup! feat: Refactor repositories download contents

c0dca2a

fixup! feat: Refactor repositories download contents

ba15a1c

fixup! feat: Refactor repositories download contents

7b15366

Signed-off-by: Steve Hipwell <steve.hipwell@gmail.com>

fixup! feat: Refactor repositories download contents

d25a1ae

stevehipwell force-pushed the fix-download-contents branch from 19bc2aa to d25a1ae Compare April 16, 2026 12:41

gmlewis reviewed Apr 16, 2026

View reviewed changes

fixup! feat: Refactor repositories download contents

52d75be

Signed-off-by: Steve Hipwell <steve.hipwell@gmail.com>

stevehipwell force-pushed the fix-download-contents branch from 280f864 to 52d75be Compare April 16, 2026 15:21

Conversation

stevehipwell commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

stevehipwell commented Apr 15, 2026

Uh oh!

gmlewis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stevehipwell commented Apr 15, 2026

Uh oh!

gmlewis commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevehipwell commented Apr 16, 2026

Uh oh!

stevehipwell commented Apr 16, 2026

Uh oh!

Uh oh!

gmlewis commented Apr 16, 2026

Uh oh!

stevehipwell commented Apr 16, 2026

Uh oh!

gmlewis Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stevehipwell commented Apr 14, 2026 •

edited

Loading

codecov bot commented Apr 14, 2026 •

edited

Loading

gmlewis commented Apr 15, 2026 •

edited

Loading

gmlewis Apr 16, 2026 •

edited

Loading