Skip to content

Commit f23ac77

Browse files
pks-tgitster
authored andcommitted
commit: avoid parsing non-commits in lookup_commit_reference_gently()
The function `lookup_commit_reference_gently()` can be used to look up a committish by object ID. As such, the function knows to peel for example tag objects so that we eventually end up with the commit. The function is used quite a lot throughout our tree. One such user is "shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this function is to figure out whether a shallow push is missing any objects that are required to satisfy the ref updates, and if so, which of the ref updates is missing objects. This is done by painting the tree with `UNINTERESTING`. We start painting by calling `refs_for_each_ref()` so that we can mark all existing referenced objects as the boundary of objects that we already have, and which are supposed to be fully connected. The reference tips are then parsed via `lookup_commit_reference_gently()`, and the commit is then marked as uninteresting. But references may not necessarily point to a committish, and if a lot of them aren't then this step takes a lot of time. This is mostly due to the way that `lookup_commit_reference_gently()` is implemented: before we learn about the type of the object we already call `parse_object()` on the object ID. This has two consequences: - We parse all objects, including trees and blobs, even though we don't even need the contents of them. - More importantly though, `parse_object()` will cause us to check whether the object ID matches its contents. Combined this means that we deflate and hash every non-committish object, and that of course ends up being both CPU- and memory-intensive. Improve the logic so that we first use `peel_object()`. This function won't parse the object for us, and thus it allows us to learn about the object's type before we parse and return it. The following benchmark pushes a single object from a shallow clone into a repository that has 100,000 refs. These refs were created by listing all objects via `git rev-list(1) --objects --all` and creating refs for a subset of them, so lots of those refs will cover non-commit objects. Benchmark 1: git-receive-pack (rev = HEAD~) Time (mean ± σ): 62.571 s ± 0.413 s [User: 58.331 s, System: 4.053 s] Range (min … max): 62.191 s … 63.010 s 3 runs Benchmark 2: git-receive-pack (rev = HEAD) Time (mean ± σ): 38.339 s ± 0.192 s [User: 36.220 s, System: 1.992 s] Range (min … max): 38.176 s … 38.551 s 3 runs Summary git-receive-pack . </tmp/input (rev = HEAD) ran 1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~) This leads to a sizeable speedup as we now skip reading and parsing non-commit objects. Before this change we spent around 40% of the time in `assign_shallow_commits_to_refs()`, after the change we only spend around 1.2% of the time in there. Almost the entire remainder of the time is spent in git-rev-list(1) to perform the connectivity checks. Despite the speedup though, this also leads to a massive reduction in allocations. Before: HEAP SUMMARY: in use at exit: 352,480,441 bytes in 97,185 blocks total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated And after: HEAP SUMMARY: in use at exit: 17,524,978 bytes in 22,393 blocks total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated Note that when all references refer to commits performance stays roughly the same, as expected. The following benchmark was executed with 600k commits: Benchmark 1: git-receive-pack (rev = HEAD~) Time (mean ± σ): 9.101 s ± 0.006 s [User: 8.800 s, System: 0.520 s] Range (min … max): 9.095 s … 9.106 s 3 runs Benchmark 2: git-receive-pack (rev = HEAD) Time (mean ± σ): 9.128 s ± 0.094 s [User: 8.820 s, System: 0.522 s] Range (min … max): 9.019 s … 9.188 s 3 runs Summary git-receive-pack (rev = HEAD~) ran 1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD) This will be improved in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent 67ad421 commit f23ac77

3 files changed

Lines changed: 50 additions & 10 deletions

File tree

commit.c

Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -43,13 +43,35 @@ const char *commit_type = "commit";
4343
struct commit *lookup_commit_reference_gently(struct repository *r,
4444
const struct object_id *oid, int quiet)
4545
{
46-
struct object *obj = deref_tag(r,
47-
parse_object(r, oid),
48-
NULL, 0);
46+
const struct object_id *maybe_peeled;
47+
struct object_id peeled_oid;
48+
struct object *object;
49+
enum object_type type;
4950

50-
if (!obj)
51+
switch (peel_object_ext(r, oid, &peeled_oid, 0, &type)) {
52+
case PEEL_NON_TAG:
53+
maybe_peeled = oid;
54+
break;
55+
case PEEL_PEELED:
56+
maybe_peeled = &peeled_oid;
57+
break;
58+
default:
5159
return NULL;
52-
return object_as_type(obj, OBJ_COMMIT, quiet);
60+
}
61+
62+
if (type != OBJ_COMMIT) {
63+
if (!quiet)
64+
error(_("object %s is a %s, not a %s"),
65+
oid_to_hex(oid), type_name(type),
66+
type_name(OBJ_COMMIT));
67+
return NULL;
68+
}
69+
70+
object = parse_object(r, maybe_peeled);
71+
if (!object)
72+
return NULL;
73+
74+
return object_as_type(object, OBJ_COMMIT, quiet);
5375
}
5476

5577
struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)

object.c

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -207,10 +207,11 @@ struct object *lookup_object_by_type(struct repository *r,
207207
}
208208
}
209209

210-
enum peel_status peel_object(struct repository *r,
211-
const struct object_id *name,
212-
struct object_id *oid,
213-
unsigned flags)
210+
enum peel_status peel_object_ext(struct repository *r,
211+
const struct object_id *name,
212+
struct object_id *oid,
213+
unsigned flags,
214+
enum object_type *typep)
214215
{
215216
struct object *o = lookup_unknown_object(r, name);
216217

@@ -220,8 +221,10 @@ enum peel_status peel_object(struct repository *r,
220221
return PEEL_INVALID;
221222
}
222223

223-
if (o->type != OBJ_TAG)
224+
if (o->type != OBJ_TAG) {
225+
*typep = o->type;
224226
return PEEL_NON_TAG;
227+
}
225228

226229
while (o && o->type == OBJ_TAG) {
227230
o = parse_object(r, &o->oid);
@@ -241,9 +244,19 @@ enum peel_status peel_object(struct repository *r,
241244
return PEEL_INVALID;
242245

243246
oidcpy(oid, &o->oid);
247+
*typep = o->type;
244248
return PEEL_PEELED;
245249
}
246250

251+
enum peel_status peel_object(struct repository *r,
252+
const struct object_id *name,
253+
struct object_id *oid,
254+
unsigned flags)
255+
{
256+
enum object_type dummy;
257+
return peel_object_ext(r, name, oid, flags, &dummy);
258+
}
259+
247260
struct object *parse_object_buffer(struct repository *r, const struct object_id *oid, enum object_type type, unsigned long size, void *buffer, int *eaten_p)
248261
{
249262
struct object *obj;

object.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,11 @@ enum peel_status peel_object(struct repository *r,
309309
const struct object_id *name,
310310
struct object_id *oid,
311311
unsigned flags);
312+
enum peel_status peel_object_ext(struct repository *r,
313+
const struct object_id *name,
314+
struct object_id *oid,
315+
unsigned flags,
316+
enum object_type *typep);
312317

313318
struct object_list *object_list_insert(struct object *item,
314319
struct object_list **list_p);

0 commit comments

Comments
 (0)