-
Notifications
You must be signed in to change notification settings - Fork 165
Parser performance #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Parser performance #273
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this layer of caching even makes sense. It's just overhead.
I can't think of any reason not to just maintain a map of name->property_info from the schema.
I'm fuzzy on all the various event type schemas, but looking at this code, it seems like the property_info blobs are expected to be consistent. Just trying to think through whether we ever need any of the old fallback code if we pre-populate and cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. These were just the smallest and quickest wins that I could port across.
Since a cache existed, I kept it and just addressed its bugged performance characteristics.
Though, in practice, the O(1) map lookup only seems to have gains for events with 12+ fields. (Though I see much beter gains in Rust because of UTF-8 - smaller keys to hash).
We could just delete it - and use the (fallback) hinted linear scan.
This is actually even faster for my benchmark which accesses properties in event order.
I have implementioned both approaches - and will open two PRs for you.
property_info has dynamic per-event information for events with more than one variable length field - whereas the name->index mapping is static per-schema.
I actually do a lot more work upfront in my library - I allocate a parser, calculate the offsets for all fixed location properties and cache this. Variable location property offsets are still calculated lazily. The lookup step just resets the parser.
So you could pre-populate and cache a partial property_info[] also, but that's a bigger lift.
I'm happy to provide you both pre-release access to my library if you are interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PRs. Did you compare the two against each other? Just wondering whether you have a preference.
Re. lookup, yes the linear scans are SUPER fast and are almost always faster than the overhead of the hashing and node memory management from the maps.
I'm still trying to convince myself we need all that overhead in the parser to get the property_info. There's a lot more code in there than I would have expected taking a look with fresh eyes.
Hinting and/or simply tracking the last index between calls is probably more than enough and wouldn't need a cache at all.
After looking at them both, I really like the idea of having less than more. I can see the merits of 274 -- and it would line up most accurately with the current model. 275 seems better to me though, and I'm thinking it doesn't go far enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some performance numbers for both approaches are in #275.
The hinted linear scan wins best case, and loses worst case. And the loss scales with the number of fields. TL;DR - O(1) versus O(N) scaling
If you can improve the map performance in a non-API-breaking way I would do that...
Otherwise, due to its simplicity, I would lean towards the linear scan using the highest accessed hint - despite it having worse performance in (arguably perverse) scenarios. I always consume properties in roughly the event order as this feels natural to me. It feels reasonable to assume that other developers mostly do the same...and will hit the fast path...
My only worry is that the user needs to understand the implemention details if they want best performance though. Whereas the map always gives adequate performance.
re: not far enough
What else are you thinking?
In my new library I optimised the map's performance, and provided a second API.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to decide whether even the vector in the parser makes sense. I will take some time today and look more closely to see if it's even worth THAT allocation.
I like the idea of the hint as a way to allow callers to help mitigate non-scan lookups. It's a nice fallback that isn't too hard to explain and doesn't require any caching in the core layer. Well, aside from the schema caching which has a huge perf benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eager caching is definitely the winner of the three approaches for my all-fields-sequentially benchmarks.
It's less good for sparse/partial access though - especially for long events where only early properties are accessed.
Here are the times for accessing only the 2nd/4th/6th properties.
For an eager caching approach, this can potentially be mitigated by providing an (optional) max_eager_index limit to the constructor, and additional logic to handle lookups beyond the eager cache... at this point you are probably better off taking my approach of doing upfront parser pre-work and caching / resetting that.
In practice though #276 is probably good enough for most real world use cases?
Do we have any of the history on why the lazy approach was chosen initially?
I assume there was something about how the team was using it internally at the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An earlier iteration of our ETW library parsing looked more like a map from name->functor to accept the data. That allowed a single, forward-only pass through the event. Only fields with a binding were actually extracted. When krabs was written, the original dev tried to keep that support for sparse access intact.
Krabs is nice at the consumer side, but the lazy, pull-through model comes with some major costs -- especially when using C# because of the thunk (hence the schema locator was added). I think #275 is probably good enough, especially with hinting plumbed through. If we really wanted to chase the dragon and squeeze out every last ns, a forward only, no-cache walk with binding layer would probably be the fastest, but it'd be a much more complicated change and would change the client API shape a lot.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at our internal usage patterns; in most cases we do not end up reading all the properties on the event, we only care about a handful of properties (< 40% of the total properties available).
The benchmarks suggest #275 is the fastest for this pattern. Even in cases where we end up reading all the properties #276 will only be significantly better if the number of properties on the event is large (no such internal usage exists today), for a smaller number of properties the performance looks nearly the same.
I'm leaning towards 275 as the preferred approach to go with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @vmurthysuhas. That's my take too. It might still be worth pulling in the hint parameter changes from 276. We can even hold off on touching the managed layer for now and add that later if we need to.
@jdu2600 do you want to update 275 or would you like me to take your changes and apply the hinted overloads as a new PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 hinted overloads added to #275
The performance is less to do with the number of properties accessed, and more to do with the maximum property index accessed. And also to do with whether any slow size calculations (esp uncounted strings) can be skipped.
A better sparse benchmark might be every third property.
This always accesses a property near the end so #276's eager eval wins.
But if only the middle property is accessed, then #275's lazy eval sometimes wins.
But this depends on the event layout - and where the fixed/variable length fields sit.