Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions O365.Security.Native.ETW.Debug.nuspec
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
<package xmlns="http://schemas.microsoft.com/packaging/2011/08/nuspec.xsd">
<metadata>
<id>Microsoft.O365.Security.Native.ETW.Debug</id>
<version>4.4.7</version>
<version>4.4.8</version>
<title>Microsoft.O365.Security.Native.ETW Debug - managed wrappers for krabsetw</title>
<authors>Microsoft</authors>
<owners>Microsoft</owners>
Expand All @@ -12,8 +12,9 @@
<description>Microsoft.O365.Security.Native.ETW Debug is a managed wrapper around the krabsetw ETW library. This is the Debug build.</description>
<summary>Microsoft.O365.Security.Native.ETW Debug is a managed wrapper around the krabsetw ETW library. This is the Debug build.</summary>
<releaseNotes>
Version 4.4.7:
- Add process_start_key() to read ProcessStartKey from extended data
Version 4.4.8:
- Optimize parser property lookup with persistent name-to-index map
- Use std::wstring_view for parser property name parameters
</releaseNotes>
<copyright>© Microsoft Corporation. All rights reserved.</copyright>
<language />
Expand Down
7 changes: 4 additions & 3 deletions O365.Security.Native.ETW.nuspec
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
<package xmlns="http://schemas.microsoft.com/packaging/2011/08/nuspec.xsd">
<metadata>
<id>Microsoft.O365.Security.Native.ETW</id>
<version>4.4.7</version>
<version>4.4.8</version>
<title>Microsoft.O365.Security.Native.ETW - managed wrappers for krabsetw</title>
<authors>Microsoft</authors>
<owners>Microsoft</owners>
Expand All @@ -12,8 +12,9 @@
<description>Microsoft.O365.Security.Native.ETW is a managed wrapper around the krabsetw ETW library.</description>
<summary>Microsoft.O365.Security.Native.ETW is a managed wrapper around the krabsetw ETW library.</summary>
<releaseNotes>
Version 4.4.7:
- Add process_start_key() to read ProcessStartKey from extended data
Version 4.4.8:
- Optimize parser property lookup with persistent name-to-index map
- Use std::wstring_view for parser property name parameters
</releaseNotes>
<copyright>© Microsoft Corporation. All rights reserved.</copyright>
<language />
Expand Down
118 changes: 68 additions & 50 deletions krabs/krabs/parser.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@
#pragma once

#include <cassert>
#include <deque>
#include <utility>
#include <stdexcept>
#include <string_view>
#include <vector>

#include <functional>

Expand Down Expand Up @@ -76,7 +76,7 @@ namespace krabs {
* </remarks>
*/
template <typename T>
bool try_parse(const std::wstring &name, T &out);
bool try_parse(std::wstring_view name, T &out);

/**
* <summary>
Expand All @@ -85,23 +85,24 @@ namespace krabs {
* </summary>
*/
template <typename T>
T parse(const std::wstring &name);
T parse(std::wstring_view name);

template <typename Adapter>
auto view_of(const std::wstring &name, Adapter &adapter) -> collection_view<typename Adapter::const_iterator>;
auto view_of(std::wstring_view name, Adapter &adapter) -> collection_view<typename Adapter::const_iterator>;

private:
property_info find_property(const std::wstring &name);
void cache_property(const wchar_t *name, property_info info);
property_info find_property(std::wstring_view name);
void cache_property(ULONG index, property_info info);

private:
const schema &schema_;
const BYTE *pEndBuffer_;
BYTE *pBufferIndex_;
ULONG lastPropertyIndex_;

// Maintain a mapping from property name to blob data index.
std::deque<std::pair<const wchar_t *, property_info>> propertyCache_;
// Persistent name to index map shared across all events of the same type.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this layer of caching even makes sense. It's just overhead.
I can't think of any reason not to just maintain a map of name->property_info from the schema.

I'm fuzzy on all the various event type schemas, but looking at this code, it seems like the property_info blobs are expected to be consistent. Just trying to think through whether we ever need any of the old fallback code if we pre-populate and cache.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. These were just the smallest and quickest wins that I could port across.

Since a cache existed, I kept it and just addressed its bugged performance characteristics.
Though, in practice, the O(1) map lookup only seems to have gains for events with 12+ fields. (Though I see much beter gains in Rust because of UTF-8 - smaller keys to hash).

We could just delete it - and use the (fallback) hinted linear scan.
This is actually even faster for my benchmark which accesses properties in event order.

I have implementioned both approaches - and will open two PRs for you.


property_info has dynamic per-event information for events with more than one variable length field - whereas the name->index mapping is static per-schema.

I actually do a lot more work upfront in my library - I allocate a parser, calculate the offsets for all fixed location properties and cache this. Variable location property offsets are still calculated lazily. The lookup step just resets the parser.

So you could pre-populate and cache a partial property_info[] also, but that's a bigger lift.

I'm happy to provide you both pre-release access to my library if you are interested.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PRs. Did you compare the two against each other? Just wondering whether you have a preference.
Re. lookup, yes the linear scans are SUPER fast and are almost always faster than the overhead of the hashing and node memory management from the maps.

I'm still trying to convince myself we need all that overhead in the parser to get the property_info. There's a lot more code in there than I would have expected taking a look with fresh eyes.

Hinting and/or simply tracking the last index between calls is probably more than enough and wouldn't need a cache at all.

After looking at them both, I really like the idea of having less than more. I can see the merits of 274 -- and it would line up most accurately with the current model. 275 seems better to me though, and I'm thinking it doesn't go far enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some performance numbers for both approaches are in #275.
The hinted linear scan wins best case, and loses worst case. And the loss scales with the number of fields. TL;DR - O(1) versus O(N) scaling

If you can improve the map performance in a non-API-breaking way I would do that...

Otherwise, due to its simplicity, I would lean towards the linear scan using the highest accessed hint - despite it having worse performance in (arguably perverse) scenarios. I always consume properties in roughly the event order as this feels natural to me. It feels reasonable to assume that other developers mostly do the same...and will hit the fast path...

My only worry is that the user needs to understand the implemention details if they want best performance though. Whereas the map always gives adequate performance.

re: not far enough
What else are you thinking?


In my new library I optimised the map's performance, and provided a second API.

  • UTF-16LE -> UTF-8 keys
  • SipHash -> FxHash
// grab the parser from the cache - it's pre-warmed and reset for use
let mut parser = schema_cache.parser(event)?;

// convenience API - FxHashMap lookup keyed by UTF-8 property name
let process_id: u32 = parser.get("ProcessID")?;

// performance API - check hinted index first, then fall back to map lookup
let image_name: String = parser.get_hinted("ImageName", 10)?;

@kylereedmsft kylereedmsft Apr 3, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to decide whether even the vector in the parser makes sense. I will take some time today and look more closely to see if it's even worth THAT allocation.

I like the idea of the hint as a way to allow callers to help mitigate non-scan lookups. It's a nice fallback that isn't too hard to explain and doesn't require any caching in the core layer. Well, aside from the schema caching which has a huge perf benefit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eager caching is definitely the winner of the three approaches for my all-fields-sequentially benchmarks.

Event Type 4.4.6 PR#274 PR#275 PR#276 PR#276 hinted
Process Start 2620 ns 1322 ns 824 ns 624 ns 465 ns
Thread Start 1269 ns 495 ns 280 ns 152 ns 158 ns
RPC Client Call 871 ns 549 ns 304 ns 187 ns 194 ns
File Create 653 ns 415 ns 256 ns 192 ns 175 ns
Registry CreateKey 532 ns 351 ns 232 ns 153 ns 152 ns

It's less good for sparse/partial access though - especially for long events where only early properties are accessed.
Here are the times for accessing only the 2nd/4th/6th properties.
For an eager caching approach, this can potentially be mitigated by providing an (optional) max_eager_index limit to the constructor, and additional logic to handle lookups beyond the eager cache... at this point you are probably better off taking my approach of doing upfront parser pre-work and caching / resetting that.

Event Props PR#274 PR#275 PR#276 PR#276 hinted PR#276 + max_eager(5) PR#276 + max_eager(5) + hinted
Process Start 16 233 ns 147 ns 272 ns 260 ns 102 ns 91 ns
RPC ClientCall 9 204 ns 158 ns 121 ns 114 ns 115 ns 103 ns
Thread Start 8 206 ns 140 ns 103 ns 88 ns 94 ns 81 ns
File Create 7 202 ns 137 ns 126 ns 107 ns 94 ns 80 ns
Registry CreateKey 6 237 ns 170 ns 129 ns 119 ns 129 ns 120 ns

In practice though #276 is probably good enough for most real world use cases?

Do we have any of the history on why the lazy approach was chosen initially?
I assume there was something about how the team was using it internally at the time?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An earlier iteration of our ETW library parsing looked more like a map from name->functor to accept the data. That allowed a single, forward-only pass through the event. Only fields with a binding were actually extracted. When krabs was written, the original dev tried to keep that support for sparse access intact.
Krabs is nice at the consumer side, but the lazy, pull-through model comes with some major costs -- especially when using C# because of the thunk (hence the schema locator was added). I think #275 is probably good enough, especially with hinting plumbed through. If we really wanted to chase the dragon and squeeze out every last ns, a forward only, no-cache walk with binding layer would probably be the fastest, but it'd be a much more complicated change and would change the client API shape a lot.

@vmurthysuhas vmurthysuhas Apr 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at our internal usage patterns; in most cases we do not end up reading all the properties on the event, we only care about a handful of properties (< 40% of the total properties available).
The benchmarks suggest #275 is the fastest for this pattern. Even in cases where we end up reading all the properties #276 will only be significantly better if the number of properties on the event is large (no such internal usage exists today), for a smaller number of properties the performance looks nearly the same.
I'm leaning towards 275 as the preferred approach to go with.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @vmurthysuhas. That's my take too. It might still be worth pulling in the hint parameter changes from 276. We can even hold off on touching the managed layer for now and add that later if we need to.
@jdu2600 do you want to update 275 or would you like me to take your changes and apply the hinted overloads as a new PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 hinted overloads added to #275

we only care about a handful of properties (< 40% of the total properties available).
The benchmarks suggest #275 is the fastest for this pattern. Even in cases where we end up reading all the properties #276 will only be significantly better if the number of properties on the event is large (no such internal usage exists today), for a smaller number of properties the performance looks nearly the same.

The performance is less to do with the number of properties accessed, and more to do with the maximum property index accessed. And also to do with whether any slow size calculations (esp uncounted strings) can be skipped.

A better sparse benchmark might be every third property.
This always accesses a property near the end so #276's eager eval wins.

Event Props #275 #275 reverse #275 hinted (+reverse ) #276 #276 reverse #276 hinted (+reverse )
Process 6 of 16 482.9 1105.4 498.3 385.4 779.2 307.3
Thread 4 of 10 199.9 309.6 183.7 117.0 240.0 94.4
RPC 3 of 9 166.7 220.3 155.1 121.0 193.4 99.2
File 3 of 7 208.3 237.4 197.5 162.5 213.5 134.1
Registry 2 of 6 111.3 120.4 109.4 88.3 104.9 81.4

But if only the middle property is accessed, then #275's lazy eval sometimes wins.
But this depends on the event layout - and where the fixed/variable length fields sit.

Event type #275 #275 hinted #276 #276 hinted
Process 184.4 145.2 277.8 234.1
Thread 130.7 127.0 87.1 68.7
File 97.6 93.0 98.4 85.5
Registry 98.8 93.3 82.3 70.3
RPC 139.5 127.5 120.3 98.9

const property_name_map *pPropertyNames_;
// Maintain a mapping from property index to blob data location.
std::vector<property_info> propertyCache_;
};

// Implementation
Expand All @@ -112,14 +113,16 @@ namespace krabs {
, pEndBuffer_((BYTE*)s.record_.UserData + s.record_.UserDataLength)
, pBufferIndex_((BYTE*)s.record_.UserData)
, lastPropertyIndex_(0)
, pPropertyNames_(s.pPropertyNames_)
, propertyCache_(s.pSchema_->PropertyCount)
{}

inline property_iterator parser::properties() const
{
return property_iterator(schema_);
}

inline property_info parser::find_property(const std::wstring &name)
inline property_info parser::find_property(std::wstring_view name)
{
// A schema contains a collection of properties that are keyed by name.
// These properties are stored in a blob of bytes that needs to be
Expand All @@ -129,15 +132,40 @@ namespace krabs {
// the contents within it. This is janky, so our strategy is to
// minimize this as much as possible via caching.

// The first step is to use our cache for the property to see if we've
// discovered it already.
for (auto &item : propertyCache_) {
if (name == item.first) {
return item.second;
const ULONG totalPropCount = schema_.pSchema_->PropertyCount;

// Resolve property name to index.
ULONG index = totalPropCount; // sentinel = not found
if (pPropertyNames_) {
// Fast path: use the persistent name to index map shared across
// all events of the same type.
auto it = pPropertyNames_->find(name);
if (it != pPropertyNames_->end()) {
index = it->second;
}
} else {
// Fallback: linear scan of property names in the schema.
for (ULONG i = 0; i < totalPropCount; ++i) {
auto &propInfo = schema_.pSchema_->EventPropertyInfoArray[i];
const wchar_t *pName = reinterpret_cast<const wchar_t*>(
reinterpret_cast<const BYTE*>(schema_.pSchema_) +
propInfo.NameOffset);
if (name == pName) {
index = i;
break;
}
}
}

const ULONG totalPropCount = schema_.pSchema_->PropertyCount;
if (index >= totalPropCount) {
return property_info();
}

// The first step is to use our cache for the property to see if we've
// discovered it already.
if (index < lastPropertyIndex_) {
return propertyCache_[index];
}

assert((pBufferIndex_ <= pEndBuffer_ && pBufferIndex_ >= schema_.record_.UserData) &&
"invariant: we should've already thrown for falling off the edge");
Expand All @@ -153,16 +181,13 @@ namespace krabs {
// to find it. While we're going through the blob to find it, we'll
// remember what we've seen to save time later.
//
// Question: Why don't we just populate the cache before looking up any
// properties and simplify our code (less state, etc)?
//
// Answer: Doing that introduces overhead in the case that only a
// subset of properties are needed. While this code is a bit
// more complicated, we introduce no additional performance
// overhead at runtime.
for (auto &i = lastPropertyIndex_; i < totalPropCount; ++i) {
// Note: The name-to-index map is built once per schema type (cheap
// metadata scan). But the blob walk below is lazy per-event -- we
// only walk forward to the requested index, avoiding overhead when
// only a subset of properties are needed.
while (lastPropertyIndex_ <= index) {

auto &currentPropInfo = schema_.pSchema_->EventPropertyInfoArray[i];
auto &currentPropInfo = schema_.pSchema_->EventPropertyInfoArray[lastPropertyIndex_];
const wchar_t *pName = reinterpret_cast<const wchar_t*>(
reinterpret_cast<const BYTE*>(schema_.pSchema_) +
currentPropInfo.NameOffset);
Expand All @@ -179,26 +204,19 @@ namespace krabs {
}

property_info propInfo(pBufferIndex_, currentPropInfo, propertyLength);
cache_property(pName, propInfo);
cache_property(lastPropertyIndex_, propInfo);

// advance the buffer index since we've already processed this property
pBufferIndex_ += propertyLength;

// The property was found, return it
if (name == pName) {
// advance the index since we've already processed this property
++i;
return propInfo;
}
lastPropertyIndex_++;
}

// property wasn't found, return an empty propInfo
return property_info();
return propertyCache_[index];
}

inline void parser::cache_property(const wchar_t *name, property_info propInfo)
inline void parser::cache_property(ULONG index, property_info info)
{
propertyCache_.push_front(std::make_pair(name, propInfo));
propertyCache_[index] = info;
}

inline void throw_if_property_not_found(const property_info &propInfo)
Expand Down Expand Up @@ -227,7 +245,7 @@ namespace krabs {
// ------------------------------------------------------------------------

template <typename T>
bool parser::try_parse(const std::wstring &name, T &out)
bool parser::try_parse(std::wstring_view name, T &out)
{
try {
out = parse<T>(name);
Expand All @@ -252,7 +270,7 @@ namespace krabs {
// ------------------------------------------------------------------------

template <typename T>
T parser::parse(const std::wstring &name)
T parser::parse(std::wstring_view name)
{
auto propInfo = find_property(name);
throw_if_property_not_found(propInfo);
Expand All @@ -268,7 +286,7 @@ namespace krabs {
}

template<>
inline bool parser::parse<bool>(const std::wstring& name)
inline bool parser::parse<bool>(std::wstring_view name)
{
auto propInfo = find_property(name);
throw_if_property_not_found(propInfo);
Expand All @@ -280,7 +298,7 @@ namespace krabs {
}

template <>
inline std::wstring parser::parse<std::wstring>(const std::wstring &name)
inline std::wstring parser::parse<std::wstring>(std::wstring_view name)
{
auto propInfo = find_property(name);
throw_if_property_not_found(propInfo);
Expand All @@ -294,7 +312,7 @@ namespace krabs {
}

template <>
inline std::string parser::parse<std::string>(const std::wstring &name)
inline std::string parser::parse<std::string>(std::wstring_view name)
{
auto propInfo = find_property(name);
throw_if_property_not_found(propInfo);
Expand All @@ -308,7 +326,7 @@ namespace krabs {
}

template<>
inline const counted_string* parser::parse<const counted_string*>(const std::wstring &name)
inline const counted_string* parser::parse<const counted_string*>(std::wstring_view name)
{
auto propInfo = find_property(name);
throw_if_property_not_found(propInfo);
Expand All @@ -319,7 +337,7 @@ namespace krabs {
}

template<>
inline binary parser::parse<binary>(const std::wstring &name)
inline binary parser::parse<binary>(std::wstring_view name)
{
auto propInfo = find_property(name);
throw_if_property_not_found(propInfo);
Expand All @@ -331,7 +349,7 @@ namespace krabs {

template<>
inline ip_address parser::parse<ip_address>(
const std::wstring &name)
std::wstring_view name)
{
auto propInfo = find_property(name);
throw_if_property_not_found(propInfo);
Expand All @@ -354,7 +372,7 @@ namespace krabs {

template<>
inline socket_address parser::parse<socket_address>(
const std::wstring &name)
std::wstring_view name)
{
auto propInfo = find_property(name);
throw_if_property_not_found(propInfo);
Expand All @@ -366,7 +384,7 @@ namespace krabs {

template<>
inline sid parser::parse<sid>(
const std::wstring& name)
std::wstring_view name)
{
auto propInfo = find_property(name);
throw_if_property_not_found(propInfo);
Expand Down Expand Up @@ -404,7 +422,7 @@ namespace krabs {
}

template<>
inline pointer parser::parse<pointer>(const std::wstring& name)
inline pointer parser::parse<pointer>(std::wstring_view name)
{
auto propInfo = find_property(name);
throw_if_property_not_found(propInfo);
Expand All @@ -418,7 +436,7 @@ namespace krabs {
// ------------------------------------------------------------------------

template <typename Adapter>
auto parser::view_of(const std::wstring &name, Adapter &adapter)
auto parser::view_of(std::wstring_view name, Adapter &adapter)
-> collection_view<typename Adapter::const_iterator>
{
auto propInfo = find_property(name);
Expand Down
4 changes: 4 additions & 0 deletions krabs/krabs/schema.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,8 @@ namespace krabs {
private:
const EVENT_RECORD &record_;
const TRACE_EVENT_INFO *pSchema_;
// Persistent name to index map, owned by schema_locator. May be nullptr.
const property_name_map *pPropertyNames_;

private:
friend std::wstring event_name(const schema &);
Expand Down Expand Up @@ -337,11 +339,13 @@ namespace krabs {
inline schema::schema(const EVENT_RECORD &record, const krabs::schema_locator &schema_locator)
: record_(record)
, pSchema_(schema_locator.get_event_schema(record))
, pPropertyNames_(schema_locator.get_property_names(pSchema_))
{ }

inline schema::schema(const EVENT_RECORD &record, const PTRACE_EVENT_INFO pSchema)
: record_(record)
, pSchema_(pSchema)
, pPropertyNames_(nullptr)
{ }

inline bool schema::operator==(const schema &other) const
Expand Down
Loading
Loading