Skip to content

feat(entity): context-aware loaders + Membase graph-backed NER + per-request auth token#1361

Open
ywang1110 wants to merge 3 commits into
SciSharp:masterfrom
ywang1110:master
Open

feat(entity): context-aware loaders + Membase graph-backed NER + per-request auth token#1361
ywang1110 wants to merge 3 commits into
SciSharp:masterfrom
ywang1110:master

Conversation

@ywang1110
Copy link
Copy Markdown
Contributor

No description provided.

@qodo-code-review
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Context-aware entity loaders with Membase graph-backed NER and per-request auth

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add context-aware entity data loaders with runtime parameters support
• Implement Membase graph-backed NER loader for dynamic vocabulary/synonym loading
• Enable per-request authentication tokens for Membase API calls
• Extend entity analysis options with loader parameters configuration
Diagram
flowchart LR
  A["EntityAnalysisOptions"] -->|"LoaderParameters"| B["EntityDataLoadContext"]
  B -->|"Parameters dict"| C["IEntityDataLoader"]
  C -->|"context-aware methods"| D["MembaseNERDataLoader"]
  D -->|"graphId parameter"| E["IGraphDb"]
  F["MembaseAuthHandler"] -->|"per-request token"| G["IConversationStateService"]
  G -->|"TokenStateKey"| H["HTTP Request"]

Loading

Grey Divider

File Changes

1. src/Infrastructure/BotSharp.Abstraction/Entity/IEntityDataLoader.cs ✨ Enhancement +16/-0

Add context-aware loader method overloads

src/Infrastructure/BotSharp.Abstraction/Entity/IEntityDataLoader.cs


2. src/Infrastructure/BotSharp.Abstraction/Entity/Models/EntityAnalysisOptions.cs ✨ Enhancement +8/-0

Add LoaderParameters property for runtime configuration

src/Infrastructure/BotSharp.Abstraction/Entity/Models/EntityAnalysisOptions.cs


3. src/Infrastructure/BotSharp.Abstraction/Entity/Models/EntityDataLoadContext.cs ✨ Enhancement +16/-0

New context class for passing loader parameters

src/Infrastructure/BotSharp.Abstraction/Entity/Models/EntityDataLoadContext.cs


View more (6)
4. src/Plugins/BotSharp.Plugin.FuzzySharp/FuzzySharpPlugin.cs ⚙️ Configuration changes +2/-0

Register MembaseNERDataLoader in dependency injection

src/Plugins/BotSharp.Plugin.FuzzySharp/FuzzySharpPlugin.cs


5. src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs ✨ Enhancement +237/-0

Implement graph-backed NER loader with caching

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs


6. src/Plugins/BotSharp.Plugin.FuzzySharp/Services/FuzzySharpEntityAnalyzer.cs ✨ Enhancement +27/-6

Pass loader context to vocabulary and synonym loading

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/FuzzySharpEntityAnalyzer.cs


7. src/Plugins/BotSharp.Plugin.FuzzySharp/Settings/FuzzySharpSettings.cs ⚙️ Configuration changes +19/-0

Add Membase NER configuration settings classes

src/Plugins/BotSharp.Plugin.FuzzySharp/Settings/FuzzySharpSettings.cs


8. src/Plugins/BotSharp.Plugin.Membase/Handlers/MembaseAuthHandler.cs ✨ Enhancement +26/-2

Support per-request authentication token resolution

src/Plugins/BotSharp.Plugin.Membase/Handlers/MembaseAuthHandler.cs


9. src/Plugins/BotSharp.Plugin.Membase/MembasePlugin.cs ⚙️ Configuration changes +1/-0

Register HttpContextAccessor for token resolution

src/Plugins/BotSharp.Plugin.Membase/MembasePlugin.cs


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented Jun 1, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (4)

Grey Divider


Action required

1. cached vocabulary returned directly 📘 Rule violation ⛨ Security
Description
LoadVocabularyByGraphIdAsync and LoadSynonymMappingByGraphIdAsync return cached dictionary
instances directly, allowing downstream mutation to alter shared cached state across requests. This
can lead to cross-request data leakage, nondeterministic entity results, and inconsistent synonym
resolution between requests/components.
Code

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[R84-87]

Evidence
Rule 1 requires defensive copies when returning cached/shared mutable objects, but both methods
return the cached value as-is (e.g., if (cached != null) return cached;). In
LoadVocabularyByGraphIdAsync, the returned Dictionary<string, HashSet<string>> includes mutable
HashSet<string> values, so callers can mutate both the dictionary and its sets and those changes
persist in the shared cache; in LoadSynonymMappingByGraphIdAsync, even if the values are tuples,
the dictionary itself is still mutable and is likewise exposed to downstream mutation by being
returned directly from cache.

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[84-87]
src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[163-166]
src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[171-174]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`LoadVocabularyByGraphIdAsync` and `LoadSynonymMappingByGraphIdAsync` currently return cached dictionaries by reference, which exposes shared mutable cached state to callers; downstream mutation can corrupt the cache and cause cross-request leakage, nondeterministic entity results, and inconsistent synonym resolution.

## Issue Context
Both methods read from `_cache` and return `cached` directly when present. For the vocabulary loader, the cached value is a `Dictionary<string, HashSet<string>>`, so a deep copy is needed (copy the dictionary and copy each `HashSet<string>`) when returning cached results (and ideally when storing into the cache as well) to prevent mutation of shared state. For the synonym mapping loader, even though the values are tuples, the dictionary remains mutable and should be cloned/copied before returning to prevent shared-state mutation.

## Fix Focus Areas
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[82-87]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[163-166]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[169-174]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[198-200]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Authorization token not validated 📘 Rule violation ☼ Reliability
Description
MembaseAuthHandler may send an Authorization header with an empty bearer token when neither
per-request state nor _settings.ApiKey is set. This hides configuration/client bugs and can lead
to unpredictable authentication failures against Membase.
Code

src/Plugins/BotSharp.Plugin.Membase/Handlers/MembaseAuthHandler.cs[R29-31]

Evidence
Rule 3 requires null/empty validation at system boundaries with predictable failure behavior. The
handler constructs the Authorization header from ResolveToken() without validating that the
resolved token is non-empty, and ResolveToken() can return the (possibly empty) configured API
key.

src/Plugins/BotSharp.Plugin.Membase/Handlers/MembaseAuthHandler.cs[29-31]
src/Plugins/BotSharp.Plugin.Membase/Handlers/MembaseAuthHandler.cs[41-55]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The outbound Membase request can be sent with an empty bearer token because the resolved token is not validated for null/empty.

## Issue Context
`ResolveToken()` falls back to `_settings.ApiKey` (which defaults to an empty string). `SendAsync` unconditionally sets `Authorization: Bearer {token}`.

## Fix Focus Areas
- src/Plugins/BotSharp.Plugin.Membase/Handlers/MembaseAuthHandler.cs[27-31]
- src/Plugins/BotSharp.Plugin.Membase/Handlers/MembaseAuthHandler.cs[41-55]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. Missing graphId returns empty 📘 Rule violation ☼ Reliability
Description
When graphId is missing from EntityDataLoadContext, the loader returns an empty
vocabulary/synonym set instead of failing fast, which can silently disable NER for a request. This
can mask caller/config bugs and produce unpredictable downstream behavior.
Code

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[R64-70]

Evidence
Rule 3 requires explicit validation with predictable failure behavior at boundaries to avoid silent
defaults. The loader treats missing graphId as a non-error and returns an empty dictionary, which
can lead to implicit 'no NER' behavior instead of an actionable failure.

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[64-70]
src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[226-235]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The graph-backed loader silently returns empty results when required context (`graphId`) is missing.

## Issue Context
This loader documents `graphId` as required, but on missing/empty `graphId` it returns an empty dictionary. Prefer failing fast with a clear exception (or a well-defined error contract) so callers immediately see misconfiguration.

## Fix Focus Areas
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[64-70]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[226-235]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Interpolated logs in loader 📘 Rule violation ◔ Observability
Description
MembaseNERDataLoader uses string interpolation in logging calls instead of structured logging
templates. This reduces queryability/consistency and violates the structured logging requirement.
Code

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[R93-94]

Evidence
Rule 4 requires structured logging and flags string interpolation/concatenation in log messages. The
loader logs warnings/errors using interpolated strings (including variables like graphId and
label) instead of template-based structured logging.

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[93-94]
src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[159-160]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Logging uses interpolated strings rather than structured templates with named properties.

## Issue Context
The compliance requirement prefers structured logs (e.g., `LogWarning("... {GraphId}", graphId)`) for better queryability and consistency.

## Fix Focus Areas
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[93-94]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[159-160]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[198-204]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. GraphId cache key drift 🐞 Bug ☼ Reliability
Description
MembaseNERDataLoader constructs cache keys from the raw graphId, so different casing/whitespace
across calls will create distinct cache entries and can prevent InvalidateCacheAsync from removing
the active entry. This can lead to unnecessary reloads and stale vocabulary/synonym data persisting
longer than intended.
Code

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[R53-55]

Evidence
Cache keys are derived directly from graphId, and invalidation removes keys derived the same way,
so any graphId representation changes across calls will affect cache hit/invalidate behavior.

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[53-55]
src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[209-214]
src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[226-235]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`MembaseNERDataLoader` uses the caller-provided `graphId` verbatim in cache keys. If callers provide the same logical graph id with different casing/whitespace, caching and invalidation can diverge (multiple entries, invalidation misses).

### Issue Context
The loader reads `graphId` from `EntityDataLoadContext.Parameters` and uses it for cache key construction and cache invalidation.

### Fix Focus Areas
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[53-55]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[226-235]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[209-214]

### Implementation notes
- Introduce a small normalization helper for cache/invalidation keys (e.g., `NormalizeGraphIdKey(graphId) => graphId.Trim().ToLowerInvariant()`), and use it in `VocabKey`, `SynonymKey`, and `InvalidateCacheAsync`.
- Keep the original `graphId` (trimmed if desired) for `GraphQueryExecuteOptions.GraphId` if the underlying graph db treats ids as case-sensitive.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
6. Canonical synonym not trimmed 🐞 Bug ≡ Correctness
Description
MembaseNERDataLoader stores canonical_form without trimming, which can surface canonical strings
with leading/trailing whitespace and break overlap deduplication that relies on strict
canonical-form equality. This can produce duplicate or inconsistent entity outputs for the same
synonym.
Code

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[R188-196]

Evidence
The loader trims term/table/column but not canonical_form, while the result processor
deduplicates overlaps by CanonicalForm exact equality—making whitespace differences observable and
behavior-changing.

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[186-196]
src/Plugins/BotSharp.Plugin.FuzzySharp/Services/Processors/ResultProcessor.cs[50-55]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`canonical_form` values from the graph synonym query are persisted without normalization. Downstream deduplication compares canonical forms using strict string equality, so whitespace-only differences prevent correct deduping and may show incorrect canonical text.

### Issue Context
`term`, `table`, and `column` are trimmed, but `canonical_form` is not.

### Fix Focus Areas
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[188-196]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/Processors/ResultProcessor.cs[50-53]

### Implementation notes
- Change the assignment to store `canonical.Trim()` (and consider also trimming/lowercasing the term consistently).
- Example: `var canonicalText = canonical.Trim(); result[termKey] = (dbPath, canonicalText);`

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

7. Repeated row scans in vocab 🐞 Bug ➹ Performance
Description
LoadVocabularyByGraphIdAsync scans the same queryResult.Values once per projected field, adding
avoidable overhead and increasing latency for large label result sets. This is especially costly on
cache misses, when vocabulary loads already do network I/O.
Code

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[R141-155]

Evidence
The code nests a full scan of queryResult.Values inside a loop over validFields, so each
additional field re-iterates the entire row set.

src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[141-155]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Vocabulary aggregation loops over fields and then loops over all rows for each field, repeatedly enumerating the full result set. This creates avoidable overhead (extra dictionary lookups and repeated row iteration).

### Issue Context
The query returns a row dictionary with all projected aliases per row, so values for all aliases can be extracted in a single pass over rows.

### Fix Focus Areas
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[141-155]

### Implementation notes
- Pre-create `Dictionary<string, HashSet<string>>` entries for each `sqlSource`.
- Iterate `foreach (var row in queryResult.Values)` once and inside it loop `foreach (var (alias, _, sqlSource) in validFields)` to add the value.
- Keep current caching behavior (this mainly improves cache-miss performance).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment on lines +84 to +87
var key = VocabKey(graphId);
var cached = await _cache.GetAsync<Dictionary<string, HashSet<string>>>(key);
if (cached != null) return cached;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. cached vocabulary returned directly 📘 Rule violation ⛨ Security

LoadVocabularyByGraphIdAsync and LoadSynonymMappingByGraphIdAsync return cached dictionary
instances directly, allowing downstream mutation to alter shared cached state across requests. This
can lead to cross-request data leakage, nondeterministic entity results, and inconsistent synonym
resolution between requests/components.
Agent Prompt
## Issue description
`LoadVocabularyByGraphIdAsync` and `LoadSynonymMappingByGraphIdAsync` currently return cached dictionaries by reference, which exposes shared mutable cached state to callers; downstream mutation can corrupt the cache and cause cross-request leakage, nondeterministic entity results, and inconsistent synonym resolution.

## Issue Context
Both methods read from `_cache` and return `cached` directly when present. For the vocabulary loader, the cached value is a `Dictionary<string, HashSet<string>>`, so a deep copy is needed (copy the dictionary and copy each `HashSet<string>`) when returning cached results (and ideally when storing into the cache as well) to prevent mutation of shared state. For the synonym mapping loader, even though the values are tuples, the dictionary remains mutable and should be cloned/copied before returning to prevent shared-state mutation.

## Fix Focus Areas
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[82-87]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[163-166]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[169-174]
- src/Plugins/BotSharp.Plugin.FuzzySharp/Services/DataLoaders/MembaseNERDataLoader.cs[198-200]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +29 to 31
var token = ResolveToken();
requestMessage.Headers.TryAddWithoutValidation("Authorization", $"Bearer {token}");
var response = await base.SendAsync(requestMessage, cancellationToken).ConfigureAwait(false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Authorization token not validated 📘 Rule violation ☼ Reliability

MembaseAuthHandler may send an Authorization header with an empty bearer token when neither
per-request state nor _settings.ApiKey is set. This hides configuration/client bugs and can lead
to unpredictable authentication failures against Membase.
Agent Prompt
## Issue description
The outbound Membase request can be sent with an empty bearer token because the resolved token is not validated for null/empty.

## Issue Context
`ResolveToken()` falls back to `_settings.ApiKey` (which defaults to an empty string). `SendAsync` unconditionally sets `Authorization: Bearer {token}`.

## Fix Focus Areas
- src/Plugins/BotSharp.Plugin.Membase/Handlers/MembaseAuthHandler.cs[27-31]
- src/Plugins/BotSharp.Plugin.Membase/Handlers/MembaseAuthHandler.cs[41-55]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant