Agent Salience

Deterministic salience, similarity, threshold, drift, and loop diagnostics for local agentic systems.

Agent Salience is a small stdlib-only Python package. It is designed to be embedded by tools such as project memory stores, context-economy tools, governors, and routers. It is not an agent, not an MCP server, not a vector database, and not an embedding model.

The package provides explainable primitives for questions like:

Is this text similar to that text?
Is this observation novel or drifting away from the baseline?
Is this action pattern becoming a loop?
Did this signal cross a fixed/adaptive/hysteresis threshold?
Can this local project corpus support IDF-aware scoring yet?
Can this text be represented by a compact deterministic signature for future indexing?

Status

Current version: 0.2.1

Runtime requirements:

Python 3.9+
Standard library only
No runtime dependencies

What Agent Salience provides

Unicode-aware lexical normalization
Sparse token-frequency maps
Cosine similarity
Jaccard similarity with empty/empty as 0.0 by design
Explainable weighted signal_score() breakdowns
Optional character n-gram similarity for morphological variants
Optional token-prefix overlap for visible word-family matches
Optional project-controlled alias-map expansion
Cold-start-aware local IDF profiles
Domain-specific IDF profile support
IDF-weighted Jaccard/Tanimoto support (idf_jaccard) for common-token suppression
Deterministic stable hash helper
Token shingle helpers
Bounded min-k shingle hashes
TextSignature for future MinHash/LSH preparation
Welford running statistics
EWMA statistics
Adaptive thresholds
Hysteresis thresholds
Trigger objects with evaluate() / observe() behavior
Drift and novelty scoring
Repeated-action loop diagnostics
Compact explanation helpers

Non-goals

Agent Salience is not:

an LLM agent
an MCP server
a memory store
a vector database
an embedding model
a token counter
a router
a governor
a language-specific stemmer
a stopword package

It provides deterministic signals. Callers own policy, persistence, routing, enforcement, and domain meaning.

Installation

From a local checkout:

pip install .

For editable development:

pip install -e .

For no-install local usage, add src/ to PYTHONPATH or sys.path:

import sys
from pathlib import Path

ROOT = Path(r"/path/to/agent-salience")
sys.path.insert(0, str(ROOT / "src"))

Quick start

from agent_salience import signal_score

score = signal_score(
    "MCP server entrypoint and tests",
    "Inspect server.py and test_server.py before editing the MCP entrypoint.",
)

print(score.to_dict())

signal_score() returns an explainable breakdown, not only a scalar:

{
    "cosine": 0.559,
    "jaccard": 0.364,
    "final": 0.500,
    "weights": {"cosine": 0.7, "jaccard": 0.3, ...},
    ...
}

Default scoring is deterministic lexical scoring:

final = 0.70 * cosine + 0.30 * jaccard

Optional components are explicit and off by default unless requested through weights/options.

Similarity basics

Jaccard

from agent_salience import jaccard_similarity

jaccard_similarity(["agent", "memory"], ["agent", "routing"])

jaccard_similarity([], []) returns 0.0 because, in search and memory salience, empty text is treated as no evidence, not as a perfect semantic match.

Cosine

from agent_salience import cosine_similarity, token_frequencies

left = token_frequencies("agent loop budget")
right = token_frequencies("agent budget discipline")

cosine_similarity(left, right)

Optional fuzzy lexical helpers

Character n-grams

Character n-gram similarity helps with visible morphological variants:

from agent_salience import char_ngram_similarity

char_ngram_similarity("validation", "validated")
char_ngram_similarity("configuration", "config")

This is language-agnostic. It does not solve conceptual synonyms.

Token-prefix overlap

Token-prefix overlap gives a small similarity signal when tokens share a visible prefix:

from agent_salience import normalize_text, token_prefix_overlap

left = normalize_text("configuration")
right = normalize_text("config")

token_prefix_overlap(left, right)

This is not stemming and not alias expansion. It is a deterministic word-family helper.

Project-controlled alias maps

Alias maps bridge local project vocabulary without using embeddings or hardcoded language stopwords.

from agent_salience import signal_score

aliases = {
    "test_failure": [
        "test failure",
        "broken validation",
        "debug failing test",
        "failed validation run",
    ],
}

score = signal_score(
    "test failure triage",
    "debug broken validation run",
    alias_map=aliases,
    weights={"cosine": 0.4, "jaccard": 0.3, "alias": 0.3},
)

Alias governance belongs to the caller, usually a coordinator/router policy layer. Agent Salience only applies the approved alias map it receives.

IDF support

IDF support is local, language-agnostic, and cold-start aware.

from agent_salience import build_idf_profile, signal_score

corpus = [
    "mnemo stores project memory",
    "thrift tracks token economy",
    "governor detects repeated loops",
]

profile = build_idf_profile(
    corpus,
    min_documents=3,
    min_unique_terms=5,
    min_total_tokens=6,
)

score = signal_score(
    "token economy",
    "context cost tracking",
    mode="auto",
    idf_profile=profile,
    weights={"cosine": 0.5, "jaccard": 0.2, "idf_cosine": 0.3},
)

print(score.idf_status, score.idf_used)

When the corpus is too small, IDF stays cold:

idf_status = "cold"
idf_used = false

In cold mode, scoring falls back to lexical components. IDF does not replace baseline lexical scoring; when enough local corpus exists, it adds optional idf_cosine and idf_jaccard components.

signal_score() can combine both IDF cosine and IDF-weighted Jaccard/Tanimoto:

score = signal_score(
    "MCP server entrypoint and tests",
    "Discuss apartment prices and kindergarten logistics",
    mode="auto",
    idf_profile=profile,
    weights={"idf_cosine": 0.55, "idf_jaccard": 0.35, "cosine": 0.05, "jaccard": 0.05},
)

idf_jaccard is useful when unrelated texts share only common terms. Frequent local-corpus terms contribute very little, while rare/domain-specific terms remain meaningful.

Domain-specific IDF

from agent_salience import build_domain_idf_profiles

records = [
    {"domain": "memory", "text": "mnemo recall context block"},
    {"domain": "memory", "text": "hippocampus durable entry"},
    {"domain": "economy", "text": "token budget context window"},
]

profiles = build_domain_idf_profiles(records)

Domain IDF is useful when different areas develop different common vocabulary.

Text signatures and future LSH preparation

Agent Salience 0.2.0 does not implement full MinHash/LSH.

It does provide stable signature primitives so callers can prepare storage schemas now and add more advanced indexing later:

from agent_salience import build_text_signature

signature = build_text_signature("Run tests before release handoff.")
print(signature.to_dict())

A signature includes:

content_hash
normalized_hash
token_count
unique_token_count
top_terms
shingle_hashes
signature_version
normalizer_version

This makes future MinHash/LSH a signature-version upgrade instead of a redesign.

Loop diagnostics

from agent_salience import ActionEvent, detect_repeated_target_loop

events = [
    ActionEvent(tool="file_window", target="server.py"),
    ActionEvent(tool="file_window", target="server.py"),
    ActionEvent(tool="file_window", target="server.py"),
    ActionEvent(tool="grep_text", target="server.py"),
]

decision = detect_repeated_target_loop(events, threshold=0.6, min_count=3)
print(decision.to_dict())

Loop decisions are diagnostics. Callers decide whether to warn, pause, stop, or ignore.

Drift and novelty

from agent_salience import drift_score, novelty_score

baseline = "Fix Mnemo consolidation and signature backfill."
current = "Discuss apartment prices and kindergarten logistics."

drift = drift_score(baseline, current)

Drift is a mission-alignment signal. It should feed policy; it should not be treated as an automatic stop button by itself.

Thresholds and triggers

from agent_salience import AdaptiveThreshold, SignalTrigger

threshold = AdaptiveThreshold(base=0.70, minimum=0.45, maximum=0.95)
trigger = SignalTrigger(
    name="loop-warning",
    pattern="repeated same tool and target",
    threshold=threshold,
    kind="loop",
)

decision = trigger.observe("same file_window target repeated repeatedly")
print(decision.to_dict())

Thresholds support serialization/roundtrip so callers can persist local adaptive state.

Integration model

Agent Salience is meant to be embedded by other local tools:

Caller	Typical use
Project memory	relevance, duplicate candidates, novelty, signature helpers
Context economy tool	repeated file-window/read patterns, cost-related similarity
Governor	loop/no-progress diagnostics, drift warnings
Router	task similarity, known-workflow matching, alias/IDF-assisted routing
Feedback writer	deciding which lessons are worth recording or promoting

Separation of responsibilities:

Agent Salience = deterministic signal primitives
Caller = storage, policy, routing, enforcement, approvals

Documentation

Development

Run from repository root:

python -m compileall -q .
python smoke_test.py
python -m unittest discover -s tests -p "test*.py"

Expected result for 0.2.0:

57 tests passed

Stability notes

0.2.0 keeps the original lexical default behavior and adds optional extension points. Full LSH, embeddings, persistent corpus stores, and caller-specific policies are intentionally out of scope.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/agent_salience		src/agent_salience
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
smoke_test.py		smoke_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Salience

Status

What Agent Salience provides

Non-goals

Installation

Quick start

Similarity basics

Jaccard

Cosine

Optional fuzzy lexical helpers

Character n-grams

Token-prefix overlap

Project-controlled alias maps

IDF support

Domain-specific IDF

Text signatures and future LSH preparation

Loop diagnostics

Drift and novelty

Thresholds and triggers

Integration model

Documentation

Development

Stability notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Salience

Status

What Agent Salience provides

Non-goals

Installation

Quick start

Similarity basics

Jaccard

Cosine

Optional fuzzy lexical helpers

Character n-grams

Token-prefix overlap

Project-controlled alias maps

IDF support

Domain-specific IDF

Text signatures and future LSH preparation

Loop diagnostics

Drift and novelty

Thresholds and triggers

Integration model

Documentation

Development

Stability notes

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages