Skip to content

Add agentic_search MCP tool backed by the Cosmos Retriever service#135

Open
aryan-410 wants to merge 8 commits into
AzureCosmosDB:mainfrom
aryan-410:feat/agentic-search-cosmos-retriever
Open

Add agentic_search MCP tool backed by the Cosmos Retriever service#135
aryan-410 wants to merge 8 commits into
AzureCosmosDB:mainfrom
aryan-410:feat/agentic-search-cosmos-retriever

Conversation

@aryan-410

Copy link
Copy Markdown

Summary

Adds a 9th MCP tool, agentic_search, that runs a multi-turn retrieval agent over a Cosmos DB corpus and returns ranked, curated documents (hybrid vector + full-text RRF search, optional rerank, multi-turn read/prune).

Commits

  1. Vendor the Cosmos Retriever Python service — FastAPI service (POST /search, GET /health) wrapping CosmosRetriever. Pluggable inference backend: harmony_vllm (fine-tuned pat-jj/harness-1), openai_chat (any OpenAI-compatible chat model), or openai_responses (reasoning models such as gpt-5.4 on Azure AI Foundry). Includes tests for the server and agent loops.
  2. Add agentic_search MCP tool (.NET)AgenticSearchExecutor calls the service over HTTP (COSMOS_RETRIEVER_URL, COSMOS_RETRIEVER_TIMEOUT_S) and always returns parseable JSON (error envelope on failure). Wired into Program.cs, MCPProtocolController (tools/list + tools/call), MCPTestController, and McpToolRequestValidator. CosmosClientFactory excludes ManagedIdentityCredential (falls through to az login) and accepts the standard MCP _meta params field. Docs: docs/AGENTIC_SEARCH.md, README + CHANGELOG + .env.example.

Testing

  • .NET: dotnet build clean; executor unit tests pass.
  • Python: retriever test suites pass.
  • End-to-end verified through the MCP /mcp/http tools/call path against a live Cosmos corpus with gpt-5.4 (Azure AI Foundry) via the responses backend.

Note: the Cosmos Retriever Python service is vendored here so the tool is self-contained; happy to split it into a separate repo/submodule if maintainers prefer.

cosmos-dev added 2 commits June 26, 2026 21:57
FastAPI service (POST /search, GET /health) wrapping CosmosRetriever, which
runs a multi-turn retrieval agent over a Cosmos DB corpus. Pluggable inference
backend: harmony_vllm (fine-tuned pat-jj/harness-1), openai_chat (any
OpenAI-compatible chat model), or openai_responses (reasoning models such as
gpt-5.4 on Azure AI Foundry). Includes tests for the server and agent loops.

The .NET agentic_search tool calls this service over HTTP.
Add a 9th MCP tool, agentic_search, that runs the Cosmos Retriever agent over
a Cosmos DB corpus and returns ranked, curated documents.

- AgenticSearchExecutor: calls the cosmos-retriever service over HTTP
  (COSMOS_RETRIEVER_URL, COSMOS_RETRIEVER_TIMEOUT_S); always returns parseable
  JSON (error envelope on failure).
- Wire into Program.cs, MCPProtocolController (tools/list + tools/call),
  MCPTestController, and McpToolRequestValidator.
- CosmosClientFactory: exclude ManagedIdentityCredential (fall through to az
  login); accept the standard MCP _meta params field.
- Docs: docs/AGENTIC_SEARCH.md, README + CHANGELOG + .env.example.
self.corpus: CorpusConfig = self.settings.resolve_corpus(corpus_name)

self._enc: HarmonyEncoding = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)
self._tiktoken = tiktoken.get_encoding("o200k_harmony")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be configurable, incase of alternatives.

…rness

- Move VllmTokenCompleter + run_single_episode into inference/vllm_policy.py
- Delete inference/evaluate_harness1_vllm.py (eval/benchmark code)
- Repoint retriever.py to vllm_policy; update env_rl docstring
- Include pre-existing: pool_doc_ids trajectory pooling (openai_chat), optional baseten import (rerank)
The datagen/ package deletion was previously only staged, never committed, so
it still appeared in the PR. Actually remove it (search_dataset.py,
generate_sft_rl_splits.py, BrowseComp-Plus, README, __init__) along with the
unit tests folder, the datagen TYPE_CHECKING import in tasks.py, the stale
datagen comment in config.py, and the now-dangling pytest/respx dev deps and
pytest/ruff test config in pyproject.toml.
…lResult

Add a trajectory field to RetrievalResult populated by the harmony_vllm
backend: the search queries issued (search_history), per-turn tool calls
(turn_tools), programmatic per-turn status summaries (turn_summaries), and the
final per-doc importance tags from curation (curated_importance).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant