An autonomous, multi-provider Agentic RAG system built with LangGraph. This agent ingests institutional data, parses market sentiment, and utilizes live tools to synthesize investment strategies that strictly adhere to internal firm risk policies.
The system is designed as a cyclic state machine. Unlike linear RAG pipelines, this agent can backtrack, call multiple tools, and must pass a Compliance Bouncer before finalizing any output.
.
├── src/ # Core Pipeline Stages
│ ├── 00_fetch_substack.py # RSS parser for analyst newsletters
│ ├── 00_fetch.py # Institutional PDF downloader
│ ├── 01_ingest.py # Vector DB ETL (Chunking/Embedding)
│ ├── 02_filtered_query.py # CLI for testing the Metadata Firewall
│ ├── 02_query.py # Baseline semantic search test
│ └── 03_orchestrate.py # Main LangGraph & Bouncer orchestration
├── tools/ # Atomic Agent Tools
│ ├── internal_kb.py # ChromaDB policy search
│ ├── market.py # Live yfinance integration
│ └── news.py # Tavily API news integration
├── data/
│ ├── raw_documents/ # Source PDFs (World Bank 2026, etc)
│ └── processed_data/ # Cleaned Markdown sentiment files
├── tool_registry.py # LLM function schemas & mapping
├── vector_store/ # Persistent ChromaDB (SQLite + Vector bins)
├── pyproject.toml # Project manifest (uv managed)
├── uv.lock # Deterministic dependency lockfile
├── requirements.txt # Exported for legacy/pip environments
├── main.py # Project entry point
└── CONTRIBUTING.md # Developer guidelines
The agent is currently grounded in the following data tiers:
-
Ground Truth (Priority 1): * Institution reports from World Bank * Government Reports * Reports by various independent agencies
-
Compliance (Firewall): * Internal Risk Framework * Add your policy to data/raw_documents/
-
Market Sentiment (Priority 2): * Deep-dive newsletters from Substack.
This project uses uv for high-performance dependency management.
# Sync the environment and install dependencies
uv sync
Create a .env file or export your keys:
export ANTHROPIC_API_KEY='your_anthropic_key'
export OPENAI_API_KEY='your_openai_key'
export GEMINI_API_KEY='your_gemini_key' #not working at the moment
export TAVILY_API_KEY='your_tavily_key'
export GROQ_API_KEY='your_groq_key'
Execute the scripts in numeric order:
# Ingest data into ChromaDB
uv run src/01_ingest.py
# Launch the Agentic Loop
uv run src/03_orchestrate.py
The Bouncer Node in 03_orchestrate.py implements a mandatory two-tier validation:
- Internal KB Check: Rejects drafts if the agent fails to query the
Internal_Risk_Framework. - Fact-Checking: Runs a specialized pass to identify common financial hallucinations before the final strategy is released.
Distributed under the GNU AFFERO GENERAL PUBLIC License. See LICENSE for more information.