Your personal AI assistant that lives in your pocket.
Multi-agent AI with browser automation, coding, voice input, persistent notepads, and smart scheduling—including optional genius-mode reasoning for complex tasks.
Features • Architecture • Quick Start • Docker • Configuration • Observability
Pocket Assistant AI is an intelligent personal assistant that runs on Telegram. It remembers your conversations, learns your preferences, and can perform complex tasks like browsing the web, writing code, and managing your schedule.
Unlike simple chatbots, Pocket Assistant uses a multi-agent architecture with persistent memory and optional high-capability reasoning:
- Main Agent — Orchestrates conversations, routes tasks, and can use a genius model for complex reasoning when you ask for it
- Browser Agent — Automates web browsing with intelligent planning
- Coder Agent — Writes, edits, and manages code projects
- Notepads — Persistent notes and data logs so the assistant remembers context and outcomes across runs and scheduled tasks
These are real prompts you can copy, paste, and tweak to see what Pocket Assistant AI can do.
You: Remind me to call mom tomorrow at 3pm
Bot: Schedule created! I'll remind you tomorrow at 3:00 PM.
You: Every Monday at 9am, remind me about standup
Bot: Recurring schedule created for every Monday at 9:00 AM.
You: Every day at 9am analyze my portfolio and summarize, use genius mode
Bot: Schedule created with genius mode — I'll use the high-capability model for the analysis and keep notes in the schedule notepad.
The assistant can create and update notepads to remember data across runs. Scheduled tasks get their own notepad automatically; you can also ask it to track ad-hoc data.
You: Track the BTC price in a notepad and tell me when it crosses 100k
Bot: I'll create a notepad and update it when we check the price...
[Later runs read the notepad and see history]
You: Go to Hacker News and tell me the top 3 stories
Bot: I'll browse Hacker News for you...
[Takes screenshots, extracts information]
Here are the top 3 stories: ...
You: Clone github.com/user/repo and add a README file
Bot: Starting coding task...
Using project folder: repo
Cloning repository...
Creating README.md...
Done! Created README.md with project description.
You: [Send a voice message saying "What's the weather like today?"]
Bot: Voice transcribed: "What's the weather like today?"
Processing...
[Bot responds to your question]
You: Transcribe this for me
You: [Send a voice message]
Bot: Transcription:
[Your spoken words as text]
You: Check if api.github.com is up
Bot: HTTP 200 OK
{"current_user_url":"https://api.github.com/user"...}
You can send these as-is to create powerful scheduled or one-off tasks. Customize times and URLs to your needs.
Daily crypto analysis (browser + genius model + notepad)
Create a schedule that runs every morning, opens CoinGecko, captures the page, and uses the genius model for professional-style analysis:
Every morning at 9:35:
Act as a senior and professional crypto trader:
1. Open https://www.coingecko.com/en/coins/bitcoin in the browser
2. Take a screenshot and extract all the data (chart situation, current price, and all other important data)
3. Use genius model to analyse it as a professional crypto trader
4. Give me a short message including comparison, signal and prediction
The bot will create a recurring schedule with genius mode, use the browser agent to capture the page, and the schedule notepad to keep context (e.g. previous price, trend) across runs.
- Powered by LLMs via OpenRouter (supports GPT-4, Claude, Gemini, DeepSeek, and more)
- Two-layer memory: short-term conversation history plus long-term semantic memory for important facts and preferences
- Semantic search enriches context by retrieving relevant past conversations and stored facts
- Genius model — optional high-capability model for complex reasoning (e.g. scheduled analysis); enable per task with "use genius mode"
- Learns your preferences through the "Soul" personalization system
- Plans and executes complex web tasks step by step
- Takes screenshots and extracts information from pages
- Handles dynamic content, forms, and multi-step workflows
- Built on standalone Playwright for reliable full-browser control
- Also supports Browser MCP (
browsermcp.io) so you can run the same browser tools in MCP-compatible environments
- Clone repositories and work on code projects
- Read, write, and edit files with intelligent context
- Search code with grep, manage git branches
- Run commands and build scripts
- Real-time progress updates as it works
- Natural language: "Remind me tomorrow at 5pm"
- Recurring tasks with cron expressions
- Genius mode — optionally use a high-capability model (e.g. DeepSeek V3) for complex scheduled reasoning (analysis, trends, decisions)
- Automatic cleanup of old schedules
- Context-aware reminders
- Each schedule has a persistent notepad so the agent remembers context and outcomes across runs
- Cross-run memory — the AI maintains notepads (notes, key-value state, time-series data logs) that persist across conversations and scheduled runs
- Perfect for tracking metrics, decisions, and context (e.g. "last price", "trend", "what we decided")
- Used automatically by scheduled tasks; also available as tools (
listNotepads,readNotepad,updateNotepad) for ad-hoc tracking - Organize by category; scannable, concise storage so the agent stays in context without clutter
- Send voice messages and the AI will transcribe and act on them
- Powered by Groq's Whisper large-v3-turbo model
- Say "transcribe this" before sending a voice message to get transcription only
- Fast and accurate speech-to-text conversion
- Make API calls directly from chat
- Support for all HTTP methods
- Custom headers and authentication
- Perfect for checking RSS feeds, APIs, webhooks
- Whitelist-based access control
- SSRF protection for HTTP requests
- Input sanitization against prompt injection
- Sandboxed code execution environment
┌─────────────────────────────────────────────────────────────────┐
│ Telegram Bot │
│ (nestjs-telegraf) │
└─────────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Messaging Abstraction │
│ (IMessagingService - extensible) │
└─────────────────────────────┬───────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────────────────────┐
│ Main Agent │
│ (LangGraph ReAct) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Tools │ │ Memory │ │ Soul │ │ Scheduler│ │ Notepad │ │
│ └────┬─────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└───────┼───────────────────────────────────────────────────────────────────────┘
│
├─────────────────┬─────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Browser Agent │ │ Coder Agent │ │ HTTP Client │
│ (Playwright) │ │ (Git + FS) │ │ (fetch) │
└───────────────┘ └───────────────┘ └───────────────┘
| Component | Description |
|---|---|
| Main Agent | LangGraph-based ReAct agent that handles conversations and routes to sub-agents |
| Browser Agent | Plans complex web tasks, executes them step-by-step with standalone Playwright or Browser MCP (browsermcp.io) |
| Coder Agent | Manages code projects with file operations, git, and command execution |
| Soul Service | Stores user preferences and personality settings |
| Memory Service | Two-layer memory: conversation history (with summarization) and long-term semantic memory via ChromaDB; vector search enriches context |
| Notepad Service | Persistent notepads per chat: notes, key-values, and time-series data logs so agents remember context and outcomes across runs (used by scheduler and tools) |
| Scheduler | Handles reminders and recurring tasks with cron support; optional genius model and per-job notepad for complex scheduled reasoning |
- Node.js 20+
- A Telegram bot token (from @BotFather)
- An OpenRouter API key (openrouter.ai)
# Clone the repository
git clone https://github.com/yourusername/pocket-assistant-ai.git
cd pocket-assistant-ai
# Install dependencies
npm install
# Copy environment template
cp .env.example .env
# Edit .env with your tokens
nano .envEdit .env with your credentials:
TELEGRAM_BOT_TOKEN=your_telegram_bot_token
OPENROUTER_API_KEY=your_openrouter_api_keyEdit data/config.json to add your Telegram user ID (created on first run if missing):
{
"security": {
"allowedUserIds": ["YOUR_USER_ID"]
}
}Don't know your user ID? Start the bot and send any message - it will show your ID.
# Start the app (with hot reload)
npm startThe app listens on port 29111 (used for the optional REST API channel when ENABLE_API_CHANNEL=true).
| Script | Description |
|---|---|
npm start |
Start the app with watch mode (hot reload) |
npm run build |
Compile the NestJS app |
npm run browser |
Run the browser helper script (opens Playwright browser) |
npm run memories:list |
Print all long-term memories from ChromaDB (requires ChromaDB running) |
npm run memories:reset |
Delete all long-term memories in ChromaDB. Optional: npm run memories:reset -- --chat=CHAT_ID to reset one chat only |
Docker is used to run ChromaDB (vector memory) and Langfuse (LLM observability). Run the app locally with npm start.
ChromaDB is the vector database used for long-term memory: it stores facts and preferences the assistant learns (e.g. “your wife’s name is Khatere”) and powers semantic search so the bot can recall relevant context. Each Telegram chat has its own collection; embeddings are generated via OpenRouter’s text-embedding-3-small.
Start ChromaDB:
docker compose up chromadb -d
# API: http://localhost:8100In .env, set CHROMA_HOST=http://localhost:8100 (this is the default). If ChromaDB is unavailable, long-term memory features (memorySave, memorySearch) are disabled.
Inspect or reset memories (CLI):
# List all long-term memories (all chats)
npm run memories:list
# Reset all long-term memories
npm run memories:reset
# Reset long-term memory for one chat only
npm run memories:reset -- --chat=132995226These scripts require ChromaDB to be running. They connect to ChromaDB only and do not start the Nest app.
docker compose up langfuse-web langfuse-db -d
# Dashboard: http://localhost:31111Then in .env set LANGFUSE_HOST=http://localhost:31111 and add your keys from the Langfuse project settings.
docker compose up -ddocker build -t pocket-assistant-ai .| Variable | Required | Description |
|---|---|---|
TELEGRAM_BOT_TOKEN |
Yes | Bot token from BotFather |
OPENROUTER_API_KEY |
Yes | API key from OpenRouter |
GROQ_API_KEY |
No | Groq API key for voice transcription |
CHROMA_HOST |
No | ChromaDB URL for vector memory (default: http://localhost:8100) |
ZAPIER_MCP_TOKEN |
No | Zapier MCP integration token |
LANGFUSE_PUBLIC_KEY |
No | Langfuse public key for observability |
LANGFUSE_SECRET_KEY |
No | Langfuse secret key |
LANGFUSE_HOST |
No | Langfuse host URL |
ENABLE_API_CHANNEL |
No | Enable REST API alongside Telegram |
Configuration is loaded from data/config.json (created with defaults on first run). Example:
{
"security": {
"allowedUserIds": ["123456789"]
},
"model": "google/gemini-2.0-flash-001",
"vision_model": "google/gemini-2.0-flash-001",
"coder_model": "anthropic/claude-sonnet-4",
"genius_model": "deepseek/deepseek-v3.2"
}Choose any model from OpenRouter:
| Role | Purpose | Example |
|---|---|---|
model |
Main conversations and routing | google/gemini-2.0-flash-001, deepseek/deepseek-v3.2 |
vision_model |
Browser and image understanding | openai/gpt-4o-mini |
coder_model |
Code editing and analysis | anthropic/claude-sonnet-4, google/gemini-2.5-pro |
genius_model |
Complex reasoning (e.g. scheduled analysis, "genius mode") | deepseek/deepseek-v3.2 |
- Genius model is used when you enable "genius mode" on a schedule (e.g. "remind me to analyze trends every morning, use genius mode") for deeper reasoning without slowing everyday chat.
Pocket Assistant integrates with Langfuse for LLM observability:
- Trace all LLM calls with timing and token usage
- Debug agent reasoning and tool calls
- Monitor costs and performance
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.comThe included docker-compose.yml runs only Langfuse (and its Postgres). Run the app with npm start.
docker compose up -d
# Dashboard: http://localhost:31111
# In .env: LANGFUSE_HOST=http://localhost:31111| Command | Description |
|---|---|
/start |
Start the bot / Begin setup |
/help |
Show help message |
/clear |
Clear conversation history |
/tools |
List available tools |
/profile |
View your profile settings |
/schedules |
View scheduled reminders |
/resetprofile |
Reset and redo setup |
pocket-assistant-ai/
├── src/
│ ├── agent/ # Main agent orchestration
│ ├── ai/ # AI helper services
│ ├── browser/ # Browser automation agent
│ ├── coder/ # Code assistant agent
│ ├── config/ # Configuration management
│ ├── logger/ # Logging and tracing
│ ├── memory/ # Conversation + long-term memory (ChromaDB vector store)
│ ├── messaging/ # Messaging abstraction layer
│ ├── model/ # Model factory (main, vision, coder, genius)
│ ├── notepad/ # Persistent notepads (notes, key-values, data logs) per chat
│ ├── prompts/ # Prompt templates (YAML)
│ ├── scheduler/ # Task scheduling (cron, genius mode, schedule notepads)
│ ├── soul/ # User personalization
│ ├── telegram/ # Telegram integration
│ ├── usage/ # Token usage tracking
│ └── utils/ # Utilities and sanitization
├── data/
│ ├── config.json # Application config (created on first run)
│ ├── prompts/ # YAML prompt files
│ └── {userId}/ # Per-user: memory, longterm-memory, schedules, notepads/, soul, etc.
├── docker-compose.yml # ChromaDB (8100) + Langfuse (31111)
└── Dockerfile # Production container
The messaging layer is abstracted via IMessagingService. To add a new channel (e.g., REST API, Discord):
- Create a new service implementing
IMessagingService - Register it in
MessagingModule - Set
ENABLE_API_CHANNEL=truefor multi-channel mode
See src/messaging/api-messaging.service.ts for an example.
Edit src/agent/tools.service.ts:
private createMyNewTool(chatId: string) {
return tool(
async (input: { param: string }) => {
// Your tool logic
return 'Result';
},
{
name: 'myNewTool',
description: 'What this tool does',
schema: z.object({
param: z.string().describe('Parameter description'),
}),
},
);
}Prompts are stored in prompts/*.yaml and support hot-reload:
# prompts/main-agent.yaml
base: |
You are a helpful AI assistant...
capabilities:
browser: |
You can browse the web using executeBrowserTask...- NestJS - Node.js framework
- LangChain - LLM framework
- LangGraph - Agent orchestration
- Telegraf - Telegram bot framework
- Playwright - Browser automation
- OpenRouter - LLM API gateway
- Langfuse - LLM observability



