Pocket Assistant AI

Your personal AI assistant that lives in your pocket.
Multi-agent AI with browser automation, coding, voice input, persistent notepads, and smart scheduling—including optional genius-mode reasoning for complex tasks.

Features • Architecture • Quick Start • Docker • Configuration • Observability

What is Pocket Assistant AI?

Pocket Assistant AI is an intelligent personal assistant that runs on Telegram. It remembers your conversations, learns your preferences, and can perform complex tasks like browsing the web, writing code, and managing your schedule.

Unlike simple chatbots, Pocket Assistant uses a multi-agent architecture with persistent memory and optional high-capability reasoning:

Main Agent — Orchestrates conversations, routes tasks, and can use a genius model for complex reasoning when you ask for it
Browser Agent — Automates web browsing with intelligent planning
Coder Agent — Writes, edits, and manages code projects
Notepads — Persistent notes and data logs so the assistant remembers context and outcomes across runs and scheduled tasks

Prompt Examples

These are real prompts you can copy, paste, and tweak to see what Pocket Assistant AI can do.

Scheduling

You: Remind me to call mom tomorrow at 3pm
Bot: Schedule created! I'll remind you tomorrow at 3:00 PM.

You: Every Monday at 9am, remind me about standup
Bot: Recurring schedule created for every Monday at 9:00 AM.

You: Every day at 9am analyze my portfolio and summarize, use genius mode
Bot: Schedule created with genius mode — I'll use the high-capability model for the analysis and keep notes in the schedule notepad.

Notepad (persistent context)

The assistant can create and update notepads to remember data across runs. Scheduled tasks get their own notepad automatically; you can also ask it to track ad-hoc data.

You: Track the BTC price in a notepad and tell me when it crosses 100k
Bot: I'll create a notepad and update it when we check the price...
     [Later runs read the notepad and see history]

Browser Tasks

You: Go to Hacker News and tell me the top 3 stories
Bot: I'll browse Hacker News for you...
     [Takes screenshots, extracts information]
     Here are the top 3 stories: ...

Coding

You: Clone github.com/user/repo and add a README file
Bot: Starting coding task...
     Using project folder: repo
     Cloning repository...
     Creating README.md...
     Done! Created README.md with project description.

Voice Input

You: [Send a voice message saying "What's the weather like today?"]
Bot: Voice transcribed: "What's the weather like today?"
     Processing...
     [Bot responds to your question]

You: Transcribe this for me
You: [Send a voice message]
Bot: Transcription:
     [Your spoken words as text]

API Calls

You: Check if api.github.com is up
Bot: HTTP 200 OK
     {"current_user_url":"https://api.github.com/user"...}

Example prompts (copy & paste)

You can send these as-is to create powerful scheduled or one-off tasks. Customize times and URLs to your needs.

Daily crypto analysis (browser + genius model + notepad)

Create a schedule that runs every morning, opens CoinGecko, captures the page, and uses the genius model for professional-style analysis:

Every morning at 9:35:

Act as a senior and professional crypto trader:

1. Open https://www.coingecko.com/en/coins/bitcoin in the browser
2. Take a screenshot and extract all the data (chart situation, current price, and all other important data)
3. Use genius model to analyse it as a professional crypto trader
4. Give me a short message including comparison, signal and prediction

The bot will create a recurring schedule with genius mode, use the browser agent to capture the page, and the schedule notepad to keep context (e.g. previous price, trend) across runs.

Features

Intelligent Conversations

Powered by LLMs via OpenRouter (supports GPT-4, Claude, Gemini, DeepSeek, and more)
Two-layer memory: short-term conversation history plus long-term semantic memory for important facts and preferences
Semantic search enriches context by retrieving relevant past conversations and stored facts
Genius model — optional high-capability model for complex reasoning (e.g. scheduled analysis); enable per task with "use genius mode"
Learns your preferences through the "Soul" personalization system

Browser Automation

Plans and executes complex web tasks step by step
Takes screenshots and extracts information from pages
Handles dynamic content, forms, and multi-step workflows
Built on standalone Playwright for reliable full-browser control
Also supports Browser MCP (browsermcp.io) so you can run the same browser tools in MCP-compatible environments

Code Assistant

Clone repositories and work on code projects
Read, write, and edit files with intelligent context
Search code with grep, manage git branches
Run commands and build scripts
Real-time progress updates as it works

Smart Scheduling

Natural language: "Remind me tomorrow at 5pm"
Recurring tasks with cron expressions
Genius mode — optionally use a high-capability model (e.g. DeepSeek V3) for complex scheduled reasoning (analysis, trends, decisions)
Automatic cleanup of old schedules
Context-aware reminders
Each schedule has a persistent notepad so the agent remembers context and outcomes across runs

Notepad — Persistent Agent Memory

Cross-run memory — the AI maintains notepads (notes, key-value state, time-series data logs) that persist across conversations and scheduled runs
Perfect for tracking metrics, decisions, and context (e.g. "last price", "trend", "what we decided")
Used automatically by scheduled tasks; also available as tools (listNotepads, readNotepad, updateNotepad) for ad-hoc tracking
Organize by category; scannable, concise storage so the agent stays in context without clutter

Voice Input

Send voice messages and the AI will transcribe and act on them
Powered by Groq's Whisper large-v3-turbo model
Say "transcribe this" before sending a voice message to get transcription only
Fast and accurate speech-to-text conversion

HTTP Requests

Make API calls directly from chat
Support for all HTTP methods
Custom headers and authentication
Perfect for checking RSS feeds, APIs, webhooks

Security

Whitelist-based access control
SSRF protection for HTTP requests
Input sanitization against prompt injection
Sandboxed code execution environment

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Telegram Bot                             │
│                     (nestjs-telegraf)                           │
└─────────────────────────────┬───────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Messaging Abstraction                        │
│            (IMessagingService - extensible)                     │
└─────────────────────────────┬───────────────────────────────────┘
                              │
                              ▼
┌───────────────────────────────────────────────────────────────────────────────┐
│                        Main Agent                                             │
│                    (LangGraph ReAct)                                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐         │
│  │  Tools   │  │  Memory  │  │   Soul   │  │ Scheduler│  │ Notepad  │         │
│  └────┬─────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘         │
└───────┼───────────────────────────────────────────────────────────────────────┘
        │
        ├─────────────────┬─────────────────┐
        ▼                 ▼                 ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Browser Agent │ │  Coder Agent  │ │  HTTP Client  │
│  (Playwright) │ │  (Git + FS)   │ │   (fetch)     │
└───────────────┘ └───────────────┘ └───────────────┘

Key Components

Component	Description
Main Agent	LangGraph-based ReAct agent that handles conversations and routes to sub-agents
Browser Agent	Plans complex web tasks, executes them step-by-step with standalone Playwright or Browser MCP (`browsermcp.io`)
Coder Agent	Manages code projects with file operations, git, and command execution
Soul Service	Stores user preferences and personality settings
Memory Service	Two-layer memory: conversation history (with summarization) and long-term semantic memory via ChromaDB; vector search enriches context
Notepad Service	Persistent notepads per chat: notes, key-values, and time-series data logs so agents remember context and outcomes across runs (used by scheduler and tools)
Scheduler	Handles reminders and recurring tasks with cron support; optional genius model and per-job notepad for complex scheduled reasoning

Quick Start

Prerequisites

Node.js 20+
A Telegram bot token (from @BotFather)
An OpenRouter API key (openrouter.ai)

Installation

# Clone the repository
git clone https://github.com/yourusername/pocket-assistant-ai.git
cd pocket-assistant-ai

# Install dependencies
npm install

# Copy environment template
cp .env.example .env

# Edit .env with your tokens
nano .env

Configuration

Edit .env with your credentials:

TELEGRAM_BOT_TOKEN=your_telegram_bot_token
OPENROUTER_API_KEY=your_openrouter_api_key

Edit data/config.json to add your Telegram user ID (created on first run if missing):

{
  "security": {
    "allowedUserIds": ["YOUR_USER_ID"]
  }
}

Don't know your user ID? Start the bot and send any message - it will show your ID.

Run

# Start the app (with hot reload)
npm start

The app listens on port 29111 (used for the optional REST API channel when ENABLE_API_CHANNEL=true).

Package scripts

Script	Description
`npm start`	Start the app with watch mode (hot reload)
`npm run build`	Compile the NestJS app
`npm run browser`	Run the browser helper script (opens Playwright browser)
`npm run memories:list`	Print all long-term memories from ChromaDB (requires ChromaDB running)
`npm run memories:reset`	Delete all long-term memories in ChromaDB. Optional: `npm run memories:reset -- --chat=CHAT_ID` to reset one chat only

Docker

Docker is used to run ChromaDB (vector memory) and Langfuse (LLM observability). Run the app locally with npm start.

ChromaDB (long-term memory)

ChromaDB is the vector database used for long-term memory: it stores facts and preferences the assistant learns (e.g. “your wife’s name is Khatere”) and powers semantic search so the bot can recall relevant context. Each Telegram chat has its own collection; embeddings are generated via OpenRouter’s text-embedding-3-small.

Start ChromaDB:

docker compose up chromadb -d
# API: http://localhost:8100

In .env, set CHROMA_HOST=http://localhost:8100 (this is the default). If ChromaDB is unavailable, long-term memory features (memorySave, memorySearch) are disabled.

Inspect or reset memories (CLI):

# List all long-term memories (all chats)
npm run memories:list

# Reset all long-term memories
npm run memories:reset

# Reset long-term memory for one chat only
npm run memories:reset -- --chat=132995226

These scripts require ChromaDB to be running. They connect to ChromaDB only and do not start the Nest app.

Start Langfuse (optional)

docker compose up langfuse-web langfuse-db -d
# Dashboard: http://localhost:31111

Then in .env set LANGFUSE_HOST=http://localhost:31111 and add your keys from the Langfuse project settings.

Start all services

docker compose up -d

Build app image (optional)

docker build -t pocket-assistant-ai .

Environment Variables

Variable	Required	Description
`TELEGRAM_BOT_TOKEN`	Yes	Bot token from BotFather
`OPENROUTER_API_KEY`	Yes	API key from OpenRouter
`GROQ_API_KEY`	No	Groq API key for voice transcription
`CHROMA_HOST`	No	ChromaDB URL for vector memory (default: http://localhost:8100)
`ZAPIER_MCP_TOKEN`	No	Zapier MCP integration token
`LANGFUSE_PUBLIC_KEY`	No	Langfuse public key for observability
`LANGFUSE_SECRET_KEY`	No	Langfuse secret key
`LANGFUSE_HOST`	No	Langfuse host URL
`ENABLE_API_CHANNEL`	No	Enable REST API alongside Telegram

Configuration

data/config.json

Configuration is loaded from data/config.json (created with defaults on first run). Example:

{
  "security": {
    "allowedUserIds": ["123456789"]
  },
  "model": "google/gemini-2.0-flash-001",
  "vision_model": "google/gemini-2.0-flash-001",
  "coder_model": "anthropic/claude-sonnet-4",
  "genius_model": "deepseek/deepseek-v3.2"
}

Model Selection

Choose any model from OpenRouter:

Role	Purpose	Example
`model`	Main conversations and routing	`google/gemini-2.0-flash-001`, `deepseek/deepseek-v3.2`
`vision_model`	Browser and image understanding	`openai/gpt-4o-mini`
`coder_model`	Code editing and analysis	`anthropic/claude-sonnet-4`, `google/gemini-2.5-pro`
`genius_model`	Complex reasoning (e.g. scheduled analysis, "genius mode")	`deepseek/deepseek-v3.2`

Genius model is used when you enable "genius mode" on a schedule (e.g. "remind me to analyze trends every morning, use genius mode") for deeper reasoning without slowing everyday chat.

Observability

Langfuse Integration

Pocket Assistant integrates with Langfuse for LLM observability:

Trace all LLM calls with timing and token usage
Debug agent reasoning and tool calls
Monitor costs and performance

Using Langfuse Cloud

LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com

Self-Hosted Langfuse

The included docker-compose.yml runs only Langfuse (and its Postgres). Run the app with npm start.

docker compose up -d
# Dashboard: http://localhost:31111
# In .env: LANGFUSE_HOST=http://localhost:31111

Bot Commands

Command	Description
`/start`	Start the bot / Begin setup
`/help`	Show help message
`/clear`	Clear conversation history
`/tools`	List available tools
`/profile`	View your profile settings
`/schedules`	View scheduled reminders
`/resetprofile`	Reset and redo setup

Project Structure

pocket-assistant-ai/
├── src/
│   ├── agent/           # Main agent orchestration
│   ├── ai/              # AI helper services
│   ├── browser/         # Browser automation agent
│   ├── coder/           # Code assistant agent
│   ├── config/          # Configuration management
│   ├── logger/          # Logging and tracing
│   ├── memory/          # Conversation + long-term memory (ChromaDB vector store)
│   ├── messaging/       # Messaging abstraction layer
│   ├── model/           # Model factory (main, vision, coder, genius)
│   ├── notepad/         # Persistent notepads (notes, key-values, data logs) per chat
│   ├── prompts/         # Prompt templates (YAML)
│   ├── scheduler/       # Task scheduling (cron, genius mode, schedule notepads)
│   ├── soul/            # User personalization
│   ├── telegram/        # Telegram integration
│   ├── usage/           # Token usage tracking
│   └── utils/           # Utilities and sanitization
├── data/
│   ├── config.json      # Application config (created on first run)
│   ├── prompts/         # YAML prompt files
│   └── {userId}/        # Per-user: memory, longterm-memory, schedules, notepads/, soul, etc.
├── docker-compose.yml   # ChromaDB (8100) + Langfuse (31111)
└── Dockerfile           # Production container

Extending

Adding New Messaging Channels

The messaging layer is abstracted via IMessagingService. To add a new channel (e.g., REST API, Discord):

Create a new service implementing IMessagingService
Register it in MessagingModule
Set ENABLE_API_CHANNEL=true for multi-channel mode

See src/messaging/api-messaging.service.ts for an example.

Adding New Tools

Edit src/agent/tools.service.ts:

private createMyNewTool(chatId: string) {
  return tool(
    async (input: { param: string }) => {
      // Your tool logic
      return 'Result';
    },
    {
      name: 'myNewTool',
      description: 'What this tool does',
      schema: z.object({
        param: z.string().describe('Parameter description'),
      }),
    },
  );
}

Custom Prompts

Prompts are stored in prompts/*.yaml and support hot-reload:

# prompts/main-agent.yaml
base: |
  You are a helpful AI assistant...

capabilities:
  browser: |
    You can browse the web using executeBrowserTask...

Tech Stack

NestJS - Node.js framework
LangChain - LLM framework
LangGraph - Agent orchestration
Telegraf - Telegram bot framework
Playwright - Browser automation
OpenRouter - LLM API gateway
Langfuse - LLM observability

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
prompts		prompts
scripts		scripts
src		src
test		test
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.prettierrc		.prettierrc
Dockerfile		Dockerfile
README.md		README.md
demo1.gif		demo1.gif
demo2.gif		demo2.gif
demo3.gif		demo3.gif
demo4.gif		demo4.gif
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
nest-cli.json		nest-cli.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Pocket Assistant AI

What is Pocket Assistant AI?

Prompt Examples

Scheduling

Notepad (persistent context)

Browser Tasks

Coding

Voice Input

API Calls

Example prompts (copy & paste)

Features

Intelligent Conversations

Browser Automation

Code Assistant

Smart Scheduling

Notepad — Persistent Agent Memory

Voice Input

HTTP Requests

Security

Architecture

Key Components

Quick Start

Prerequisites

Installation

Configuration

Run

Package scripts

Docker

ChromaDB (long-term memory)

Start Langfuse (optional)

Start all services

Build app image (optional)

Environment Variables

Configuration

data/config.json

Model Selection

Observability

Langfuse Integration

Using Langfuse Cloud

Self-Hosted Langfuse

Bot Commands

Project Structure

Extending

Adding New Messaging Channels

Adding New Tools

Custom Prompts

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages