Skip to content

unifyai/unity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10,166 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Unity

MIT License Docs CI Discord Built by Unify

Unity

Open-source virtual teammates that take voice and video calls β€” and let you interrupt, redirect, or pause them mid-task without restarting.

Unity's three-layer architecture: a Fast Brain on a real-time voice/video call with the user, a Slow Brain (ConversationManager) that always stays present, and an Actor (background reasoner) that does the deep work β€” extending the interaction-model / background-model pattern with a third supervisory tier.

Hop on a call with one. Send a follow-up text. Drop them a calendar invite. They remember who you are next time, what you talked about last week, and what they promised to do about it.

Most agents stop the moment you talk. They make you wait for a tool call to finish, then re-explain when you change your mind. Unity's teammates stay listening through everything β€” chat, voice, phone, video, screen-share β€” and treat your interjections, corrections, and questions as first-class inputs rather than interruptions to recover from. Whether the assistant is researching flights, drafting an email, or sitting on a live call with a vendor, you can ask "how's it going?", say "actually do X instead", or pause for ten minutes β€” without losing context.

It's built around long-lived state, not one-shot conversations. Contacts, projects, files, knowledge, and follow-ups persist as queryable structure β€” so a teammate remembers who Sarah is, what the Henderson project is about, and what they committed to on your behalf last Wednesday, regardless of which channel you raised it on.

Start here: console.unify.ai β€” try a teammate in 60 seconds β€’ Overview β€’ Quickstart β€’ ARCHITECTURE.md


What this feels like

You          β–Έ  "Find me flights to Tokyo for next month."
Unity        β–Έ  (starts searching)
You          β–Έ  "Actually, also check trains to Osaka."
Unity        β–Έ  (adjusts the in-flight search β€” doesn't restart)
You          β–Έ  "Pause that, something urgent."
Unity        β–Έ  (freezes exactly where it is)
... five minutes later ...
You          β–Έ  "OK, resume. How's it going?"
Unity        β–Έ  (picks up where it left off, gives you a status update)
Unity        β–Έ  (on a live phone call with a vendor)
You          β–Έ  (in a side chat) "Don't agree to anything over $5k."
Unity        β–Έ  (the constraint reaches the call mid-conversation)
Unity        β–Έ  Three tasks running at once.
                  [0] research_flights   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘  in progress
                  [1] draft_summary      β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘  in progress
                  [2] find_restaurants   β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘  starting
                Each one independently inspectable, steerable, and pausable.

Highlights

πŸŽ™οΈ Takes calls like a personLive voice, phone, and video calls β€” with screen-share and webcam frames streamed to the assistant in real time. Not a tool that initiates a call; a participant in the conversation.
βœ‹ Interruptible mid-taskEvery operation can be paused, resumed, redirected, or queried while it's running. Including operations nested inside other operations, all the way down.
🧠 Plans in code, not tool-by-toolMulti-step work becomes one coherent program with variables, loops, and control flow β€” instead of a noisy chain of one-tool-at-a-time decisions.
πŸ“ž One identity across every channelChat, SMS, email, phone, voice, video β€” all feed the same persistent memory. The assistant remembers who Sarah is whether she texted, called, or mailed you.
πŸ“š Structured memory, not transcript soupContacts, knowledge, tasks, files, and procedures live in typed, queryable tables β€” distilled from your conversations every fifty messages.
βš™οΈ Learns reusable functions, not just markdownAfter a successful trajectory, the assistant can save executable Python (with metadata and a venv) β€” so the next session can compose it into a plan, not re-derive it.
πŸ”€ Concurrent work, independently steerableMultiple actions can run at once. Pause one, redirect another, ask a third for a status update β€” without affecting the rest.
⏰ Schedules and triggers in plain English"Every Monday at 9, summarize my unread emails" or "Ping me whenever Alice emails about invoices." Recurring jobs and event triggers are described in natural language, executed by the same agent loop β€” and can graduate into stored functions after enough successful runs.
πŸ”Œ Local-first, fully openRuntime, persistence backend, LLM client, and Python SDK are all open-source and run locally with one Docker command. Hosted backend optional.

Try one

There are two paths, depending on whether you want to meet a teammate or run the whole stack yourself.

🌐 Hosted β€” fastest

The lowest-friction path is the hosted product at console.unify.ai. Sign in with Google, get matched with a teammate, and start chatting in about a minute. No install, no Docker, no API keys to manage. Voice, video, telephony, and integrations are all turn-key.

πŸ’» Self-host β€” fully open

Run the whole stack on your own machine. Runtime, persistence backend, LLM client, and Python SDK are all open-source β€” see Self-host below.

No signup required. The local installer auto-generates a synthetic API key for the bundled Orchestra and wires everything together. The only key you bring is one LLM provider key (OpenAI or Anthropic).


Self-host

By default, Unity's open-core install is fully local: the runtime, the LLM client, and the persistence backend (Orchestra, via Docker) all run on your machine. The hosted product at console.unify.ai is optional β€” Unity does not depend on it for any local feature.

Prerequisites:

  • Python 3.12+ (the installer will fetch it with uv if needed)
  • Docker (runs the local Orchestra backend)
  • PortAudio for audio support
    • macOS: brew install portaudio
    • Ubuntu/Debian: sudo apt-get install portaudio19-dev python3-dev
  • One LLM provider key β€” OpenAI or Anthropic are the simplest paths

Install:

curl -fsSL https://raw.githubusercontent.com/unifyai/unity/main/scripts/install.sh | bash

The installer clones unity, unify, unillm, and orchestra as siblings under ~/.unity/, installs dependencies, creates a unity CLI shim in ~/.local/bin/, boots a local Orchestra in Docker, generates a local API key for the bundled Orchestra, and wires ORCHESTRA_URL and that auto-generated key into ~/.unity/unity/.env. No Unify account or external signup is required.

Add one model provider key to ~/.unity/unity/.env:

OPENAI_API_KEY=sk-...
# or
ANTHROPIC_API_KEY=...

Run the sandbox:

unity --project_name Sandbox --overwrite

At the configuration prompt:

Option What it gives you
1 Top-level orchestration only β€” useful for isolating the conversation layer
2 The full runtime: orchestration + planning + simulated managers
3 Option 2 plus desktop/browser control through agent-service

If you're evaluating Unity as a runtime, start with option 2.

> msg Hey, can you help me organize my upcoming week?
> sms I need to reschedule my meeting with Sarah to Thursday
> email Project Update | Here are the Q3 numbers you asked for...

Other unity subcommands: unity setup, unity status, unity stop, unity restart, unity help.

Skip the local Orchestra (point at your own deployment)
curl -fsSL https://raw.githubusercontent.com/unifyai/unity/main/scripts/install.sh | bash -s -- --skip-setup

That leaves the code installed but doesn't spin up Orchestra. You'll need to point Unity at your own Orchestra deployment (or another team's shared one) via ORCHESTRA_URL and a matching API key in ~/.unity/unity/.env.

Manual install (no installer script)
git clone https://github.com/unifyai/unity.git      ~/.unity/unity
git clone https://github.com/unifyai/unify.git      ~/.unity/unify
git clone https://github.com/unifyai/unillm.git     ~/.unity/unillm
git clone https://github.com/unifyai/orchestra.git  ~/.unity/orchestra

cd ~/.unity/unity
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync

cd ~/.unity/orchestra
poetry install
ORCHESTRA_INACTIVITY_TIMEOUT_SECONDS=0 scripts/local.sh start
# Copy the ORCHESTRA_URL and UNIFY_KEY it prints into ~/.unity/unity/.env

The installer copies .env.example to .env (intentionally minimal). For voice mode, live calls, hosted comms, LiveKit, Tavily, or visual caching, see .env.advanced.example and sandboxes/conversation_manager/README.md.


How it works

Unity follows the interaction-model / background-model split recently articulated by Thinking Machines β€” implemented at the harness level, against any LLM you already use.

A persistent interaction loop (the ConversationManager) stays present with the user across every medium. When work needs deeper reasoning than the conversation can produce instantly, it dispatches a background reasoner (the Actor), which writes Python plans over a back office of typed state managers. Crucially, every operation in the system returns a live, steerable handle β€” and those handles nest. A correction the user makes in chat propagates down through the dispatched action, into whatever manager call is currently running.

flowchart TB
    classDef interaction fill:#fce7f3,stroke:#be185d,stroke-width:2px,color:#1f2937
    classDef actor fill:#bbf7d0,stroke:#15803d,stroke-width:2px,color:#1f2937
    classDef neutral fill:#f9fafb,stroke:#9ca3af,stroke-width:1px,color:#374151
    classDef accent fill:#1f2937,stroke:#000,stroke-width:1px,color:#fef3c7

    User(["User"]):::neutral
    Mediums["πŸ’¬ chat  Β·  πŸ“ž voice / phone  Β·  πŸŽ₯ video / screen-share  Β·  βœ‰οΈ email Β· SMS"]:::neutral
    Broker["⚑ Event Broker"]:::accent
    CM["<b>ConversationManager</b> Β· interaction loop (always present)<br/>per-handle steering tools: pause Β· resume Β· interject Β· stop Β· ask"]:::interaction
    Actor["<b>Actor</b> Β· background reasoner<br/>writes Python that composes primitives.*"]:::actor
    BackOffice["<b>The Back Office</b> Β· typed state managers, English-language APIs<br/>Contacts Β· Knowledge Β· Tasks Β· Transcripts Β· Files Β· Images Β· Web Β· Secrets Β· βš™οΈ Functions Β· πŸ“– Guidance"]:::neutral

    User ==> Mediums ==> Broker ==> CM
    CM ==>|"act(...)"| Actor
    Actor ==>|"primitives.*"| BackOffice

    BackOffice -.->|"SteerableToolHandle"| Actor
    Actor -.->|"SteerableToolHandle + notifications"| CM
    CM -.->|"streamed responses"| User
Loading

Solid arrows are dispatch flow. Dotted arrows are the steering bus β€” every level returns the same SteerableToolHandle type, so steering signals propagate down through the call stack while results and notifications propagate up.

Why this matters: nested steering in action

This is the demo no other framework can run. The user's mid-flight redirect doesn't abort the run, doesn't append a second prompt, and doesn't wait for the next tool boundary β€” it propagates through the live nested call stack as a typed signal.

sequenceDiagram
    autonumber
    actor User
    participant CM as ConversationManager
    participant Ax as Actor
    participant TM as TranscriptManager

    User->>CM: "find when Sarah last mentioned Berlin"
    CM->>Ax: act(prompt)
    activate Ax
    Ax-->>CM: handle_A (SteerableToolHandle)
    Note over CM: handle_A stored in<br/>in_flight_actions
    Ax->>TM: transcripts.ask(...)
    activate TM
    TM-->>Ax: handle_B (nested SteerableToolHandle)

    User->>CM: "actually include emails too"
    Note over CM: slow brain wakes,<br/>picks the steering tool<br/>for handle_A
    CM->>Ax: handle_A.interject("...also emails")
    Ax->>TM: handle_B.interject("...also emails")
    TM-->>Ax: refined results
    deactivate TM
    Ax-->>CM: notification (intermediate progress)
    CM-->>User: "scanning emails too..."
    Ax-->>CM: handle_A.result
    deactivate Ax
    CM-->>User: final answer
Loading

How does this compare to other open-source agents?

The clearest way to see what's distinctive about Unity is to draw the same diagram for adjacent projects, using the same visual language. Pink means persistent supervising loop (only Unity has one). Click to expand.

OpenClaw β€” channel-first dispatcher + single Pi agent loop
flowchart TB
    classDef agent fill:#bbf7d0,stroke:#15803d,stroke-width:2px,color:#1f2937
    classDef neutral fill:#f9fafb,stroke:#9ca3af,stroke-width:1px,color:#374151
    classDef dispatch fill:#fed7aa,stroke:#c2410c,stroke-width:2px,color:#1f2937

    User(["User"]):::neutral
    Channels["πŸ’¬ Telegram Β· Discord Β· Slack Β· SMS Β· Nodes (devices)"]:::neutral
    Gateway["<b>Gateway daemon</b> Β· dispatcher<br/>per-session lane (1 active run); steer = abort + redeliver"]:::dispatch
    PiAgent["<b>Pi embedded agent</b> Β· single tool-calling loop<br/>no supervising loop runs in parallel"]:::agent
    Tools["<b>Tools</b> Β· core + plugin + MCP bridge<br/>core (web Β· exec Β· sessions_spawn) Β· πŸ“ž voice-call plugin (discrete actions: initiate Β· speak Β· end) Β· mcporter β†’ MCP servers"]:::neutral
    State["<b>State</b> Β· local-first artefacts<br/>JSONL sessions Β· workspace files (πŸ“– SKILL.md Β· SOUL.md Β· AGENTS.md) Β· memory plugin (one slot at a time)"]:::neutral

    User ==> Channels ==> Gateway
    Gateway ==>|"start / abort run"| PiAgent
    PiAgent ==> Tools
    PiAgent <==> State
Loading

OpenClaw is a local-first control plane with a wide channel matrix and a plugin marketplace. The Gateway dispatches runs but doesn't supervise them; voice is a plugin tool the agent invokes through discrete actions; steering is implemented as abort-and-redeliver. OpenClaw's VISION.md explicitly takes "no agent-hierarchy frameworks (manager-of-managers)" as a non-goal β€” a deliberate, principled bet in the opposite direction from Unity. If you want a personal-assistant product with broad channel coverage, OpenClaw is excellent. If you want a runtime built around mid-task steering and structured long-lived state, Unity is shaped differently.

Hermes Agent β€” many surfaces, one monolithic loop
flowchart TB
    classDef agent fill:#bbf7d0,stroke:#15803d,stroke-width:2px,color:#1f2937
    classDef neutral fill:#f9fafb,stroke:#9ca3af,stroke-width:1px,color:#374151
    classDef trigger fill:#fed7aa,stroke:#c2410c,stroke-width:2px,color:#1f2937

    User(["User"]):::neutral
    Cron["⏰ cron + webhooks (automation triggers)"]:::trigger
    Surfaces["πŸ’¬ CLI Β· TUI Β· Gateway (Telegram Β· Discord Β· Slack Β· SMS) Β· ACP (IDE)"]:::neutral
    AIAgent["<b>AIAgent</b> Β· single ~12k-LOC sync tool-calling loop<br/>steer() = inject text into next tool result; interrupt() = thread-scoped abort flag"]:::agent
    Tools["<b>Tools</b><br/>native tools Β· execute_code (ephemeral Python against fixed RPC stubs) Β· TTS / voice_mode / SMS (no live phone call) Β· delegate_tool Β· MCP servers"]:::neutral
    State["<b>State</b><br/>SQLite sessions + FTS5 Β· MEMORY.md / USER.md workspace files Β· πŸ“– SKILL.md library Β· memory provider plugin (mem0 Β· honcho Β· ...)"]:::neutral

    User ==> Surfaces
    Cron ==> Surfaces
    Surfaces ==> AIAgent
    AIAgent ==> Tools
    AIAgent <==> State
Loading

Hermes pairs a single ~12k-LOC AIAgent loop with four surfaces (CLI, TUI, gateway, ACP), a deep markdown skills library, SQLite+FTS5 transcripts, and best-in-class cron / webhook automation. Steering is implemented as text injection into the next tool result; interrupt is a thread-scoped flag. Live telephony isn't in the repo β€” SMS is, voice is local-only. If you want a polished personal-agent product with a wide messaging surface, broad model support, and mature automation triggers, Hermes is excellent. Unity is making a different bet on what the orchestration layer should look like.

A small bit of history. This architecture has been running in Unity since 2025 β€” well ahead of the wider conversation about it. For the record:

  • SteerableToolHandle (the universal steering protocol) β€” first commit September 23, 2025. That predates OpenClaw's first commit (Nov 24, 2025), Hermes Agent's interrupt() (Feb 3, 2026) and steer() (Apr 18, 2026).
  • ConversationManager + dual-brain LiveKit voice β€” first commit November 12, 2025. That predates OpenClaw's voice-call plugin (Jan 11, 2026) by two months.
  • The two-tier interaction-loop / background-reasoner pattern as a whole β€” operational since November 2025. The Thinking Machines paper that articulated the same architecture was published May 11, 2026, six months later.

We're not claiming foresight; the convergence is just interesting if you find architectural archaeology fun. Repo dates verifiable in git log.


Under the hood

Steerable handles β€” the universal protocol

Every public manager method returns one. The same ask, interject, pause, resume, stop surface, regardless of whether you're talking to the top-level orchestrator or a deeply nested knowledge query.

handle = await actor.act("Research flights to Tokyo and draft an itinerary")

# Twenty seconds later, while it's still working:
await handle.interject("Also check train options from Tokyo to Osaka")

# Or if something urgent comes up:
await handle.pause()
# ... deal with the urgent thing ...
await handle.resume()

When the Actor calls primitives.contacts.ask(...), the ContactManager starts its own tool loop and returns its own handle β€” nested inside the Actor's handle, which is nested inside the ConversationManager's. Steering at any level propagates.

CodeAct β€” the Actor writes Python programs

Most agents emit one JSON tool call at a time and let the LLM stitch results together across turns. Unity's Actor writes a single Python program per turn over typed primitives.*:

contacts = await primitives.contacts.ask(
    "Who was involved in the Henderson project?"
)
for contact in contacts:
    history = await primitives.knowledge.ask(
        f"What was {contact} last working on?"
    )
    await primitives.contacts.update(
        f"Send {contact} a catch-up email referencing {history}"
    )

This runs in a sandboxed execution session. Variables, loops, real control flow. A contact lookup β†’ knowledge retrieval β†’ outbound communication becomes one coherent plan rather than three separate tool-selection turns β€” and the LLM can express intermediate computation directly instead of round-tripping through tool messages.

Dual-brain voice and video

Live calls run as two coordinated brains:

  • Slow brain β€” the ConversationManager. Sees the full picture: all conversations, in-flight actions, structured memory. Makes deliberate decisions. Runs in the main process.
  • Fast brain β€” a real-time voice agent on LiveKit, running as a separate subprocess. Sub-second latency. Handles turn-taking and direct conversation autonomously.

They communicate over IPC. When the slow brain wants to guide the conversation, it sends one of:

  • SPEAK β€” "say exactly this" (bypasses the fast brain's LLM)
  • NOTIFY β€” "here's some context, decide what to do with it"
  • BLOCK β€” nothing; the fast brain keeps going on its own

Screen-share frames and webcam frames stream to both brains simultaneously, so the fast brain can answer "can you see my screen?" without round-tripping, while the slow brain incorporates visual context into longer-running plans.

Functions and Guidance β€” a dual library

Unity maintains two persistent libraries that the Actor draws from on every session:

  • FunctionManager β€” executable Python (with metadata and a venv) that the Actor composes into plans.
  • GuidanceManager β€” procedural how-to prose: SOPs, software walkthroughs, multi-step strategies.

After a successful trajectory, a proactive reviewer loop (store_skills) can extract both β€” code worth keeping, and the procedural narrative for using it. The next session consults both before reaching for raw tools, by design.

Schedules and triggers, described in plain English

Recurring and triggered work isn't configured with cron expressions or webhook YAML β€” it's described to the agent in natural language and stored as a Task with schedule and repeat (for cadences) or trigger (for event matches). When the time arrives or the trigger fires, a contained Actor run wakes up, reads the task's description, and figures out how to do it.

That same task can graduate over time. After enough successful description-driven runs, the storage-review loop can persist the trajectory as a stored function β€” at which point the recurring task runs in a hidden, headless lane against that function rather than re-planning from scratch each time. So "summarize my unread emails every Monday at 9" starts out as a paragraph the agent interprets, and gradually becomes an entrypoint it just calls.

Memory consolidation

Every fifty messages, the MemoryManager runs a background extraction pass over the new transcript window. It distills:

  • Contact profiles β€” who people are, their roles, relationships
  • Per-contact summaries β€” what you've been discussing, sentiment, themes
  • Response policies β€” how each person prefers to be communicated with
  • Domain knowledge β€” project details, preferences, long-term facts
  • Tasks β€” things you committed to, deadlines, follow-ups

These end up in typed, queryable tables β€” not freeform transcript summaries.

Concurrent steerable actions

β”Œβ”€ In-Flight Actions ────────────────────────────────┐
β”‚                                                     β”‚
β”‚  [0] research_flights  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘  In progress   β”‚
β”‚      β†’ ask, interject, stop, pause                  β”‚
β”‚                                                     β”‚
β”‚  [1] draft_summary     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘  In progress   β”‚
β”‚      β†’ ask, interject, stop, pause                  β”‚
β”‚                                                     β”‚
β”‚  [2] find_restaurants   β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘  Starting      β”‚
β”‚      β†’ ask, interject, stop, pause                  β”‚
β”‚                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each action gets its own dynamically-generated steering tools attached to the slow brain's tool surface. You can inspect, interject into, pause, resume, or stop one action without affecting the others.


Architecture

For the full architectural breakdown β€” async tool loop internals, event bus, primitive registry, hosted deployment SPI β€” see ARCHITECTURE.md. At a glance:

ConversationManager (interaction loop, event-driven scheduling)
    β”‚
    β”‚   Slow Brain ◄── IPC ──► Fast Brain (real-time voice + video, LiveKit)
    β”‚
    β–Ό
CodeActActor (generates Python plans, calls primitives.* APIs)
    β”‚
    β–Ό
State Managers (each runs its own async LLM tool loop)
    β”‚
    β”œβ”€β”€ ContactManager        β€” people and relationships
    β”œβ”€β”€ KnowledgeManager      β€” domain facts, structured knowledge
    β”œβ”€β”€ TaskScheduler         β€” durable tasks, schedules, triggers, execution with live handles
    β”œβ”€β”€ TranscriptManager     β€” conversation history and search
    β”œβ”€β”€ GuidanceManager       β€” procedures, SOPs, how-to knowledge
    β”œβ”€β”€ FileManager           β€” file parsing and registry
    β”œβ”€β”€ ImageManager          β€” image storage, vision queries
    β”œβ”€β”€ FunctionManager       β€” user-defined functions, primitives registry
    β”œβ”€β”€ WebSearcher           β€” web research orchestration
    β”œβ”€β”€ SecretManager         β€” encrypted secret storage
    β”œβ”€β”€ BlacklistManager      β€” blocked contact details
    └── DataManager           β€” low-level data operations
    β”‚
    β”œβ”€β”€ EventBus              β€” typed pub/sub backbone (Pydantic events)
    └── MemoryManager         β€” offline consolidation every 50 messages

How a request flows

  1. A user message arrives on any medium. The slow brain renders a full state snapshot and makes a single-shot tool decision.
  2. It starts an action via actor.act(...) β†’ gets back a SteerableToolHandle, registered in in_flight_actions.
  3. The Actor generates a Python plan calling typed primitives. Each primitive dispatches to a manager running its own LLM tool loop, returning its own steerable handle.
  4. Meanwhile, the slow brain can start more work, steer existing work, or guide the fast brain during voice/video calls.
  5. The MemoryManager observes message events and periodically distills conversations into structured knowledge.
  6. The EventBus carries typed events with hierarchy labels aligned to tool-loop lineage, making everything observable.

The runtime stack

Unity is one of four MIT-licensed repos that make up the runtime. The installer wires them together for the local install; you can also use any of them independently.

Repo Role
unity (this) The agent runtime β€” managers, tool loops, CodeAct, voice, orchestration
orchestra Persistence backend β€” FastAPI + Postgres + pgvector. Installer spins it up locally in Docker
unify Python SDK β€” the client Unity uses to talk to Orchestra
unillm LLM access layer β€” OpenAI, Anthropic, or any compatible endpoint

Running the tests

Tests exercise the real system (steerable handles, CodeAct, manager composition, nested tool loops) against simulated backends with cached LLM responses:

uv sync --all-groups
source .venv/bin/activate

tests/parallel_run.sh tests/                    # everything
tests/parallel_run.sh tests/actor/              # one module
tests/parallel_run.sh tests/contact_manager/    # another

See tests/README.md for the full philosophy β€” responses are cached, not mocked. Delete the cache and you're re-evaluating against live models.


Where to start reading

File What's there
unity/common/async_tool_loop.py SteerableToolHandle β€” the protocol everything returns
unity/common/_async_tool/loop.py The async tool loop engine β€” nesting, steering, context propagation
unity/actor/code_act_actor.py CodeAct β€” plan generation, sandbox, primitives
unity/conversation_manager/conversation_manager.py Dual-brain orchestration, debouncing, in-flight actions
unity/conversation_manager/domains/brain_action_tools.py How the brain starts, steers, and tracks concurrent work
unity/conversation_manager/domains/call_manager.py LiveKit subprocess + voice/video event wiring
unity/function_manager/primitives/registry.py How primitives are assembled into the typed API surface
unity/events/event_bus.py Typed event backbone
unity/memory_manager/memory_manager.py Offline consolidation pipeline

Project structure

unity/
β”œβ”€β”€ unity/
β”‚   β”œβ”€β”€ actor/                    # CodeActActor
β”‚   β”œβ”€β”€ conversation_manager/     # Dual-brain orchestration
β”‚   β”‚   └── domains/              # Brain tools, action tracking, rendering
β”‚   β”œβ”€β”€ common/
β”‚   β”‚   β”œβ”€β”€ async_tool_loop.py    # SteerableToolHandle
β”‚   β”‚   └── _async_tool/          # Tool loop internals
β”‚   β”œβ”€β”€ contact_manager/
β”‚   β”œβ”€β”€ knowledge_manager/
β”‚   β”œβ”€β”€ task_scheduler/
β”‚   β”œβ”€β”€ transcript_manager/
β”‚   β”œβ”€β”€ guidance_manager/
β”‚   β”œβ”€β”€ memory_manager/
β”‚   β”œβ”€β”€ function_manager/
β”‚   β”œβ”€β”€ file_manager/
β”‚   β”œβ”€β”€ image_manager/
β”‚   β”œβ”€β”€ web_searcher/
β”‚   β”œβ”€β”€ secret_manager/
β”‚   β”œβ”€β”€ events/
β”‚   └── manager_registry.py
β”œβ”€β”€ sandboxes/                    # Interactive playgrounds
β”‚   └── conversation_manager/     # Full ConversationManager sandbox (start here)
β”œβ”€β”€ tests/
β”œβ”€β”€ agent-service/                # Node.js desktop/browser automation
└── deploy/                       # Dockerfile, Cloud Build, virtual desktop

License

MIT β€” see LICENSE.

Built by the team at Unify.

About

The recursive realtime agent πŸ”

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors