Documentation

Get started with Aura

Aura gives your AI agent persistent memory — no embeddings, no vector database, no LLM calls for memory operations. Pure Rust, sub-millisecond recall.

Quickstart

Install

terminal

pip install aura-memory

Requires Python 3.9+. Works on Windows, macOS, Linux. No system dependencies.

Your first memory in 30 seconds

hello_memory.py

from aura import Aura

brain = Aura("./my_brain")

# Store knowledge
brain.store("Deploy to staging first. Never push straight to prod.")
brain.store("Our database is PostgreSQL. Auth uses JWT, 15-minute tokens.")
brain.store("User prefers concise answers without bullet points.")

# Recall relevant context — <1ms, no API call
results = brain.recall("deployment rules", limit=3)
for r in results:
    print(r.content)

# Output:
# Deploy to staging first. Never push straight to prod.

The brain persists to disk. Next time you run the script — it remembers everything.

Add memory to any LLM

with_claude.py

from aura import Aura
import anthropic

brain = Aura("./brain")
client = anthropic.Anthropic()  # ANTHROPIC_API_KEY

def chat(message: str) -> str:
    # Pull relevant memories before each message
    memories = brain.recall(message, limit=5)
    context = "\n".join(f"- {m.content}" for m in memories)

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        system=f"You are a helpful assistant.\n\nWhat you remember:\n{context}",
        messages=[{"role": "user", "content": message}]
    )

    brain.store(f"User asked: {message}")
    return response.content[0].text

# Teach the agent
brain.store("The user is building a FastAPI service with PostgreSQL.")
brain.store("The user prefers async endpoints.")

print(chat("What database should I use?"))
# → Answers with PostgreSQL from memory

Core Concepts

What is Aura?

Aura is a cognitive memory layer that runs alongside your AI agent. It stores what the agent learns, organizes it automatically, and surfaces the most relevant context when the agent needs to answer a question.

Unlike a vector database, Aura doesn't just store and retrieve. It forms beliefs from accumulated evidence, discovers causal patterns over time, and generates advisory hints — all without any LLM calls.

The brain

Everything starts with Aura("./path"). This creates or opens a brain at the given directory. The brain stores everything to disk — it persists between script runs, server restarts, and deployments.

from aura import Aura

brain = Aura("./my_agent_brain")
# Creates: my_agent_brain/brain.cog, beliefs.cog, etc.

Store and Recall

Two operations you'll use constantly:

# Store — write something to memory
brain.store("content", tags=["optional", "tags"])

# Recall — find relevant memories for a query
results = brain.recall("your query", limit=5)
for r in results:
    print(r.content)      # the stored text
    print(r.confidence)   # how reliable is this memory
    print(r.tags)         # associated tags

How memory becomes knowledge

Aura automatically organizes stored records into higher layers during maintenance cycles:

RecordsRaw stored facts — what you wrote with store()

BeliefsGroups of related records weighted by evidence and confidence

ConceptsAbstractions discovered across stable beliefs

Causal PatternsCause-effect relationships found in the memory graph

Policy HintsAdvisory guidance derived from patterns — "prefer staging deploys"

Run brain.run_maintenance() to trigger a cycle. In production, use the background daemon.

Integrations

Aura works with any LLM framework. The pattern is always the same: recall before the LLM call, store after.

Claude (Anthropic)

claude_agent.py

from aura import Aura
import anthropic

brain = Aura("./brain")
client = anthropic.Anthropic()

memories = brain.recall(user_message, limit=5)
context = "\n".join(f"- {m.content}" for m in memories)

response = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    system=f"Assistant with memory:\n{context}",
    messages=[{"role": "user", "content": user_message}]
)

Full example

Gemini (Google)

gemini_agent.py

from aura import Aura
import google.generativeai as genai

brain = Aura("./brain")
genai.configure(api_key="YOUR_KEY")
model = genai.GenerativeModel("gemini-2.5-flash-lite")

memories = brain.recall(user_message, limit=5)
context = "\n".join(f"- {m.content}" for m in memories)
prompt = f"Memory:\n{context}\n\nUser: {user_message}"

response = model.generate_content(prompt)
brain.store(f"User asked: {user_message}")

Full example — cheap model + memory vs expensive model alone

Ollama (local, no API key)

ollama_agent.py

from aura import Aura
import requests

brain = Aura("./brain")

memories = brain.recall(user_message, limit=5)
context = "\n".join(f"- {m.content}" for m in memories)

r = requests.post("http://localhost:11434/api/generate", json={
    "model": "gemma3n:e4b",
    "prompt": f"Memory:\n{context}\n\nUser: {user_message}",
    "stream": False
})
brain.store(f"User asked: {user_message}")

CrewAI

crewai_agent.py

from aura import Aura
from crewai.tools import tool

brain = Aura("./brain")

@tool("remember")
def remember(content: str) -> str:
    """Store important information in long-term memory."""
    brain.store(content)
    return "Stored."

@tool("recall")
def recall_memory(query: str) -> str:
    """Search long-term memory for relevant information."""
    results = brain.recall(query, limit=5)
    return "\n".join(r.content for r in results) or "Nothing found."

All integrations

Claude SDK Gemini Ollama CrewAI LangChain LlamaIndex OpenAI AutoGen FastAPI

Memory Layers

When you store something, you can specify how long it should persist. Aura uses four levels — memories decay naturally and promote based on how often they're accessed.

Level	Lifespan	Use for
Level.Working	Hours	Current session context, temporary notes
Level.Decisions	Days	Decisions made, tasks in progress
Level.Domain	Weeks	Project knowledge, team preferences
Level.Identity	Months+	Core user traits, permanent rules

from aura import Aura, Level

brain = Aura("./brain")

# Persists for months — core identity
brain.store("User is a senior backend engineer", level=Level.Identity)

# Persists for weeks — project knowledge
brain.store("We use PostgreSQL, not MySQL", level=Level.Domain)

# Persists for days — active work
brain.store("Working on auth module this week", level=Level.Decisions)

# Default — current session
brain.store("User asked about deployment just now")

API Reference

Core methods

brain.store(content, level?, tags?, namespace?)

Store a memory. Returns the created Record.

brain.recall(query, limit?, tags?, namespace?)

Recall relevant memories. Returns list of Records sorted by relevance.

brain.run_maintenance()

Run a maintenance cycle. Forms beliefs, concepts, causal patterns, policy hints from stored records.

brain.get_surfaced_policy_hints(limit?)

Get advisory hints derived from accumulated memory patterns.

brain.get_metacognitive_context()

Query the brain's own knowledge state — confidence, freshness, conflicts.

brain.recall_with_epistemic_context(query, limit?)

Recall records plus their epistemic state in one call.

Record fields

record = brain.recall("something")[0]

record.id           # unique record ID
record.content      # stored text
record.level        # memory level (Working/Decisions/Domain/Identity)
record.tags         # list of tags
record.confidence   # reliability score 0.0–1.0
record.namespace    # isolation namespace
record.created_at   # unix timestamp

MCP server

Aura includes a native MCP server — works with Claude Desktop, Cursor, VS Code, and any MCP client.

# Start MCP server
python -m aura.mcp --brain ./my_brain

# Or via CLI
aura-mcp --brain ./my_brain --port 8765

Examples

Ready-to-run scripts for every major use case. Clone the repo and run directly.

basic_usage.py

Store, recall, levels, tags — full basics walkthrough

claude_sdk_agent.py

System prompt injection + tool use with Claude

gemini_aura_demo.py

Cheap model + memory vs expensive model alone

ollama_agent.py

Fully local — no API key required

crewai_agent.py

Multi-agent crew with persistent memory tools

langchain_agent.py

Drop-in memory via prompt injection

fastapi_middleware.py

Per-user memory isolation in a web API

research_bot.py

Research agent that accumulates findings

maintenance_daemon.py

Background brain maintenance in production

encryption.py

ChaCha20 encrypted brain at rest

Try in browser — no install needed

Open the Colab notebook and run Aura in your browser in under 2 minutes.

Open in Colab