Your agent's context is your attack surface. Act accordingly.
Secure context engineering for production AI agents.
Content security. Integrity verification. Trust hierarchy. Context that improves itself.
Website • Docs • Blog • Quickstart • Security Guide
Agents are getting compromised. Not theoretically — right now.
- EchoLeak (CVE-2025-32711, CVSS 9.3) — a single email triggered zero-click data exfiltration from Microsoft 365 Copilot1
- CrewAI + GPT-4o — researchers achieved 65% exfiltration success rate against multi-agent systems (COLM 2025)2
- Drift chatbot cascade — one compromised chatbot integration cascaded into 700+ organizations via Salesforce, Google Workspace, Slack, S3, and Azure3
- OWASP Top 10 for Agentic Applications published December 2025 — memory and context manipulation is a top risk category4
Agent A's output is Agent B's instruction. Memory is the vector.
Every other memory layer trusts content by default. That is the vulnerability.
We audited the docs, repos, and changelogs of every major memory tool.5 These protections do not exist anywhere else:
| Security Feature | mem0 | Zep | Letta | Aegis |
|---|---|---|---|---|
| Content injection detection | — | — | — | 4-stage pipeline |
| Memory integrity | — | — | — | HMAC-SHA256 |
| Agent identity binding | — | — | — | Cryptographic API key |
| Trust hierarchy | — | — | — | 4-tier OWASP model |
| Per-agent rate limiting | — | — | — | Sliding window |
| Security audit trail | — | — | — | Immutable event log |
| Sensitive data protection | — | — | — | Auto-detect + reject/redact/flag |
Aegis is the only OSS context hub. Four artifacts, one secure surface, one API call to load them all:
| Artifact | What it is | Endpoint |
|---|---|---|
| Prompts | Versioned, with one active version per name | /prompts/* |
| Memory | What we've always done — secure, ranked, decayed | /memories/* |
| Skills | Anthropic Agent Skills spec, semantic activation | /skills/* |
| Subagents | Delegation surface with tool + scope policy | /subagents/* |
| Bundle | Load all four in one call, token-budgeted | POST /context/load |
Every artifact: HMAC integrity-signed. Content-scanned. Trust-gated. Audit-logged.
from aegis_memory import AegisClient
client = AegisClient(api_key="...")
bundle = client.load_context(
agent_id="executor",
query="paginate the orders API",
token_budget=8000,
)
# → ranked memories + active prompt + matched skills + available subagents
# → integrity-verified across all four
# → token-budgeted to fit your modelOther context hubs (LangSmith, MindStudio) are closed-source. Other memory layers (mem0, Zep, Letta) stop at memory. Aegis does both, with security as the foundation.
Beyond storing memories, Aegis owns their lifecycle. mem0, Zep, and Letta ship variants of these primitives; what's distinct in Aegis is the audit-preserving and human-reviewable shape of each one — typed edges with explicit resolution states, consolidation that soft-deprecates rather than deletes.
Hybrid retrieval. Every query runs through dense (pgvector cosine) and sparse (PostgreSQL tsvector) channels, fused with Reciprocal Rank Fusion. Catches the exact-match cases (entity names, error codes, tool names, file paths) that pure embedding similarity blurs.
results = client.hybrid_query(query="ZX7-PAGE-94 cursor pagination", agent_id="executor")Contradiction detection. When two memories make incompatible claims, Aegis surfaces the conflict as a contradicts edge — a typed link with confidence and rationale. Resolve via API.
client.scan_contradictions(namespace="default")
unresolved = client.list_contradictions()
client.resolve_edge(edge_id=..., resolution="kept_source")
metrics = client.contradiction_metrics()
# → {"unresolved_contradictions": 3, "total_contradictions_detected": 17}Semantic consolidation. Real merge, not prefix matching. Embedding-similar memories above threshold get merged via heuristic or LLM, with full audit trail (losing memory stays queryable with is_deprecated=True and metadata.consolidated_into).
plan = client.consolidate_memories(dry_run=True) # review first
client.consolidate_memories(dry_run=False) # then applyAegis implements OWASP AI Agent Security recommendations natively. Six capabilities, none optional:
- 4-stage content security pipeline — input validation, sensitive data scanning, prompt injection detection, optional LLM-based injection classification. Every memory write. Not optional.
- HMAC-SHA256 integrity signing — tamper detection on store, verification on demand. You know if a memory was modified.
- OWASP 4-tier trust hierarchy — untrusted, internal, privileged, system. Agents get compromised. Aegis limits the blast radius.
- Cryptographic agent binding — API keys bound to agent identity. No more trusting a request body that says "I'm the admin agent."
- ACE loop — generation, reflection, curation. Agents that learn from their own mistakes and promote what works.
- Multi-agent coordination — scoped access control, cross-agent query, structured handoffs. Memory sharing with boundaries.
git clone https://github.com/quantifylabs/aegis-memory.git
cd aegis-memory
export OPENAI_API_KEY=sk-...
docker compose up -d
curl http://localhost:8000/health
# {"status": "healthy"}pip install aegis-memoryfrom aegis_memory import AegisClient
client = AegisClient(api_key="dev-key", base_url="http://localhost:8000")
# Planner agent stores task breakdown
client.add(
content="Task: Build login. Steps: 1) Form, 2) Validation, 3) API",
agent_id="planner",
scope="agent-shared",
shared_with_agents=["executor"]
)
# Executor queries planner's memories
memories = client.query_cross_agent(
query="current task",
requesting_agent_id="executor",
target_agent_ids=["planner"]
)
print(memories[0].content)Aegis Memory is the first context layer with a complete ACE loop — the Generation → Reflection → Curation cycle from Stanford/SambaNova's research, engineered for production.
Your agent made the same mistake 5 times? ACE loop remembers the fix forever. Stale memories polluting retrieval? Curation auto-cleans your playbook.
Generation Execution Reflection Curation
| | | |
Query playbook -> Run task with -> Auto-vote on -> Promote effective
for strategies tracked memories used memories Flag ineffective
Auto-reflect Consolidate duplicates
on failures
from aegis_memory import AegisClient
client = AegisClient(api_key="your-key")
# 1. GENERATION: Query agent-specific playbook
playbook = client.get_playbook_for_agent(
"executor",
query="API pagination task",
task_type="api-integration",
)
memory_ids = [e.id for e in playbook.entries]
# 2. EXECUTION: Track which memories the agent uses
run = client.start_run(
"task-42", "executor",
task_type="api-integration",
memory_ids_used=memory_ids,
)
# ... agent does its work ...
# 3. REFLECTION: Complete with outcome (auto-feedback!)
client.complete_run("task-42", success=True, evaluation={"score": 0.95})
# -> Auto-votes 'helpful' on every memory used
# -> On failure: auto-votes 'harmful' AND creates a reflection memory
# 4. CURATION: Periodically clean up
curation = client.curate(namespace="production")
# -> Promotes high-effectiveness entries
# -> Flags low-effectiveness for deprecation
# -> Identifies duplicate entries to consolidate| Feature | ACE-Inspired | Aegis ACE-Engineered |
|---|---|---|
| Voting | Manual vote endpoints | Auto-voting tied to run outcomes |
| Reflection | Manual reflection creation | Auto-reflection on failure with error context |
| Curation | Not implemented | Full curation cycle with promote/flag/consolidate |
| Run tracking | Not tracked | First-class ace_runs table linking memories to outcomes |
| Agent-specific playbook | Generic query | Filtered by agent_id + task_type |
Different tools solve different problems. This comparison stays focused on capabilities clearly documented in public repos and docs.5
| If you need... | Usually pick | Reason |
|---|---|---|
| Personalized assistant memory (user/profile facts) | mem0 | Designed around persistent user/agent memory for assistants |
| Personal/team "second brain" with ingestion | Supermemory | Knowledge-base style memory with connectors |
| Graph-native episodic memory over agent events | Graphiti / Zep | Focused on temporal + knowledge graph memory models |
| Stateful agent runtime + built-in memory blocks | Letta | Agent framework centered on durable state |
| Secure context engineering with built-in security, trust, and compliance | Aegis Memory | Only context layer with content security, integrity verification, and trust hierarchy |
| Multi-agent coordination with access boundaries | Aegis Memory | Scope-aware ACLs + cross-agent query APIs |
| Self-improving context loops (what worked / failed) | Aegis Memory | ACE patterns: vote, reflection, playbook |
Memory-depth primitives (hybrid retrieval, contradiction handling, consolidation) are now table stakes — mem0, Zep, Letta, and Aegis all ship variants in 2026.6 The differences are in how, not whether.
| Capability | mem0 | Graphiti / Zep | Letta | Aegis Memory |
|---|---|---|---|---|
| Primary focus | Assistant personalization | Graph-based episodic memory | Stateful agents | Secure context engineering |
| Open source | Yes | Yes | Yes | Yes |
| Self-host posture | Available | Available | Available | Self-host-first |
| Content security pipeline | — | — | — | 4-stage (validation, PII, injection, LLM) |
| Memory integrity | — | — | — | HMAC-SHA256 |
| Trust hierarchy | — | — | — | 4-tier OWASP model |
| Multi-agent ACL/scopes | — | — | — | Yes |
| Cross-agent query | — | — | — | Yes |
| Handoff baton | — | — | — | Yes |
| ACE loop | — | — | — | Yes |
| Typed memory model | — | — | — | Yes |
| Temporal decay | — | Partial | — | Yes |
| Hybrid retrieval (dense + sparse + RRF) | Semantic + BM25 + entity | Semantic + keyword + graph | Yes (RRF) | Yes (pgvector + tsvector + RRF) |
| Contradiction detection | Mem0g (graph variant, LLM) | LLM + temporal invalidation | — | Typed contradicts edge, cheap + optional LLM, explicit resolution workflow |
| Semantic consolidation | LLM-merge + DELETE losers | Temporal supersession | — | LLM/heuristic merge + audit-preserving (is_deprecated=True + consolidated_into) |
| Unified context hub (prompts + memory + skills + subagents) | — | — | — | Yes |
Pick Aegis Memory when most of these are true:
- You need content security — injection detection, integrity verification, sensitive data protection.
- You need multiple agents to share memory safely with explicit ACL/scopes.
- You need handoffs where one agent passes a reliable state bundle to another.
- You want ACE patterns (vote/reflection/playbook) to continuously improve memory quality.
- You want hybrid retrieval that catches exact-token cases (entity names, error codes, file paths) without giving up semantic similarity.
- You need contradiction tracking that's reviewable, not just auto-deleted — typed edges with explicit
kept_source/kept_target/both_valid/both_invalidresolutions, plus a/metricsendpoint for measuring epistemic conflict over time. - You need consolidation with an audit trail — losing memories stay queryable (
is_deprecated=True,metadata.consolidated_into) rather than being deleted. - You prefer a self-host posture with operational control over storage and deployment.
- You need temporal decay so stale memories don't pollute retrieval over time.
Benchmarked on 8 vCPU / 7.6 GB RAM (Intel 13th Gen), 1000 memories, Docker Compose (PostgreSQL 16 + pgvector), concurrency=10. Queries include OpenAI embedding latency. Reproduce with cd benchmarks && bash run_benchmark.sh.
| Operation | p50 | p95 | p99 | Throughput |
|---|---|---|---|---|
| Sequential add | 72ms | 89ms | 97ms | 14.1 ops/s |
| Batch add (5x20) | 216ms | 292ms | 292ms | 4.6 ops/s |
| Concurrent add (c=10) | 100ms | 193ms | 511ms | 85.1 ops/s |
| Sequential query | 282ms | 411ms | 1502ms | 3.8 ops/s |
| Concurrent query (c=10) | 413ms | 1832ms | 1897ms | 18.6 ops/s |
| Cross-agent query | 304ms | 380ms | 380ms | 3.3 ops/s |
| Vote | 64ms | 176ms | 176ms | 14.1 ops/s |
| Deduplication | 75ms | 112ms | 112ms | 13.6 ops/s |
Query tail latency (p95/p99) is dominated by the external OpenAI embedding call, not Aegis or PostgreSQL. Write and vote operations that skip embedding are consistently under 100ms at p50.
docker compose up -dkubectl apply -f k8s/| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://... |
PostgreSQL connection |
OPENAI_API_KEY |
— | For embeddings |
AEGIS_API_KEY |
dev-key |
API authentication |
CONTENT_POLICY_INJECTION |
flag |
reject / redact / flag / allow |
CONTENT_POLICY_SECRETS |
reject |
reject / redact / flag / allow |
ENABLE_LLM_INJECTION_CLASSIFIER |
false |
Enable Stage 4 LLM classifier |
INJECTION_CLASSIFIER_MODEL |
gpt-4o-mini |
Model for injection classification |
docs.aegismemory.com — Full documentation
- Quickstart — Get running in 5 minutes
- Security Guide — Content security, integrity, trust hierarchy
- ACE Patterns — Self-improving agent patterns
- Smart Memory — Zero-config memory extraction
- Integrations — CrewAI, LangChain guides
- CLI Reference — Command-line tools
We welcome contributions! See CONTRIBUTING.md for guidelines.
# Run tests
pytest tests/ -v
# Run linting
ruff check server/Apache 2.0 — Use it however you want. See LICENSE.
Built by engineers who read the OWASP reports and acted on them.
Footnotes
-
EchoLeak: Zero-click exfiltration from M365 Copilot. arxiv.org/html/2509.10540v1 ↩
-
Multi-agent exfiltration study (COLM 2025). openreview.net/pdf?id=DAozI4etUp ↩
-
CVE-2025-32711 zero-click AI vulnerability analysis. socprime.com/blog/cve-2025-32711-zero-click-ai-vulnerability/ ↩
-
OWASP Top 10 for Agentic Applications (2026). genai.owasp.org ↩
-
Security comparison based on public documentation and open-source repositories as of February 2026. Sources: mem0 docs | Zep docs | Letta repo | Aegis docs ↩ ↩2
-
Memory-depth feature claims verified May 2026 against vendor blogs and docs. Sources: mem0 State of AI Agent Memory 2026 (hybrid: semantic + BM25 + entity), mem0 architecture (consolidation, Mem0g contradiction resolver), Graphiti / Zep paper and Neo4j writeup (LLM-based edge contradiction with temporal invalidation), Letta archival search docs (RRF hybrid). Aegis design choices documented in server/contradiction_detector.py and server/consolidation.py. ↩