Hermon is an AI Agent OS and inference gateway written in Rust. It routes requests to multiple LLM backends (Ollama, OpenAI, Anthropic, llama.cpp) through a unified API, with built-in authentication, token economics, session management, and audit logging.
- Multi-backend routing -- Ollama, OpenAI, Anthropic, and llama.cpp HTTP server adapters
- Three API surfaces -- Hermon native, OpenAI-compatible (
/v1/), and Ollama-compatible (/ollama/api/) - Token economics -- Tiered entitlements (Free/Standard/Admin/Custom), sliding-window rate limiting, quota enforcement, usage tracking
- Identity & auth -- JWT (HS512, 15-min access + 30-day refresh tokens), API keys (
hrmn_sk_{env}_{key}), Argon2id password hashing - Session management -- Conversation-attached sessions with auto-expiry and background cleanup
- Audit logging -- Append-only event log with SHA-256 hash chain integrity verification
- Streaming -- SSE streaming on all API surfaces (OpenAI
data:format, Ollama NDJSON) - Rust 2024 edition -- Workspace of 31 crates, async throughout via Tokio
- Rust 1.85+ (2024 edition)
- Ollama running locally (default
http://localhost:11434)
# Build
cargo build --release
# Run with Ollama backend (auto-detected)
./target/release/hermond
# Or with explicit config
OLLAMA_URL=http://localhost:11434 \
HERMON_ADMIN_PASSWORD=your-secure-password \
./target/release/hermond --port 11488# OpenAI
OPENAI_API_KEY=sk-... ./target/release/hermond
# Anthropic
ANTHROPIC_API_KEY=sk-ant-... ./target/release/hermond
# llama.cpp server
LLAMACPP_URL=http://localhost:8080 ./target/release/hermond
# All backends simultaneously
OLLAMA_URL=http://localhost:11434 \
OPENAI_API_KEY=sk-... \
ANTHROPIC_API_KEY=sk-ant-... \
LLAMACPP_URL=http://localhost:8080 \
./target/release/hermond# Login (returns JWT access + refresh tokens)
curl -X POST http://localhost:11488/api/auth/login \
-H 'Content-Type: application/json' \
-d '{"username":"admin","password":"your-password"}'
# Use the access_token in subsequent requests
TOKEN="eyJ..."Any OpenAI SDK or client works out of the box:
# Chat completion
curl http://localhost:11488/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{
"model": "llama3.2:3b",
"messages": [{"role":"user","content":"Hello!"}],
"stream": false
}'
# List models
curl http://localhost:11488/v1/models \
-H "Authorization: Bearer $TOKEN"# Chat
curl http://localhost:11488/ollama/api/chat \
-H 'Content-Type: application/json' \
-d '{
"model": "llama3.2:3b",
"messages": [{"role":"user","content":"Hello!"}],
"stream": false
}'
# List models
curl http://localhost:11488/ollama/api/tags# Chat
curl http://localhost:11488/api/chat \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{
"model_id": "llama3.2:3b",
"messages": [{"role":"user","parts":[{"Text":{"text":"Hello!"}}]}]
}'
# Session management
curl -X POST http://localhost:11488/api/sessions \
-H "Authorization: Bearer $TOKEN"
# Usage & quotas
curl http://localhost:11488/api/usage \
-H "Authorization: Bearer $TOKEN"
# Health check
curl http://localhost:11488/api/health hermond (server binary)
|
+---------------+---------------+
| | |
/api/* /v1/* /ollama/api/*
(Native API) (OpenAI-compat) (Ollama-compat)
| | |
+-------+-------+-------+-------+
| |
Identity/Auth Entitlements
(JWT, API keys) (rate limits, quotas)
| |
+-------+-------+
|
InferenceBackend trait
|
+--------+-------+-------+--------+
| | | |
Ollama OpenAI Anthropic llama.cpp
hermon/
crates/
hermon-types/ # Foundation types (messages, events, IDs)
hermon-config/ # Configuration loading
hermon-inference/ # InferenceBackend trait + types
hermon-backend-ollama/ # Ollama HTTP adapter
hermon-backend-openai/ # OpenAI API adapter
hermon-backend-anthropic/ # Anthropic Messages API adapter
hermon-backend-llamacpp-http/ # llama.cpp server adapter
hermon-router/ # Capability-aware backend routing
hermon-identity/ # Auth: JWT, API keys, passwords
hermon-authz/ # Authorization & access control
hermon-entitlements/ # Token economics, rate limits, quotas
hermon-audit/ # Append-only audit log with hash chain
hermon-session/ # Session lifecycle management
hermon-api-native/ # Hermon native REST API
hermon-api-openai/ # OpenAI-compatible API projection
hermon-api-ollama/ # Ollama-compatible API projection
hermon-core/ # Facade re-export crate
hermon-agents/ # Agent runtime (planned)
hermon-tools/ # Tool execution framework (planned)
hermon-tasks/ # Task management (planned)
hermon-context/ # Context/memory management (planned)
hermon-governance/ # Policy & permissions (planned)
hermon-checkpoint/ # Workspace snapshots (planned)
hermon-extension/ # Skills, hooks, plugins (planned)
hermon-storage/ # Conversation persistence (planned)
hermon-surface/ # API surface definitions (planned)
services/
hermond/ # Main server daemon (port 11488)
hermon-cli/ # CLI tool
sdks/
hermon-sdk-rs/ # Rust client SDK
hermon-sdk-ffi/ # C FFI bindings
hermon-sdk-ts/ # TypeScript SDK codegen
| Environment Variable | Default | Description |
|---|---|---|
HERMON_HOST |
0.0.0.0 |
Bind address |
HERMON_PORT |
11488 |
Bind port |
OLLAMA_URL |
http://localhost:11434 |
Ollama server URL |
OPENAI_API_KEY |
(none) | OpenAI API key (enables OpenAI backend) |
OPENAI_BASE_URL |
https://api.openai.com |
OpenAI base URL |
ANTHROPIC_API_KEY |
(none) | Anthropic API key (enables Anthropic backend) |
ANTHROPIC_BASE_URL |
https://api.anthropic.com |
Anthropic base URL |
LLAMACPP_URL |
(none) | llama.cpp server URL (enables llama.cpp backend) |
HERMON_ADMIN_USER |
admin |
Bootstrap admin username |
HERMON_ADMIN_PASSWORD |
hermon-admin |
Bootstrap admin password (change in production) |
HERMON_SESSION_TTL |
3600 |
Session time-to-live in seconds |
HERMON_MAX_SESSIONS |
10 |
Max concurrent sessions per user |
Hermon implements tiered access control modeled after Claude and ChatGPT:
| Tier | Requests/min | Requests/day | Tokens/min | Tokens/day |
|---|---|---|---|---|
| Free | 20 | 2,000 | 40,000 | 500,000 |
| Standard | 60 | 10,000 | 200,000 | 5,000,000 |
| Admin | 200 | 100,000 | 1,000,000 | 50,000,000 |
Rate limiting uses sliding window counters. Usage is tracked per-user with full audit trails.
- Passwords hashed with Argon2id (reject < 8 characters)
- JWT tokens signed with HS512 using Ed25519 key material
- API keys use HMAC-SHA256 hashing with prefix-based lookup
- Audit log uses SHA-256 hash chain for tamper detection
- CORS middleware enabled
- All auth endpoints require valid tokens or API keys
# Run all tests
cargo test
# Build debug
cargo build
# Run with tracing
RUST_LOG=debug cargo run -p hermond
# Check for warnings
cargo clippy --workspaceApache-2.0