Hermon

Hermon is an AI Agent OS and inference gateway written in Rust. It routes requests to multiple LLM backends (Ollama, OpenAI, Anthropic, llama.cpp) through a unified API, with built-in authentication, token economics, session management, and audit logging.

Features

Multi-backend routing -- Ollama, OpenAI, Anthropic, and llama.cpp HTTP server adapters
Three API surfaces -- Hermon native, OpenAI-compatible (/v1/), and Ollama-compatible (/ollama/api/)
Token economics -- Tiered entitlements (Free/Standard/Admin/Custom), sliding-window rate limiting, quota enforcement, usage tracking
Identity & auth -- JWT (HS512, 15-min access + 30-day refresh tokens), API keys (hrmn_sk_{env}_{key}), Argon2id password hashing
Session management -- Conversation-attached sessions with auto-expiry and background cleanup
Audit logging -- Append-only event log with SHA-256 hash chain integrity verification
Streaming -- SSE streaming on all API surfaces (OpenAI data: format, Ollama NDJSON)
Rust 2024 edition -- Workspace of 31 crates, async throughout via Tokio

Quick Start

Prerequisites

Rust 1.85+ (2024 edition)
Ollama running locally (default http://localhost:11434)

Build and Run

# Build
cargo build --release

# Run with Ollama backend (auto-detected)
./target/release/hermond

# Or with explicit config
OLLAMA_URL=http://localhost:11434 \
HERMON_ADMIN_PASSWORD=your-secure-password \
./target/release/hermond --port 11488

Connect Additional Backends

# OpenAI
OPENAI_API_KEY=sk-... ./target/release/hermond

# Anthropic
ANTHROPIC_API_KEY=sk-ant-... ./target/release/hermond

# llama.cpp server
LLAMACPP_URL=http://localhost:8080 ./target/release/hermond

# All backends simultaneously
OLLAMA_URL=http://localhost:11434 \
OPENAI_API_KEY=sk-... \
ANTHROPIC_API_KEY=sk-ant-... \
LLAMACPP_URL=http://localhost:8080 \
./target/release/hermond

API Usage

Authentication

# Login (returns JWT access + refresh tokens)
curl -X POST http://localhost:11488/api/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"username":"admin","password":"your-password"}'

# Use the access_token in subsequent requests
TOKEN="eyJ..."

OpenAI-Compatible API

Any OpenAI SDK or client works out of the box:

# Chat completion
curl http://localhost:11488/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role":"user","content":"Hello!"}],
    "stream": false
  }'

# List models
curl http://localhost:11488/v1/models \
  -H "Authorization: Bearer $TOKEN"

Ollama-Compatible API

# Chat
curl http://localhost:11488/ollama/api/chat \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role":"user","content":"Hello!"}],
    "stream": false
  }'

# List models
curl http://localhost:11488/ollama/api/tags

Native API

# Chat
curl http://localhost:11488/api/chat \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
    "model_id": "llama3.2:3b",
    "messages": [{"role":"user","parts":[{"Text":{"text":"Hello!"}}]}]
  }'

# Session management
curl -X POST http://localhost:11488/api/sessions \
  -H "Authorization: Bearer $TOKEN"

# Usage & quotas
curl http://localhost:11488/api/usage \
  -H "Authorization: Bearer $TOKEN"

# Health check
curl http://localhost:11488/api/health

Architecture

                          hermond (server binary)
                               |
               +---------------+---------------+
               |               |               |
          /api/*          /v1/*         /ollama/api/*
       (Native API)   (OpenAI-compat)  (Ollama-compat)
               |               |               |
               +-------+-------+-------+-------+
                       |               |
                 Identity/Auth    Entitlements
                 (JWT, API keys)  (rate limits, quotas)
                       |               |
                       +-------+-------+
                               |
                    InferenceBackend trait
                               |
              +--------+-------+-------+--------+
              |        |               |        |
           Ollama   OpenAI        Anthropic  llama.cpp

Workspace Structure

hermon/
  crates/
    hermon-types/           # Foundation types (messages, events, IDs)
    hermon-config/          # Configuration loading
    hermon-inference/       # InferenceBackend trait + types
    hermon-backend-ollama/  # Ollama HTTP adapter
    hermon-backend-openai/  # OpenAI API adapter
    hermon-backend-anthropic/ # Anthropic Messages API adapter
    hermon-backend-llamacpp-http/ # llama.cpp server adapter
    hermon-router/          # Capability-aware backend routing
    hermon-identity/        # Auth: JWT, API keys, passwords
    hermon-authz/           # Authorization & access control
    hermon-entitlements/    # Token economics, rate limits, quotas
    hermon-audit/           # Append-only audit log with hash chain
    hermon-session/         # Session lifecycle management
    hermon-api-native/      # Hermon native REST API
    hermon-api-openai/      # OpenAI-compatible API projection
    hermon-api-ollama/      # Ollama-compatible API projection
    hermon-core/            # Facade re-export crate
    hermon-agents/          # Agent runtime (planned)
    hermon-tools/           # Tool execution framework (planned)
    hermon-tasks/           # Task management (planned)
    hermon-context/         # Context/memory management (planned)
    hermon-governance/      # Policy & permissions (planned)
    hermon-checkpoint/      # Workspace snapshots (planned)
    hermon-extension/       # Skills, hooks, plugins (planned)
    hermon-storage/         # Conversation persistence (planned)
    hermon-surface/         # API surface definitions (planned)
  services/
    hermond/                # Main server daemon (port 11488)
    hermon-cli/             # CLI tool
  sdks/
    hermon-sdk-rs/          # Rust client SDK
    hermon-sdk-ffi/         # C FFI bindings
    hermon-sdk-ts/          # TypeScript SDK codegen

Configuration

Environment Variable	Default	Description
`HERMON_HOST`	`0.0.0.0`	Bind address
`HERMON_PORT`	`11488`	Bind port
`OLLAMA_URL`	`http://localhost:11434`	Ollama server URL
`OPENAI_API_KEY`	(none)	OpenAI API key (enables OpenAI backend)
`OPENAI_BASE_URL`	`https://api.openai.com`	OpenAI base URL
`ANTHROPIC_API_KEY`	(none)	Anthropic API key (enables Anthropic backend)
`ANTHROPIC_BASE_URL`	`https://api.anthropic.com`	Anthropic base URL
`LLAMACPP_URL`	(none)	llama.cpp server URL (enables llama.cpp backend)
`HERMON_ADMIN_USER`	`admin`	Bootstrap admin username
`HERMON_ADMIN_PASSWORD`	`hermon-admin`	Bootstrap admin password (change in production)
`HERMON_SESSION_TTL`	`3600`	Session time-to-live in seconds
`HERMON_MAX_SESSIONS`	`10`	Max concurrent sessions per user

Token Economics

Hermon implements tiered access control modeled after Claude and ChatGPT:

Tier	Requests/min	Requests/day	Tokens/min	Tokens/day
Free	20	2,000	40,000	500,000
Standard	60	10,000	200,000	5,000,000
Admin	200	100,000	1,000,000	50,000,000

Rate limiting uses sliding window counters. Usage is tracked per-user with full audit trails.

Security

Passwords hashed with Argon2id (reject < 8 characters)
JWT tokens signed with HS512 using Ed25519 key material
API keys use HMAC-SHA256 hashing with prefix-based lookup
Audit log uses SHA-256 hash chain for tamper detection
CORS middleware enabled
All auth endpoints require valid tokens or API keys

Development

# Run all tests
cargo test

# Build debug
cargo build

# Run with tracing
RUST_LOG=debug cargo run -p hermond

# Check for warnings
cargo clippy --workspace

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
crates		crates
sdks		sdks
services		services
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hermon

Features

Quick Start

Prerequisites

Build and Run

Connect Additional Backends

API Usage

Authentication

OpenAI-Compatible API

Ollama-Compatible API

Native API

Architecture

Workspace Structure

Configuration

Token Economics

Security

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hermon

Features

Quick Start

Prerequisites

Build and Run

Connect Additional Backends

API Usage

Authentication

OpenAI-Compatible API

Ollama-Compatible API

Native API

Architecture

Workspace Structure

Configuration

Token Economics

Security

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages