Skip to content

hermonai/hermon-v01

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hermon

Hermon is an AI Agent OS and inference gateway written in Rust. It routes requests to multiple LLM backends (Ollama, OpenAI, Anthropic, llama.cpp) through a unified API, with built-in authentication, token economics, session management, and audit logging.

Features

  • Multi-backend routing -- Ollama, OpenAI, Anthropic, and llama.cpp HTTP server adapters
  • Three API surfaces -- Hermon native, OpenAI-compatible (/v1/), and Ollama-compatible (/ollama/api/)
  • Token economics -- Tiered entitlements (Free/Standard/Admin/Custom), sliding-window rate limiting, quota enforcement, usage tracking
  • Identity & auth -- JWT (HS512, 15-min access + 30-day refresh tokens), API keys (hrmn_sk_{env}_{key}), Argon2id password hashing
  • Session management -- Conversation-attached sessions with auto-expiry and background cleanup
  • Audit logging -- Append-only event log with SHA-256 hash chain integrity verification
  • Streaming -- SSE streaming on all API surfaces (OpenAI data: format, Ollama NDJSON)
  • Rust 2024 edition -- Workspace of 31 crates, async throughout via Tokio

Quick Start

Prerequisites

  • Rust 1.85+ (2024 edition)
  • Ollama running locally (default http://localhost:11434)

Build and Run

# Build
cargo build --release

# Run with Ollama backend (auto-detected)
./target/release/hermond

# Or with explicit config
OLLAMA_URL=http://localhost:11434 \
HERMON_ADMIN_PASSWORD=your-secure-password \
./target/release/hermond --port 11488

Connect Additional Backends

# OpenAI
OPENAI_API_KEY=sk-... ./target/release/hermond

# Anthropic
ANTHROPIC_API_KEY=sk-ant-... ./target/release/hermond

# llama.cpp server
LLAMACPP_URL=http://localhost:8080 ./target/release/hermond

# All backends simultaneously
OLLAMA_URL=http://localhost:11434 \
OPENAI_API_KEY=sk-... \
ANTHROPIC_API_KEY=sk-ant-... \
LLAMACPP_URL=http://localhost:8080 \
./target/release/hermond

API Usage

Authentication

# Login (returns JWT access + refresh tokens)
curl -X POST http://localhost:11488/api/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"username":"admin","password":"your-password"}'

# Use the access_token in subsequent requests
TOKEN="eyJ..."

OpenAI-Compatible API

Any OpenAI SDK or client works out of the box:

# Chat completion
curl http://localhost:11488/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role":"user","content":"Hello!"}],
    "stream": false
  }'

# List models
curl http://localhost:11488/v1/models \
  -H "Authorization: Bearer $TOKEN"

Ollama-Compatible API

# Chat
curl http://localhost:11488/ollama/api/chat \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role":"user","content":"Hello!"}],
    "stream": false
  }'

# List models
curl http://localhost:11488/ollama/api/tags

Native API

# Chat
curl http://localhost:11488/api/chat \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
    "model_id": "llama3.2:3b",
    "messages": [{"role":"user","parts":[{"Text":{"text":"Hello!"}}]}]
  }'

# Session management
curl -X POST http://localhost:11488/api/sessions \
  -H "Authorization: Bearer $TOKEN"

# Usage & quotas
curl http://localhost:11488/api/usage \
  -H "Authorization: Bearer $TOKEN"

# Health check
curl http://localhost:11488/api/health

Architecture

                          hermond (server binary)
                               |
               +---------------+---------------+
               |               |               |
          /api/*          /v1/*         /ollama/api/*
       (Native API)   (OpenAI-compat)  (Ollama-compat)
               |               |               |
               +-------+-------+-------+-------+
                       |               |
                 Identity/Auth    Entitlements
                 (JWT, API keys)  (rate limits, quotas)
                       |               |
                       +-------+-------+
                               |
                    InferenceBackend trait
                               |
              +--------+-------+-------+--------+
              |        |               |        |
           Ollama   OpenAI        Anthropic  llama.cpp

Workspace Structure

hermon/
  crates/
    hermon-types/           # Foundation types (messages, events, IDs)
    hermon-config/          # Configuration loading
    hermon-inference/       # InferenceBackend trait + types
    hermon-backend-ollama/  # Ollama HTTP adapter
    hermon-backend-openai/  # OpenAI API adapter
    hermon-backend-anthropic/ # Anthropic Messages API adapter
    hermon-backend-llamacpp-http/ # llama.cpp server adapter
    hermon-router/          # Capability-aware backend routing
    hermon-identity/        # Auth: JWT, API keys, passwords
    hermon-authz/           # Authorization & access control
    hermon-entitlements/    # Token economics, rate limits, quotas
    hermon-audit/           # Append-only audit log with hash chain
    hermon-session/         # Session lifecycle management
    hermon-api-native/      # Hermon native REST API
    hermon-api-openai/      # OpenAI-compatible API projection
    hermon-api-ollama/      # Ollama-compatible API projection
    hermon-core/            # Facade re-export crate
    hermon-agents/          # Agent runtime (planned)
    hermon-tools/           # Tool execution framework (planned)
    hermon-tasks/           # Task management (planned)
    hermon-context/         # Context/memory management (planned)
    hermon-governance/      # Policy & permissions (planned)
    hermon-checkpoint/      # Workspace snapshots (planned)
    hermon-extension/       # Skills, hooks, plugins (planned)
    hermon-storage/         # Conversation persistence (planned)
    hermon-surface/         # API surface definitions (planned)
  services/
    hermond/                # Main server daemon (port 11488)
    hermon-cli/             # CLI tool
  sdks/
    hermon-sdk-rs/          # Rust client SDK
    hermon-sdk-ffi/         # C FFI bindings
    hermon-sdk-ts/          # TypeScript SDK codegen

Configuration

Environment Variable Default Description
HERMON_HOST 0.0.0.0 Bind address
HERMON_PORT 11488 Bind port
OLLAMA_URL http://localhost:11434 Ollama server URL
OPENAI_API_KEY (none) OpenAI API key (enables OpenAI backend)
OPENAI_BASE_URL https://api.openai.com OpenAI base URL
ANTHROPIC_API_KEY (none) Anthropic API key (enables Anthropic backend)
ANTHROPIC_BASE_URL https://api.anthropic.com Anthropic base URL
LLAMACPP_URL (none) llama.cpp server URL (enables llama.cpp backend)
HERMON_ADMIN_USER admin Bootstrap admin username
HERMON_ADMIN_PASSWORD hermon-admin Bootstrap admin password (change in production)
HERMON_SESSION_TTL 3600 Session time-to-live in seconds
HERMON_MAX_SESSIONS 10 Max concurrent sessions per user

Token Economics

Hermon implements tiered access control modeled after Claude and ChatGPT:

Tier Requests/min Requests/day Tokens/min Tokens/day
Free 20 2,000 40,000 500,000
Standard 60 10,000 200,000 5,000,000
Admin 200 100,000 1,000,000 50,000,000

Rate limiting uses sliding window counters. Usage is tracked per-user with full audit trails.

Security

  • Passwords hashed with Argon2id (reject < 8 characters)
  • JWT tokens signed with HS512 using Ed25519 key material
  • API keys use HMAC-SHA256 hashing with prefix-based lookup
  • Audit log uses SHA-256 hash chain for tamper detection
  • CORS middleware enabled
  • All auth endpoints require valid tokens or API keys

Development

# Run all tests
cargo test

# Build debug
cargo build

# Run with tracing
RUST_LOG=debug cargo run -p hermond

# Check for warnings
cargo clippy --workspace

License

Apache-2.0

About

LLM backend infrastructure

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages