Dexter RL

An RL environment for training AI music coaches. Gemini watches you play guitar — analyzing your hands via camera and your audio via microphone — then coaches you in real-time. Every coaching decision and student improvement is captured as an RL trajectory for training better coaches.

Built on Prime Intellect verifiers, drop-in compatible with prime-rl.

Built at YC x DeepMind Hackathon 2026.

The Problem

Teaching music is deeply interactive — a great coach adapts to what they see and hear in real time. But there's no RL environment that captures the coaching loop: observe the student, give a tip, see if they improve, repeat. Dexter RL builds that loop end-to-end.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Expo App (Student)                       │
│                                                             │
│   📷 Camera ──┐    🎸 Tab Notation    ┌── 💬 AI Coach Feed │
│   🎤 Mic ─────┤                       │                    │
│               ▼                       │                    │
│        ┌──────────────┐     ┌─────────┴──────┐             │
│        │ Audio        │     │ Coaching Feed   │             │
│        │ Analysis     │     │ (text tips)     │             │
│        └──────┬───────┘     └────────▲───────┘             │
│               │                      │                      │
└───────────────┼──────────────────────┼──────────────────────┘
                │                      │
                ▼                      │
        ┌───────────────┐    ┌─────────┴──────────┐
        │ Gemini 2.5    │    │                     │
        │ Multimodal    │───▶│  Coaching Response  │
        │ (image+text)  │    │  (2-3 sentences)    │
        └───────────────┘    └─────────┬───────────┘
                                       │
                              ┌────────▼────────┐
                              │   RL Trajectory  │
                              │   Server         │
                              │                  │
                              │ ┌──────────────┐ │
                              │ │ Composite    │ │
                              │ │ Score (0-10) │ │
                              │ ├──────────────┤ │
                              │ │ Reward =     │ │
                              │ │ Δ score      │ │
                              │ ├──────────────┤ │
                              │ │ Trajectory   │ │
                              │ │ JSONL        │ │
                              │ └──────────────┘ │
                              └──────────────────┘

Three-Layer Design

1. Expo App — The student-facing interface. Searches for any song via Songsterr, displays tab notation bar-by-bar, captures camera frames of the student's hands, and runs real-time audio analysis (pitch detection, timing, amplitude). Camera and microphone work simultaneously.

2. Gemini Multimodal Coaching — Every ~5 seconds, the app captures a JPEG frame from the camera and sends it alongside audio metrics and bar context (song, section, expected chords) to Gemini 2.5 Flash. Gemini sees the student's hands and hears their playing data, returning specific coaching tips ("arch your ring finger more on D to clear the high E string").

3. RL Trajectory Server — Each coaching interaction is logged as a trajectory step. The server computes a composite score from three weighted metrics (pitch accuracy 40%, timing 35%, finger position 25%) and calculates the reward as the score improvement from the previous step. Trajectories are saved as JSONL for training.

Live Trajectory Capture

When a student practices, the terminal shows RL trajectories building in real-time:

============================================================
  DEXTER RL — Live Trajectory Capture
  Song: Smoke on the Water by Deep Purple
============================================================

  RL | Step   1 [###########################.............] 6.8/10 (reward: +0.68)
                 tip: "Smooth out the transition to the Bb5 chord"
  RL | Step   2 [#################################.......] 8.3/10 (reward: +0.14)
                 tip: "Keep the palm muting tight with your picking hand"
  RL | Step   3 [################################........] 8.1/10 (reward: -0.01)
                 tip: "Focus on the stretch for the Db5 power chord"
  RL | Step   4 [################################........] 8.1/10 (reward: +0.03)
                 tip: "Ensure your index finger creates a clean bar on fret 1"

============================================================
  SESSION COMPLETE — RL Trajectory Summary
  Steps:  4
  Score:  6.8 → 8.1
  Reward: 0.84
  Saved to: trajectories/latest.jsonl
============================================================

Trajectory Data Format

Each session produces a trajectory in JSONL:

{
  "song": "Smoke on the Water by Deep Purple",
  "steps": [
    {
      "step": 1,
      "timestamp": 6.6,
      "bar_context": "Bar 1-4: G5 - Bb5 - C5 | Main Riff",
      "coaching_tip": "Smooth out the transition to the Bb5 chord and relax your fretting hand",
      "metrics": {
        "pitchAccuracy": 0.71,
        "timing": 0.67,
        "fingerPosition": 0.66
      },
      "composite_score": 6.83,
      "reward": 0.683
    }
  ],
  "total_reward": 0.806,
  "final_score": 8.07,
  "num_steps": 6,
  "duration_s": 112.5
}

As a Verifiers Environment

Dexter RL implements verifiers.MultiTurnEnv for direct use with prime-rl:

from dexter_rl import load_environment

env = load_environment(max_turns=10)

Or in a prime-rl config:

[[env]]
id = "dexter/music-coach"
args = { max_turns = 10 }

Episode structure:

Component	Description
State	Guitar tabs + score history from Gemini evaluations
Action	A single coaching tip (LLM text output)
Observation	Gemini's multimodal evaluation of the student's attempt
Reward	Score improvement normalized to [-1, 1]
Termination	Score >= 8/10 or max turns reached

The environment scrapes tabs from Ultimate Guitar via Browserbase, sends coaching tips to the student through a WebSocket bridge, and uses Gemini's multimodal API to evaluate each attempt (audio → pitch accuracy, timing, rhythm → score 1-10).

Module Reference

Module	Purpose
`dexter_rl/env.py`	`MusicCoachEnv` — verifiers multi-turn environment
`dexter_rl/evaluator.py`	Gemini multimodal scoring (audio → 1-10 score)
`dexter_rl/tabs.py`	Browserbase + Playwright tab scraping
`dexter_rl/bridge.py`	WebSocket bridge to Expo app
`dexter_rl/rewards.py`	Improvement reward (Δ score) + absolute score
`dexter_agent/token_server.py`	Dev server with RL trajectory endpoints
`dexter_agent/agent.py`	LiveKit agent for real-time Gemini coaching
`dexter_app/`	Expo React Native app (student interface)

Quick Start

# Install RL environment
uv sync

# Install agent dependencies
cd dexter_agent && uv venv && uv pip install -r requirements.txt

# Install app dependencies
cd dexter_app && npm install

# Set environment variables
export GOOGLE_API_KEY="your-key"
export BROWSERBASE_API_KEY="your-key"
export BROWSERBASE_PROJECT_ID="your-project-id"

# Start the trajectory server
cd dexter_agent && .venv/bin/python token_server.py

# Start the app (separate terminal)
cd dexter_app && npx expo start --web

# Run the full RL environment
uv run python scripts/demo.py

Tech Stack

RL Framework: Prime Intellect verifiers — multi-turn environment with rubric-based rewards
AI Coaching: Google Gemini 2.5 Flash — multimodal (camera frames + audio data + text context)
Student App: Expo / React Native — real-time audio analysis, camera capture, tab notation
Tab Scraping: Browserbase + Playwright — headless browser scraping from Ultimate Guitar
Audio Analysis: Web Audio API — pitch detection (autocorrelation), amplitude, spectral clarity
Data Format: JSONL trajectories compatible with HuggingFace datasets

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.playwright-mcp		.playwright-mcp
datasets		datasets
dexter_agent		dexter_agent
dexter_app		dexter_app
dexter_rl		dexter_rl
docs/plans		docs/plans
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dexter RL

The Problem

Architecture

Three-Layer Design

Live Trajectory Capture

Trajectory Data Format

As a Verifiers Environment

Module Reference

Quick Start

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dexter RL

The Problem

Architecture

Three-Layer Design

Live Trajectory Capture

Trajectory Data Format

As a Verifiers Environment

Module Reference

Quick Start

Tech Stack

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages