← All materials
Demo 3

AI that improves itself

Live edit of the Agents config by the AI itself

Live demo

Workshop stand — open in browser, no local setup needed:

Run locally

Setup

cd ~/demo-config-tuning pip install -r requirements.txt

Fill in .env.example — add your OpenAI and Anthropic API keys, then:

cp .env.example .env python run.py

Open in browser

After python run.py, copy the network address from the terminal:

local: http://localhost:8000 network: http://192.168.1.42:8000

Open the network URL in your browser (your IP will differ):

http://192.168.1.42:8000
Interface guide

What's on screen, what it does, what to click, and what happens. Opens at the address from the terminal.

3 visible zones + 2 slide-out panels:

  1. Header (top)
  2. Optimization panel (left, main)
  3. Chat with Agent A (right)
  4. Settings — slides in from the right via ⚙ Settings
  5. Usage — slides in via 📊 (hidden telemetry)
1. Header
ElementWhat it isClick / result
⚙︎ Config TuningLogo / titleNot clickable
Badges anthropic ✓ / openai ✕Which providers are available (key in .env). = key found, = missingNot clickable. If both — optimization and chat will error
📊Hidden Usage panelClick → token usage and cost estimate (JSON). Close with
⚙ SettingsSettingsClick → settings panel (section 4)
2. Optimization panel (left)

Main zone: Agent B runs here and you see the full process.

Goal row

ElementWhat it does
Optimization goal fieldNatural-language task: what Agent A should start doing. Enter = run
▶ OptimizeStarts the optimization loop. Button → ⏳ Running… and disabled until done
↺ ResetResets Agent A config to starting state and clears step feed, final diff, chat, and progress

Example goals:

  • Make the agent knowledgeable about butterfly history and cite specific examples more often.
  • The agent is too dry — make it witty, with light humor.
  • Keep answers very short, one paragraph, no fluff.

Progress

  • Step N / M — current step of max (max_steps from settings)
  • self-score: K — Agent B's subjective score (0–100)
  • Bar — visual self-score

Step feed

Each step is a collapsible card (click header to expand/collapse). Header shows: step number, self N chip, verdict continue / ✓ done.

BlockWhat it shows
Agent B reasoningOptimizer reasoning: why config is good/bad and what to change
Config changesDiff: System prompt (green/red), temperature (before → after), Few-shot (new examples)
Probe questions for Agent AQuestions B invented for your goal, and Agent A's streamed answers

Loop runs until B says done or hits max_steps. At the bottom — Final config diff: starting vs final config.

3. Chat with Agent A (right)

Always available. Talk to Agent A on its current config (after optimization — the improved one).

ElementWhat it does
Line under titleCurrent config: provider · model · reasoning · t=… · few-shot N
Input + Enter — send, Shift+Enter — newline. Response streams

Workflow: optimize for "be funny" → done → test in chat whether the agent actually got funnier.

4. Settings (⚙ Settings)
🔒 API keys are not entered here — only in .env. Panel is for models and parameters.

Agent A (target — being tuned)

FieldSetsNotes
Provideranthropic / openai(no key) if key missing
ModelAgent A modelList fetched live from provider API
ReasoningReasoning levelReasoning models only. GPT-5.x — none/low/medium/high/xhigh; Claude — none/on
Temperature0–1Only when model accepts temperature; hidden when reasoning is on
System prompt (starting)Agent A starting promptInitial config that B improves

Agent B (optimizer)

FieldSets
Provider / ModelWho runs the optimizer
ReasoningAgent B reasoning level (if model supports it)

Agent B system prompt is not editable — it's built in.

Run parameters

FieldSets
Probe questions per stepHow many probe questions B asks A per step (1–6)
Max stepsMaximum loop steps
SaveSaves settings, closes panel, updates config line in chat
5. Usage (📊)

Summary: calls, tokens, cost estimate, breakdown by model (JSON). Full log in usage.jsonl. Intentionally hidden from main UI.

Full walkthrough example
  1. ⚙ Settings → Agent A: openai, gpt-4.1-mini, prompt You are a helpful assistant., temperature 0.7. Agent B: gpt-5.4-mini, reasoning none. Probe 3, Max steps 5. Save.
  2. Goal: Make the agent knowledgeable about butterfly history and cite examples more often.▶ Optimize.
  3. Step 1: B reasoning → prompt diff (entomologist role) → probe Q&A from A → self 55 continue.
  4. Step 2, 3… — B refines config, self-score rises → ✓ done.
  5. Final config diff — from You are a helpful assistant. to entomologist prompt with examples.
  6. Test in chat. Start over — ↺ Reset.