AI that improves itself

Live edit of the Agents config by the AI itself

Live demo

Workshop stand — open in browser, no local setup needed:

https://config-tuning-demo.infiano.app/

Run locally

Setup

cd ~/demo-config-tuning pip install -r requirements.txt

Fill in .env.example — add your OpenAI and Anthropic API keys, then:

cp .env.example .env python run.py

Open in browser

After python run.py, copy the network address from the terminal:

local: http://localhost:8000 network: http://192.168.1.42:8000

Open the network URL in your browser (your IP will differ):

http://192.168.1.42:8000

Interface guide

What's on screen, what it does, what to click, and what happens. Opens at the address from the terminal.

3 visible zones + 2 slide-out panels:

Header (top)
Optimization panel (left, main)
Chat with Agent A (right)
Settings — slides in from the right via ⚙ Settings
Usage — slides in via 📊 (hidden telemetry)

1. Header

Element	What it is	Click / result
`⚙︎ Config Tuning`	Logo / title	Not clickable
Badges `anthropic ✓ / openai ✕`	Which providers are available (key in `.env`). `✓` = key found, `✕` = missing	Not clickable. If both `✕` — optimization and chat will error
`📊`	Hidden Usage panel	Click → token usage and cost estimate (JSON). Close with `✕`
`⚙ Settings`	Settings	Click → settings panel (section 4)

2. Optimization panel (left)

Main zone: Agent B runs here and you see the full process.

Goal row

Element	What it does
Optimization goal field	Natural-language task: what Agent A should start doing. Enter = run
▶ Optimize	Starts the optimization loop. Button → `⏳ Running…` and disabled until done
↺ Reset	Resets Agent A config to starting state and clears step feed, final diff, chat, and progress

Example goals:

Make the agent knowledgeable about butterfly history and cite specific examples more often.
The agent is too dry — make it witty, with light humor.
Keep answers very short, one paragraph, no fluff.

Progress

Step N / M — current step of max (max_steps from settings)
self-score: K — Agent B's subjective score (0–100)
Bar — visual self-score

Step feed

Each step is a collapsible card (click header to expand/collapse). Header shows: step number, self N chip, verdict continue / ✓ done.

Block	What it shows
Agent B reasoning	Optimizer reasoning: why config is good/bad and what to change
Config changes	Diff: System prompt (green/red), temperature (before → after), Few-shot (new examples)
Probe questions for Agent A	Questions B invented for your goal, and Agent A's streamed answers

Loop runs until B says done or hits max_steps. At the bottom — Final config diff: starting vs final config.

3. Chat with Agent A (right)

Always available. Talk to Agent A on its current config (after optimization — the improved one).

Element	What it does
Line under title	Current config: `provider · model · reasoning · t=… · few-shot N`
Input + →	Enter — send, Shift+Enter — newline. Response streams

Workflow: optimize for "be funny" → done → test in chat whether the agent actually got funnier.

4. Settings (⚙ Settings)

🔒 API keys are not entered here — only in .env. Panel is for models and parameters.

Agent A (target — being tuned)

Field	Sets	Notes
Provider	anthropic / openai	`(no key)` if key missing
Model	Agent A model	List fetched live from provider API
Reasoning	Reasoning level	Reasoning models only. GPT-5.x — `none/low/medium/high/xhigh`; Claude — `none/on`
Temperature	0–1	Only when model accepts temperature; hidden when reasoning is on
System prompt (starting)	Agent A starting prompt	Initial config that B improves

Agent B (optimizer)

Field	Sets
Provider / Model	Who runs the optimizer
Reasoning	Agent B reasoning level (if model supports it)

Agent B system prompt is not editable — it's built in.

Run parameters

Field	Sets
Probe questions per step	How many probe questions B asks A per step (1–6)
Max steps	Maximum loop steps
Save	Saves settings, closes panel, updates config line in chat

5. Usage (📊)

Summary: calls, tokens, cost estimate, breakdown by model (JSON). Full log in usage.jsonl. Intentionally hidden from main UI.

Full walkthrough example

⚙ Settings → Agent A: openai, gpt-4.1-mini, prompt You are a helpful assistant., temperature 0.7. Agent B: gpt-5.4-mini, reasoning none. Probe 3, Max steps 5. Save.
Goal: Make the agent knowledgeable about butterfly history and cite examples more often. → ▶ Optimize.
Step 1: B reasoning → prompt diff (entomologist role) → probe Q&A from A → self 55 continue.
Step 2, 3… — B refines config, self-score rises → ✓ done.
Final config diff — from You are a helpful assistant. to entomologist prompt with examples.
Test in chat. Start over — ↺ Reset.