Pattern: Autonomous Improvement Loops (inspired by Karpathy's autoresearch)
P3 - LowAutonomous Improvement Loops — Future Implementation
Source
Andrej Karpathy's autoresearch repo: https://github.com/karpathy/autoresearch
Analyzed: March 21, 2026
The Pattern
An AI agent autonomously improves a system overnight using a simple loop:
- Modify ONE thing (code, copy, config)
- Measure ONE metric (conversion rate, engagement, val loss)
- Fixed time budget per experiment (e.g., 5 min, 1 hour)
- Keep if improved, discard if not
- Repeat (~12 experiments/hour, ~100 overnight)
The human writes the program.md (instructions for HOW the agent should experiment) — not the code itself. "Programming the program."
Applications for MedSchools.ai
1. Landing Page Conversion Optimization
- Agent modifies hero copy, CTA text, pricing display, social proof placement
- Measures: signup conversion rate via Plausible analytics
- Fixed window: 24-48 hours per variant
- Keep winner, try next variation
2. AI Chat Prompt Engineering
- Agent tweaks system prompts for the AI advisor
- Measures: user satisfaction signals (thumbs up/down, session length, return rate)
- Fixed window: 1 day per prompt variant
- Automated prompt A/B testing
3. Email/Onboarding Copy Optimization
- Agent rewrites onboarding email subject lines, body copy, CTAs
- Measures: open rate, click rate, onboarding completion rate
- Keep best performers
4. Blog/SEO Content Testing
- Agent rewrites blog post intros, titles, meta descriptions
- Measures: time-on-page, bounce rate, search click-through rate
- Wake up to best-performing versions
5. Pricing Page Optimization
- Agent tests different pricing presentations, trial messaging, feature ordering
- Measures: checkout initiation rate, plan selection distribution
Implementation Requirements
- Analytics integration (Plausible or GA4) with API access for automated metric reading
- Feature flag system for serving variants
- Experiment logging (which variant, what metric, keep/discard)
- Safety guardrails (don't test more than N variants simultaneously, rollback on significant regression)
Key Insight from Karpathy
The program.md concept — Markdown as the programming language for agents — is exactly what our SOUL.md + skill files already do. This validates our architecture. The next step is adding the automated measurement + keep/discard loop.
Priority
Future TODO — implement after MedSchools.ai has enough traffic to measure meaningful conversion differences (need statistical significance). Ideal timing: when we hit 1,000+ monthly visitors.
Reference
- Repo: https://github.com/karpathy/autoresearch
- 3 files: prepare.py (fixed), train.py (agent modifies), program.md (human writes)
- MIT licensed
Created: Sat, Mar 21, 2026, 7:04 PM by bob
Updated: Sat, Mar 21, 2026, 7:04 PM
Last accessed: Wed, Apr 1, 2026, 11:31 AM
ID: 0006157a-9f81-411a-bde9-f8bf9d8208f0