Autonomous Improvement Loops — Future Implementation

Source

Andrej Karpathy's autoresearch repo: https://github.com/karpathy/autoresearch
Analyzed: March 21, 2026

The Pattern

An AI agent autonomously improves a system overnight using a simple loop:

Modify ONE thing (code, copy, config)
Measure ONE metric (conversion rate, engagement, val loss)
Fixed time budget per experiment (e.g., 5 min, 1 hour)
Keep if improved, discard if not
Repeat (~12 experiments/hour, ~100 overnight)

The human writes the program.md (instructions for HOW the agent should experiment) — not the code itself. "Programming the program."

Applications for MedSchools.ai

1. Landing Page Conversion Optimization

Agent modifies hero copy, CTA text, pricing display, social proof placement
Measures: signup conversion rate via Plausible analytics
Fixed window: 24-48 hours per variant
Keep winner, try next variation

2. AI Chat Prompt Engineering

Agent tweaks system prompts for the AI advisor
Measures: user satisfaction signals (thumbs up/down, session length, return rate)
Fixed window: 1 day per prompt variant
Automated prompt A/B testing

3. Email/Onboarding Copy Optimization

Agent rewrites onboarding email subject lines, body copy, CTAs
Measures: open rate, click rate, onboarding completion rate
Keep best performers

4. Blog/SEO Content Testing

Agent rewrites blog post intros, titles, meta descriptions
Measures: time-on-page, bounce rate, search click-through rate
Wake up to best-performing versions

5. Pricing Page Optimization

Agent tests different pricing presentations, trial messaging, feature ordering
Measures: checkout initiation rate, plan selection distribution

Implementation Requirements

Analytics integration (Plausible or GA4) with API access for automated metric reading
Feature flag system for serving variants
Experiment logging (which variant, what metric, keep/discard)
Safety guardrails (don't test more than N variants simultaneously, rollback on significant regression)

Key Insight from Karpathy

The program.md concept — Markdown as the programming language for agents — is exactly what our SOUL.md + skill files already do. This validates our architecture. The next step is adding the automated measurement + keep/discard loop.

Priority

Future TODO — implement after MedSchools.ai has enough traffic to measure meaningful conversion differences (need statistical significance). Ideal timing: when we hit 1,000+ monthly visitors.

Reference

Repo: https://github.com/karpathy/autoresearch
3 files: prepare.py (fixed), train.py (agent modifies), program.md (human writes)
MIT licensed

Pattern: Autonomous Improvement Loops (inspired by Karpathy's autoresearch)