🧠 All Projects
💭

Pattern: Autonomous Improvement Loops (inspired by Karpathy's autoresearch)

P3 - Low
Idea MedSchools.ai

Autonomous Improvement Loops — Future Implementation

Source

Andrej Karpathy's autoresearch repo: https://github.com/karpathy/autoresearch
Analyzed: March 21, 2026

The Pattern

An AI agent autonomously improves a system overnight using a simple loop:

  1. Modify ONE thing (code, copy, config)
  2. Measure ONE metric (conversion rate, engagement, val loss)
  3. Fixed time budget per experiment (e.g., 5 min, 1 hour)
  4. Keep if improved, discard if not
  5. Repeat (~12 experiments/hour, ~100 overnight)

The human writes the program.md (instructions for HOW the agent should experiment) — not the code itself. "Programming the program."

Applications for MedSchools.ai

1. Landing Page Conversion Optimization

  • Agent modifies hero copy, CTA text, pricing display, social proof placement
  • Measures: signup conversion rate via Plausible analytics
  • Fixed window: 24-48 hours per variant
  • Keep winner, try next variation

2. AI Chat Prompt Engineering

  • Agent tweaks system prompts for the AI advisor
  • Measures: user satisfaction signals (thumbs up/down, session length, return rate)
  • Fixed window: 1 day per prompt variant
  • Automated prompt A/B testing

3. Email/Onboarding Copy Optimization

  • Agent rewrites onboarding email subject lines, body copy, CTAs
  • Measures: open rate, click rate, onboarding completion rate
  • Keep best performers

4. Blog/SEO Content Testing

  • Agent rewrites blog post intros, titles, meta descriptions
  • Measures: time-on-page, bounce rate, search click-through rate
  • Wake up to best-performing versions

5. Pricing Page Optimization

  • Agent tests different pricing presentations, trial messaging, feature ordering
  • Measures: checkout initiation rate, plan selection distribution

Implementation Requirements

  • Analytics integration (Plausible or GA4) with API access for automated metric reading
  • Feature flag system for serving variants
  • Experiment logging (which variant, what metric, keep/discard)
  • Safety guardrails (don't test more than N variants simultaneously, rollback on significant regression)

Key Insight from Karpathy

The program.md concept — Markdown as the programming language for agents — is exactly what our SOUL.md + skill files already do. This validates our architecture. The next step is adding the automated measurement + keep/discard loop.

Priority

Future TODO — implement after MedSchools.ai has enough traffic to measure meaningful conversion differences (need statistical significance). Ideal timing: when we hit 1,000+ monthly visitors.

Reference

Created: Sat, Mar 21, 2026, 7:04 PM by bob

Updated: Sat, Mar 21, 2026, 7:04 PM

Last accessed: Wed, Apr 1, 2026, 11:31 AM

ID: 0006157a-9f81-411a-bde9-f8bf9d8208f0