MedSchools.ai Database Utilization Audit

Audit Date: February 11, 2026
Conducted by: Atlas Reeves (Backend Dev)
Database: Supabase (ohkdiwblocbrcfrhxeyg.supabase.co)
Codebase: ~/clawd/projects/medschools_ai/apps/medschools-ai/

Executive Summary

Key Findings

📊 Database Scale:

104 total tables across 12 schemas
70 application tables (excluding system/auth/storage)
~1.1M+ total medical school records (applicant/matriculant data)
~10K RAG content chunks for semantic search
~2.8K scraped pages with analysis artifacts

🎯 Utilization Overview:

23% Well-Utilized (16 tables) - Core features depend on them
27% Underutilized (19 tables) - Has data but rarely queried
29% Unused (20 tables) - Empty or no code references
21% System/Internal (15 tables) - Auth, storage, migrations

💡 Top Opportunities:

Scraper data (2,821 analyzed pages) - Enormous SEO & content potential, barely used
Student life data (172 schools) - Rich lifestyle info, never displayed
Curriculum details (172 schools) - Research programs, rotations not shown
MAT data breakdown - Application/interview numbers available but not exposed

⚠️ Cleanup Candidates:

15 empty tables in med_schools schema (0 rows)
5 empty tables in extensions schema (orphaned from migration?)
3 empty ETL tables in RAG schema (unused pipeline)
2 empty tables in user_data (draft_comments, ps_feedback)

Schema-by-Schema Breakdown

1. `med_schools` Schema - Medical School Data

Purpose: Core medical school information, statistics, and application data.

Table	Rows	Code Refs	Status	Notes
`medical_school`	173	188	🟢 Well-Utilized	Primary school table, heavily used
`med_school_mcat_scores`	4,376	3	🟡 Underutilized	MCAT data by year/type, minimal usage
`med_school_gpa_scores`	1,750	3	🟡 Underutilized	GPA data, minimal usage
`med_school_mat_data`	1,376	7	🟡 Underutilized	Applied/interviewed/matriculated breakdown
`med_school_mcat`	876	6	🟡 Underutilized	MCAT summary by school
`med_school_gpa`	876	6	🟡 Underutilized	GPA summary by school
`secondary_essay_questions`	801	5	🟢 Well-Utilized	Used in secondaries feature
`secondary_essay_answers`	1,103	1	🟢 Well-Utilized	User-generated secondary answers
`med_school_financial`	278	1	🔴 Underutilized	Tuition/costs - only 1 reference!
`med_school_application`	278	2	🟡 Underutilized	Application requirements
`med_school_overview`	173	0	🔴 Unused	No code references
`med_school_curriculum`	172	0	🔴 Unused	Rich curriculum data not displayed
`med_school_stud_life`	172	0	🔴 Unused	Student life details not used
`med_school_location`	172	0	🔴 Unused	Location details available but not queried
`med_school_information`	172	2	🟡 Underutilized	Minimal usage
`med_school_interview`	12	4	🟢 Well-Utilized	Interview format/questions
Empty Tables (0 rows):			🔴 Cleanup
`med_school_acceptance`	0	0	🔴 Unused	Empty
`med_school_assessments`	0	0	🔴 Unused	Empty
`med_school_campuses`	0	0	🔴 Unused	Empty
`med_school_courseworks`	0	0	🔴 Unused	Empty
`med_school_mat_demo`	0	0	🔴 Unused	Empty demographics table
`med_school_misc`	0	0	🔴 Unused	Empty
`med_school_programs`	0	0	🔴 Unused	Empty
`med_school_race_ethnicity`	0	0	🔴 Unused	Empty
`med_school_selection`	0	0	🔴 Unused	Empty
`med_school_specialty`	0	0	🔴 Unused	Empty
`med_school_staffs`	0	0	🔴 Unused	Empty
`med_school_timeline`	0	0	🔴 Unused	Empty

Key Insights:

Main table is heavily used (188 references), but supporting tables are underutilized
15 tables are completely empty - candidates for cleanup
Rich data exists but isn't exposed: curriculum, student life, detailed location info
MAT data has 1,376 rows showing applied/interviewed/matriculated breakdowns by program type (MATRIC, MDPHD, APPLIED, etc.) - could power "What are my chances?" features

2. `content` Schema - Blog & Educational Content

Purpose: CMS for blog posts, guides, rankings, FAQs, and glossary.

Table	Rows	Code Refs	Status	Notes
`faqs`	100	38	🟢 Well-Utilized	FAQ feature heavily used
`glossary_terms`	115	6	🟢 Well-Utilized	Medical terminology glossary
`state_pages`	59	0	🔴 Unused	State-specific pages not referenced
`rankings`	10	45	🟢 Well-Utilized	School rankings feature
`blog_posts`	12	24	🟢 Well-Utilized	Blog content
`authors`	6	16	🟢 Well-Utilized	Blog authors
`guides`	3	40	🟢 Well-Utilized	Educational guides
`blog_post_schools`	0	0	🔴 Unused	Empty junction table

Key Insights:

Content features are well-utilized overall
State pages (59 rows) exist but have zero code references - SEO opportunity?
blog_post_schools junction table is empty but defined

3. `user_data` Schema - User Activity & Profiles

Purpose: User profiles, school lists, activities, essays, application tracking.

Table	Rows	Code Refs	Status	Notes
`activities`	31	184	🟢 Well-Utilized	Extracurricular tracking, heavily used
`school_list`	78	56	🟢 Well-Utilized	User school lists
`user_profiles`	9	73	🟢 Well-Utilized	Core user data
`essay_drafts`	47	43	🟢 Well-Utilized	Essay writing feature
`application_tracking`	34	14	🟢 Well-Utilized	Application status tracking
`experience_ideas`	4	5	🟢 Well-Utilized	Activity suggestions
`draft_comments`	0	0	🔴 Unused	Empty
`draft_shares`	0	0	🔴 Unused	Empty
`ps_feedback`	0	0	🔴 Unused	Empty
`relationships`	0	0	🔴 Unused	Empty

Key Insights:

Core user features are highly utilized (activities, essays, school lists)
4 empty tables suggest features planned but not implemented
Small user base (9 profiles) - early stage product

4. `rag` Schema - RAG/Vector Search

Purpose: Semantic search using embeddings, RAG pipeline.

Table	Rows	Code Refs	Status	Notes
`content_chunks`	9,662	4	🟢 Well-Utilized	Vector embeddings for RAG
`schools`	10	641	🟢 Well-Utilized	School data for RAG (note: 641 refs!)
`data_sources`	1	0	🟡 Underutilized	Metadata for content sources
`etl_chunks`	0	0	🔴 Unused	Empty ETL table
`etl_runs`	0	0	🔴 Unused	Empty ETL table
`query_cache`	0	0	🔴 Unused	Empty cache table

Key Insights:

RAG system is active with 9,662 content chunks
rag.schools has 641 code references (vs 188 for med_schools.medical_school) - interesting duplication
ETL pipeline tables are unused - suggests manual/different ingestion process
Only 10 schools in RAG system vs 173 in main DB - expansion opportunity

5. `scraper` Schema - Web Scraping & Analysis

Purpose: Automated scraping of school websites, AI analysis of pages.

Table	Rows	Code Refs	Status	Notes
`scraped_pages`	2,821	0	🔴 MAJOR UNUSED	Scraped content, NO usage!
`page_analysis_artifacts`	2,821	1	🔴 MAJOR UNUSED	AI-analyzed pages, 1 ref only
`synthesis_artifacts`	274	0	🔴 Unused	Topic syntheses, no usage
`link_queue`	115	0	🔴 Unused	Scraper queue
`school_topic_profiles`	0	0	🔴 Unused	Empty
`scraped_images`	0	0	🔴 Unused	Empty

Key Insights:

🚨 HUGE OPPORTUNITY: 2,821 scraped & AI-analyzed pages sitting unused
Analysis includes: main_topic, key_points, content_summary, worth_referencing, essay_positioning_strategy
Scraped between June-July 2025 (relatively fresh)
274 synthesis artifacts (topical summaries across schools) - could power comparison features
Covers 34 unique schools (per synthesis_artifacts)
Zero utilization in app despite massive investment in scraping/analysis

6. `ai` Schema - AI/Chat Features

Purpose: AI matching, prompt templates.

Table	Rows	Code Refs	Status	Notes
`prompt_template`	2	4	🟢 Well-Utilized	AI prompt templates
`match_results`	0	1	🟡 Underutilized	Empty but has code reference

Key Insights:

Small schema, prompt templates in use
match_results table exists but empty - planned feature?

7. `public` Schema - Shared/Public Tables

Purpose: App config, audit logs, contact forms, interview feature.

Table	Rows	Code Refs	Status	Notes
`interview_sessions`	21	5	🟢 Well-Utilized	Interview practice sessions
`interview_questions`	10	0	🔴 Unused	Questions available but not referenced
`audit_log`	6	4	🟢 Well-Utilized	System audit trail
`app_config`	1	3	🟢 Well-Utilized	App configuration
`contact_messages`	0	1	🟡 Underutilized	Contact form (empty)

Key Insights:

Interview feature is active (21 sessions, 2 users)
interview_questions has data but no code references - needs investigation

8. `extensions` Schema - Extended User Data?

Purpose: Unclear - appears to be duplicate/orphaned user data.

Table	Rows	Status	Notes
`school_list`	1	🔴 Orphaned	Duplicate of user_data.school_list?
`user_profiles`	0	🔴 Orphaned	Duplicate of user_data.user_profiles?
`gpa`	0	🔴 Orphaned	Empty
`mcat_scores`	0	🔴 Orphaned	Empty
`relationships`	0	🔴 Orphaned	Empty

Key Insights:

🚨 CLEANUP CANDIDATE: Entire schema appears orphaned
No code references to extensions.* in codebase
Possible leftover from schema migration or abandoned feature

9. System Schemas (Auth, Storage, Realtime, Vault)

Status: ⚪ System/Internal - Managed by Supabase, not application code.

auth.* - 22 tables for authentication (Supabase managed)
storage.* - 9 tables for file storage (Supabase managed)
realtime.* - 3 tables for subscriptions (Supabase managed)
vault.* - 1 table for secrets (Supabase managed)
drizzle.* - 1 migration table
supabase_migrations.* - 1 migration table

Utilization Matrix

By Category

Category	Count	Examples
🟢 Well-Utilized	16	`medical_school`, `activities`, `blog_posts`, `faqs`, `content_chunks`
🟡 Underutilized	19	`med_school_mcat_scores`, `med_school_curriculum`, `med_school_financial`
🔴 Unused	20	`scraped_pages`, `med_school_overview`, `state_pages`, entire `extensions` schema
⚪ System/Internal	49	`auth.`, `storage.`, `realtime.`, `vault.`

By Schema (Application Tables Only)

Schema	Total	Well-Utilized	Underutilized	Unused
`med_schools`	28	3	9	16
`content`	8	6	0	2
`user_data`	10	6	0	4
`rag`	6	2	1	3
`scraper`	6	0	0	6
`ai`	2	1	1	0
`public`	5	3	1	1
`extensions`	5	0	0	5

Top Opportunities

1. 🚀 Scraper Data Goldmine (Priority: CRITICAL)

The Opportunity:

2,821 AI-analyzed pages from 34 schools with zero utilization
Each page has: main_topic, key_points, full_content_summary, worth_referencing (very_high/high/medium/low), importance_for_application, essay_positioning_strategy
274 synthesis artifacts - topical summaries across schools

Potential Features:

School Deep Dives: Display analyzed content on school profile pages
Essay Positioning: Use essay_positioning_strategy to help students frame their essays
Topic Comparisons: Use synthesis artifacts to compare schools on specific topics
SEO Content: Generate blog posts from analyzed pages
Smart Search: "Show me schools with strong community service programs" using analyzed topics
Application Intel: Surface importance_for_application insights

Example Data Found:

School: California Northstate (ID 25827)
Topic: "Clínica Tepati: Student-Run Clinic for Underserved Latinx Community"
Worth Referencing: very_high
Summary: "Clínica Tepati represents a significant opportunity for medical students..."

Impact: HIGH - Massive investment already made in scraping/analysis, just needs UI

2. 🏥 Student Life Data (Priority: HIGH)

The Opportunity:

172 schools with detailed student life data: diversity, housing, campus culture, wellness programs, transportation, support systems
Zero code references - completely unused

Potential Features:

"What's it like?" section on school profiles
Lifestyle filters: "Find schools with on-campus housing" or "Strong wellness programs"
Culture insights: Display campus culture descriptions
Comparison tool: Compare student life across schools

Impact: MEDIUM-HIGH - Helps students beyond just stats, differentiates us from competitors

3. 📚 Curriculum Details (Priority: HIGH)

The Opportunity:

172 schools with curriculum data: research programs, clinical rotations, curricular highlights, global experiences, primary care programs
Zero code references

Potential Features:

Academic Deep Dive: Display curriculum highlights, research opportunities
Program Filters: "Schools with required research thesis" or "Global health programs"
Rotation Info: Show clinical rotation details
Academic System: Display grading systems, lecture attendance policies

Example Data:

School: UAB (ID 26341)
Curricular Highlights: "Four-year program emphasizes fundamentals underlying clinical practice..."
Research: "UAB medical students have options for engaging in research..."

Impact: MEDIUM-HIGH - Important for students choosing schools

4. 💰 Financial Data Exposure (Priority: MEDIUM)

The Opportunity:

278 rows of financial data (tuition, costs)
Only 1 code reference - barely used

Potential Features:

Cost Calculator: Estimate total cost of attendance
Financial Comparison: Compare tuition across schools
Financial Aid Intel: Surface aid information

Impact: MEDIUM - Important factor for applicants

5. 📊 MAT Data Breakdown (Priority: MEDIUM)

The Opportunity:

1,376 rows of matriculant data by type: MATRIC, MDPHD, APPLIED, INTERVIEWED, BACMD, EDP, DEFERRED, EAP
7 code references - minimal use

Data Available:

Applications by school (987,317 total across all schools!)
Interviews by school (98,650 total)
Matriculants by school (25,497 total)
MD/PhD acceptances (747 total)
Early decision/early assurance programs

Potential Features:

"What are my chances?" calculator: Show acceptance rates by program type
Interview likelihood: Based on historical interview numbers
Program insights: Display MD/PhD, BA/MD availability

Impact: MEDIUM - Competitive advantage for application planning

6. 🎯 State Pages for SEO (Priority: LOW-MEDIUM)

The Opportunity:

59 state pages exist with content
Zero code references - not published

Potential Features:

State landing pages: "Medical Schools in California"
SEO juice: Rank for "[state] medical schools" searches
Local insights: State-specific application tips

Impact: LOW-MEDIUM - SEO benefit, low implementation cost

7. 🔍 RAG Expansion (Priority: MEDIUM)

The Opportunity:

Only 10 schools in RAG system vs 173 in main database
9,662 content chunks suggest system works well

Potential Features:

Expand to all 173 schools for comprehensive semantic search
Integrate scraper analysis into RAG system (2,821 analyzed pages!)
Cross-school search: "Which schools emphasize primary care?"

Impact: MEDIUM - Better search/discovery for users

Cleanup Recommendations

High Priority

Drop extensions schema entirely (5 empty/orphaned tables)
- No code references
- Appears to be leftover from migration
- Action: Verify no dependencies, then DROP SCHEMA extensions CASCADE
Consolidate or drop empty med_schools tables (15 tables, 0 rows)
- Tables like med_school_acceptance, med_school_assessments, med_school_campuses, etc.
- Either populate them or remove them
- Action: Confirm these aren't used in imports/ETL, then drop
Drop unused RAG ETL tables (3 tables)
- etl_chunks, etl_runs, query_cache all empty
- Action: DROP if not part of future pipeline

Medium Priority

Archive or utilize scraper data
- Don't delete! This is valuable, just unused
- Action: Prioritize building features that use this data (see Opportunities #1)
Investigate interview_questions table
- Has 10 rows but no code references
- Action: Check if this should be used or is redundant
Clean up empty user_data tables
- draft_comments, draft_shares, ps_feedback, relationships all empty
- Action: Decide if these are planned features or can be dropped

Low Priority

Consolidate rag.schools vs med_schools.medical_school
- 641 code references to rag.schools vs 188 to medical_school
- Only 10 schools in RAG vs 173 in main DB
- Action: Investigate data model, consider consolidation or expansion

Schema Design Concerns

1. Data Duplication

rag.schools (10 rows) vs med_schools.medical_school (173 rows)
extensions.school_list vs user_data.school_list
Recommendation: Normalize or document the distinction

2. Naming Inconsistency

Most med_schools tables use medical_school_id FK
Some schemas use different conventions
Recommendation: Establish naming convention guide

3. Empty Table Proliferation

25 completely empty tables across the database
Suggests features planned but not built
Recommendation: Either populate or remove to reduce schema noise

4. No Timestamps on Key Tables

Some med_schools tables lack created_at/updated_at
Hard to track data freshness
Recommendation: Add audit timestamps to all data tables

5. Scraper Data Isolation

Valuable scraper data is completely isolated from main app
No foreign key relationships to med_schools tables (wait, there is via medical_school_id!)
Recommendation: Integrate scraper analysis into school profiles

Recommended Next Steps

Immediate (This Week)

✅ Review this audit with stakeholders
🧹 Clean up extensions schema (5 empty tables, no refs)
📋 Prioritize scraper data features - biggest opportunity

Short Term (Next 2 Weeks)

🏗️ Build "Student Life" tab on school profiles (172 schools of data ready)
🏗️ Build "Curriculum" tab on school profiles (172 schools of data ready)
🧹 Drop 15 empty med_schools tables after verification

Medium Term (Next Month)

🚀 Launch scraper content features (see Opportunities #1)
💰 Enhance financial comparison tools
📊 Build "What are my chances?" calculator using MAT data
🎯 Publish state pages for SEO

Long Term (Next Quarter)

🔍 Expand RAG to all 173 schools (currently only 10)
🔗 Integrate scraper analysis into RAG system
📐 Normalize schema (consolidate duplicates, establish conventions)
🏗️ Build synthesis-based school comparison features

Data Quality Notes

Positive

✅ Core tables (medical_school, activities, blog_posts) are well-maintained
✅ RAG system has good volume (9,662 chunks)
✅ Scraper data is fresh (June-July 2025)
✅ User features are actively used (essays, school lists, tracking)

Concerns

⚠️ Only 9 user profiles - early stage product
⚠️ MAT data last updated: unknown (check freshness)
⚠️ Only 10 schools in RAG vs 173 total
⚠️ No last_updated timestamps on many tables

Appendix: Code Reference Methodology

Search performed:

cd ~/clawd/projects/medschools_ai/apps/medschools-ai
grep -r "<table_name>" --include="*.ts" --include="*.tsx" --include="*.svelte" \
  --exclude-dir=node_modules --exclude-dir=.svelte-kit --exclude-dir=dist | wc -l

Files searched: 545 TypeScript/Svelte files

Note: Reference counts include:

Direct SQL queries
Supabase client calls (.from('table_name'))
Type definitions
Comments mentioning table names

Limitation: Does not capture:

Server-side API calls outside this codebase
Database triggers/functions using tables
Manual SQL queries run by developers

Summary Statistics

Metric	Value
Total Tables	104
Application Tables	70
Well-Utilized Tables	16 (23%)
Underutilized Tables	19 (27%)
Unused Tables	20 (29%)
System Tables	49 (47%)
Empty Tables	25
Tables with >1K rows	9
Largest Table	med_school_mcat_scores (4,376)
Most Referenced Table	rag.schools (641 refs)
Total Med School Records	173 schools
Total Content Chunks	9,662
Total Scraped Pages	2,821
Total User Profiles	9

End of Audit

For questions or clarifications, contact Atlas Reeves (backend-dev@widerwings.com)

Database Utilization Audit

MedSchools.ai Database Utilization Audit

Executive Summary

Key Findings

Schema-by-Schema Breakdown

1. med_schools Schema - Medical School Data

2. content Schema - Blog & Educational Content

3. user_data Schema - User Activity & Profiles

4. rag Schema - RAG/Vector Search

5. scraper Schema - Web Scraping & Analysis

6. ai Schema - AI/Chat Features

7. public Schema - Shared/Public Tables

8. extensions Schema - Extended User Data?

9. System Schemas (Auth, Storage, Realtime, Vault)

Utilization Matrix

By Category

By Schema (Application Tables Only)

Top Opportunities

1. 🚀 Scraper Data Goldmine (Priority: CRITICAL)

2. 🏥 Student Life Data (Priority: HIGH)

3. 📚 Curriculum Details (Priority: HIGH)

4. 💰 Financial Data Exposure (Priority: MEDIUM)

5. 📊 MAT Data Breakdown (Priority: MEDIUM)

6. 🎯 State Pages for SEO (Priority: LOW-MEDIUM)

7. 🔍 RAG Expansion (Priority: MEDIUM)

Cleanup Recommendations

High Priority

Medium Priority

Low Priority

Schema Design Concerns

1. Data Duplication

2. Naming Inconsistency

3. Empty Table Proliferation

4. No Timestamps on Key Tables

5. Scraper Data Isolation

Recommended Next Steps

Immediate (This Week)

Short Term (Next 2 Weeks)

Medium Term (Next Month)

Long Term (Next Quarter)

Data Quality Notes

Positive

Concerns

Appendix: Code Reference Methodology

Summary Statistics

1. `med_schools` Schema - Medical School Data

2. `content` Schema - Blog & Educational Content

3. `user_data` Schema - User Activity & Profiles

4. `rag` Schema - RAG/Vector Search

5. `scraper` Schema - Web Scraping & Analysis

6. `ai` Schema - AI/Chat Features

7. `public` Schema - Shared/Public Tables

8. `extensions` Schema - Extended User Data?