Database Utilization Audit
P2 - MediumMedSchools.ai Database Utilization Audit
Audit Date: February 11, 2026
Conducted by: Atlas Reeves (Backend Dev)
Database: Supabase (ohkdiwblocbrcfrhxeyg.supabase.co)
Codebase: ~/clawd/projects/medschools_ai/apps/medschools-ai/
Executive Summary
Key Findings
π Database Scale:
- 104 total tables across 12 schemas
- 70 application tables (excluding system/auth/storage)
- ~1.1M+ total medical school records (applicant/matriculant data)
- ~10K RAG content chunks for semantic search
- ~2.8K scraped pages with analysis artifacts
π― Utilization Overview:
- 23% Well-Utilized (16 tables) - Core features depend on them
- 27% Underutilized (19 tables) - Has data but rarely queried
- 29% Unused (20 tables) - Empty or no code references
- 21% System/Internal (15 tables) - Auth, storage, migrations
π‘ Top Opportunities:
- Scraper data (2,821 analyzed pages) - Enormous SEO & content potential, barely used
- Student life data (172 schools) - Rich lifestyle info, never displayed
- Curriculum details (172 schools) - Research programs, rotations not shown
- MAT data breakdown - Application/interview numbers available but not exposed
β οΈ Cleanup Candidates:
- 15 empty tables in med_schools schema (0 rows)
- 5 empty tables in extensions schema (orphaned from migration?)
- 3 empty ETL tables in RAG schema (unused pipeline)
- 2 empty tables in user_data (draft_comments, ps_feedback)
Schema-by-Schema Breakdown
1. med_schools Schema - Medical School Data
Purpose: Core medical school information, statistics, and application data.
| Table | Rows | Code Refs | Status | Notes |
|---|---|---|---|---|
medical_school |
173 | 188 | π’ Well-Utilized | Primary school table, heavily used |
med_school_mcat_scores |
4,376 | 3 | π‘ Underutilized | MCAT data by year/type, minimal usage |
med_school_gpa_scores |
1,750 | 3 | π‘ Underutilized | GPA data, minimal usage |
med_school_mat_data |
1,376 | 7 | π‘ Underutilized | Applied/interviewed/matriculated breakdown |
med_school_mcat |
876 | 6 | π‘ Underutilized | MCAT summary by school |
med_school_gpa |
876 | 6 | π‘ Underutilized | GPA summary by school |
secondary_essay_questions |
801 | 5 | π’ Well-Utilized | Used in secondaries feature |
secondary_essay_answers |
1,103 | 1 | π’ Well-Utilized | User-generated secondary answers |
med_school_financial |
278 | 1 | π΄ Underutilized | Tuition/costs - only 1 reference! |
med_school_application |
278 | 2 | π‘ Underutilized | Application requirements |
med_school_overview |
173 | 0 | π΄ Unused | No code references |
med_school_curriculum |
172 | 0 | π΄ Unused | Rich curriculum data not displayed |
med_school_stud_life |
172 | 0 | π΄ Unused | Student life details not used |
med_school_location |
172 | 0 | π΄ Unused | Location details available but not queried |
med_school_information |
172 | 2 | π‘ Underutilized | Minimal usage |
med_school_interview |
12 | 4 | π’ Well-Utilized | Interview format/questions |
| Empty Tables (0 rows): | π΄ Cleanup | |||
med_school_acceptance |
0 | 0 | π΄ Unused | Empty |
med_school_assessments |
0 | 0 | π΄ Unused | Empty |
med_school_campuses |
0 | 0 | π΄ Unused | Empty |
med_school_courseworks |
0 | 0 | π΄ Unused | Empty |
med_school_mat_demo |
0 | 0 | π΄ Unused | Empty demographics table |
med_school_misc |
0 | 0 | π΄ Unused | Empty |
med_school_programs |
0 | 0 | π΄ Unused | Empty |
med_school_race_ethnicity |
0 | 0 | π΄ Unused | Empty |
med_school_selection |
0 | 0 | π΄ Unused | Empty |
med_school_specialty |
0 | 0 | π΄ Unused | Empty |
med_school_staffs |
0 | 0 | π΄ Unused | Empty |
med_school_timeline |
0 | 0 | π΄ Unused | Empty |
Key Insights:
- Main table is heavily used (188 references), but supporting tables are underutilized
- 15 tables are completely empty - candidates for cleanup
- Rich data exists but isn't exposed: curriculum, student life, detailed location info
- MAT data has 1,376 rows showing applied/interviewed/matriculated breakdowns by program type (MATRIC, MDPHD, APPLIED, etc.) - could power "What are my chances?" features
2. content Schema - Blog & Educational Content
Purpose: CMS for blog posts, guides, rankings, FAQs, and glossary.
| Table | Rows | Code Refs | Status | Notes |
|---|---|---|---|---|
faqs |
100 | 38 | π’ Well-Utilized | FAQ feature heavily used |
glossary_terms |
115 | 6 | π’ Well-Utilized | Medical terminology glossary |
state_pages |
59 | 0 | π΄ Unused | State-specific pages not referenced |
rankings |
10 | 45 | π’ Well-Utilized | School rankings feature |
blog_posts |
12 | 24 | π’ Well-Utilized | Blog content |
authors |
6 | 16 | π’ Well-Utilized | Blog authors |
guides |
3 | 40 | π’ Well-Utilized | Educational guides |
blog_post_schools |
0 | 0 | π΄ Unused | Empty junction table |
Key Insights:
- Content features are well-utilized overall
- State pages (59 rows) exist but have zero code references - SEO opportunity?
blog_post_schoolsjunction table is empty but defined
3. user_data Schema - User Activity & Profiles
Purpose: User profiles, school lists, activities, essays, application tracking.
| Table | Rows | Code Refs | Status | Notes |
|---|---|---|---|---|
activities |
31 | 184 | π’ Well-Utilized | Extracurricular tracking, heavily used |
school_list |
78 | 56 | π’ Well-Utilized | User school lists |
user_profiles |
9 | 73 | π’ Well-Utilized | Core user data |
essay_drafts |
47 | 43 | π’ Well-Utilized | Essay writing feature |
application_tracking |
34 | 14 | π’ Well-Utilized | Application status tracking |
experience_ideas |
4 | 5 | π’ Well-Utilized | Activity suggestions |
draft_comments |
0 | 0 | π΄ Unused | Empty |
draft_shares |
0 | 0 | π΄ Unused | Empty |
ps_feedback |
0 | 0 | π΄ Unused | Empty |
relationships |
0 | 0 | π΄ Unused | Empty |
Key Insights:
- Core user features are highly utilized (activities, essays, school lists)
- 4 empty tables suggest features planned but not implemented
- Small user base (9 profiles) - early stage product
4. rag Schema - RAG/Vector Search
Purpose: Semantic search using embeddings, RAG pipeline.
| Table | Rows | Code Refs | Status | Notes |
|---|---|---|---|---|
content_chunks |
9,662 | 4 | π’ Well-Utilized | Vector embeddings for RAG |
schools |
10 | 641 | π’ Well-Utilized | School data for RAG (note: 641 refs!) |
data_sources |
1 | 0 | π‘ Underutilized | Metadata for content sources |
etl_chunks |
0 | 0 | π΄ Unused | Empty ETL table |
etl_runs |
0 | 0 | π΄ Unused | Empty ETL table |
query_cache |
0 | 0 | π΄ Unused | Empty cache table |
Key Insights:
- RAG system is active with 9,662 content chunks
rag.schoolshas 641 code references (vs 188 formed_schools.medical_school) - interesting duplication- ETL pipeline tables are unused - suggests manual/different ingestion process
- Only 10 schools in RAG system vs 173 in main DB - expansion opportunity
5. scraper Schema - Web Scraping & Analysis
Purpose: Automated scraping of school websites, AI analysis of pages.
| Table | Rows | Code Refs | Status | Notes |
|---|---|---|---|---|
scraped_pages |
2,821 | 0 | π΄ MAJOR UNUSED | Scraped content, NO usage! |
page_analysis_artifacts |
2,821 | 1 | π΄ MAJOR UNUSED | AI-analyzed pages, 1 ref only |
synthesis_artifacts |
274 | 0 | π΄ Unused | Topic syntheses, no usage |
link_queue |
115 | 0 | π΄ Unused | Scraper queue |
school_topic_profiles |
0 | 0 | π΄ Unused | Empty |
scraped_images |
0 | 0 | π΄ Unused | Empty |
Key Insights:
- π¨ HUGE OPPORTUNITY: 2,821 scraped & AI-analyzed pages sitting unused
- Analysis includes: main_topic, key_points, content_summary, worth_referencing, essay_positioning_strategy
- Scraped between June-July 2025 (relatively fresh)
- 274 synthesis artifacts (topical summaries across schools) - could power comparison features
- Covers 34 unique schools (per synthesis_artifacts)
- Zero utilization in app despite massive investment in scraping/analysis
6. ai Schema - AI/Chat Features
Purpose: AI matching, prompt templates.
| Table | Rows | Code Refs | Status | Notes |
|---|---|---|---|---|
prompt_template |
2 | 4 | π’ Well-Utilized | AI prompt templates |
match_results |
0 | 1 | π‘ Underutilized | Empty but has code reference |
Key Insights:
- Small schema, prompt templates in use
match_resultstable exists but empty - planned feature?
7. public Schema - Shared/Public Tables
Purpose: App config, audit logs, contact forms, interview feature.
| Table | Rows | Code Refs | Status | Notes |
|---|---|---|---|---|
interview_sessions |
21 | 5 | π’ Well-Utilized | Interview practice sessions |
interview_questions |
10 | 0 | π΄ Unused | Questions available but not referenced |
audit_log |
6 | 4 | π’ Well-Utilized | System audit trail |
app_config |
1 | 3 | π’ Well-Utilized | App configuration |
contact_messages |
0 | 1 | π‘ Underutilized | Contact form (empty) |
Key Insights:
- Interview feature is active (21 sessions, 2 users)
interview_questionshas data but no code references - needs investigation
8. extensions Schema - Extended User Data?
Purpose: Unclear - appears to be duplicate/orphaned user data.
| Table | Rows | Code Refs | Status | Notes |
|---|---|---|---|---|
school_list |
1 | 0 | π΄ Orphaned | Duplicate of user_data.school_list? |
user_profiles |
0 | 0 | π΄ Orphaned | Duplicate of user_data.user_profiles? |
gpa |
0 | 0 | π΄ Orphaned | Empty |
mcat_scores |
0 | 0 | π΄ Orphaned | Empty |
relationships |
0 | 0 | π΄ Orphaned | Empty |
Key Insights:
- π¨ CLEANUP CANDIDATE: Entire schema appears orphaned
- No code references to
extensions.*in codebase - Possible leftover from schema migration or abandoned feature
9. System Schemas (Auth, Storage, Realtime, Vault)
Status: βͺ System/Internal - Managed by Supabase, not application code.
auth.*- 22 tables for authentication (Supabase managed)storage.*- 9 tables for file storage (Supabase managed)realtime.*- 3 tables for subscriptions (Supabase managed)vault.*- 1 table for secrets (Supabase managed)drizzle.*- 1 migration tablesupabase_migrations.*- 1 migration table
Utilization Matrix
By Category
| Category | Count | Examples |
|---|---|---|
| π’ Well-Utilized | 16 | medical_school, activities, blog_posts, faqs, content_chunks |
| π‘ Underutilized | 19 | med_school_mcat_scores, med_school_curriculum, med_school_financial |
| π΄ Unused | 20 | scraped_pages, med_school_overview, state_pages, entire extensions schema |
| βͺ System/Internal | 49 | auth.*, storage.*, realtime.*, vault.* |
By Schema (Application Tables Only)
| Schema | Total | Well-Utilized | Underutilized | Unused |
|---|---|---|---|---|
med_schools |
28 | 3 | 9 | 16 |
content |
8 | 6 | 0 | 2 |
user_data |
10 | 6 | 0 | 4 |
rag |
6 | 2 | 1 | 3 |
scraper |
6 | 0 | 0 | 6 |
ai |
2 | 1 | 1 | 0 |
public |
5 | 3 | 1 | 1 |
extensions |
5 | 0 | 0 | 5 |
Top Opportunities
1. π Scraper Data Goldmine (Priority: CRITICAL)
The Opportunity:
- 2,821 AI-analyzed pages from 34 schools with zero utilization
- Each page has: main_topic, key_points, full_content_summary, worth_referencing (very_high/high/medium/low), importance_for_application, essay_positioning_strategy
- 274 synthesis artifacts - topical summaries across schools
Potential Features:
- School Deep Dives: Display analyzed content on school profile pages
- Essay Positioning: Use
essay_positioning_strategyto help students frame their essays - Topic Comparisons: Use synthesis artifacts to compare schools on specific topics
- SEO Content: Generate blog posts from analyzed pages
- Smart Search: "Show me schools with strong community service programs" using analyzed topics
- Application Intel: Surface
importance_for_applicationinsights
Example Data Found:
School: California Northstate (ID 25827)
Topic: "ClΓnica Tepati: Student-Run Clinic for Underserved Latinx Community"
Worth Referencing: very_high
Summary: "ClΓnica Tepati represents a significant opportunity for medical students..."
Impact: HIGH - Massive investment already made in scraping/analysis, just needs UI
2. π₯ Student Life Data (Priority: HIGH)
The Opportunity:
- 172 schools with detailed student life data: diversity, housing, campus culture, wellness programs, transportation, support systems
- Zero code references - completely unused
Potential Features:
- "What's it like?" section on school profiles
- Lifestyle filters: "Find schools with on-campus housing" or "Strong wellness programs"
- Culture insights: Display campus culture descriptions
- Comparison tool: Compare student life across schools
Impact: MEDIUM-HIGH - Helps students beyond just stats, differentiates us from competitors
3. π Curriculum Details (Priority: HIGH)
The Opportunity:
- 172 schools with curriculum data: research programs, clinical rotations, curricular highlights, global experiences, primary care programs
- Zero code references
Potential Features:
- Academic Deep Dive: Display curriculum highlights, research opportunities
- Program Filters: "Schools with required research thesis" or "Global health programs"
- Rotation Info: Show clinical rotation details
- Academic System: Display grading systems, lecture attendance policies
Example Data:
School: UAB (ID 26341)
Curricular Highlights: "Four-year program emphasizes fundamentals underlying clinical practice..."
Research: "UAB medical students have options for engaging in research..."
Impact: MEDIUM-HIGH - Important for students choosing schools
4. π° Financial Data Exposure (Priority: MEDIUM)
The Opportunity:
- 278 rows of financial data (tuition, costs)
- Only 1 code reference - barely used
Potential Features:
- Cost Calculator: Estimate total cost of attendance
- Financial Comparison: Compare tuition across schools
- Financial Aid Intel: Surface aid information
Impact: MEDIUM - Important factor for applicants
5. π MAT Data Breakdown (Priority: MEDIUM)
The Opportunity:
- 1,376 rows of matriculant data by type: MATRIC, MDPHD, APPLIED, INTERVIEWED, BACMD, EDP, DEFERRED, EAP
- 7 code references - minimal use
Data Available:
- Applications by school (987,317 total across all schools!)
- Interviews by school (98,650 total)
- Matriculants by school (25,497 total)
- MD/PhD acceptances (747 total)
- Early decision/early assurance programs
Potential Features:
- "What are my chances?" calculator: Show acceptance rates by program type
- Interview likelihood: Based on historical interview numbers
- Program insights: Display MD/PhD, BA/MD availability
Impact: MEDIUM - Competitive advantage for application planning
6. π― State Pages for SEO (Priority: LOW-MEDIUM)
The Opportunity:
- 59 state pages exist with content
- Zero code references - not published
Potential Features:
- State landing pages: "Medical Schools in California"
- SEO juice: Rank for "[state] medical schools" searches
- Local insights: State-specific application tips
Impact: LOW-MEDIUM - SEO benefit, low implementation cost
7. π RAG Expansion (Priority: MEDIUM)
The Opportunity:
- Only 10 schools in RAG system vs 173 in main database
- 9,662 content chunks suggest system works well
Potential Features:
- Expand to all 173 schools for comprehensive semantic search
- Integrate scraper analysis into RAG system (2,821 analyzed pages!)
- Cross-school search: "Which schools emphasize primary care?"
Impact: MEDIUM - Better search/discovery for users
Cleanup Recommendations
High Priority
Drop
extensionsschema entirely (5 empty/orphaned tables)- No code references
- Appears to be leftover from migration
- Action: Verify no dependencies, then DROP SCHEMA extensions CASCADE
Consolidate or drop empty med_schools tables (15 tables, 0 rows)
- Tables like
med_school_acceptance,med_school_assessments,med_school_campuses, etc. - Either populate them or remove them
- Action: Confirm these aren't used in imports/ETL, then drop
- Tables like
Drop unused RAG ETL tables (3 tables)
etl_chunks,etl_runs,query_cacheall empty- Action: DROP if not part of future pipeline
Medium Priority
Archive or utilize scraper data
- Don't delete! This is valuable, just unused
- Action: Prioritize building features that use this data (see Opportunities #1)
Investigate
interview_questionstable- Has 10 rows but no code references
- Action: Check if this should be used or is redundant
Clean up empty user_data tables
draft_comments,draft_shares,ps_feedback,relationshipsall empty- Action: Decide if these are planned features or can be dropped
Low Priority
- Consolidate rag.schools vs med_schools.medical_school
- 641 code references to
rag.schoolsvs 188 tomedical_school - Only 10 schools in RAG vs 173 in main DB
- Action: Investigate data model, consider consolidation or expansion
- 641 code references to
Schema Design Concerns
1. Data Duplication
rag.schools(10 rows) vsmed_schools.medical_school(173 rows)extensions.school_listvsuser_data.school_list- Recommendation: Normalize or document the distinction
2. Naming Inconsistency
- Most med_schools tables use
medical_school_idFK - Some schemas use different conventions
- Recommendation: Establish naming convention guide
3. Empty Table Proliferation
- 25 completely empty tables across the database
- Suggests features planned but not built
- Recommendation: Either populate or remove to reduce schema noise
4. No Timestamps on Key Tables
- Some med_schools tables lack
created_at/updated_at - Hard to track data freshness
- Recommendation: Add audit timestamps to all data tables
5. Scraper Data Isolation
- Valuable scraper data is completely isolated from main app
- No foreign key relationships to med_schools tables (wait, there is via medical_school_id!)
- Recommendation: Integrate scraper analysis into school profiles
Recommended Next Steps
Immediate (This Week)
- β Review this audit with stakeholders
- π§Ή Clean up
extensionsschema (5 empty tables, no refs) - π Prioritize scraper data features - biggest opportunity
Short Term (Next 2 Weeks)
- ποΈ Build "Student Life" tab on school profiles (172 schools of data ready)
- ποΈ Build "Curriculum" tab on school profiles (172 schools of data ready)
- π§Ή Drop 15 empty med_schools tables after verification
Medium Term (Next Month)
- π Launch scraper content features (see Opportunities #1)
- π° Enhance financial comparison tools
- π Build "What are my chances?" calculator using MAT data
- π― Publish state pages for SEO
Long Term (Next Quarter)
- π Expand RAG to all 173 schools (currently only 10)
- π Integrate scraper analysis into RAG system
- π Normalize schema (consolidate duplicates, establish conventions)
- ποΈ Build synthesis-based school comparison features
Data Quality Notes
Positive
- β Core tables (medical_school, activities, blog_posts) are well-maintained
- β RAG system has good volume (9,662 chunks)
- β Scraper data is fresh (June-July 2025)
- β User features are actively used (essays, school lists, tracking)
Concerns
- β οΈ Only 9 user profiles - early stage product
- β οΈ MAT data last updated: unknown (check freshness)
- β οΈ Only 10 schools in RAG vs 173 total
- β οΈ No last_updated timestamps on many tables
Appendix: Code Reference Methodology
Search performed:
cd ~/clawd/projects/medschools_ai/apps/medschools-ai
grep -r "<table_name>" --include="*.ts" --include="*.tsx" --include="*.svelte" \
--exclude-dir=node_modules --exclude-dir=.svelte-kit --exclude-dir=dist | wc -l
Files searched: 545 TypeScript/Svelte files
Note: Reference counts include:
- Direct SQL queries
- Supabase client calls (
.from('table_name')) - Type definitions
- Comments mentioning table names
Limitation: Does not capture:
- Server-side API calls outside this codebase
- Database triggers/functions using tables
- Manual SQL queries run by developers
Summary Statistics
| Metric | Value |
|---|---|
| Total Tables | 104 |
| Application Tables | 70 |
| Well-Utilized Tables | 16 (23%) |
| Underutilized Tables | 19 (27%) |
| Unused Tables | 20 (29%) |
| System Tables | 49 (47%) |
| Empty Tables | 25 |
| Tables with >1K rows | 9 |
| Largest Table | med_school_mcat_scores (4,376) |
| Most Referenced Table | rag.schools (641 refs) |
| Total Med School Records | 173 schools |
| Total Content Chunks | 9,662 |
| Total Scraped Pages | 2,821 |
| Total User Profiles | 9 |
End of Audit
For questions or clarifications, contact Atlas Reeves (backend-dev@widerwings.com)
Created: Wed, Mar 4, 2026, 11:01 PM by bob
Updated: Wed, Mar 4, 2026, 11:01 PM
Last accessed: Sat, Mar 28, 2026, 3:02 AM
ID: 8999ba2f-1110-4c16-a505-5a143a585fb8