🧠 All Projects
πŸ”¬

Database Utilization Audit

P2 - Medium
Research MedSchools.ai

MedSchools.ai Database Utilization Audit

Audit Date: February 11, 2026
Conducted by: Atlas Reeves (Backend Dev)
Database: Supabase (ohkdiwblocbrcfrhxeyg.supabase.co)
Codebase: ~/clawd/projects/medschools_ai/apps/medschools-ai/


Executive Summary

Key Findings

πŸ“Š Database Scale:

  • 104 total tables across 12 schemas
  • 70 application tables (excluding system/auth/storage)
  • ~1.1M+ total medical school records (applicant/matriculant data)
  • ~10K RAG content chunks for semantic search
  • ~2.8K scraped pages with analysis artifacts

🎯 Utilization Overview:

  • 23% Well-Utilized (16 tables) - Core features depend on them
  • 27% Underutilized (19 tables) - Has data but rarely queried
  • 29% Unused (20 tables) - Empty or no code references
  • 21% System/Internal (15 tables) - Auth, storage, migrations

πŸ’‘ Top Opportunities:

  1. Scraper data (2,821 analyzed pages) - Enormous SEO & content potential, barely used
  2. Student life data (172 schools) - Rich lifestyle info, never displayed
  3. Curriculum details (172 schools) - Research programs, rotations not shown
  4. MAT data breakdown - Application/interview numbers available but not exposed

⚠️ Cleanup Candidates:

  • 15 empty tables in med_schools schema (0 rows)
  • 5 empty tables in extensions schema (orphaned from migration?)
  • 3 empty ETL tables in RAG schema (unused pipeline)
  • 2 empty tables in user_data (draft_comments, ps_feedback)

Schema-by-Schema Breakdown

1. med_schools Schema - Medical School Data

Purpose: Core medical school information, statistics, and application data.

Table Rows Code Refs Status Notes
medical_school 173 188 🟒 Well-Utilized Primary school table, heavily used
med_school_mcat_scores 4,376 3 🟑 Underutilized MCAT data by year/type, minimal usage
med_school_gpa_scores 1,750 3 🟑 Underutilized GPA data, minimal usage
med_school_mat_data 1,376 7 🟑 Underutilized Applied/interviewed/matriculated breakdown
med_school_mcat 876 6 🟑 Underutilized MCAT summary by school
med_school_gpa 876 6 🟑 Underutilized GPA summary by school
secondary_essay_questions 801 5 🟒 Well-Utilized Used in secondaries feature
secondary_essay_answers 1,103 1 🟒 Well-Utilized User-generated secondary answers
med_school_financial 278 1 πŸ”΄ Underutilized Tuition/costs - only 1 reference!
med_school_application 278 2 🟑 Underutilized Application requirements
med_school_overview 173 0 πŸ”΄ Unused No code references
med_school_curriculum 172 0 πŸ”΄ Unused Rich curriculum data not displayed
med_school_stud_life 172 0 πŸ”΄ Unused Student life details not used
med_school_location 172 0 πŸ”΄ Unused Location details available but not queried
med_school_information 172 2 🟑 Underutilized Minimal usage
med_school_interview 12 4 🟒 Well-Utilized Interview format/questions
Empty Tables (0 rows): πŸ”΄ Cleanup
med_school_acceptance 0 0 πŸ”΄ Unused Empty
med_school_assessments 0 0 πŸ”΄ Unused Empty
med_school_campuses 0 0 πŸ”΄ Unused Empty
med_school_courseworks 0 0 πŸ”΄ Unused Empty
med_school_mat_demo 0 0 πŸ”΄ Unused Empty demographics table
med_school_misc 0 0 πŸ”΄ Unused Empty
med_school_programs 0 0 πŸ”΄ Unused Empty
med_school_race_ethnicity 0 0 πŸ”΄ Unused Empty
med_school_selection 0 0 πŸ”΄ Unused Empty
med_school_specialty 0 0 πŸ”΄ Unused Empty
med_school_staffs 0 0 πŸ”΄ Unused Empty
med_school_timeline 0 0 πŸ”΄ Unused Empty

Key Insights:

  • Main table is heavily used (188 references), but supporting tables are underutilized
  • 15 tables are completely empty - candidates for cleanup
  • Rich data exists but isn't exposed: curriculum, student life, detailed location info
  • MAT data has 1,376 rows showing applied/interviewed/matriculated breakdowns by program type (MATRIC, MDPHD, APPLIED, etc.) - could power "What are my chances?" features

2. content Schema - Blog & Educational Content

Purpose: CMS for blog posts, guides, rankings, FAQs, and glossary.

Table Rows Code Refs Status Notes
faqs 100 38 🟒 Well-Utilized FAQ feature heavily used
glossary_terms 115 6 🟒 Well-Utilized Medical terminology glossary
state_pages 59 0 πŸ”΄ Unused State-specific pages not referenced
rankings 10 45 🟒 Well-Utilized School rankings feature
blog_posts 12 24 🟒 Well-Utilized Blog content
authors 6 16 🟒 Well-Utilized Blog authors
guides 3 40 🟒 Well-Utilized Educational guides
blog_post_schools 0 0 πŸ”΄ Unused Empty junction table

Key Insights:

  • Content features are well-utilized overall
  • State pages (59 rows) exist but have zero code references - SEO opportunity?
  • blog_post_schools junction table is empty but defined

3. user_data Schema - User Activity & Profiles

Purpose: User profiles, school lists, activities, essays, application tracking.

Table Rows Code Refs Status Notes
activities 31 184 🟒 Well-Utilized Extracurricular tracking, heavily used
school_list 78 56 🟒 Well-Utilized User school lists
user_profiles 9 73 🟒 Well-Utilized Core user data
essay_drafts 47 43 🟒 Well-Utilized Essay writing feature
application_tracking 34 14 🟒 Well-Utilized Application status tracking
experience_ideas 4 5 🟒 Well-Utilized Activity suggestions
draft_comments 0 0 πŸ”΄ Unused Empty
draft_shares 0 0 πŸ”΄ Unused Empty
ps_feedback 0 0 πŸ”΄ Unused Empty
relationships 0 0 πŸ”΄ Unused Empty

Key Insights:

  • Core user features are highly utilized (activities, essays, school lists)
  • 4 empty tables suggest features planned but not implemented
  • Small user base (9 profiles) - early stage product

4. rag Schema - RAG/Vector Search

Purpose: Semantic search using embeddings, RAG pipeline.

Table Rows Code Refs Status Notes
content_chunks 9,662 4 🟒 Well-Utilized Vector embeddings for RAG
schools 10 641 🟒 Well-Utilized School data for RAG (note: 641 refs!)
data_sources 1 0 🟑 Underutilized Metadata for content sources
etl_chunks 0 0 πŸ”΄ Unused Empty ETL table
etl_runs 0 0 πŸ”΄ Unused Empty ETL table
query_cache 0 0 πŸ”΄ Unused Empty cache table

Key Insights:

  • RAG system is active with 9,662 content chunks
  • rag.schools has 641 code references (vs 188 for med_schools.medical_school) - interesting duplication
  • ETL pipeline tables are unused - suggests manual/different ingestion process
  • Only 10 schools in RAG system vs 173 in main DB - expansion opportunity

5. scraper Schema - Web Scraping & Analysis

Purpose: Automated scraping of school websites, AI analysis of pages.

Table Rows Code Refs Status Notes
scraped_pages 2,821 0 πŸ”΄ MAJOR UNUSED Scraped content, NO usage!
page_analysis_artifacts 2,821 1 πŸ”΄ MAJOR UNUSED AI-analyzed pages, 1 ref only
synthesis_artifacts 274 0 πŸ”΄ Unused Topic syntheses, no usage
link_queue 115 0 πŸ”΄ Unused Scraper queue
school_topic_profiles 0 0 πŸ”΄ Unused Empty
scraped_images 0 0 πŸ”΄ Unused Empty

Key Insights:

  • 🚨 HUGE OPPORTUNITY: 2,821 scraped & AI-analyzed pages sitting unused
  • Analysis includes: main_topic, key_points, content_summary, worth_referencing, essay_positioning_strategy
  • Scraped between June-July 2025 (relatively fresh)
  • 274 synthesis artifacts (topical summaries across schools) - could power comparison features
  • Covers 34 unique schools (per synthesis_artifacts)
  • Zero utilization in app despite massive investment in scraping/analysis

6. ai Schema - AI/Chat Features

Purpose: AI matching, prompt templates.

Table Rows Code Refs Status Notes
prompt_template 2 4 🟒 Well-Utilized AI prompt templates
match_results 0 1 🟑 Underutilized Empty but has code reference

Key Insights:

  • Small schema, prompt templates in use
  • match_results table exists but empty - planned feature?

7. public Schema - Shared/Public Tables

Purpose: App config, audit logs, contact forms, interview feature.

Table Rows Code Refs Status Notes
interview_sessions 21 5 🟒 Well-Utilized Interview practice sessions
interview_questions 10 0 πŸ”΄ Unused Questions available but not referenced
audit_log 6 4 🟒 Well-Utilized System audit trail
app_config 1 3 🟒 Well-Utilized App configuration
contact_messages 0 1 🟑 Underutilized Contact form (empty)

Key Insights:

  • Interview feature is active (21 sessions, 2 users)
  • interview_questions has data but no code references - needs investigation

8. extensions Schema - Extended User Data?

Purpose: Unclear - appears to be duplicate/orphaned user data.

Table Rows Code Refs Status Notes
school_list 1 0 πŸ”΄ Orphaned Duplicate of user_data.school_list?
user_profiles 0 0 πŸ”΄ Orphaned Duplicate of user_data.user_profiles?
gpa 0 0 πŸ”΄ Orphaned Empty
mcat_scores 0 0 πŸ”΄ Orphaned Empty
relationships 0 0 πŸ”΄ Orphaned Empty

Key Insights:

  • 🚨 CLEANUP CANDIDATE: Entire schema appears orphaned
  • No code references to extensions.* in codebase
  • Possible leftover from schema migration or abandoned feature

9. System Schemas (Auth, Storage, Realtime, Vault)

Status: βšͺ System/Internal - Managed by Supabase, not application code.

  • auth.* - 22 tables for authentication (Supabase managed)
  • storage.* - 9 tables for file storage (Supabase managed)
  • realtime.* - 3 tables for subscriptions (Supabase managed)
  • vault.* - 1 table for secrets (Supabase managed)
  • drizzle.* - 1 migration table
  • supabase_migrations.* - 1 migration table

Utilization Matrix

By Category

Category Count Examples
🟒 Well-Utilized 16 medical_school, activities, blog_posts, faqs, content_chunks
🟑 Underutilized 19 med_school_mcat_scores, med_school_curriculum, med_school_financial
πŸ”΄ Unused 20 scraped_pages, med_school_overview, state_pages, entire extensions schema
βšͺ System/Internal 49 auth.*, storage.*, realtime.*, vault.*

By Schema (Application Tables Only)

Schema Total Well-Utilized Underutilized Unused
med_schools 28 3 9 16
content 8 6 0 2
user_data 10 6 0 4
rag 6 2 1 3
scraper 6 0 0 6
ai 2 1 1 0
public 5 3 1 1
extensions 5 0 0 5

Top Opportunities

1. πŸš€ Scraper Data Goldmine (Priority: CRITICAL)

The Opportunity:

  • 2,821 AI-analyzed pages from 34 schools with zero utilization
  • Each page has: main_topic, key_points, full_content_summary, worth_referencing (very_high/high/medium/low), importance_for_application, essay_positioning_strategy
  • 274 synthesis artifacts - topical summaries across schools

Potential Features:

  • School Deep Dives: Display analyzed content on school profile pages
  • Essay Positioning: Use essay_positioning_strategy to help students frame their essays
  • Topic Comparisons: Use synthesis artifacts to compare schools on specific topics
  • SEO Content: Generate blog posts from analyzed pages
  • Smart Search: "Show me schools with strong community service programs" using analyzed topics
  • Application Intel: Surface importance_for_application insights

Example Data Found:

School: California Northstate (ID 25827)
Topic: "ClΓ­nica Tepati: Student-Run Clinic for Underserved Latinx Community"
Worth Referencing: very_high
Summary: "ClΓ­nica Tepati represents a significant opportunity for medical students..."

Impact: HIGH - Massive investment already made in scraping/analysis, just needs UI


2. πŸ₯ Student Life Data (Priority: HIGH)

The Opportunity:

  • 172 schools with detailed student life data: diversity, housing, campus culture, wellness programs, transportation, support systems
  • Zero code references - completely unused

Potential Features:

  • "What's it like?" section on school profiles
  • Lifestyle filters: "Find schools with on-campus housing" or "Strong wellness programs"
  • Culture insights: Display campus culture descriptions
  • Comparison tool: Compare student life across schools

Impact: MEDIUM-HIGH - Helps students beyond just stats, differentiates us from competitors


3. πŸ“š Curriculum Details (Priority: HIGH)

The Opportunity:

  • 172 schools with curriculum data: research programs, clinical rotations, curricular highlights, global experiences, primary care programs
  • Zero code references

Potential Features:

  • Academic Deep Dive: Display curriculum highlights, research opportunities
  • Program Filters: "Schools with required research thesis" or "Global health programs"
  • Rotation Info: Show clinical rotation details
  • Academic System: Display grading systems, lecture attendance policies

Example Data:

School: UAB (ID 26341)
Curricular Highlights: "Four-year program emphasizes fundamentals underlying clinical practice..."
Research: "UAB medical students have options for engaging in research..."

Impact: MEDIUM-HIGH - Important for students choosing schools


4. πŸ’° Financial Data Exposure (Priority: MEDIUM)

The Opportunity:

  • 278 rows of financial data (tuition, costs)
  • Only 1 code reference - barely used

Potential Features:

  • Cost Calculator: Estimate total cost of attendance
  • Financial Comparison: Compare tuition across schools
  • Financial Aid Intel: Surface aid information

Impact: MEDIUM - Important factor for applicants


5. πŸ“Š MAT Data Breakdown (Priority: MEDIUM)

The Opportunity:

  • 1,376 rows of matriculant data by type: MATRIC, MDPHD, APPLIED, INTERVIEWED, BACMD, EDP, DEFERRED, EAP
  • 7 code references - minimal use

Data Available:

  • Applications by school (987,317 total across all schools!)
  • Interviews by school (98,650 total)
  • Matriculants by school (25,497 total)
  • MD/PhD acceptances (747 total)
  • Early decision/early assurance programs

Potential Features:

  • "What are my chances?" calculator: Show acceptance rates by program type
  • Interview likelihood: Based on historical interview numbers
  • Program insights: Display MD/PhD, BA/MD availability

Impact: MEDIUM - Competitive advantage for application planning


6. 🎯 State Pages for SEO (Priority: LOW-MEDIUM)

The Opportunity:

  • 59 state pages exist with content
  • Zero code references - not published

Potential Features:

  • State landing pages: "Medical Schools in California"
  • SEO juice: Rank for "[state] medical schools" searches
  • Local insights: State-specific application tips

Impact: LOW-MEDIUM - SEO benefit, low implementation cost


7. πŸ” RAG Expansion (Priority: MEDIUM)

The Opportunity:

  • Only 10 schools in RAG system vs 173 in main database
  • 9,662 content chunks suggest system works well

Potential Features:

  • Expand to all 173 schools for comprehensive semantic search
  • Integrate scraper analysis into RAG system (2,821 analyzed pages!)
  • Cross-school search: "Which schools emphasize primary care?"

Impact: MEDIUM - Better search/discovery for users


Cleanup Recommendations

High Priority

  1. Drop extensions schema entirely (5 empty/orphaned tables)

    • No code references
    • Appears to be leftover from migration
    • Action: Verify no dependencies, then DROP SCHEMA extensions CASCADE
  2. Consolidate or drop empty med_schools tables (15 tables, 0 rows)

    • Tables like med_school_acceptance, med_school_assessments, med_school_campuses, etc.
    • Either populate them or remove them
    • Action: Confirm these aren't used in imports/ETL, then drop
  3. Drop unused RAG ETL tables (3 tables)

    • etl_chunks, etl_runs, query_cache all empty
    • Action: DROP if not part of future pipeline

Medium Priority

  1. Archive or utilize scraper data

    • Don't delete! This is valuable, just unused
    • Action: Prioritize building features that use this data (see Opportunities #1)
  2. Investigate interview_questions table

    • Has 10 rows but no code references
    • Action: Check if this should be used or is redundant
  3. Clean up empty user_data tables

    • draft_comments, draft_shares, ps_feedback, relationships all empty
    • Action: Decide if these are planned features or can be dropped

Low Priority

  1. Consolidate rag.schools vs med_schools.medical_school
    • 641 code references to rag.schools vs 188 to medical_school
    • Only 10 schools in RAG vs 173 in main DB
    • Action: Investigate data model, consider consolidation or expansion

Schema Design Concerns

1. Data Duplication

  • rag.schools (10 rows) vs med_schools.medical_school (173 rows)
  • extensions.school_list vs user_data.school_list
  • Recommendation: Normalize or document the distinction

2. Naming Inconsistency

  • Most med_schools tables use medical_school_id FK
  • Some schemas use different conventions
  • Recommendation: Establish naming convention guide

3. Empty Table Proliferation

  • 25 completely empty tables across the database
  • Suggests features planned but not built
  • Recommendation: Either populate or remove to reduce schema noise

4. No Timestamps on Key Tables

  • Some med_schools tables lack created_at/updated_at
  • Hard to track data freshness
  • Recommendation: Add audit timestamps to all data tables

5. Scraper Data Isolation

  • Valuable scraper data is completely isolated from main app
  • No foreign key relationships to med_schools tables (wait, there is via medical_school_id!)
  • Recommendation: Integrate scraper analysis into school profiles

Recommended Next Steps

Immediate (This Week)

  1. βœ… Review this audit with stakeholders
  2. 🧹 Clean up extensions schema (5 empty tables, no refs)
  3. πŸ“‹ Prioritize scraper data features - biggest opportunity

Short Term (Next 2 Weeks)

  1. πŸ—οΈ Build "Student Life" tab on school profiles (172 schools of data ready)
  2. πŸ—οΈ Build "Curriculum" tab on school profiles (172 schools of data ready)
  3. 🧹 Drop 15 empty med_schools tables after verification

Medium Term (Next Month)

  1. πŸš€ Launch scraper content features (see Opportunities #1)
  2. πŸ’° Enhance financial comparison tools
  3. πŸ“Š Build "What are my chances?" calculator using MAT data
  4. 🎯 Publish state pages for SEO

Long Term (Next Quarter)

  1. πŸ” Expand RAG to all 173 schools (currently only 10)
  2. πŸ”— Integrate scraper analysis into RAG system
  3. πŸ“ Normalize schema (consolidate duplicates, establish conventions)
  4. πŸ—οΈ Build synthesis-based school comparison features

Data Quality Notes

Positive

  • βœ… Core tables (medical_school, activities, blog_posts) are well-maintained
  • βœ… RAG system has good volume (9,662 chunks)
  • βœ… Scraper data is fresh (June-July 2025)
  • βœ… User features are actively used (essays, school lists, tracking)

Concerns

  • ⚠️ Only 9 user profiles - early stage product
  • ⚠️ MAT data last updated: unknown (check freshness)
  • ⚠️ Only 10 schools in RAG vs 173 total
  • ⚠️ No last_updated timestamps on many tables

Appendix: Code Reference Methodology

Search performed:

cd ~/clawd/projects/medschools_ai/apps/medschools-ai
grep -r "<table_name>" --include="*.ts" --include="*.tsx" --include="*.svelte" \
  --exclude-dir=node_modules --exclude-dir=.svelte-kit --exclude-dir=dist | wc -l

Files searched: 545 TypeScript/Svelte files

Note: Reference counts include:

  • Direct SQL queries
  • Supabase client calls (.from('table_name'))
  • Type definitions
  • Comments mentioning table names

Limitation: Does not capture:

  • Server-side API calls outside this codebase
  • Database triggers/functions using tables
  • Manual SQL queries run by developers

Summary Statistics

Metric Value
Total Tables 104
Application Tables 70
Well-Utilized Tables 16 (23%)
Underutilized Tables 19 (27%)
Unused Tables 20 (29%)
System Tables 49 (47%)
Empty Tables 25
Tables with >1K rows 9
Largest Table med_school_mcat_scores (4,376)
Most Referenced Table rag.schools (641 refs)
Total Med School Records 173 schools
Total Content Chunks 9,662
Total Scraped Pages 2,821
Total User Profiles 9

End of Audit

For questions or clarifications, contact Atlas Reeves (backend-dev@widerwings.com)

Created: Wed, Mar 4, 2026, 11:01 PM by bob

Updated: Wed, Mar 4, 2026, 11:01 PM

Last accessed: Sat, Mar 28, 2026, 3:02 AM

ID: 8999ba2f-1110-4c16-a505-5a143a585fb8