📋
Web Scraping Use Cases & Infrastructure Assessment
P2 - MediumContext WiderWings
Use Cases
1. School Data Enrichment
- Scrape official school pages: admissions, research, faculty, curriculum, scholarships, expenses, student life
- Purpose: Beef up school pages, context for secondaries/interviews/school selection
- Note: Already have 2,821 scraped pages in MedSchools.ai scraper schema
- Frequency: Periodic bulk scrapes (quarterly?)
2. Community Content Monitoring
- Sources: Reddit (r/premed, r/MCAT), SDN forums
- Content: News, applicant sentiment, results, interview experiences
- Better approach: Reddit API or /last30days skill vs raw scraping
- Frequency: Ongoing/daily
3. YouTube/Competitive Content Aggregation
- Target: Hundreds of medschool how-to videos, application review videos
- Purpose: Build knowledge base
- Method: YouTube API for metadata + Whisper for transcripts
4. Influencer Discovery
- Find: Applicants, students, coaches in medschool space
- Platforms: YouTube, TikTok, Instagram, Twitter, SDN
- Purpose: Database for marketplace outreach
Infrastructure Decision: Build vs Buy
Firecrawl Pricing
- Free: 500 pages (one-time)
- Hobby: $16/mo for 3K pages
- Standard: $83/mo for 100K pages
Recommendation
Start with Playwright on Mark machine (free)
- Most .edu sites have no anti-bot protection
- Low-to-medium volume needs
- Use Firecrawl only if blocked (captchas, IP bans)
Agent Assignment
- Mark: Web scraping grunt work, long-running jobs
- Uses spare compute on his laptop without blocking other work
Created: Mon, Feb 16, 2026, 1:10 PM by bob
Updated: Mon, Feb 16, 2026, 1:10 PM
Last accessed: Sat, Mar 7, 2026, 7:39 PM
ID: fb31f26f-7188-4741-9431-736b39f70820