🧠 All Projects

← All Memories

📋

Web Scraping Use Cases & Infrastructure Assessment

P2 - Medium

Context WiderWings

Use Cases

1. School Data Enrichment

Scrape official school pages: admissions, research, faculty, curriculum, scholarships, expenses, student life
Purpose: Beef up school pages, context for secondaries/interviews/school selection
Note: Already have 2,821 scraped pages in MedSchools.ai scraper schema
Frequency: Periodic bulk scrapes (quarterly?)

2. Community Content Monitoring

Sources: Reddit (r/premed, r/MCAT), SDN forums
Content: News, applicant sentiment, results, interview experiences
Better approach: Reddit API or /last30days skill vs raw scraping
Frequency: Ongoing/daily

3. YouTube/Competitive Content Aggregation

Target: Hundreds of medschool how-to videos, application review videos
Purpose: Build knowledge base
Method: YouTube API for metadata + Whisper for transcripts

4. Influencer Discovery

Find: Applicants, students, coaches in medschool space
Platforms: YouTube, TikTok, Instagram, Twitter, SDN
Purpose: Database for marketplace outreach

Infrastructure Decision: Build vs Buy

Firecrawl Pricing

Free: 500 pages (one-time)
Hobby: $16/mo for 3K pages
Standard: $83/mo for 100K pages

Recommendation

Start with Playwright on Mark machine (free)

Most .edu sites have no anti-bot protection
Low-to-medium volume needs
Use Firecrawl only if blocked (captchas, IP bans)

Agent Assignment

Mark: Web scraping grunt work, long-running jobs
Uses spare compute on his laptop without blocking other work

Created: Mon, Feb 16, 2026, 1:10 PM by bob

Updated: Mon, Feb 16, 2026, 1:10 PM

Last accessed: Sun, Aug 2, 2026, 4:30 PM

ID: fb31f26f-7188-4741-9431-736b39f70820