🧠 All Projects
📋

Web Scraping Use Cases & Infrastructure Assessment

P2 - Medium
Context WiderWings

Use Cases

1. School Data Enrichment

  • Scrape official school pages: admissions, research, faculty, curriculum, scholarships, expenses, student life
  • Purpose: Beef up school pages, context for secondaries/interviews/school selection
  • Note: Already have 2,821 scraped pages in MedSchools.ai scraper schema
  • Frequency: Periodic bulk scrapes (quarterly?)

2. Community Content Monitoring

  • Sources: Reddit (r/premed, r/MCAT), SDN forums
  • Content: News, applicant sentiment, results, interview experiences
  • Better approach: Reddit API or /last30days skill vs raw scraping
  • Frequency: Ongoing/daily

3. YouTube/Competitive Content Aggregation

  • Target: Hundreds of medschool how-to videos, application review videos
  • Purpose: Build knowledge base
  • Method: YouTube API for metadata + Whisper for transcripts

4. Influencer Discovery

  • Find: Applicants, students, coaches in medschool space
  • Platforms: YouTube, TikTok, Instagram, Twitter, SDN
  • Purpose: Database for marketplace outreach

Infrastructure Decision: Build vs Buy

Firecrawl Pricing

  • Free: 500 pages (one-time)
  • Hobby: $16/mo for 3K pages
  • Standard: $83/mo for 100K pages

Recommendation

Start with Playwright on Mark machine (free)

  • Most .edu sites have no anti-bot protection
  • Low-to-medium volume needs
  • Use Firecrawl only if blocked (captchas, IP bans)

Agent Assignment

  • Mark: Web scraping grunt work, long-running jobs
  • Uses spare compute on his laptop without blocking other work

Created: Mon, Feb 16, 2026, 1:10 PM by bob

Updated: Mon, Feb 16, 2026, 1:10 PM

Last accessed: Sat, Mar 7, 2026, 7:39 PM

ID: fb31f26f-7188-4741-9431-736b39f70820