MassTimes Church Directory Scraper

Overview

A Python scraper that extracts church information and mass timings from masstimes.org. It iterates through US ZIP codes to substitute for the missing site-wide search, runs in parallel for speed, and stores results as JSON files synced to Google Drive. Outcome: Produced a complete, structured dataset of US churches and mass times that powers the client's downstream directory product.

Architecture & Pipeline

flowchart LR
    n0["
US ZIP Codes
Search substitute
"] n1["
masstimes.org
Source site
"] n2["
Parallel Scrapers
Python · Requests
"] n3["
Rotating Proxies
Rate-limit safety
"] n4["
JSON per Church
Mass times + metadata
"] n5["
Google Drive Sync
Final dataset
"] n0 --> n1 n1 --> n2 n2 --> n3 n3 --> n4 n4 --> n5 classDef step0 fill:#f1f5f9,stroke:#64748b,color:#1e293b,stroke-width:2px,rx:10,ry:10; classDef step1 fill:#ecfeff,stroke:#06b6d4,color:#1e293b,stroke-width:2px,rx:10,ry:10; classDef step2 fill:#f0fdfa,stroke:#0d9488,color:#1e293b,stroke-width:2px,rx:10,ry:10; classDef step3 fill:#ecfdf5,stroke:#10b981,color:#1e293b,stroke-width:2px,rx:10,ry:10; classDef step4 fill:#fffbeb,stroke:#f59e0b,color:#1e293b,stroke-width:2px,rx:10,ry:10; class n0 step0; class n1 step1; class n2 step2; class n3 step2; class n4 step3; class n5 step4;

End-to-end flow derived from this project's scope and tech stack. Tap View Fullscreen for a larger view, or scroll horizontally on small screens.

Key Features

  • ZIP-code-driven search to bypass missing site search
  • Parallel processing for fast nationwide coverage
  • JSON output with Google Drive sync
  • Rotating proxies to avoid rate limiting
  • Scheduled runs from an Ubuntu server
  • Tech Stack:** Python, Requests, BeautifulSoup, Linux