MassTimes Church Directory Scraper

Overview

A Python scraper that extracts church information and mass timings from masstimes.org. It iterates through US ZIP codes to substitute for the missing site-wide search, runs in parallel for speed, and stores results as JSON files synced to Google Drive. Outcome: Produced a complete, structured dataset of US churches and mass times that powers the client's downstream directory product.

Architecture & Pipeline

flowchart LR
    n0["US ZIP Codes
Search substitute"]
    n1["masstimes.org
Source site"]
    n2["Parallel Scrapers
Python · Requests"]
    n3["Rotating Proxies
Rate-limit safety"]
    n4["JSON per Church
Mass times + metadata"]
    n5["Google Drive Sync
Final dataset"]
    n0 --> n1
    n1 --> n2
    n2 --> n3
    n3 --> n4
    n4 --> n5
classDef step0 fill:#f1f5f9,stroke:#64748b,color:#1e293b,stroke-width:2px,rx:10,ry:10;
classDef step1 fill:#ecfeff,stroke:#06b6d4,color:#1e293b,stroke-width:2px,rx:10,ry:10;
classDef step2 fill:#f0fdfa,stroke:#0d9488,color:#1e293b,stroke-width:2px,rx:10,ry:10;
classDef step3 fill:#ecfdf5,stroke:#10b981,color:#1e293b,stroke-width:2px,rx:10,ry:10;
classDef step4 fill:#fffbeb,stroke:#f59e0b,color:#1e293b,stroke-width:2px,rx:10,ry:10;
    class n0 step0;
    class n1 step1;
    class n2 step2;
    class n3 step2;
    class n4 step3;
    class n5 step4;

End-to-end flow derived from this project's scope and tech stack. Tap View Fullscreen for a larger view, or scroll horizontally on small screens.

Key Features

ZIP-code-driven search to bypass missing site search
Parallel processing for fast nationwide coverage
JSON output with Google Drive sync
Rotating proxies to avoid rate limiting
Scheduled runs from an Ubuntu server
Tech Stack:** Python, Requests, BeautifulSoup, Linux

Overview

Architecture & Pipeline

Key Features

More Web Scraping projects

Parts Data Aggregation Platform (50M+ Records)

Frontend UI for Amazon Scraper

Backend API Extraction & Automation