Charleston Diocese Directory Scraper

Overview

A Python scraper that collects church information, mass timings, and clergy assignments from directory.charlestondiocese.org. Parallelization is used to handle the large directory, and church images are downloaded and renamed per the client's naming convention. Outcome: Delivered a clean, ready-to-use dataset and image library for the client's diocesan directory product.

Architecture & Pipeline

flowchart LR
    n0["directory.charlestondiocese.org
Source directory"]
    n1["Parallel Scrapers
Python · BeautifulSoup"]
    n2["Extract Profiles
Mass times · clergy assignments"]
    n3["Image Pipeline
Client naming convention"]
    n4["JSON Output
Structured records"]
    n5["Google Drive Sync
Delivery"]
    n0 --> n1
    n1 --> n2
    n2 --> n3
    n3 --> n4
    n4 --> n5
classDef step0 fill:#f1f5f9,stroke:#64748b,color:#1e293b,stroke-width:2px,rx:10,ry:10;
classDef step1 fill:#ecfeff,stroke:#06b6d4,color:#1e293b,stroke-width:2px,rx:10,ry:10;
classDef step2 fill:#f0fdfa,stroke:#0d9488,color:#1e293b,stroke-width:2px,rx:10,ry:10;
classDef step3 fill:#ecfdf5,stroke:#10b981,color:#1e293b,stroke-width:2px,rx:10,ry:10;
classDef step4 fill:#fffbeb,stroke:#f59e0b,color:#1e293b,stroke-width:2px,rx:10,ry:10;
    class n0 step0;
    class n1 step1;
    class n2 step2;
    class n3 step2;
    class n4 step3;
    class n5 step4;

End-to-end flow derived from this project's scope and tech stack. Tap View Fullscreen for a larger view, or scroll horizontally on small screens.

Key Features

Full extraction of church profiles and mass schedules
Image extraction with client-specific naming convention
JSON output synced to Google Drive
Rotating proxies for stable, parallel collection
Tech Stack:** Python, Requests, BeautifulSoup

Overview

Architecture & Pipeline

Key Features

More Web Scraping projects

Parts Data Aggregation Platform (50M+ Records)

Frontend UI for Amazon Scraper

Backend API Extraction & Automation