Amazon Price Comparison Scraper

Overview

A Python-based scraper that monitors selected product categories on Amazon and compares prices against the client's other sources. The script runs every four hours, exports a clean comparison spreadsheet, and delivers it to the client by email or shared cloud storage. Outcome: Replaced manual price tracking with a hands-off pipeline that produces fresh, decision-ready price reports six times a day.

Architecture & Pipeline

flowchart LR
    n0["
Scheduler (4 h)
Recurring runs
"] n1["
Amazon
Source listings
"] n2["
Selenium Scrape
Names · IDs · prices · images
"] n3["
Compare Prices
Vs client sources
"] n4["
JSON / Excel + Images
Structured output
"] n5["
Email / Drive Delivery
Client report
"] n0 --> n1 n1 --> n2 n2 --> n3 n3 --> n4 n4 --> n5 classDef step0 fill:#f1f5f9,stroke:#64748b,color:#1e293b,stroke-width:2px,rx:10,ry:10; classDef step1 fill:#ecfeff,stroke:#06b6d4,color:#1e293b,stroke-width:2px,rx:10,ry:10; classDef step2 fill:#f0fdfa,stroke:#0d9488,color:#1e293b,stroke-width:2px,rx:10,ry:10; classDef step3 fill:#ecfdf5,stroke:#10b981,color:#1e293b,stroke-width:2px,rx:10,ry:10; classDef step4 fill:#fffbeb,stroke:#f59e0b,color:#1e293b,stroke-width:2px,rx:10,ry:10; class n0 step0; class n1 step1; class n2 step2; class n3 step2; class n4 step3; class n5 step4;

End-to-end flow derived from this project's scope and tech stack. Tap View Fullscreen for a larger view, or scroll horizontally on small screens.

Key Features

  • Scheduled execution every four hours with full logging
  • Structured output in JSON and Excel for easy downstream use
  • Image extraction with PNG/JPG export
  • Automatic delivery via Google Drive and email
  • Deployed on an Ubuntu remote desktop server for reliability
  • Tech Stack:** Python, Selenium, BeautifulSoup, Pandas, MongoDB, Linux