Process Status Scrape

Year

2024

Tech & Technique

Python, Selenium, Pandas, Docker, AWS, Google Sheets

Description

A web scraper for a property regularization company (Aprovcon) that collects up-to-date statuses of various construction processes from government websites and saves all the statuses in a Google Sheets spreadsheet.

Key Features:

🤖 Automated Status Tracking: Automatically retrieves the current status of construction and regularization processes from official government sources.
🏛️ Government Data Integration: Seamlessly connects and extracts relevant information from various government websites for up-to-date insights.
🔄 Real-time Data Updates: Ensures statuses are always current, providing timely information on process progression.
📊 Google Sheets Integration: Stores and organizes all collected status data directly into a centralized Google Sheets spreadsheet for easy access and analysis by the Aprovcon team.
✅ Streamlined Regularization Workflow: Automates manual tracking tasks, significantly enhancing the efficiency and speed of property regularization for Aprovcon.

My Role

Web Scraper

🤖 Automated Web Scraping: Designed, developed, and deployed a robust web scraper to automate the collection of real-time construction process statuses from diverse government websites for Aprovcon.
🛡️ CAPTCHA Evasion: Successfully engineered and implemented techniques to bypass multiple, varied CAPTCHA systems, ensuring uninterrupted data retrieval operations.
🔓 Persistent Data Access: Overcame restrictive access protocols and dynamic content challenges on government platforms by identifying and leveraging specific system behaviors to ensure consistent and reliable information gathering.
📊 Data Integration & Reporting: Automated the process of parsing, structuring, and saving all collected statuses into a Google Sheets spreadsheet, providing Aprovcon with an up-to-date and accessible data source for tracking property regularization processes.