Property Data Scraper

This is the codebase for a web crawler that I have built to extract the residential listings available for renting in Singapore.

Usage

This web crawler can currently extract the following metrics of each listings.

Property Name
Property Type
Address
Number of Beds
Nunber of Bathroom
Rental
Rental per square feet
Completion Year
Total Unit of the Project
Building Tenure (Freehold or Leasehold)
Amenities
Key Details (Rules set by landlord and etc)
Nearest MRT station and distance

Prerequisite

Basic Understanding in HTML Structure
Python
Basic Understanding of Shell command

Installation

This scraper was built using the scrapy framework in Python 3.7. You are required to install scrapy in order to run the scraper.

Pip:

pip install Scrapy

Conda:

conda install -c conda-forge scrapy

Otherwise, you can download the requirement.txt file and use the following code to install the packages stated.

$ while read requirement; do conda install --yes $requirement; done < requirements.txt

$ while read requirement; do conda install --yes $requirement || pip install $requirement; done < requirements.txt

Project Structure

.
├── properties_data_2019-07-12T16-19-53.json
├── README.md
├── scrapy.cfg
└── sgpropbot
    ├── __init__.py
    ├── items.py
    ├── middlewares.py
    ├── pipelines.py
    ├── __pycache__
    ├── settings.py
    └── spiders
        ├── __init__.py
        ├── properties.py
        ├── __pycache__
        ├── run.log
        └── scrapy_shell_test_linkextractor.txt

Running the Crawler

In order to run the crawler, please navigate to the sgpropbot/spiders folder and run the following command:

scrapy crawl properties

Or

scrapy runspider properties.py

Built With

Scraped Data Output

The data output was set as JSON file format by default. However, this can be easily changed by amending the setting.py. I have provided a sample of data output here.

Next Up

Following up project, I will use the scraped data to:

Provide a walkthrough on how to export data into database like MySQL or PostgreSQL
Conduct an exploratory data analysis
Make an interactive dashboard
Building a rental prediction model

Resources

The most essential part of building a good scraper is to have a good understanding of the website layout so that you are able to extract the right item. The most effective way is through a selector (CSS or XPath) and for specific text extraction, you have to familiar with regex.

Cheatsheet

Regex
Xpath

Books

Web Scraping with Python: Collecting More Data from the Modern Web

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Property Data Scraper

Usage

Prerequisite

Installation

Project Structure

Running the Crawler

Built With

Scraped Data Output

Next Up

Resources

Cheatsheet

Books

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
sgpropertybot		sgpropertybot
LICENSE		LICENSE
README.md		README.md
properties_data_2019-07-12T16-19-53.json		properties_data_2019-07-12T16-19-53.json
requirements.txt		requirements.txt

License

josephkokchin/SG-Property-Scraper

Folders and files

Latest commit

History

Repository files navigation

Property Data Scraper

Usage

Prerequisite

Installation

Project Structure

Running the Crawler

Built With

Scraped Data Output

Next Up

Resources

Cheatsheet

Books

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages