Web Scraping and EDA in Python 3 using Requests, BeautifulSoup, Pandas, Matplotlib, Seaborn

In this jupyter notebook I documented web scraping and exploratory data analysis using Python 3.

The process is as follows:

use Python's requests and bs4 libraries to scrape a webpage
load the scraped data into a pandas dataframe
do some basic exploratory data anlaysis on the dataframe
basic visualization

For this article's purpose I scraped population data of New Zealand as of June 30, 2018 from http://citypopulation.de website.

Source:

Thomas Brinkhoff: City Population, http://www.citypopulation.de

Please note there might be some policies and rules for a website for using the data. So before you do the web scraping please do not forget to read the data usage policies.

Data use policy: http://citypopulation.de/termsofuse.html (DATA -> Population Data)

The data that I will be extracting in this jupyter notebook is for Oceania -> NEW ZEALAND http://citypopulation.de/en/newzealand/

I am only scraping data for North and South Islands (excluded Chatham islands)
North island: http://citypopulation.de/en/newzealand/northisland/
South island: http://citypopulation.de/en/newzealand/southisland/

Final note

Please note that I documented this jupyter notebook for a simple way to explain how useful it is to scrape website data using requests and BeautifulSoup (bs4) libraries.

Feel free to do your own exploration and analysis on the data that is scraped.

I hope that this piece of work will give readers some basic understanding of web scraping and intuition behind the analysis, which is:

from scraping data to end-to-end data analysis (if I might say in simple words)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
Web_scraping_with_BeautifulSoup.ipynb		Web_scraping_with_BeautifulSoup.ipynb
table.png		table.png
terms.png		terms.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping and EDA in Python 3 using Requests, BeautifulSoup, Pandas, Matplotlib, Seaborn

Source:

Source:

Thomas Brinkhoff: City Population, http://www.citypopulation.de

The data that I will be extracting in this jupyter notebook is for Oceania -> NEW ZEALAND http://citypopulation.de/en/newzealand/

Final note

About

Releases

Packages

Languages

py404/Web-Scraping-and-EDA-in-Python3

Folders and files

Latest commit

History

Repository files navigation

Web Scraping and EDA in Python 3 using Requests, BeautifulSoup, Pandas, Matplotlib, Seaborn

Source:

Source:

Thomas Brinkhoff: City Population, http://www.citypopulation.de

The data that I will be extracting in this jupyter notebook is for Oceania -> NEW ZEALAND http://citypopulation.de/en/newzealand/

Final note

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages