Basics on Web Scraping 🔎

In this project/tutorial we introduce some basics on web scraping. This technique allows us to retrieve data from websites. This can be especially very handy when there are no datasets available or/and it is not possible to obtain such data via APIs.

In our example it is possible to see how to obtain data from different websites having a basic knowledge on HTML and using some Python, and Python packages.

The notebook explains the technique step-by-step using an entertainment example. However, this technique is applicable to any case where obtaining (extra) data is necessary.

A blog was also written with the same content in a more compact presentation.

🔧 Tools

requests: HTTP library for Python that allows us to send HTTP requests in a simple way.
Beautiful Soup: Python library for pulling data out of HTML and XML files. It allows us to access easily the information we need to retrieve.

Some knowledge on lists, list comprehension, and string methods is also very helpful.

💻 Install requirements

Install requirements using pip install -r requirements.txt.
- Make sure you use Python 3.
- You may want to use a virtual environment for this.

◀️ Back to repository main page

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
notebooks		notebooks
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basics on Web Scraping 🔎

🔧 Tools

💻 Install requirements

About

Releases

Packages

Languages

License

MKB-Datalab/basics_web_scraping

Folders and files

Latest commit

History

Repository files navigation

Basics on Web Scraping 🔎

🔧 Tools

💻 Install requirements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages