We’re music-to-scrape, a fictitious music streaming service with a real website and API. Built for educational purposes, you can use us to learn web scraping!
Head over to https://music-to-scrape.org to view our live website, or directly check out our API documentation.
The easiest way to run our project is using Docker.
- Install Docker and clone this repository.
- Open the terminal at the repository's root directory and run the following commands:
docker compose build
anddocker compose up
. - Wait a bit for the website and API to be launched. If the process breaks, you likely haven't allocated enough memory (e.g., the built takes about 6 GB of memory)
- Once docker has been launched, you can access the website and API locally at these addresses:
- API:
http://localhost:8080
(whereby localhost typically is127.0.0.1
) - Front end:
http://localhost:8000
- API:
- Press Ctrl + C in the terminal to quit.
If you're running this project publicly, it's worthwhile configuring HTTPS access on your server. Following the notes here.
TLDR:
- If you have already built the image, it's enough to start it with
docker compose up -d
. - If you want to rebuild, use
docker compose build
first. - Unsure whether a docker container with the site is running already? Check with
docker ps
; stop unnecessary images usingdocker stop IMAGEID
.
- Clone this repository
- Ensure you have R installed, and run
simulate.R
insrc/simulate
to generate the fictitious data. - Install required Python packages
pip install fastapi
pip install fastapi_utils
pip install sqlalchemy
pip install pydantic
pip install uvicorn
pip install gunicorn
pip install flask
pip install flask_sqlalchemy
- Open terminal
- Go to the
sql_app
folder inside the repository - Run the following command:
uvicorn main:app --port 8080
- If you want to the FastAPI connection, press Ctrl + C in the terminal to quit.
- If you want to check the documentation, you can go to following address when uvicorn is started:
http://127.0.0.1:8080/docs
uvicorn
will show you which link is used when running the application.
- Open terminal
- Go to the
flask_app
folder inside the repository - Run the following command:
gunicorn app:app --bind 127.0.0.1:8000
- If you want to the Flask connection, press Ctrl + c in the terminal to quit.
- Open the
simulate.R
file within thesrc/simulate
folder - Make your adjustments
- Run the complete file top-down, and the databases will be updated.
- Raw data with song names used in simulating the data: https://github.com/mahkaila/songnames
- Featured artist on the landing page: A trumpet player from Caracas, Venezuela. Credits to Ana Maria Arevalo Gosen for providing the photo.
- Album art by dicebear.com, thumbs & shapes libraries
- Avatars by dicebear.com, avataars library