Skip to content

Code for parsing 2k+ HTML pages of UofT work study job postings (Aug-Sept 2024); see releases for resulting database files (.db, .sql, .json, .csv, .xlsx)

License

Notifications You must be signed in to change notification settings

sadmanca/uoft-work-study-jobs-2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UofT Work Study Jobs Database (2024-2025 Fall/Winter)

This repository contains the code for parsing 2000+ HTML pages of work study job postings at the University of Toronto (UofT) from August to September 2024 to a single sqlite3 database (.db) file.

Releases

See releases to access the latest sqlite3 database file (.db) with work study job postings. Other formats (.json, .sql, .csv, .xlsx) are also available for downloading.

File Structure

  • parse_folder_to_db.py: main Python script used for parsing the HTML pages in a folder recursively.
  • requirements.txt: required imports

Getting Started

To get started with this project, clone the repository and install the necessary Python dependencies.

git clone https://github.com/sadmanca/uoft-work-study-jobs-2024.git
cd uoft-work-study-jobs-2024
pip install -r requirements.txt

Usage

To parse the HTML pages and store the data in a database, run the parse_folder_to_db.py script.

python parse_folder_to_db.py --directory <<DIR_WITH_HTML_FILES>>

License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

Code for parsing 2k+ HTML pages of UofT work study job postings (Aug-Sept 2024); see releases for resulting database files (.db, .sql, .json, .csv, .xlsx)

Topics

Resources

License

Stars

Watchers

Forks

Languages