A tiny GitHub searcher using the GitHub API. No authentication required. Nothing fancy, just a tiny searcher that allows you to search for GitHub repositories given a set of keywords and returns a CSV file with the results. Simple parser and filterer included.
- Clone the repository 💻
git clone https://github.com/wdika/tiny-github-searcher.git
- Install dependencies ⚙️
pip install -r requirements.txt
- Run the searcher 🏃
This will take long time, since it will search for all the repositories that match the keywords given. It also halts for 5 minutes when getting a 403 error, to escape the GitHub API rate limit. Please be patient.
python src/searcher.py {keywords}
- Enjoy the results! 😃
- Clean the results 🔍
This will clean the results file, removing duplicates and removing dead repositories (those that return a 404 error when trying to access them). This will take a while, since it will try to access all the repositories that were found. Please be patient.
python src/cleaner.py results/github_repositories_overall.csv
- Parse the results 📝
This will parse the results file, extracting the About, Stars, Forks, Language and Keywords fields from every repository. This will take a while, since it will try to access all the repositories that were found. Please be patient.
python src/parser.py results/github_repositories_overall_clean.csv
- Filter the results 📌
This will filter the parsed results file, removing empty, nans, false characters, etc from the About, Stars, Forks, Language and Keywords fields.
python src/filterer.py results/github_repositories_overall_analytical.csv
- That's it! You can now use the results/github_repositories_overall_analytical_filtered.csv file to do whatever you want.
./scripts/search_github_repositories.sh
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
If you have any question or suggestion, please do not hesitate to contact me at dimitriskarkalousos@gmail.com