Skip to content

agusticonesagago/Github-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage

About The Project

This project will allow you to crawl information from the GitHub search engine to obtain the urls of the first search page. On the other hand, if you search for repositories, it will also give you the author and the languages used for each of them.

(back to top)

Built With

This project uses the following language:

Getting Started

In this section you will find what you need to set up the project locally. To get a local copy up and running follow these simple example steps.

Prerequisites

You need Python 3 and install the requirements that are in the file requirements.txt .

  • npm
    npm install requests~=2.27.1
    npm install beautifulsoup4~=4.11.1
    npm install pytest~=7.1.2

Installation

  1. Download Python 3
  2. Clone the repo
  3. Install NPM packages
    pip install -r requirements.txt
    
    

Usage

In order to see the correct operation of the project, you have to enter the data in input_data file. The output will be printed by terminal, although in future versions, it could be added to an output file. This last part depends more on the context of use.

Example 1

  • Input
{
  "keywords": [
    "openstack",
    "nova",
    "css"
  ],
  "proxies": [
    "194.126.37.94:8080",
    "13.78.125.167:8080"
  ],
  "type": "Repositories"
}
  • Output
[
  {
    "url": "https://github.com/atuldjadhav/DropBox-Cloud-Storage",
    "extra": {
      "owner": "atuldjadhav",
      "language_stats": {
        "CSS": 52,
        "JavaScript": 47.2,
        "HTML": 0.8
      }
    }
  }
]

(back to top)

Example 2

  • Input
{
  "keywords": [
    "python",
    "django-rest-framework",
    "jwt"
  ],
  "proxies": [
    "194.126.37.94:8080",
    "13.78.125.167:8080"
  ],
  "type": "Issues"
}
  • Output
[
  {
    "url": "https://github.com/GetBlimp/django-rest-framework-jwt"
  },
  {
    "url": "https://github.com/lock8/django-rest-framework-jwt-refresh-token"
  },
  {
    "url": "https://github.com/City-of-Helsinki/tunnistamo"
  },
  {
    "url": "https://github.com/chessbr/rest-jwt-permission"
  },
  {
    "url": "https://github.com/rishabhiitbhu/djangular"
  },
  {
    "url": "https://github.com/vaibhavkollipara/ChatroomApi"
  }
]

_For more examples and information, please refer to the Documentation

Releases

No releases published

Packages

No packages published

Languages