Skip to content

🏐 Command-line tool to scrape volleyball statistics from Data Project Web Competition websites

License

Notifications You must be signed in to change notification settings

claromes/volleystats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Volley Stats

PyPI PyPI

Command-line tool to scrape volleyball statistics from Data Project Web Competition websites.

Volley Stats facilitates the export of data in CSV format of volleyball matches and competitions organized by entities that use Data Project WCM. The tool streamlines the collection of individual matches, match lists, and automates the retrieval of individual match data from the competition matches list.

Additionally, it documents the structure of URLs for Web Competition websites, simplifying the search for identifiers (mID, ID, PID), and also supplies acronyms for the main entities utilizing Data Project Management.

This tool is not affiliated with Genius Sports Italy.

Installation

Requirement

  • Python 3.8+
pip install volleystats

Documentation

Extracted Data

  • Competition

    • Competition ID
    • Home Team
    • Guest Team
    • Home Points
    • Guest Points
    • Date
    • Stadium
  • Match

    • Match ID
    • Match date
    • Home Team
    • Guest Team
    • Coach
    • Stadium
    • Total Points
    • Break Points
    • Win-Lost
    • Total Serves
    • Serve Erros
    • Serve Points
    • Total Receptions
    • Reception Erros
    • Positive Pass Percentage (Pos%)
    • Excellent/ Perfect Pass Percentage (Exc.%)
    • Total Attacks
    • Attack Erros
    • Blocked Attack
    • Attack Points (Exc.)
    • Attack Points Percentage (Exc.%)
    • Block Points

Usage

volleystats [--help] --fed FED (--match MATCH | --comp COMP | --batch CSV_FILE_PATH) [--pid PID] [--log]
  • --fed, -f: Federation Acronym (required)
  • --match, -m: Statistics of a single match (required, unless --comp or --batch are provided)
  • --comp, -c: List of matches in a competition (required, unless --match or --batch are provided)
  • --pid, -p: PID of the competition (optional, only when --comp is provided)
  • --batch, -b: CSV file path with Match IDs (Competition Matches output) (required, unless --match or --comp are provided)
  • --log, -l: View the logging during scraping
  • --help, -h: Show help message

Match

volleystats --fed FED --match MATCH

Examples

  • Brazilian Volleyball Confederation

  • Lithuanian Volleyball Federation

Competition Matches

volleystats --fed FED --comp COMP

Example

Competition Matches with PID

In some competitions, PID can be used to distinguish between seasons, such as regular season and playoffs. Therefore, it is necessary to submit this value to obtain statistics separately.

volleystats --fed FED --comp COMP --pid PID

Examples

Matches via Competition Matches file

volleystats --fed FED --batch CSV_FILE_PATH

Example

  • Brazilian Volleyball Confederation
    • Data Project website: https://cbv-web.dataproject.com/MatchStatistics.aspx?mID=ID
    • Federation Acronym: CBV
    • CSV file path (output of the Competition Matches): data/cbv-18-2022-2023-competition-matches.csv
    • Command: $ volleystats --fed cbv --batch data/cbv-18-2022-2023-competition-matches.csv
    • Output files:
      data/cbv-1623-22-10-28-guest-baruerivolleyballclub.csv
      data/cbv-1623-22-10-28-home-fluminense.csv
      data/cbv-1618-2022-11-01-guest-energis8sãocaetano.csv
      data/cbv-1618-2022-11-01-home-esporteclubepinheiros.csv
      data/cbv-1619-2022-11-01-guest-abelmodavolei.csv
      data/cbv-1619-2022-11-01-home-gerdauminas.csv
      ...
      

Help

volleystats --help

Log

volleystats --fed FED (--match MATCH | --comp COMP | --batch CSV_FILE_PATH) --log

Output messages

                    .
                    |`.
                    |  `.
                    |-_  `.
                    |  -_  `._
____________________|____-_ _|_______________,
',                         -_|                ',
  ',                         |                  ',
    ',                       |                    ',
      ',_____________________|______________________',

volleystats: started
volleystats: data/cbv-1623-22-10-28-home-fluminense.csv file was created
volleystats: data/cbv-1623-22-10-28-guest-baruerivolleyballclub.csv file was created
volleystats: finished

Data Project Web Competition URLs structure

  • Hostname: <Fed_Acronym>-web.dataproject.com

  • Pathnames and search parameters:

    • /MainHome

    • /History?ID=<Fed_ID>

    • /CompetitionHome?ID=<Category_ID> (could be Women, Men, Pro or Youth, e.g.)

    • /CompetitionMatches?ID=<Competition_ID>&PID=<PID> (PID could be regular season or playoffs, e.g.)

    • /MatchStatistics?mID=<Match_ID>&ID=<Competition_ID>

Federations, Confederations and Leagues Acronyms

European Volleyball

South American Volleyball

Troubleshooting

Match files collected from batch file

In some cases, empty files may be returned, usually named as <fed_acronym>-<match_id>-guest_stats.csv and <fed_acronym>-<match_id>-home_stats.csv. This can happen due to the hiding of a match in the competition listing, either because it was canceled or incorrectly entered. The match is hidden from view, but it remains accessible in the HTML, causing the tool to return an empty file. In such cases, simply ignore and delete this file.

It can also happen that the data is only available in PDF, which makes scraping impossible.

Development

$ git clone git@github.com:claromes/volleystats.git

$ cd volleystats

$ pip install -r requirements.txt

$ pip install --editable .

Author

Claromes