Skip to content

Latest commit

 

History

History
227 lines (181 loc) · 7.51 KB

README.md

File metadata and controls

227 lines (181 loc) · 7.51 KB

PyBibTextTools

  • SpringerCsv2Bib
  • GetAbstract
  • BibFilesMerge

Dependencies

Run this command to install the dependencies:

  • pip install -r requirements.txt

SpringerCsv2Bib

Convert Springer CSV file to Bibtext file

Run

foo@bar:~$ python SpringerCsv2Bib.py -h
usage: SpringerCsv2Bib.py [-h] -c CSVFILENAME -b BIBFILENAME

optional arguments:
  -h, --help            show this help message and exit
  -c CSVFILENAME, --csvFileName CSVFILENAME
                        CSV file name
  -b BIBFILENAME, --bibFileName BIBFILENAME
                        BibText file name

Example

foo@bar:~$ python SpringerCsv2Bib.py -c "Springer.csv" -b "Springer.bib"

File founded: Springer.csv
Processed: 590
Removid without author: 5
Total Final: 585
Saved file: Springer.bib

GetAbstract

This tools get abstract on digital library, its a Magver function, not official.

Obs 1: ACM need use limit parameter, because ACM blocks if you get many abstracts same time.

Obs 2: In my case, I use proxy, because I access in my house by proxy of my university.

Run

foo@bar:~$ python GetAbstract.py -h
usage: GetAbstract.py [-h] -d {springer,acm,ieee} -f BIBFILENAME [-p PROXY]
                      [-l LIMIT]

optional arguments:
  -h, --help            show this help message and exit
  -d {springer,acm,ieee}, --database {springer,acm,ieee}
                        select database
  -f BIBFILENAME, --bibFileName BIBFILENAME
                        Springer bibFile name
  -p PROXY, --proxy PROXY
                        internet proxy, ex:
                        https://john:password123@palmas.pucrs.br:4001
  -l LIMIT, --limit LIMIT
                        abstract load limit

Example

foo@bar:~$ python GetAbstract.py -d acm -f "ACM.bib" -l 10 -p https://peter:mypass@palmas.pucrs.br:4001

Had Abstract: 85
Url errors: 0
Loaded Abstract: 10
Total Entries: 585
Limit to process: 10
Processed: 95
Left: 490

or

foo@bar:~$ python GetAbstract.py -d springer -f "Springer.bib"

Had Abstract: 85
Url errors: 0
Loaded Abstract: 10
Total Entries: 585
Limit to process: 10
Processed: 95
Left: 490

BibFilesMerge

Merge BibTex files and:

  • remove duplicate entries
  • in some cases merge information before removing duplicates
  • remove entries that not have:
    • author or
    • title or
    • year or
    • journal name or conference name

This tool has been tested with these digital library files:

  • ACM Digital Library
  • IEEE Xplore
  • Scopus
  • SpringerLink
  • ScienceDirect - ElsevierWeb of Science
  • Web of Science (thanks @dineiar for this)

Run

foo@bar:~$ python BibFilesMerge.py -h
usage: BibFilesMerge.py [-h] -p FOLDERPATH [-f [FILELIST [FILELIST ...]]]
                        [-o FILENAMEOUT] [-e [EXCLUDE [EXCLUDE ...]]] [-l]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDERPATH, --folderPath FOLDERPATH
                        Bib files folder path
  -f [FILELIST [FILELIST ...]], --fileList [FILELIST [FILELIST ...]]
                        bib file name list, e.g. -f IEEE.bib ACM.bib
                        science.bib Springer.bib
  -o FILENAMEOUT, --fileNameOut FILENAMEOUT
                        File name of merged file
  -e [EXCLUDE [EXCLUDE ...]], --exclude [EXCLUDE [EXCLUDE ...]]
                        bib with entries to be removed from others, e.g. -e
                        FirstExecution.bib SecondExecution.bib
  -l, --logProcess      Log processing to CSV files

Example

foo@bar:~$ python BibFilesMerge.py -p output/ -o 2019-2.bib -f 2019-2/ScienceDirect1.bib 2019-2/ScienceDirect2.bib 2019-2/Scopus.bib -e 2018/ACM.bib 2018/IEEE.bib 2018/ScienceDirect.bib 2018/SCOPUS.bib 2019/ACM.bib 2019/IEEE.bib 2019/ScienceDirect1.bib 2019/ScienceDirect2.bib 2019/SCOPUS.bib -l

--folderPath     output/
--fileNameOut    2019-2.bib
--fileList       ['2019-2/ScienceDirect1.bib', '2019-2/ScienceDirect2.bib', '2019-2/Scopus.bib']
--exclude        ['2018/ACM.bib', '2018/IEEE.bib', '2018/ScienceDirect.bib', '2018/SCOPUS.bib', '2019/ACM.bib', '2019/IEEE.bib', '2019/ScienceDirect1.bib', '2019/ScienceDirect2.bib', '2019/SCOPUS.bib']
--logProcess     True

2019-2/ScienceDirect1.bib
2019-2/ScienceDirect2.bib                                  
2019-2/Scopus.bib                                   
                                                       
Total:                   798
No Author:               0
No Year:                 0
No Publisher:            0
Duplicates:              31
Merged:                  25
Excluded from bib:       537
Final:                   230
without Abstract:        0 {'2019-2/ScienceDirect1.bib': 0, '2019-2/ScienceDirect2.bib': 0, '2019-2/Scopus.bib': 0}

The two CSV files created on output folder by the -l switch are:

  • BibFilesMerge_removed.csv, with columns cause, source, key, doi, author, year, title and publish
  • cause is one of: no author, no year, no journal, duplicate of next or duplicate of prev
  • BibFilesMerge_final.csv, with columns key, doi, author, year, title, publish and abstract

Load Bib File Error

Sometimes errors occur while reading the bib file. In this case, note at the end of the error line of the bib file. Then edit the bib file and adjust the error. For example:

foo@bar:~$ python BibFilesMerge.py -p results -f IEEE.bib ACM.bib science.bib Springer.bib -o MyFile.bib

--folderPath     results
--fileNameOut    MyFile.bib
--fileList       ['IEEE.bib', 'ACM.bib', 'science.bib', 'Springer.bib']
--exclude        None
--logProcess     False

IEEE.bib
ACM.bib
science.bib
Traceback (most recent call last):
  File "BibFilesMerge.py", line 146, in <module>
    run(args["folderPath"], args["fileList"], args["fileNameOut"])
  File "BibFilesMerge.py", line 63, in run
    bibData = parse_file(os.path.join(folderPath,bibFileName))
  File "./pybtex\pybtex\database\__init__.py", line 865, in parse_file
    return parser.parse_file(file)
  File "./pybtex\pybtex\database\input\__init__.py", line 54, in parse_file
    self.parse_stream(f)
  File "./pybtex\pybtex\database\input\bibtex.py", line 410, in parse_stream
    return self.parse_string(text)
  File "./pybtex\pybtex\database\input\bibtex.py", line 397, in parse_string
    for entry in entry_iterator:
  File "./pybtex\pybtex\database\input\bibtex.py", line 191, in parse_bibliography
    self.handle_error(error)
  File "./pybtex\pybtex\database\input\bibtex.py", line 383, in handle_error
    report_error(error)
  File "./pybtex\pybtex\errors.py", line 78, in report_error
    raise exception
  File "./pybtex\pybtex\database\input\bibtex.py", line 189, in parse_bibliography
    yield tuple(self.parse_command())
  File "./pybtex\pybtex\database\input\bibtex.py", line 222, in parse_command
    self.handle_error(error)
  File "./pybtex\pybtex\database\input\bibtex.py", line 383, in handle_error
    report_error(error)
  File "./pybtex\pybtex\errors.py", line 78, in report_error
    raise exception
  File "./pybtex\pybtex\database\input\bibtex.py", line 220, in parse_command
    self.required([body_end])
  File "./pybtex\pybtex\scanner.py", line 120, in required
    raise TokenRequired(description, self)
pybtex.scanner.TokenRequired: syntax error in line 2264: '}' expected

Bib file Content in line 2264:

2264 note = "Special issue on Assistive Computer Vision and Robotics - "Assistive Solutions for Mobility, Communication and HMI" ",

just fix it to:

2264 note = "Special issue on Assistive Computer Vision and Robotics - Assistive Solutions for Mobility, Communication and HMI",