GitHub - seguimiento-politico/manifestos-converter: Python script to convert any Party Manifesto (PDF) into a structured YAML file; Script en Python para convertir cualquier Programa electoral en PDF a un archivo structurado YAML

Requirements

This script uses the Adobe pdfservices API to extract the PDF text into a structured JSON file.

Before using "pdfservices-sdk" must be installed by typing in terminal "pip install requirements.txt"
You also need to get the API credentials and copy the resulting file "pdfservices-api-credentials.json" into the root folder

Copy to the "Inputs" folder all the PDF to be converted
Execute "python main.py"
The resulting YAML files will be generated into "Output" folder. The "Intermediate" folder locates the JSON files created via Adobe pdfservices_sdk

If after executing you get an error like this one: "OSError: [Errno 18] Invalid cross-device link: ..." you may fixe it by following theese steps:

open /usr/local/lib/python3.9/dist-packages/pdfservices_sdk-2.3.0-py3.9.egg/adobe/pdfservices/operation/internal/io/file_ref_impl.py
look for "os.rename(self._file_path, abs_path)"
replace it "shutil.copy(self._file_path, abs_path)" and "os.remove(self._file_path, abs_path)"
at the line #15 add "import shutil"

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.vscode		.vscode
__pycache__		__pycache__
.gitignore		.gitignore
LICENSE		LICENSE
extract_text_from_pdf_exception_sample.py		extract_text_from_pdf_exception_sample.py
extract_txt_with_styling_info_from_pdf.py		extract_txt_with_styling_info_from_pdf.py
json2yaml.py		json2yaml.py
main.py		main.py
pdf2json.py		pdf2json.py
pdf2text.py		pdf2text.py
pdf2yaml.py		pdf2yaml.py
readme.md		readme.md
requirements.txt		requirements.txt
transform_from_json_data_to_yaml_manifest.py		transform_from_json_data_to_yaml_manifest.py