This script uses the Adobe pdfservices API to extract the PDF text into a structured JSON file.
- Before using "pdfservices-sdk" must be installed by typing in terminal "pip install requirements.txt"
- You also need to get the API credentials and copy the resulting file "pdfservices-api-credentials.json" into the root folder
- Copy to the "Inputs" folder all the PDF to be converted
- Execute "python main.py"
- The resulting YAML files will be generated into "Output" folder. The "Intermediate" folder locates the JSON files created via Adobe pdfservices_sdk
If after executing you get an error like this one: "OSError: [Errno 18] Invalid cross-device link: ..." you may fixe it by following theese steps:
- open /usr/local/lib/python3.9/dist-packages/pdfservices_sdk-2.3.0-py3.9.egg/adobe/pdfservices/operation/internal/io/file_ref_impl.py
- look for "os.rename(self._file_path, abs_path)"
- replace it "shutil.copy(self._file_path, abs_path)" and "os.remove(self._file_path, abs_path)"
- at the line #15 add "import shutil"