Skip to content

Commit

Permalink
Merge pull request #11 from ChristophWenk/feature/cleanup
Browse files Browse the repository at this point in the history
Conda enviroment
  • Loading branch information
ChristophWenk authored Jul 10, 2022
2 parents f226eba + c685aa4 commit f6143e3
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 14 deletions.
10 changes: 8 additions & 2 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,16 @@ General functionality should look like this:
- pypdf2 2.4.2
- dateparser 1.1.1

Other Python or package versions might work but have not been tested.
Other Python or package versions might work but have not been tested.

A conda environment configuration is provided in
the `environment.yml` file. You can set it up with `conda env create -f environment.yml`. Activate it with
`conda activate PDFSorter`.

## Document Type Configuration
New document types can be added by creating new configuration files. The process is described below.
Place the files in the directory defined in `settings.config_files_dir`.
The default is `'../resources/config_files'`.
### Configuration File Name
The configuration file name has to adhere to the scheme below. The [Company] and the [Document Type] values have both
to be found in the PDF text content. This will only be used to select the correct configuration file for the PDF in
Expand Down Expand Up @@ -78,7 +84,7 @@ the configuration file and may include property keys generated from the regex pa
"target_directory": "F:\\Dokumente\\Rechnungen\\Helsana\\Leistungsabrechnungen",
"file_name_format": "{company_name}_{date}_{document_type}_{document_id}.pdf",
"document_id": "ABCD",
"date": "2022-01-01"
"date": "2022-12-31"
}
```

Expand Down
29 changes: 29 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: PDFSorter
channels:
- defaults
dependencies:
- bzip2=1.0.8=he774522_0
- ca-certificates=2022.4.26=haa95532_0
- certifi=2022.6.15=py310haa95532_0
- dateparser=1.1.1=pyhd3eb1b0_0
- libffi=3.4.2=hd77b12b_4
- openssl=1.1.1p=h2bbff1b_0
- pip=21.2.4=py310haa95532_0
- python=3.10.4=hbb2ffb3_0
- python-dateutil=2.8.2=pyhd3eb1b0_0
- pytz=2022.1=py310haa95532_0
- regex=2021.8.3=py310h2bbff1b_0
- setuptools=61.2.0=py310haa95532_0
- six=1.16.0=pyhd3eb1b0_1
- sqlite=3.38.5=h2bbff1b_0
- tk=8.6.12=h2bbff1b_0
- tzdata=2022a=hda174b7_0
- tzlocal=2.1=py310haa95532_0
- vc=14.2=h21ff451_1
- vs2015_runtime=14.27.29016=h5e58377_2
- wheel=0.37.1=pyhd3eb1b0_0
- wincertstore=0.2=py310haa95532_2
- xz=5.2.5=h8cc25b3_1
- zlib=1.2.12=h8cc25b3_2
- pip:
- pypdf2==2.4.2
19 changes: 8 additions & 11 deletions pdf_sorter/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,9 @@ def get_attr_from_regex(config, regex, file_name, not_processed_list, pdf_text):


def process_files(path, config_file_path):
logger.info("\n"
"##################################\n"
"# Starting new PDF Sort Run #\n"
"##################################")
logger.info("##################################")
logger.info("# Starting new PDF Sort Run #")
logger.info("##################################")
if settings.dry_run is True:
logger.warning("Dry Run active: Running in preview mode. No files will be renamed or moved.")

Expand Down Expand Up @@ -118,17 +117,15 @@ def process_files(path, config_file_path):
continue

logger.info('==============================================================================================')
logger.info("\n"
"##################################\n"
"# PDF Sort Run completed #\n"
"##################################")
logger.info("##################################")
logger.info("# PDF Sort Run completed #")
logger.info("##################################")

if not_processed_list:
logger.warning("The following PDF files could not be processed or have just been partially processed:")
output_list = ""
for file_name in not_processed_list:
output_list += "\n" + file_name
logger.warning("The following PDF files could not be processed or have just been partially processed:" +
output_list)
logger.warning(file_name)


# Main Function
Expand Down
2 changes: 1 addition & 1 deletion pdf_sorter/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import logging

# Do (False) or do not (True) rename files and move them
dry_run = True
dry_run = False

# Folder that contains the PDFs to process
pdf_files_dir = 'F:/Downloads/02_pdf_sorter'
Expand Down

0 comments on commit f6143e3

Please sign in to comment.