A data labeling tool written in Flask. Created for Teknofest 2024 Turkish Natural Language Processing Competition.
- Installation
- Usage
- Features
- License
- Libraries
- Recommendation
- Screenshots
- Folder Structure
- Clone the repository and Change Directory
$ git clone https://github.com/HEZARTECH/hezartech-data-preprocessing-app
# or
$ gh repo clone HEZARTECH/hezartech-data-preprocessing-app
#and
$ cd hezartech-data-preprocessing-app
- Install the required dependencies
# Note: you can try `pip3` if `pip` is not working for you.
$ pip install -r requirements.txt
- Set up the environment variables
# Note: you can try `python3` if `python` is not working for you.
$ python scripts/make_dotenv.py # please just press enter for all inputs
Note: we have a guest account for database:
Email: misafir@hezartech.com
Password: deneme123
P.S: You will se that when make_dotenv.py
finished. ;)
But if you want to create a new user you can use simply:
$ python utils.py # It'll call the method from scripts/make_user.py
- Run the application
$ python app.py
- If you want to use as desktop app:
$ python desktop.py
-
Upload a dataset
-
Clean the dataset
-
Label the dataset
-
Export the labeled dataset
- Upload a data file
- Clean the data
- Label the dataset with NER based labelling editor.
- Export the just claned/cleaned and labeled dataset.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
If you want to look to the important files, here is a list:
./scripts/make_dotenv.py
./scripts/make_user.py
./app.py
./utils.py
./desktop.py
And if you want to try system with example data file.
You can use the example_upload_file/TurknetDestek.csv
.
- Flask
- Flask-Cors
- Werkzeug
- MongoDB
- Requests
- PyMongo
- PyWebView
- Pandas
- OpenPyXL
- Python-Dotenv
- Bcrypt
- PwInput
.
├── app.py # Main app file
├── desktop.py # Webview desktop app
├── LICENSE.md # our apps LICENSE
├── README.md # This file
├── requirements.txt # for install required libraries
├── responses.py # For `app.py`
├── screenshots # For `README.md`
│ ├── clean_dataset.png
│ ├── export_dataset.png
│ ├── homepage.png
│ ├── label_dataset.png
│ ├── label_dataset_editor.png
│ ├── login.png
│ └── upload_dataset.png
├── scripts # For setup
│ └── make_dotenv.py
├── static # For `app.py`
│ ├── css
│ │ └── login.css
│ ├── HEZARTECH.png
│ ├── HEZARTECH_HORIZONTAL.png
│ ├── js
│ │ └── label_editor.js
│ ├── robots.txt
│ └── uploads # Exaple uploads
│ ├── 20240622172322_sikayet_var.csv
│ └── clean_data
│ ├── 20240622172322_sikayet_var.csv
│ ├── 20240622172322_sikayet_var.json
│ └── 20240622172322_sikayet_var.xlsx
├── templates # For `app.py`
│ ├── base.html
│ ├── clean_dataset.html
│ ├── export_dataset.html
│ ├── index.html
│ ├── label_dataset.html
│ ├── label_dataset_editor.html
│ ├── login.html
│ └── upload_dataset.html
├── utils.py # For `app.py`
└── wsgi.py # Alternate to `app.py` import app variable from `app.py` and run the server.