Skip to content

A data preprocessing app for create a dataset to Teknofest Turkish Natural Language Processing Competition.

License

Notifications You must be signed in to change notification settings

HEZARTECH/hezartech-data-preprocessing-app

Repository files navigation

Hezartech Data Labeling Tool

A data labeling tool written in Flask. Created for Teknofest 2024 Turkish Natural Language Processing Competition.

Table of Contents

  • Installation
  • Usage
  • Features
  • License
  • Libraries
  • Recommendation
  • Screenshots
  • Folder Structure

Installation

  1. Clone the repository and Change Directory
$ git clone https://github.com/HEZARTECH/hezartech-data-preprocessing-app
# or
$ gh repo clone HEZARTECH/hezartech-data-preprocessing-app
#and
$ cd hezartech-data-preprocessing-app
  1. Install the required dependencies
# Note: you can try `pip3` if `pip` is not working for you.
$ pip install -r requirements.txt
  1. Set up the environment variables
# Note: you can try `python3` if `python` is not working for you.
$ python scripts/make_dotenv.py # please just press enter for all inputs

Note: we have a guest account for database:

Email: misafir@hezartech.com
Password: deneme123

P.S: You will se that when make_dotenv.py finished. ;)

But if you want to create a new user you can use simply:

$ python utils.py # It'll call the method from scripts/make_user.py
  1. Run the application
$ python app.py
  1. If you want to use as desktop app:
$ python desktop.py

Usage

  1. Upload a dataset

  2. Clean the dataset

  3. Label the dataset

  4. Export the labeled dataset

Features

  • Upload a data file
  • Clean the data
  • Label the dataset with NER based labelling editor.
  • Export the just claned/cleaned and labeled dataset.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Recommendation

If you want to look to the important files, here is a list:

./scripts/make_dotenv.py
./scripts/make_user.py
./app.py
./utils.py
./desktop.py

And if you want to try system with example data file. You can use the example_upload_file/TurknetDestek.csv.

Libraries

Screenshots

Login Screen

Login Screen

Index / Homepage Screen

Homepage Screen

Upload Dataset

Upload Dataset

Clean Dataset

Clean Dataset

Label Dataset

Label Dataset

Label Dataset Editor

Label Dataset Editor

Export Dataset

Export Dataset

Folder Structure

.
├── app.py # Main app file
├── desktop.py # Webview desktop app
├── LICENSE.md # our apps LICENSE
├── README.md # This file
├── requirements.txt # for install required libraries
├── responses.py # For `app.py`
├── screenshots # For `README.md`
│  ├── clean_dataset.png
│  ├── export_dataset.png
│  ├── homepage.png
│  ├── label_dataset.png
│  ├── label_dataset_editor.png
│  ├── login.png
│  └── upload_dataset.png
├── scripts # For setup
│  └── make_dotenv.py
├── static # For `app.py`
│  ├── css
│  │  └── login.css
│  ├── HEZARTECH.png
│  ├── HEZARTECH_HORIZONTAL.png
│  ├── js
│  │  └── label_editor.js
│  ├── robots.txt
│  └── uploads # Exaple uploads
│     ├── 20240622172322_sikayet_var.csv
│     └── clean_data
│        ├── 20240622172322_sikayet_var.csv
│        ├── 20240622172322_sikayet_var.json
│        └── 20240622172322_sikayet_var.xlsx
├── templates # For `app.py`
│  ├── base.html
│  ├── clean_dataset.html
│  ├── export_dataset.html
│  ├── index.html
│  ├── label_dataset.html
│  ├── label_dataset_editor.html
│  ├── login.html
│  └── upload_dataset.html
├── utils.py # For `app.py`
└── wsgi.py # Alternate to `app.py` import app variable from `app.py` and run the server.