This is a React app for the Data Masking Platform. It provides a user interface to interact with a backend service that masks sensitive data in text.
- Enter text to be processed and masked.
- Submit the text to the backend service for processing.
- Display the masked output text.
- Handle error cases gracefully.
-
Clone the repository:
git clone https://github.com/your/repository.git
-
Navigate to the project directory:
cd project-directory
-
Install the dependencies:
npm install
-
Start the development server:
npm start
-
Access the app in your browser at
http://localhost:3000
. -
Enter the text you want to process in the input field.
-
Click the "Submit" button to send the text to the backend service for processing.
-
The processed and masked output will be displayed below the input field.
The app is configured to send requests to the backend service at http://127.0.0.1:5000/process_text
. If your backend service is running on a different URL, you can modify the endpoint in the handleSubmit
function of the App
component.
- React: JavaScript library for building user interfaces.
- Axios: Promise-based HTTP client for making API requests.
Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request.
This project is licensed under the MIT License.
ner model code - Here's an explanation of the code:
-
The code begins by importing the necessary libraries:
csv
for reading training data from a CSV file,spacy
for natural language processing,random
for shuffling the training data, andExample
fromspacy.training.example
for creating training examples. -
The function
offsets_to_biluo_tags
converts the entity offsets to a list of BIO (beginning-inside-outside) tags. It takes a spaCydoc
object and a list ofentities
as input and returns a list of tags. -
The function
train_ner_model
trains a named entity recognition (NER) model using the provided training data. It takestraining_data
as input, which is a list of tuples containing the full text, masked text, entity spans, and other information.- It initializes a blank NER model using
spacy.blank("en")
. - It adds the NER component to the pipeline of the model.
- It extracts the unique entity labels from the training data and adds them as labels in the NER component.
- It prepares the training data in spaCy format by converting the entity spans to the required format.
- It trains the NER model using the FastText algorithm for a specified number of iterations.
- Finally, it returns the trained NER model.
- It initializes a blank NER model using
-
The code reads the training data from a CSV file named
data.csv
and stores it in thetraining_data
list. The CSV file should have columns for full text, masked text, entity spans, PII (Personally Identifiable Information) entities, and other entities. -
The
train_ner_model
function is called with thetraining_data
to obtain the trained NER model. -
The trained NER model is saved to disk using the
to_disk
method, and it is stored in a directory namedner_model
. -
The code tests the NER model on a sample text by creating a spaCy
doc
object using thener_model
and the sample text. -
The
masked_text
variable is initialized with the sample text. Then, for each entity (ent
) in thedoc.ents
, the corresponding entity text is replaced with the string "{{MASKED}}". -
Finally, the
masked_text
is printed, which contains the sample text with the identified entities replaced by "{{MASKED}}".
This code trains a NER model using the provided training data and demonstrates its usage by masking the entities in a sample text.