Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docs and package fixes #95

Merged
merged 7 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
Version 1.1.1
-------------
- Fix minor bugs in bias analysis.
- Improve fonts and minor details in bias analysis plots.


Version 1.1.0
-------------
- Add bias detection and analysis feature (based on sentiment analysis)
Expand Down
31 changes: 17 additions & 14 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Wordview (Work In Progress)
.. image:: https://img.shields.io/pypi/dm/wordview
:alt: PyPI - Downloads

Wordview is a Python package for Exploratory Data Analysis (EDA) and Feature Extraction for text.
Wordview is a Python package for Exploratory Data Analysis (EDA) of text.
Wordview's Python API is open-source and available under the `MIT
license <https://en.wikipedia.org/wiki/MIT_License>`__.

Expand Down Expand Up @@ -51,24 +51,25 @@ Wordview calculates several statistics for labels in labeled datasets whether th
See `label analysis documentation pages <./docs/source/labels.rst>`__ for usage and examples.


Feature Extraction
###################

Wordview has various functionalities for feature extraction from text, including Multiword Expressions (MWEs), clusters, anomalies and
outliers, and more. See the following sections as well as the linked documentation page in each section for details.

Multiword Expressions
*********************

Extraction & Analysis of Multiword Expressions
**********************************************
Multiword Expressions (MWEs) are phrases that can be treated as a single
semantic unit. E.g. *swimming pool* and *climate change*. MWEs have
application in different areas including: parsing, language models,
language generation, terminology extraction, and topic models. Wordview can extract different types of MWEs from text.
See `MWEs documentation page <./docs/source/mwes.rst>`__ for usage and examples.

Anomalies and Outliers
**********************

Bias Analysis
**************
In the rapidly evolving realm of Natural Language Processing (NLP), downstream models are as unbiased and fair as the data on which they are trained.
Wordview Bias Analysis module is designed to assist in the rigorous task of ensuring that underlying training datasets are devoid of explicit negative biases related to categories such as gender, race, and religion.
By identifying and rectifying these biases, Wordview attempts to pave the way for the creation of more inclusive, fair, and unbiased NLP applications, leading to better user experiences and more equitable technology.
See the `bias analysis documentation page <./docs/source/bias.rst>`__ for usage and examples.


Analysis of Anomalies and Outliers
**********************************
Anomalies and outliers have wide applications in Machine Learning. While in
some cases, you can capture them and remove them from the data to improve the
performance of a downstream ML model, in other cases, they become the data points
Expand All @@ -78,8 +79,10 @@ Wordview offers several anomaly and outlier detection functions.
See `anomalies documentation page <./docs/source/anomalies.rst>`__ for usage and examples.


Clusters
*********


Cluster Analysis
****************
Clustering can be used to identify different groups of documents with similar information, in an unsupervised fashion.
Despite it's ability to provide valuable insights into your data, you do not need labeled data for clustering. See
`wordview`'s `clustering documentation page <./docs/source/clustering.rst>`__ for usage and examples.
Expand Down
2,001 changes: 0 additions & 2,001 deletions data/IMDB_Dataset_sample.csv

This file was deleted.

64 changes: 64 additions & 0 deletions data/mwes.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"LVC": {
"SHOOT the binding": 26.024772726811,
"achieve this elusive": 24.700867756741808,
"manipulate the wildlife": 24.439810226089847,
"offset the darker": 24.024772726811,
"remove the bindings": 24.024772726811,
"Wish that Anthony": 23.898535506871717,
"Add some French": 23.501618538160958,
"grab a beer": 22.824678372319397,
"steal the 42": 22.50121077075399,
"invoke the spirit": 22.11788213120248
},
"NC2": {
"gordon willis": 20.73998574816305,
"Smoking Barrels": 20.73998574816305,
"sadahiv amrapurkar": 20.73998574816305,
"nihilism nothingness": 20.73998574816305,
"tomato sauce": 20.73998574816305,
"Picket Fences": 20.73998574816305,
"deja vu": 19.73998574816305,
"cargo bay": 19.73998574816305,
"zoo souvenir": 19.155023247441893,
"cake frosting": 19.155023247441893
},
"NC3": {},
"ANC2": {
"bite-sized chunks": 20.73998574816305,
"lizardly snouts": 20.73998574816305,
"behind-the-scenes featurette": 20.73998574816305,
"hidebound conservatives": 20.73998574816305,
"judicious pruning": 20.73998574816305,
"substantial gauge": 19.73998574816305,
"haggish airheads": 19.73998574816305,
"global warming": 19.73998574816305,
"Ukrainian flags": 19.155023247441893,
"well-lit sights": 19.155023247441893
},
"ANC3": {},
"VPC": {
"upside down": 12.673896557705278,
"Stay away": 12.489687330256716,
"put together.": 11.615864436333862,
"sit through": 10.932923610488164,
"ratchet up": 10.82859376031959,
"shoot'em up": 10.82859376031959,
"rip off": 10.719204186026548,
"hunt down": 10.673896557705278,
"screw up": 10.413556261040748,
"scorch out": 10.403479188352796
},
"NP": {
"every penny": 12.779983816094969,
"THE END": 12.067560406191555,
"A JOKE": 11.785789437776176,
"A LOT": 11.048823843609968,
"Either way": 11.033489730101863,
"An absolute": 10.717617935134596,
"half hour": 10.647669057572031,
"no qualms": 10.468522720258676,
"every cliche": 10.458055721207607,
"another user": 10.368209103825127
}
}
Loading
Loading