nimbus-transformer

it's like Nimbus but uses a transformer language model

Written in a Functional Programming style.

Documentation

Getting Started

1. Install Pipenv

Works with macOS, Linux, Windows.

2. Setup virtual environment

pipenv install

This will create a virtual environment with the required:

Python 3.6.8
all the [packages] listed in the Pipfile

3. Open virtual environment

pipenv shell

4. Verify your python version

$ python --version
Python 3.6.8

Usage

from ntfp.ntfp import get_context, transformer
question = "what is Dr. Foaad Khosmood email?"
_, _, context = get_context(question)
answer, _ = transformer(question, context)
print("answer: ", answer)
>>> answer:  foaad@ calpoly.edu.

Demo

$ python main.py
To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html


question: what is Dr. Foaad Khosmood email?
len(context):  911
Converting examples to features: 100%|██| 1/1 [00:00<00:00, 95.61it/s]



answer:  foaad@ calpoly.edu.
appended new row to data.csv

How it works

Assumptions

"Context" is limited to Cal Poly, so expect non-Cal-Poly "Questions" to fail
"Answer" is expected to exist publically on the web, such that Google can access it.

Pipeline

User asks Question to a web application.
Scrape Google for Context limit 10 url results.
Store Context into database.
Transform ( Question, Context ) >> Answer
Reply with Answer
Mark, good/bad answer to learn from later.

TODO

a simple web UI with an input box and a section for answers
- if bad-answer then offer user a toggle: isItAnyOf(ans1,ans2..)
- if user does not choose a toggle then mark as possibly-answerable
- set up a nice UI for verification team to complete task.
database code for
- INSERT Context/Question/Answer/timestamp/good-bad-answer
- UPDATE good-bad-anxswer
test performance
- avoid test generation by code because the test itself should not depend on subject-under-test.
- measure precision & recall of this system
make improvements to assumptions
consider git rev-parse HEAD to get latest commit hash to associate with data.
consider learning new facts from TrustedUser
- e.g. Dr. Khosmood is a TrustedUser and can offer the system either:
  - URL
    - e.g. a published google doc containing a professor's syllabus.
    - e.g. a professor's personal website
  - UserContext
    - e.g. the plain-text of a professor's syllabus.
    - either provided through real-time chat client
    - or provided through a simple input box
    - also consider ChatContext
  - (Question, Answer) mappings
  - so, when any User asks a previously mapped question, then the correct answer can be returned
  - or, when the most relevant UserContext is found for the given question, a reasonable answer can still be returned.
question/answer data augmentation
- remember augmentations need grammar check by human
- try Question-Paraphrasing
- also try style-transformations
  - "PHRASE REPLACEMENT TRANSFORM" (Khosmood, pg. 118)
    - I wanted to be with you alone
      - => I desired to be with you only.
    - class phraseXform
      - update it to latest technologies: SpaCy! BabelNet?
    - similar to /r/IncreasinglyVerbose
    - I teach at Cal Poly
      - => I teach at a university in California
        
        (replace Stanford University with definition)
      - => I impart skills or knowledge to students at a university in California
        
        (replace teach with definition and append students)
      - => I impart skills or knowledge to students at an establishment where a seat of higher learning is housed in California
        
        (replace university with definition)
      - => I impart skills or knowledge to students at an establishment where a seat of higher learning is housed in San Luis Obispo, California
        
        (apply knowledge of city location of Cal Poly)
    - "Translation-Tours" (Khosmood, pg. 141)
      - "Translation tour with Spanish, French, German" (Khosmood, pg. 141)
        
        I teach at Cal Poly
        
        => Enseño en Cal Poly (Enlish => Spanish)
        
        => J'enseigne à Cal Poly (Spanish => French)
        
        => Ich unterrichte an der Cal Poly (French => German)
        
        => I teach at Cal Poly (German => English)
        
        I teach at Cal Poly.
        
        => Doy clases en Cal Poly. (Enlish => Spanish)
        
        => Ich unterrichte an der Cal Poly.
        
        => I teach at Cal Poly.
      - Alternative Translation Tours
        
        I teach at Cal Poly
        
        => እኔ በካሊ ፖሊ አስተምራለሁ ፡፡ (English => Amharic)
        
        => I teach by Kali Poly. (Amharic => English)
chart useful metrics
- e.g. averge confidence score of transformer over time (or over code changes) need log commit hash
- e.g. lexical similarity (fuzz ratio) of question to context over time (or over code changes) need log commit hash

What is `data.csv`?

data.csv is a temporary "database" for appending question samples with the generated meta-data and final answer of this system.

Keeping track of this data will help with measuring the model's performance and making improvements based on performance metrics.

Resources

huggingface/transformers
explosion/spaCy
Technical Talk: Using spaCy with Bert | Hugging Face Transformers | Matthew Honnibal
example google search
pip install google source code
pip install google tutorial
- read this in more depth for alternative google search api options
pip install google geeksforgeeks
pip install beautifulsoup4

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.vscode		.vscode
docs		docs
git_hook_samples		git_hook_samples
ntfp		ntfp
utils		utils
.gitignore		.gitignore
GoogleResultURLPage.png		GoogleResultURLPage.png
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
checksum.py		checksum.py
clubs.csv		clubs.csv
clubs.py		clubs.py
clubs.txt		clubs.txt
data.csv		data.csv
data.png		data.png
demo.png		demo.png
download_nlp_stuff.sh		download_nlp_stuff.sh
google.png		google.png
main.py		main.py
make_docs.sh		make_docs.sh
map_commit_to_checksum.csv		map_commit_to_checksum.csv
run.sh		run.sh
transformers_qa_test.png		transformers_qa_test.png
type_check.sh		type_check.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nimbus-transformer

Documentation

Getting Started

1. Install Pipenv

2. Setup virtual environment

3. Open virtual environment

4. Verify your python version

Usage

Demo

How it works

TODO

What is `data.csv`?

Resources

About

Releases

Packages

Languages

License

mfekadu/nimbus-transformer

Folders and files

Latest commit

History

Repository files navigation

nimbus-transformer

Documentation

Getting Started

1. Install Pipenv

2. Setup virtual environment

3. Open virtual environment

4. Verify your python version

Usage

Demo

How it works

TODO

What is data.csv?

Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

What is `data.csv`?

Packages