it's like Nimbus but uses a transformer language model
Written in a Functional Programming style.
Works with macOS, Linux, Windows.
pipenv install
This will create a virtual environment with the required:
- Python 3.6.8
- all the
[packages]
listed in thePipfile
pipenv shell
$ python --version
Python 3.6.8
from ntfp.ntfp import get_context, transformer
question = "what is Dr. Foaad Khosmood email?"
_, _, context = get_context(question)
answer, _ = transformer(question, context)
print("answer: ", answer)
>>> answer: foaad@ calpoly.edu.
$ python main.py
To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
question: what is Dr. Foaad Khosmood email?
len(context): 911
Converting examples to features: 100%|██| 1/1 [00:00<00:00, 95.61it/s]
answer: foaad@ calpoly.edu.
appended new row to data.csv
Assumptions
- "Context" is limited to Cal Poly, so expect non-Cal-Poly "Questions" to fail
- "Answer" is expected to exist publically on the web, such that Google can access it.
Pipeline
- User asks
Question
to a web application. - Scrape Google for
Context
limit 10 url results. - Store
Context
into database. - Transform (
Question
,Context
) >>Answer
- Reply with
Answer
- Mark, good/bad answer to learn from later.
- a simple web UI with an input box and a section for answers
- if bad-answer then offer user a toggle: isItAnyOf(ans1,ans2..)
- if user does not choose a toggle then mark as possibly-answerable
- set up a nice UI for verification team to complete task.
- database code for
- test performance
- avoid test generation by code because the test itself should not depend on subject-under-test.
- measure
precision
&recall
of this system
- make improvements to assumptions
- consider
git rev-parse HEAD
to get latest commit hash to associate with data. - consider learning new facts from
TrustedUser
- e.g. Dr. Khosmood is a
TrustedUser
and can offer the system either:URL
- e.g. a published google doc containing a professor's syllabus.
- e.g. a professor's personal website
UserContext
- e.g. the plain-text of a professor's syllabus.
- either provided through real-time chat client
- or provided through a simple input box
- also consider
ChatContext
- (
Question
,Answer
) mappings - so, when any
User
asks a previously mapped question, then the correct answer can be returned - or, when the most relevant
UserContext
is found for the given question, a reasonable answer can still be returned.
- e.g. Dr. Khosmood is a
- question/answer data augmentation
- remember augmentations need grammar check by human
- try Question-Paraphrasing
- also try style-transformations
- "PHRASE REPLACEMENT TRANSFORM" (Khosmood, pg. 118)
I wanted to be with you alone
- =>
I desired to be with you only.
- =>
- class
phraseXform
- update it to latest technologies: SpaCy! BabelNet?
- similar to /r/IncreasinglyVerbose
I teach at Cal Poly
- =>
I teach at a university in California
- (replace Stanford University with definition)
- =>
I impart skills or knowledge to students at a university in California
- (replace teach with definition and append students)
- =>
I impart skills or knowledge to students at an establishment where a seat of higher learning is housed in California
- (replace university with definition)
- =>
I impart skills or knowledge to students at an establishment where a seat of higher learning is housed in San Luis Obispo, California
- (apply knowledge of city location of Cal Poly)
- =>
- "Translation-Tours" (Khosmood, pg. 141)
- "Translation tour with Spanish, French, German" (Khosmood, pg. 141)
I teach at Cal Poly
- =>
Enseño en Cal Poly
(Enlish => Spanish) - =>
J'enseigne à Cal Poly
(Spanish => French) - =>
Ich unterrichte an der Cal Poly
(French => German) - =>
I teach at Cal Poly
(German => English)
- =>
I teach at Cal Poly.
- =>
Doy clases en Cal Poly.
(Enlish => Spanish) - =>
Ich unterrichte an der Cal Poly.
- =>
I teach at Cal Poly.
- =>
- Alternative Translation Tours
I teach at Cal Poly
- =>
እኔ በካሊ ፖሊ አስተምራለሁ ፡፡
(English => Amharic) - =>
I teach by Kali Poly.
(Amharic => English)
- =>
- "Translation tour with Spanish, French, German" (Khosmood, pg. 141)
- "PHRASE REPLACEMENT TRANSFORM" (Khosmood, pg. 118)
- chart useful metrics
- e.g. averge confidence
score
of transformer over time (or over code changes) need log commit hash - e.g. lexical similarity (fuzz ratio) of question to context over time (or over code changes) need log commit hash
- e.g. averge confidence
data.csv
is a temporary "database" for appending question samples with the generated meta-data and final answer of this system.
Keeping track of this data will help with measuring the model's performance and making improvements based on performance metrics.
- huggingface/transformers
- explosion/spaCy
- Technical Talk: Using spaCy with Bert | Hugging Face Transformers | Matthew Honnibal
- example google search
pip install google
source codepip install google
tutorial- read this in more depth for alternative google search api options
pip install google
geeksforgeekspip install beautifulsoup4