The ML repository for the Problem Statement - Digital Alpha SaaS Analyzer at Inter IIT Tech Meet 10.0. The deployed version of the website can be found here. The report can be found here
-
dict-sentiment.ipynb -> For sentiment analysis(6 classes - lexicon based)
- Input: The input of the file is specified in the
temp_text
variable. - Output: The input is passed to the
get_class_counter
function, which returns the sentiment dictionary containing the results.
- Input: The input of the file is specified in the
-
finbert_inference.ipynb -> For sentiment analysis(3 classes - transformer)
- Input: The input of the file is specified in the
temp_text
variable. - Output; The input is passed to the
get_output
function, which returns the sentiment dictionary.
- Input: The input of the file is specified in the
-
mdna_extractor.ipynb -> For extracting contents(section wise)
- Input: The input to the function is the
filing_url
andsection_name
, where the names have their usual meanings - Output: The output is obtained from the
get_section
function, which returns the desired section text
- Input: The input to the function is the
-
find_company_trends_using_lda.py -> For extracting the latest trending topics relevant to the company
- Input: The input to the file is the company title and the number of tweets we want to extract
- Output: The output of the file is a list of top keywords relevant to the company
-
extract_metrics_from_fillings.ipynb -> For extracting metrics from the fillings
- Input: The inputs are:
api_key
- for accessing the fillings using sec-apiurl
- url to the filingmetric
- name of the metric in lowercaseval_type
- metric data type - one of ['PERCENT', 'MONEY', 'NUMBER', 'RATIO']k
- window size for metric search, default = 6relevant_sections
- list of sections to search for the metric
- Output: The output of the file is value of the metric extracted from the filing stored in
correct_value
variable
- Input: The inputs are:
-
extract_tables.ipynb -> For extracting tables from the fillings
- Input: The inputs are
api_key
for accessing the fillings using sec-api,url
to the filing and thesection
- Output: The output of the file is the tables extracted from the filing stored in
tables
variable
- Input: The inputs are
-
qna_on_tables.ipynb -> For question answering on the tables
- Input: The inputs are
table
andques
(a list of questions) - Output: The output of the file is the answers to the question based on the table
- Input: The inputs are
-
theme-vocab-builder.ipynb -> To build vocabulary for various sectors
- Input: any important data file related to various sectors
- Output: The output of the file is the vocabulary file for various sectors
-
exposure-calc.ipynb -> to calculate the exposure of a company to various sectors
- Input: The inputs are -
filing.txt
- sec filing of a companytheme.txt
- vocabulary file for a specific sector
- Output: The output of the file is the similarity score with respect to the vocabulary of a specific sector
- Input: The inputs are -
-
generate_questions_answers.ipynb -> to generate questions and answers from the text given
- Input: The only input is the
text
- Output: The output of the file is the generated questions and answers in the dictionary
qna_dict
- Input: The only input is the
-
summarize_text.ipynb -> to summarize the text given
- Input: The only input is the
text
- Output: The output of the file is the summary of the text in the variable
summary
- Input: The only input is the
-
10Q_parser.ipynb -> For extracting contents(section wise)
- Input: The input to the function is the link of the filing and section number
- Output: The output is obtained from the
parse_10q_filing
function, which returns the desired section text
-
find_metric.ipynb -> complete pipeline for extracting metrics from filings of a company
- Input: The inputs are -
api_key
- for accessing the fillings using sec-apicompany_cik
- cik of the cmopanymetric
- name of the metric in lowercaseval_type
- metric data type - one of ['PERCENT', 'MONEY', 'NUMBER', 'RATIO']k
- window size for metric search, default = 6
- Output: The output of the file is
value
of the metric extracted from the filings
- Input: The inputs are -