more.hack4_solution

This is our solution for VTB moretech 4.0 data hackathon, data track. We provide API that extracts trends from news and provides digests for roles such as businessmen and buhgalter (it can be scaled).

We extracted 29k news from rbk.business, gazeta.ru, forbes.ru, kommersant.ru.

Task 1. Extracting trends. Trend is by definition a topic which was not popular some time ago, but for the latest amount of time it's popularity increases. (Not to misunderstand with just always popular topics). Therefore, our idea was following:

Firstly, we group news for determined periods of time (i.e. a month, a week, a day).
For each group we make embeddings for each news, make clasterisation, define biggest clusters and its centers.
The center clusters then are compared. If there is no cluster of "old" news around a cluster of "new" news, then it is a trend.
Extract the news, that match center clusters of trends.
Extract keywords from this news.

Task 2. Making digests. Our idea:

We parse data from unrelated to target professions sources.
We construct list of words and phrases that is supposed to be of interest to target roles.
We extract keywords from news and make their embeddings.
For each of the role words we choose 30 (or another amount) of closest keywords.
We calculate the most relevant news by number of its keywords that are close to the list of role words.

We used Scrapy for parsing, 'rubert-tiny' for embedding and keyword extraction, AgglomerativeClustering algorithm for clustering, FastAPI for API

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
backend		backend
ml/jupyters		ml/jupyters
parser		parser
MORE_Tech_Presentation_1.pdf		MORE_Tech_Presentation_1.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

more.hack4_solution

About

Releases

Packages

Contributors 2

Languages

SaranaAbidueva/more.hack4_solution

Folders and files

Latest commit

History

Repository files navigation

more.hack4_solution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages