The rapid growth of the amount of information has prompted the emergence of many IT domains including automatic summarization (AS). These systems are used to extract the most relevant information from huge amounts of data.
In this repository, we are working on an AS system that focuses on multiple scientific articles summarization. These summaries could help in organizing the huge amount of scientific production and could provide scientific digests attached to the same subject.
In our system, we experiment the features maximization (FM) in this context. this statistical method initially designed for machine learning provides a language-agnostic and non-parametric approach of AS. We integrate this method with traditional structures of AS systems and with a graph-based model exploiting the spread activation algorithm.
In sum, this thesis introduce a new approach for AS based on FM and it aims to evaluate the performance of this method in the task of producing extractive summaries, either generic summaries or query-focused ones.
Automatic Summarization, Retrieval Systems, Features Maximization, Cosine Similarity, Graph-Based Model, Spreading Activation.