There is a tremendous amount of text available, and it keeps expanding daily. Consider the internet, which has a wide variety of online pages, news stories, status updates, blogs, and more. The greatest thing we can do to traverse the unstructured data is use search and skim the results. Much of this text material needs to be condensed into concise summaries that highlight the key points so that we can traverse it more effectively and determine whether the longer papers actually contain the information we need.
We need automatic text summarization methods due to the following reasons:
- Summaries reduce reading time.
- When researching documents, summaries make the selection process easier.
- Automatic summarization improves the effectiveness of indexing.
- Automatic summarization algorithms are less biased than human summarizers.
- Personalized summaries are useful in question-answering systems as they provide personalized information.
- Using automatic or semi-automatic summarization systems enables commercial abstract services to increase the number of texts they are able to process.
Text Summarization is used across a wide range of industries and applications.
These include:
- Creating chapters for YouTube videos or educational online courses via video editing platforms.
- Summarizing and sharing key parts of corporate meetings to reduce the need for mass attendance.
- Automatically identifying key parts of calls and flagging sections for follow-up via revenue intelligence platforms.
- Summarizing large analytical documents to ease readability and understanding.
- Segmenting podcasts and automatically providing a Table of Contents for listeners.
I'm currently working on summarising chat context so that an agent can quickly comprehend earlier context. I'm curious to see how the deep learning models perform when applied to existing datasets. News articles have excellent grammar and vocabulary, which helps us understand things better.
The dataset consists of 4515 examples and contains Author_name, Headlines, Url of Article, Short text, Complete Article. The summarized news articles were extracted from Inshorts and only scraped from various Indian news reporting journals such as Hindu, Indian times and the Guardian. Time period ranges from Febrauary to August 2017.
I would like to thank the authors of Inshorts for their fabulous work.
Generating short length descriptions(headlines) from text(news articles).
Summarizing large amount of information which can be represented in compressed space
I didn't locate any open source data sets to work on when I was working on the summarising task, but I think there are others who are, and I hope this would be helpful to them.