Extended Sentence Similarity Based on Word Relations for Document Summarization  
Author Heechan Kim

 

Co-Author(s) Soowon Lee

 

Abstract Automatic text summarization is a method to extract key sentences from input documents for readers to understand the documents with little effort. Automatic text summarization is largely divided into follows. Abstractive summarization is a method to create a new human-like natural language to provide a summary. Extractive summarization is a method to select salience sentences. It is one of the mainly studied in the fields of natural language processing research. One of the represented extractive summarization methods is TextRank, in which sentences in a document are represented as a graph and the similarity between sentences is calculated based on the frequencies of co-occurring words. This similarity measure has a drawback in that it does not sufficiently consider the semantic similarity between words in a sentence. To overcome this drawback, in this paper, we propose a similarity measure between words by defining co-occurrence relations of all word pairs in a sentence. Further, we propose a novel sentence vector function to apply the co-occurrence relations between words while calculating the similarity between sentences. The experiments revealed that the proposed method was more accurate than TextRank.

 

Keywords Text mining, Document summarization, Graph based similarities
   
    Article #:  DSBFI19-83
 
Proceedings of ISSAT International Conference on Data Science in Business, Finance and Industry
July 3-5, 2019 - Da Nang, Vietnam