The project
Supplementary material
See a demonstration
Related projects
Interesting links
Contact us


NILC-WISE - Web Interface for Summary Evaluation - an online and easy to use interface for running ROUGE (Lin, 2004) for evaluating summaries

Summarization extension to Google Chrome - extension for on-line news summarization, based on RSumm system

OpCluster-PT - as described in the MSc Dissertation of Vargas (2017), a new computational method based on semantic relations and linguistic rules to automatically detect fine-grained opinions in User-Generated Content (UGC)

Models for summary coherence evaluation - a set of implemented models for summary coherence evaluation, following several approaches, from traditional entity grids to discourse grids. See the PhD thesis of Marcio de Souza Dias for more information.

RC-4 multi-document summarizer - based on the best RST & CST-based summarization strategy proposed by Cardoso (2014)

RCT-4 multi-document summarizer - based on the best RST & CST & subtopics-based summarization strategy proposed by Cardoso (2014). Notice that the difference of this summarization method in relation to the above one is the inclusion of subtopic segmentation and treatment.

Text-summary alignment - tool that includes a set of methods for aligning texts and their multi-document summaries, as developed by Agostini et al. (2014)

TextTiling for Portuguese - topical segmentation tool adapted to news texts in Brazilian Portuguese, based on the work of Hearst (1997)

ViSum - a visualization system for multi-document summarization (described by Lima, 2013)

Lemmatizer for Portuguese - based on the MXPOST part of speech tagger and UNITEX dictionaries for Portuguese, this tool produces the lemmas of the words of a text stored in a plain text file. The source code is also provided. For more details, see the readme.pdf file or contact Erick G. Maziero (the developer of the system)

NCLEANER trained model for Portuguese - a trained model to be used with NCleaner (Evert, 2008) for cleaning web pages in Portuguese. The model was trained with 184 texts from several online sources, as Terra, UOL, BBC, Exame, Estadão, IG, R7, Zero Hora, G1, JB Online, and O Globo, among others.

CSTTool - a semi-automatic edition tool for annotating texts according to the Cross-document Structure Theory (see Aleixo and Pardo, 2008)

Newshead - an on-line tool for searching and clustering related news

RSTeval - a tool for discourse parsing evaluation, following Marcu (2000) evaluation method - the tool is able to compare RST trees (automatically or manually produced), producing precision and recall numbers (see Maziero and Pardo, 2009)

Syntax-based text segmentation tool - a tool for detecting elementary discourse units in texts - it uses the parser PALAVRAS (Bick, 2000) for analyzing the input text and, then, applies syntactical segmentation rules

RST Toolkit - utility programs for processing RST files, offering several computational facilities for both computational and linguistic purposes

Sentence ordering program - program for ordering sentences in a multi-document summary (given the source-texts) (see Lima and Pardo, 2012)

CSTSumm - a multi-document summarizer based on CST information (see README.txt in the rar file) (see Castro Jorge, 2010)

RSumm - a multi-document summarizer based on the relationship maps proposed by Salton et al. (1997) (see Ribaldo et al., 2012 and Ribaldo, 2013)

DiZer 2.0 - an on-line RST discourse parser, which is easily adaptable and portable to different text types/genres and languages (see Maziero et al., 2011)

CSTParser - a state-of-the-art CST discourse parser for Portuguese, using both symbolic and machine learning techniques (see Maziero, 2012)
--> Its stand-alone (offline) version (with some adaptations in relation to the online version) is also freely available for use

NASP (see NASP++ below) - a tool for aiding in word sense annotation of nouns in Portuguese, using Princeton Wordnet as sense repository

NASP++ - an improved version of NASP (see above), with more facilities (e.g., the underlying generation of ontologies for the annotated words) and adapted to other part of speech tags

MulSEN - a multilingual version of NASP (see above)

Corpora and related resources

CSTNews-Update - a new arrangement of CSTNews texts for training and testing update summarization methods for Portuguese

Corpora for sentence compression - two corpora composed by long (original) sentences and their compressed versions for Portuguese

Corpus of automatic multi-document summaries with linguistic errors - a corpus of automatic multi-document summaries (for the texts of CSTNews corpus) produced by 4 different summarizes with varied performances, manually annotated with linguistic errors

OpiSums-PT - a corpus of (extractive and abstractive) opinion summaries (170, in total) for reviews of books (13 reviews) and electronic products (4 reviews), written in Brazilian Portuguese

Aspect ontologies - groups of (hierarchically organized) opinion aspects for supporting opinion mining tasks, including the domains of smartphones, digital cameras and books, in OWL format

CSTNews interface - on-line browsing interface to CSTNews corpus

CSTNews - a corpus with 50 clusters of news texts - in Portuguese - along with their multi-document summaries, as well as several discourse and semantic annotations (see Aleixo and Pardo, 2008; Cardoso et al., 2011)


More resources


NILC - Interinstitutional Center for Computational Linguistics