The project
Team
Resources
Publications
Supplementary material
See a demonstration
Related projects
Interesting links
Contact us

sustento - Generation of Linguistic Knowledge for Multi-document Automatic Summarization (coordinator: Ariani Di Felippo, DL-UFSCar)

The sustento is a two-years research project which aims at generating knowledge to provide more linguistic-motivated strategies for multi-document automatic summarization of texts in the Brazilian Portuguese language. Specifically, the project has been focused on 3 correlated tasks: linguistic characterization of multi-document summaries and their manual production, since multi-document summarization has just been based on clues regarding the human summarization; corpus-based studies of multi-document phenomena (redundancy, contradiction and complementarity); representation of semantic-conceptual knowledge and construction of resources and tools, since there are no methods based on this level of knowledge for multi-document summarization of Brazilian Portuguese texts.

TermiNet - Instantiation and Application of a Methodology for the Development of "Terminological Wordnets" in Brazilian Portuguese (coordinator: Ariani Di Felippo, DL-UFSCar)

Due to the increasing necessity of processing specialized texts, domain-specific (or terminological) lexical databases have been built in many languages, especially in wordnet format. Despite the existence of a reasonable number of terminological wordnets in many languages, there is no clear and generic methodology for building them. For Brazilian Portuguese (BP), by the way, there is no domain-specific lexical database in wordnet model. Consequently, we propose: (i) to instantiate a generic NLP methodology for developing terminological wordnets, and (ii) apply it to build a terminological wordnet in BP. Such methodology distinguishes itself by conciliating the linguistic and computational facets of the NLP researches. So, besides the benefits to NLP domain, terminological wordnets may also contribute to the development of terminological/ terminographic products since the organization of lexical-conceptual knowledge is an essential step in building such products.

PorSimples - Simplification of Portuguese Text for Digital Inclusion and Accessibility (coordinator: Sandra M. Aluísio, ICMC-USP)

In PorSimples project we propose the development of a technology to facilitate accessibility to information by the functional illiterates (FI) and potentially by people with other cognitive disabilities (e.g. aphasia or dyslexia). Such technology will be made available by means of two systems aimed to distinct users: an authoring system to help authors to produce simplified texts targeting FI, and a simplification system to allow for FI to read Web content. The latter explores the tasks of summarization and simplification and also text presentation schemes, which should highlight the associations amongst the main ideas of the text, the named entities, semantic roles and lexical elaboration.

PLN-BR - Tools and Resources for Information Retrieval from Textual Bases in Brazilian Portuguese (coordinator: Maria das Graças V. Nunes, ICMC-USP)

This project aimed at the creation of an interinstitutional space for interaction and exchange of research practices in Computational Linguistics for the investigation and development of information representation and retrieval tasks in Brazilian Portuguese language

ProCaCoSa - Coreference Chains Processing for Automatic Summarization of Portuguese texts (coordinator: Lucia H. M. Rino, DC-UFSCar)

This project aims at analyzing and solving summarization problems caused by unresolved coreferences in content selection and structuring during summary production. The general purpose is to use information about the coreference chains in the source text to produce better summaries.

EXPLOSA - EXPLOration of several methods for Automatic Summarization (coordinator: Lucia H. M. Rino, DC-UFSCar)

Fundamental and experimental approaches are tackled by means of a variety of small projects under the EXPLOSA scenario. The former is pursued through discourse-driven text generation; the latter, through extraction-based AS methods.


 

NILC - Interinstitutional Center for Computational Linguistics