HLTCentral HLTCentral HLTCentral
HLTCentral - the gateway to speech and language technology opportunities  
Pathway to market for speech and language technologies
European network of excellence in HLT
EC Programme fostering digital content on the global networks
HLTCentralHomeInternational news items from the pressAbout HLTSitemapSearch
 About HLTCentral
 FP6 - Way Forward
 What's new
 Who's Who
   Alphabetical List
   Programme List
   Group List
   Search Projects


Summary Results Consortium Index

Selecting Information From Text - SIFT

Project Summary

In future, there will be an ever-increasing reliance on full text documents in electronic form. However, as the amount of available material increases, the task of finding information becomes more and more difficult.

In the domain of technical documentation, a common difficulty is in pinpointing the location of vital facts relating to a specific task. SIFT aims to help users of instruction manuals for PC software.

Every organisation processes documents in electronic form. These must be organised, stored, retrieved and possibly summarised or translated. The best information retrieval systems currently available are based on key words. However, it is widely agreed that this paradigm can not be refined further. Similarly, current approaches to both summarisation and machine-assisted translation are usually based on keywords and string processing. The SIFT project advocates an alternative approach based on meanings.

The project aimed to construct a demonstration intelligent help system for online computer software manuals based on two key ideas: the Vector Space Model of information retrieval on the one hand and the use of distributed patterns to capture the meaning of textual information on the other. The final prototype accepted a user's query in natural language concerning the software and returned a list of pointers into the manual texts indicating where passages answering the query might be found. These were arranged in descending order of relevance to the query, allowing the user to investigate the most promising parts of the text first. The project also served to demonstrate the usefulness of distributed patterns in practical Natural Language Processing systems and their compatibility with existing work on lexical databases and robust lexical parsing.

At the centre of the project was the idea of representing words and utterances using distributed patterns. These were created automatically from concept ontologies and capture meanings in a simple and efficient manner.

Three ontologies were used. The first is the Princeton WordNet. The second was derived from the Longmans Dictionary of Contemporary English by the University of Amsterdam. The third was created manually and handles domain specific terminology.

Parsing was being carried out by a version of PLAIN, developed at the University of Heidelberg. The retrieval prototypes were created using World Wide Web protocols.

Two main prototypes were involved. SIFT-1 uses syntactic category information and semantic patterns to capture utterance meanings. SIFT-2 exploits semantic case relations obtained via robust parsing.

Go to the SIFT web site

Print  Project factsheet optimized for printing Last updated: 15.01.2001

[  Projects Home  |  Alphabetical List  |  Programme List  |  Group List  |  Search Projects  ]

Please report problems to