HLT Projects Database: SIFT / Summary

HLTCentral - the gateway to speech and language technology opportunities

About HLTCentral

Events

FP6 - Way Forward

Calls

Links

Repository

What's new

Who's Who

Projects

Alphabetical List

Programme List

Group List

Search Projects

SIFT

Selecting Information From Text - SIFT

Project Summary

In future, there will be an ever-increasing reliance on full text documents in electronic form. However, as the amount of available material increases, the task of finding information becomes more and more difficult.

In the domain of technical documentation, a common difficulty is in pinpointing the location of vital facts relating to a specific task. SIFT aims to help users of instruction manuals for PC software.

Every organisation processes documents in electronic form. These must be organised, stored, retrieved and possibly summarised or translated. The best information retrieval systems currently available are based on key words. However, it is widely agreed that this paradigm can not be refined further. Similarly, current approaches to both summarisation and machine-assisted translation are usually based on keywords and string processing. The SIFT project advocates an alternative approach based on meanings.

The project aimed to construct a demonstration intelligent help system for online computer software manuals based on two key ideas: the Vector Space Model of information retrieval on the one hand and the use of distributed patterns to capture the meaning of textual information on the other. The final prototype accepted a user's query in natural language concerning the software and returned a list of pointers into the manual texts indicating where passages answering the query might be found. These were arranged in descending order of relevance to the query, allowing the user to investigate the most promising parts of the text first. The project also served to demonstrate the usefulness of distributed patterns in practical Natural Language Processing systems and their compatibility with existing work on lexical databases and robust lexical parsing.

At the centre of the project was the idea of representing words and utterances using distributed patterns. These were created automatically from concept ontologies and capture meanings in a simple and efficient manner.

Three ontologies were used. The first is the Princeton WordNet. The second was derived from the Longmans Dictionary of Contemporary English by the University of Amsterdam. The third was created manually and handles domain specific terminology.

Parsing was being carried out by a version of PLAIN, developed at the University of Heidelberg. The retrieval prototypes were created using World Wide Web protocols.

Two main prototypes were involved. SIFT-1 uses syntactic category information and semantic patterns to capture utterance meanings. SIFT-2 exploits semantic case relations obtained via robust parsing.

Go to the SIFT web site

Project factsheet optimized for printing

Last updated: 15.01.2001

[ Projects Home | Alphabetical List | Programme List | Group List | Search Projects ]

Please report problems to