Euromap Euromap Euromap
Euromap - your path to market for speech and language technologies  
 
The gateway to speech and language technology opportunities
European network of excellence in HLT
EC Programme fostering digital content on the global networks
EuromapHomeNewsMembersSitemapSearch
Euromap    
 Articles
 Mission
 Team
 Events
 Success Stories
 Newsletters
 eBusiness
 Tourism
 Tech Transfer
 Reports
   Guide
   First Report
   > Euromap Final 2003
   Event Reports
   Annual Report 2000
 EU - HLT Fact Files
 HLT Policy Pages
 CEEC Info Centre
 Who's Who
 HLT Projects
 
Visit EUROMAP Language Technologies in these countries:

?

?

?

Summary of

Benchmarking HLT progress

in Europe

The EUROMAP Study

?

Rose Lockwood and Andrew Joscelyne

?

Copenhagen 2003

?

?

?

?

?

?

?

The EUROMAP Language Technologies project is supported by the European Commission through the HOPE contract under the IST programme.

? EUROMAP Language Technologies, Center for Sprogteknologi

ISBN 87-90708-10-5


Preface

Human Language Technologies (HLT) enable humans to communicate with computers and to use computers in a more natural way and in their own language, i.e. to participate in the information society in a totally natural way. HLT is particularly important for Europe as no other advanced economic area enjoys a similar cultural and linguistic diversity. The need and ability to use multiple languages in everyday life is an increasingly familiar aspect of business, leisure, government and civil society in the EU and the Candidate Countries. Actually, being able to do business in several languages have become a commercial necessity.

The EUROMAP Language Technologies project has investigated the state-of-the-art of HLT research and take-up in Europe, as well as the background for the present situation in each country. Building on data collections of research centres, suppliers, national research policies, and on market analyses, the European countries have been compared in a benchmark analysis. This analysis shows e.g. that the significant and steady investment made by the authorities in Germany, UK and the Netherlands has paid off - these countries are the European 'Leaders' in HLT. The situation in other countries is described as well and suggestions made for the future development.

The report concludes that a visible presence for European HLT activities should be established, and that it should have a strong relationship to the European Research Area. The goal should be to have a set of robust, stable, multilingual HLT modules, capable of being embedded into emerging IST application environments. A Language Technology Agency should be established to supervise and monitor the transition from national HLT efforts to a truly European technology level of language parity. Infrastructural funds for the provision of language resources and basic language technology modules for all languages should be made available and should be monitored by the Language Technology Agency.

It has been extremely interesting to work with these matters and to see in which way different European countries have tackled the challenge of language in the information society, and what the consequences are. We do hope that the data and the analysis provided, as well as the recommendations, will be taken up by policy makers at the national and European level.

Finally, while acknowledging the support of the European Commission, we should stress that this report is the result of the EUROMAP project, and opinions herein do not necessarily reflect the opinion of the European Commission.

Bente Maegaard
Co-ordinator for EUROMAP Language Technologies

To table of contents

?


Table of Contents

Preface
Table of Contents
What is HLT?
Why is HLT important for Europe?
Background to the EUROMAP Study
State-of-the-Art: the technologies
The HLT market
The HLT research base
Major conclusions
Recommendations
Abbreviations


What is HLT?

The term Human Language Technologies (HLT) covers the group of software components, tools, techniques and applications that process natural, human language. HLT comprises two broad areas: speech processing and natural language processing (or NLP). Speech technology replicates the human ability to hear and utter spoken language, for example in speech recognition applications. NLP technology models the human capacity to comprehend and process the content of human language - i.e. to understand and transform written text. Automatic translation for example is a common application of NLP. The combination of speech and NLP provides powerful technology for improving the interaction between humans and machines, and between humans using machines.

In the last ten years, HLT has grown from a highly specialised, theoretical research topic to a core technology of the Information Society. By contemporary standards - in an environment where cycles of technology innovation have been reduced to months rather than years - the advance of HLT may seem slow, if not glacial. This impression is misleading. Basic research in the field has been underway for 50 years or more. After decades of research, the conditions for exploiting HLT began to emerge in the 1990s, and advances in the use of HLT have grown steadily ever since.

HLT thrives in the conditions that support the information revolution - high levels of relatively affordable computing capacity, and virtually universal connectivity. For decades, the complexity and computational demands of HLT were a barrier to development, and this barrier in turn limited the scope of research. The happy convergence of information society technologies has both provided the infrastructure that can support HLT, and driven the need for exactly the kinds of products and services that HLT can support.

Back to table of contents

?


Why is HLT important for Europe?

HLT plays a unique role in the European Union, due to the unusual cultural conditions that pertain to Europe, both socially and economically. There is no other advanced economic area that enjoys the cultural and linguistic diversity of Europe. The 11 official languages of the EU will grow to more than 20 as the next round of Candidate Countries join the Union. There are dozens of additional languages in common use in the Union, including regional languages (such as Catalan and Basque in Spain), non-official national languages (such as Welsh in the UK), and immigrant languages (such as Urdu in the United Kingdom, Maghrebi Arabic in France and Turkish in Germany).

The ability - and willingness - to use multiple languages in everyday life is an increasingly familiar aspect of business, leisure, government and civil society in the EU. This reflects the aspirations of European citizens to integrate, alongside their deeply held respect for locale. Europeans have become increasingly aware that active support for linguistic diversity protects the rights of all citizens to maintain their own languages - not to the exclusion of others, but as part of the common cultural assets of the Union.

The information revolution, therefore, brings particular challenges to the EU. In an increasingly dense information environment - for citizens and consumers, governments and businesses - language transparency becomes vital. If all citizens across the expanded Union are to participate fully in the information society, the products and services of that society must be available in all their languages. If Europe is to operate successfully as a single market, and if the goals of the eEurope vision are to be achieved, those products and services must be delivered cross-lingually, making it as easy to move across languages as it is across borders.

From another point of view, the language challenge in Europe will almost certainly prove to be an advantage for the EU. As the EUROMAP Study confirms, many of the products and services of the information society will be built on core HLT components. The importance of HLT goes well beyond the obvious, and penetrates into the deepest layers of the Internet and the web, where the ability to process the components of language - coding knowledge and intelligence into the information infrastructure - will be the basis for next-generation technology.

The EU has already established its credentials as the most advanced research location in the HLT field. The very difficulty of developing HLT for many languages gives European researchers and technology developers a natural advantage in one of the most crucial technologies for the next generation of information and communication technology. As a consequence, commitment to the future of HLT in Europe is perhaps most important for the contribution it will make to the strength of the European ICT sector. A study by Booz-Allen & Hamilton, The Competitiveness of Europe?s ICT Markets: The Crisis Amid the Growth (presented at the Ministerial Conference in March, 2000) documents the challenges to Europe?s global competitive position in several key segments of ICT, including software. A recent study by The Conference Board, Productivity, ICT and Service Industries: Europe and the United States, assesses the impact of ICT on productivity. Europe?s productivity gap in ICT-using industries, especially services, is notable. Thus the EU faces competitive challenges in ICT from both supply and demand perspectives.

HLT is a 'small' technology in pure market terms, but its potential impact - on accessibility, innovation, and integration - is significant, and its crucial role in unlocking the potential for eEurope is unchallenged. The EUROMAP Study explores how HLT research fits into the larger picture of advanced technologies for the future of eEurope. It demonstrates the important role of HLT in new paradigms for next-generation ICT, and outlines suggestions for integrating HLT into the emerging European Research Area.

Back to table of contents

?


Background to the EUROMAP Study

EUROMAP Language Technologies is a European Commission supported initiative dedicated to promoting greater awareness and faster take-up of HLT within Europe. Since 1996 the project has served as a central resource and marketing support unit providing information to all communities involved in the language technology field, from researchers and developers to suppliers and users. EUROMAP supports national HLT communities of interest through National Focal Points that provide direct services to their constituencies. Through pan-EU initiatives - such as the hltcentral.org web site and the LangTech Conferences - EUROMAP knits together the various stakeholders in the European HLT community.

The EUROMAP Study brings together the experience gained in more than five years serving Europe?s HLT community. It draws on the resources and knowledge of many experts and practitioners, from every Member State and from a number of New Accession Countries. The EUROMAP network has documented the state of the HLT research community, and tracked the steadily growing number of new companies operating in the HLT field. Through seminars and fieldwork, the network has documented the evolving market for HLT technologies. Through consultation with leading HLT visionaries and co-ordination with the HLT research network ELSNET, EUROMAP has developed a wide view of the many opportunities and challenges in the field.

This study follows on from a report published in 1998 that provided the first pan-EU perspective on this emerging technology, at the beginning of the Fifth Framework Research and Technology Development Programme. EUROMAP has continued to track the progress of HLT, and has developed a prototype benchmarking method that measures progress in the field.

The results of the current study point toward the future of HLT in Europe, and identify policies and practices that have yielded successful outcomes. The study offers suggestions for leveraging the success of previous research investments into the next-generation IST research agenda for Europe.

Back to table of contents

?


State-of-the-Art: the technologies

The EUROMAP project has identified some 300 European companies offering HLT-based products and services. Most of these companies are based in current Member States of the EU, but there is also a small but growing base of companies in Candidate Countries in Eastern Europe. Many of these companies offer combinations of different HLT features and functions, ranging from basic components to advanced solutions based on speech and text processing.

Components and resources

All HLT relies on core language processing components that digitally model or replicate the way humans process language. These components can be based on linguistic rules (such as grammar), on statistical analysis (e.g. to measure the probability that a text or an utterance has a particular meaning), or on a mix of the two. In addition, all HLT techniques need a source of linguistic data as a reference, such as a lexicon (a dictionary coded with grammatical information), or a 'corpus' that provides a large database of the raw material of language, either text or speech. The existence and availability of these basic components provide the baseline for development of HLT. EUROMAP has identified around 120 European companies offering core HLT components and language resources in roughly 25 languages.

Knowledge processing

HLT components can be embedded in a wide range of what are generally called 'knowledge' applications - i.e. products and services that process information using some level of linguistic intelligence. Search engines use HLT components to improve the matching of search terms, e.g. by retrieving different morphological forms of a word, or even synonyms. More advanced applications, such as knowledge mining, can use complex combinations of HLT tools to find, analyse, and create reports on the content of text or document repositories. Increasingly, large companies are developing taxonomies (i.e. structured trees of linguistic concepts) to organise and manage their content assets. EUROMAP has identified around 120 European companies offering HLT-based knowledge processing products and services, in some 25 languages.

Interface and Interaction

Common interface and interaction technologies are usually speech-based. The most familiar uses of speech technology are telephone-based speech recognition systems that eliminate the need for a keypad, commonly used in call centres and telephone transaction systems. Speech recognition systems are also used in dictation systems that bypass the keyboard. On the other hand, speech synthesis (Text-To-Speech) systems are increasingly common for applications such as 'listening to email', after having served for a long time as support for blind people. More advanced applications include voice authentication. Speech systems have gone beyond traditional platforms, and are now embedded in common consumer items, and in telematics systems in cars. EUROMAP has identified around 130 European companies offering HLT-based Interface and Interaction products and services, in some 25 languages.

Cross-linguality

Automatic translation (machine translation, MT) was the earliest NLP application, and remains one of the most technically challenging. Nevertheless a large number of products have been developed for many different language pairs, and free 'gisting' translation is widely available on the web. Aside from MT, cross-lingual applications can overlap with both knowledge and interface applications. A cross-lingual search engine can translate a term in order to search repositories in different languages, retrieve the 'foreign' language text, and provide a rough translation, or even a summary, in the language of the original search term. Several prototype systems exist that provide cross-language speech applications, such as telephone reservation systems that allow people speaking different languages to communicate. EUROMAP has identified around 60 European companies providing cross-lingual products and services in 25 languages.

The Multilingual Semantic Web

The next-generation Internet will embed core linguistic data at the heart of the web. The Semantic Web initiative aims to capture and encode the semantics of all types of digital content, and use that embedded knowledge to enable more predictable levels of interaction between different systems and services. Agent technology armed with semantic knowledge about a user will interact with virtually any electronic system that shares its semantic knowledge of the world. This knowledge will be captured in 'ontologies' - structured sets of concepts with agreed relationships that represent real-world knowledge. European HLT should be capable of sustaining its position as thought-leader in the development of the Multilingual Semantic Web, assuring that all European language communities participate in the development of semantic resources, and that the services that use them are expressed in all the languages of Europe.

Visionary technologies

HLT will be a key embedded technology as next-generation ICT products and services emerge from the lab. Visionary work on information processing is focused on 'ambient intelligence' for 'ubiquitous computing', where knowledge is embedded in devices throughout the environment, responding to human activities in natural modes of interaction. Research on 'e-sense' will model the way all the human senses are processed in the experience of communication. Thus the boundary between 'knowledge' processing and 'interfaces' will blur, and machines will cease to dominate the modality of electronic communication. Machines will interact with humans in a more human way, and humans will interact with humans using machines that are more transparent. New ICT paradigms will process information about human experience, through all the human senses, in the most natural coding and representational system, i.e. language, creating what has been called the 'perceptually aware cross-lingual human interface'.

Back to table of contents

?


The HLT market

The transfer of HLT to market has now been through one full 'crossing the chasm' cycle, moving from 'innovators', to 'early adopters', and into the mainstream ('early majority').

Figure 1: The Chasm

In this first-generation language technology market, innovators either had a uniquely compelling requirement (e.g. the use of machine translation by the European Commission and the US defence), or experimented with component technology in innovative ways (e.g. Reuters? early use of NLP for information retrieval for news resources). Early adopters exploited the increasing maturity of some HLT products for very specialised purposes, e.g. the use of MT for technical publishing by Xerox and Caterpillar, and the introduction of speech recognition in medical transcription systems. HLT has now entered the mainstream, in early majority embedded capabilities. Telephone-based speech recognition is now widely used; most major search engines have embedded HLT components; and millions of web pages are translated automatically every day using MT.

This progress is reflected in market spending for HLT. Datamonitor puts the worldwide speech technology market for 2003 at just short of ?1 Billion. IDC estimates the current NLP market at around ?400 Million. By 2005, the combined speech/NLP market is forecast to exceed ?2 Billion. While these are respectable market opportunities in themselves, they do not reflect the multiplier effect of embedded HLT. The value added to products and services employing HLT creates markets worth many times the value of the core technology itself.

The next market stage for HLT - exploiting language knowledge in more complex and advanced applications - will initiate a new cycle of development. In second-generation HLT, innovators will experiment with new combinations of components and tools, while the mainstream market will wait for proven embedded solutions. It is unlikely that second-generation HLT will produce many standalone 'pure HLT' products; instead, language technology will be incorporated into other applications, creating innovative features or superior performance to provide differentiation for mainstream products and services. This likely shape of the future HLT market should direct future language technology research, which will need to be carried out in the context of advanced research in companion, or hosting, technologies.

Back to table of contents

?


The HLT research base

Until now, the language technology community in Europe has managed to remain competitive against strong HLT research initiatives in the US and Japan, as well as the growing levels of R&D in other parts of Asia (especially translation technology in China, Korea, and India). Indeed, HLT is one of the few areas of software research where European research is clearly world class.

Importance of EU support

Language technology research has been supported for many years within EC Framework Research Programmes, and the timing and structure of that support has been well suited to the needs of the HLT domain. Up to the mid-1990s the research programmes had a 'technology push' focus that was very effective for HLT, as market conditions were not yet favourable. Funding for language engineering in FP4, and the HLT action in FP5, has been more market-focused, and has tracked the evolution of market opportunities for language technology very closely.

EU funding has been particularly important for the HLT domain, and has been largely responsible for the creation of a coherent research community in Europe. Industry-sponsored research in HLT has been weak, though stronger in speech than in NLP. Moreover, national-level public support for HLT research has been highly variable, and with a few notable exceptions, somewhat inconsistent. EU funding for machine translation research produced a network of NLP researchers across the EC, and spawned a variety of research efforts in different languages, as well as an established academic base for MT experts. In addition, the tendency to fund a larger number of smaller projects (compared to the practice in the US and Japan) has had the effect of broadening the technical base across the Union; at the same time the structure of FP projects, requiring cross-border collaboration, has created a genuinely pan-European research base.

EU funding has, in addition, had a significant impact on technology transfer in the HLT field, though not always a direct one. The number of suppliers active in the HLT market has expanded exponentially in the last decade, from fewer than 30 companies in 1993, to 10 times that many in 2003. Almost all these European suppliers have some roots in EU-funded programmes, either through technology inherited (often through several generations) directly from projects, or through the technical capacities of experts who have been involved in projects.

Benchmarking HLT performance

This EUROMAP Study is based on a benchmarking analysis of the opportunities and achievements of the HLT research effort in Europe. The analysis compared Member States, and created indexes for two broad measures: the robustness of the opportunity to exploit HLT (the 'Opportunity Index'), and the prospects for and success of HLT research and technology transfer (the 'HLT Benchmark').

Factors measured for the Opportunity Index were based on third-party research that rates conditions such as the general environment for research innovation; supply-side factors including ease of business formation, access to key channels (as defined by the EUROMAP study) for HLT, and ability to adopt innovation; and demand-side factors including trade competitiveness, ICT infrastructure, and capacity to absorb innovation. The factors were then weighted to reflect a judgment of their relative significance as a potential success factor for HLT.

Factors measured for the HLT Benchmark were based on EUROMAP desk research and fieldwork. They included depth and breadth of HLT research (in both speech and NLP); funding commitments by both the public-sector and industry; and the breadth of language coverage in research and products (considering both the number and choice of languages processed, and coverage of low-density or minority languages). The measurement of research depth considered whether core HLT components have been fully developed, and also the extent to which more advanced applications are the subject of research or technology development projects.

Figure 2: Comparison HLT/Opportunity

The Opportunity Index was then mapped against the HLT Benchmark to create the 'HLT Scorecard' - a summary measure that captures the relationship between the two. There was a notably strong correlation between the Opportunity Index and the HLT Benchmark. In general, countries with the most favourable business environment and the most highly developed infrastructure also have the most successful HLT research efforts.

Member States Scorecard

The 'Leaders' include Germany, the Netherlands and the UK. Each of these countries has enjoyed strong national commitment to HLT research. Germany, which scored highest on the HLT Benchmark, has had consistent, long-term effective national investment in HLT from both the public and private sector ever since the SPICOS project in 1985. The Leaders are judged to be 'market ready' for advanced HLT research.

A 'Strong Potential' group who scored near or below average on the Opportunity Index, but above average on the HLT Benchmark, includes France, Belgium and Spain. France would have clustered with Leaders on the HLT measure, but scores significantly lower on business opportunity environment measures. These countries have well-developed research communities, and a significant depth of HLT research, so they are in a strong position to exploit HLT as opportunity factors improve, e.g. as rates of Internet use rise and greater support for business creation is forthcoming.

A third group show 'Promising', with Ireland and Denmark ranked near average on both scores, just behind Sweden which scored highest on Opportunity factors and Finland which is above average on HLT. While all these countries stand more or less at the EU median, with comparable performances in both ?first generation? HLT R&D and transferring results to the marketplace, they need to boost both their HLT research investment and also improve their technology transfer record if they wish to aim for next generation standards.

Finally, there is a group of four countries (Greece, Italy, Portugal and Austria) which have reached the 'Structural Limits' of their existing HLT market situation, and require a new approach to catch up with the leaders. They all scored below average on both measures, though with different profiles. Both Greece and Portugal scored low on Opportunity factors, though Greece scored higher on HLT measures, due to its strong R&D base. Both these countries may need to look beyond their borders for opportunities to exploit their HLT research, and will benefit from enhanced EU collaboration. Portugal, in particular, could improve its research opportunities with more cross-border collaboration. Italy has a

Figure 3: The HLT Scorecard

slightly stronger research base than most of its fellow countries, but with Austria is pulled down to the average on Opportunity measures. Austria has the advantage of sharing a language with the leading HLT research country, but this very fact might also act as a disincentive when it comes to expanding its own HLT activities.

Back to table of contents

?


Major conclusions

HLT research and development

For obvious reasons, HLT research has historically evolved with a national R&D bias towards the native language(s) of the national research communities. While this was essential in early HLT research, it is increasingly common to find a multi-language focus, especially in the more successful research departments and labs. This is a healthy development, and should help overcome inappropriate biases about 'ownership' of HLT for a particular language. As the HLT research community in Europe becomes ever more integrated, language expertise migrates across the whole of the EU, while naturally retaining its roots in national language communities. It is essential that language technology expertise and linguistic expertise be free to migrate and integrate across the EU research community.

HLT research and development is a long, complex process that needs substantial public support. The necessary training, resource, tool and technology development cannot be assured by market forces alone. Europe?s success in the HLT field has been built on public funding, in the universities, national research institutes, and in funded projects. It is unlikely that the field can advance effectively - especially to bring all languages to the same level of sophistication, and incorporating the new languages of the expanded Union, without continued public investment on a significant scale.

Consistent and long-term funding of HLT research at the national level has paid off handsomely, and has contributed significantly to the strong national research base in Germany, France and the UK. It is unlikely, however, that all Member States, especially in the expanded Union, will be able to support programmes at the level of the more technologically advanced members (including the Netherlands and Finland, as well as other Nordic countries). Consequently, the structure of EU funding will need to accommodate variations in the level of national support.

While national programmes in key Member States have been crucial in building core capabilities in HLT (as a complement to EU programmes), they have by no means been 'one size fits all'. National approaches to HLT research have mirrored local priorities and structures. In Germany, for example, large comprehensive programmes with a single focus (e.g. Verbmobil) linked industry to the research community in a very structured way. In France, HLT research was closely linked to (then) national laboratories (e.g. France Telecom). In the UK a relatively early Speech and Language Technology Programme solidified a strong network of national researchers, kick-starting market transfer at around the beginning of deregulation of the telecoms industry. This suggests that a truly 'European' approach to future HLT research will need to be adaptable, variable, and able to adjust to the different environmental conditions of Member States.

While research activities funded under the HLT-specific actions of the EU Framework Research Programmes are relatively visible both inside and outside the research community, the resulting picture is nevertheless incomplete. For example, there is as yet no coherent, transparent view of the considerable language-related R&D in other areas of IST research (for example in the area of Digital Libraries at ERCIM, or of Fraud Prevention at the JRC), nor of the important if structurally quite varied national programmes (for example in France, Italy, Lithuania and Estonia), nor commercially funded research (especially in the field of in-vehicle speech technology applications in Sweden?s Telematics Valley, for example, or in controlled language applications in the aeronautics and vehicle documentation sectors). This lack of a coherent and comprehensive overview is likely to become even more extreme as the Sixth Framework Programme gradually implements a policy of embedding previously ?stand-alone? HLT activities into its more mainstream IST research. Without a clear map with which to identify the patterns of ongoing R&D actions, there is a risk not only of unnecessary duplication of effort, but also of making it harder for the investment community to contribute effectively to the process of transferring technology to the marketplace.

The research status of the languages of Europe is highly variable. A few languages (English, German, French) are well served, enabling the emergence of more advanced research topics and market applications. Some of the less 'dense' languages (measured in numbers of speakers) are not even fully enabled for full exploitation of first-generation HLT applications. This means that there must be further public investment to bring all languages to a relatively equal status, at a baseline level, since this is an absolute prerequisite for future development of advanced ISTs capable of serving all European citizens equally.

At the same time, HLT research has, for several years, been moving steadily toward 'engineering' and away from theoretical research. Even apparent theoretical shifts (e.g. statistical and data-driven, as opposed to rule-based, NLP) are more like natural hybrids than true changes of paradigm. While this is a natural cycle, it is likely that the field will be refreshed by substantial re-thinking of its basic assumptions; accommodating this level of basic theoretical work (as opposed to back-filling baseline R&D for 'new' EU languages) should therefore be on the agenda for next-generation HLT research. It seems quite plausible that new theoretical approaches will arise from cross-fertilisation with other technical computing and engineering disciplines, which further emphasises the benefits of incorporating advanced HLT components into the mainstream FP6 research agenda.

But to ensure that Europe does not evolve into a two-speed culture for language technology, with one well-funded half of the HLT R&D agenda focused on embedding advanced systems for just a few of the more ?strategic? languages, while the other half attempts to ensure baseline coverage for the lower density or less ?strategic? languages, it is imperative that there be some form of autonomous ?language technology agency? whose task would be to sustain an appropriate degree of autonomy for the HLT field (especially in the critical area of baseline language components and resources), independently of HLT?s ultimate technological destiny of becoming an embedded component of the information society infrastructure.

Market transfer

So far, there has been no direct link between robustness of the HLT research effort in any particular language community, and actual effectiveness of transfer to market. There is of course a clear split between examples of successful language technology transfer for high-density languages (especially English, German and French), and transfer for low-density languages, which is clearly due to the commercial potential of larger markets where high-density languages are spoken. There is however a notable exception to this in European Spanish, where the research effort is still quite diffuse, partly because of national support for a number of 'regional' languages, all of which have official status.

Another special case is Italian, globally less commonly spoken than Spanish, but high-density in Europe, but which is comparatively weak in HLT transfer, no doubt due to specific conditions in the business environment and technical infrastructure. Italy has a long and powerful tradition of HLT research, going right back to the beginnings of computational linguistics in the 1950s, and it is clear that its current position is due more to commercial ?timing? than to any inherent technology weakness.

By contrast, the relatively strong research community in both Finland and the Netherlands, where the business environment and infrastructure are among the strongest in Europe, has nevertheless transferred less technology, especially higher-end tools and products, than might have been expected. The conclusion is that transfer of HLT to market is influenced by three strong factors: size of the linguistic community, business environment & infrastructure, and sharpness of research focus. Since the ?cost? of technologising a language is ultimately the same whether it is spoken by a population of 2 million or 200 million, it is now becoming clear that there need be no necessary link between language-specific research, technology development and market-transfer activities and the specific geographies where a language is spoken.

One effect of a truly European-wide, as opposed to country-based, marketplace for research and development, for example, may well be to encourage the creation of centres of best practice in language technology development, so what turn out to be the ?best? HLT architectures are chosen as the optimum development environments for any ?national? language, wherever it may be spoken and written.

What is clearly needed in a truly interactive European information society is language parity at all levels, both inwards and outwards, to use an analogy from investment. Until now, there has been a natural yet constraining tendency to develop language technologies for transfer of information into the national language. In practice, of course, I need access in my language to content and interaction in your language, as much as you need my content and interaction accessible in your language. Enabling such language parity at a technology level and ensuring its commercialisation will only be achieved by setting up a comprehensive multi-language infrastructure, which to a far greater degree than is true today would delink language processing from R&D geographies. Achieving such parity would be a critical item on the agenda of any eventual ?European language technology agency?.

Policy priorities

It is widely acknowledged that if Europe is to become a leading knowledge- and technology-based economy, fulfilling the objectives of eEurope, the momentum of innovation and R&D progress must be increased. HLT should continue to be promoted as a key technology advantage for Europe.

Market opportunities for HLT correlate strongly with 'environmental' opportunities such as the state of the ICT infrastructure, the strength of existing local or national ICT markets, readiness to accept new products and services by consumers and businesses, and the availability of channels to market (products as well as services) in which HLT capabilities must be embedded. The close correlation between market opportunities and strong HLT research suggests that in this, as in other areas of IST research, the business environment and technical infrastructure cannot be ignored if Europe is fully to exploit its potential in this important technological domain.

The 'information highway' analogy is a powerful one for HLT research, since language-enabling will literally eliminate barriers to communication across the networks of the Union, permitting the free flow of information, and the services and facilities based on information. From this perspective, the HLT infrastructure should have the same status and priority as the physical infrastructure that permits the free flow of goods and people in the Union.

Back to table of contents

?


Recommendations

HLT in the ERA

Establish a concrete and visible presence for HLT activities in the European Research Area: The goal should be to have a set of robust, rich, stable, multilingual, 'autonomic' HLT modules, capable of being embedded into emerging IST operating environments. This is most likely to be achieved if HLT research is both a priority within the IST components of the ERA (cf. the contribution that research into interfaces, cognitive processes, interaction, knowledge technologies and semantics will make to IST research in FP6) and treated as a companion technology for an innovative research agenda. A baseline objective should be to have such modules for all present and future EU languages, and to assure that components and resources for low-density languages are back-filled as a matter of priority.

Structures for Visioneering in HLT

Establish a ?language technology agency? to collectively supervise the gradual transition from the national HLT efforts to a truly European technology level of language parity, and to federate and rapidly circulate best practices at all levels of HLT R&D. A plausible first stage would be to create a LangTech Observatory for HLT, which would bring several advantages to the European research community:

  • Research tracking to reduce or eliminate duplication of effort, provide more open access to exploitation opportunities, provide guidance in setting the HLT research agenda.
  • Promoting the inclusion of HLT in all relevant European research efforts by making the field more visible to researchers in other domains.
  • Shifting the focus from purely geographically defined national data, to a more 'language-oriented' observatory function, by transferring European best practices to the national level.
  • Providing reliable data for EU innovation tracking, at policy level.

HLT Infrastructural Funds

The equivalent of 'linguistic infrastructural funds' would be an appropriate investment to support languages that lack a strong core of components and resources, and are lagging in the move to next-generation embedded HLT applications.

Research planners should consider disengaging 'language' from 'geography', and support linguistic infrastructure research and development wherever it is most likely to succeed. This should involve cross-border collaborations between strong HLT research locations, and geographical locations where less-technologically-developed languages are spoken (especially in New Accession Countries). This would also benefit language communities with relatively strong HLT research, but weaker local opportunities for exploitation.

Digital Language Infrastructure

While language 'ownership' should ultimately be operationally disengaged from geography as a matter of funding principle, this will clearly take some time. Meanwhile, there will still be a major role for national HLT 'agencies' or sponsors, such as the 'digital language infrastructure' being developed by Nederlandse Taalunie for Dutch-Flemish (in joint programmes between Netherlands and Flanders), and in similar initiatives under the French TechnoLangue programme. This could form one mechanism to assure that all European languages are adequately supplied with core resources and components - or at least that missing elements are identified.

The role of the EC (via the proposed ?language technology agency?) should be to support the collective definition of what constitutes a core 'language kit' without which HLT development cannot advance, promote the development of open-source platforms for developing and implementing such kits, as well as initiating the process of setting standards for interoperability between language components, and between language components and application environments. This would include defining and agreeing on requirements of formal and content quality, availability (free of ownership rights or under certain conditions), multi-functionality and reusability.

During an initial phase, the Taalunie experience should be considered as a model that can be expanded to all European languages, initiating a process that could result in a pan-European network of structures to sponsor HLT for specific languages, with concrete benefits for technology transfer. Taalunie has estimated the cost of the agency to be in the range of ?500,000 per year for Dutch-Flemish. At this order of magnitude, the Union could fund ongoing support for core HLT 'language kits' for 20 languages at a cost of ?10M/year - a relatively modest sum in relation to current spending on language services in Europe, in what would be in effect a market-priming programme. The goal should be an 'open source' approach to the evolution of a digital language infrastructure for Europe; this could converge with other open-source software initiatives within the e-government agenda. It could also have a significant impact on near-term development and launch of HLT-based products and services in a much larger set of European languages than currently exists.

All of these infrastructural measures would be supervised by the proposed ?language technology agency?, whose justification, status and composition would be subject to the broadest possible consultation. This would enable Europe?s fundamental language technology agenda to gain progressive independence from the specific foci of Framework Programmes as such, and achieve continuity of action and impact over and above the specifically project-based approach favoured until now.

?

Back to table of contents


Abbreviations

ERA European Research Area

ERCIM European Research Consortium for Informatics and Mathematics

FP6 The Sixth EU Framework Programme for Research and Technological Development

HLT Human Language Technologies

ICT Information and Communication Technologies

IST Information Society Technologies, part of FP5 and FP6

JRC Joint Research Centre

R&D Research and Development

?

Back to table of contents


go back top.gif  Version for printing Last updated: 4/10/03 9:25 PM

Please report problems to