Semantic Business: 2010

[Originally published 12/10/09]

The Calais Initiative

Company: Thomson Reuters, Inc.

URL: http://www.opencalais.com/

HQ: New York, USA

Products (Primary): The Calais Initiative

Survey Respondents: Tom Tague, Krista Thomas

Vendor Category: NLP

Employees: 50,000

Revenue: US$13.94 Billion

Calais installed base: 7,000 developers, 2,000,000 pieces of content processed per day

Primary Offering:

The Calais Initiative (Calais) comprises several tools for processing text, but the core product is a Natural Language Processing (NLP) engine. When presented with a body of text, the Calais Web service returns the “named entities” (the categories to which the document’s key terms and concepts are assigned), facts, and events it discovers within the document. The relationships between these items are also identified and embedded in the results. Essentially, the results are the Semantic metadata of the document and can be thought of as the document’s “knowledge content,” which can be published and made available for searching and navigation.

On its own, and applied to one or two small, short documents, this might not seem terribly valuable. But deployed on the Web and made available as a free service, Calais is in a position to process massive amounts of data (text, quantitative, graphic, etc.) and extract their knowledge content. Once this task is complete, this content can be searched individually or combined with other similar content and searched in a larger context. This larger context can be based on other Web content, proprietary Thomson Reuters content, a combination of the two or the context of select data sources that may address a specific area of interest.

Ultimately, Calais’s goal is to be the world’s best tool for extracting the structure of any kind of content, recognizing its type, the concepts that are contained, their relationships, and doing so not just within a single file, but across a span of files that could be as large as the Web itself.

Key Differentiators:

Demand from large organizations, including well established publishers, has grown at an unexpectedly high rate. This has led Thomson Reuters to introduce three contract-based versions of Calais in addition to the original free service:

Calais Professional - same as the free service but now backed by an SLA and with higher transaction limits.
Calais Professional for Publishers - Calais Professional tailored to meet the needs of large scale publishers and tied to an annual contract.
ClearForest On-Premise Solutions - ClearForest is the original name of the technology that makes Calais work. Now that it's available as a stand alone application, enterprises will be able to closely tailor the service to their needs, ensure the privacy of their proprietary content and also have access to what's under the hood for even further customization.

Thomson Reuters is another key differentiator – the fact that Calais is sponsored by a global information giant suggests that this entrant will be with us for a long time. Furthermore, at this time Calais is in the final stages of testing its “infinite scalability” initiative, (e.g., cloud computing) designed to address growth in demand and/or spikes in utilization.

Another distinguishing characteristic is the rate at which the service has been adopted (the fact that it’s free is worth repeating). The net effect has been to discard the original projections for usage because demand has so vastly exceeded expectations. Note that until very recently, dema

nd for Calais has existed almost entirely outside of any Thomson Reuters media property. This state of affairs is changing rapidly, with internal inquiries arriving with greater frequency.

Deploying Calais against the vast, professionally developed and controlled content in the Thomson Reuters empire would be a remarkable step in the company’s evolution. After 150 years as a traditional news wire service and publisher, Thomson Reuters’ content could quickly become something not yet fully defined, but possibly far more powerful and useful than what traditional publishers have offered before.

Six/Twelve Month Plans:

In January ’09, Calais is scheduled to launch Release 4, which will open the door to the world of “Linked Data,” a critical step toward fulfilling the promise of the Semantic Web. Essentially, URIs (Uniform Resource Identifiers) allow for the linking of individual data elements, a concept that goes much further than linking containers like files, pages, documents, or databases as we’re accustomed to on the WWW. The Semantic Web term for each pointer that leads to a datum is “dereferenceable URI”.

Wikipedia does a nice job of explaining references and their consequent dereferencing by using house addresses and houses. In this case, a house address is the reference, or pointer. Using this pointer and finding the actual house is the same as dereferencing the address.

In Calais’s case, after extracting the entities (e.g., people, places, companies, etc.) from your content you could then link to (or retrieve for processing by an application) relevant data on DBpedia, The CIA World Fact Book, Freebase, or a rapidly growing number of other compatible data sources. If you’re a talented content producer, the additional leverage that comes from linking to these “external” data could make your offering substantially more useful and in turn, much more valuable.

Let’s build on the example above, where the entities in an original document have been linked to data residing on DBpedia and The CIA World Fact Book. The idea is that the entities extracted from each source can be linked manually, through search results, or as a result of processing by an application. Simply knowing that these entities have an association can be valuable, but the key is that the URI provides a pointer to the specific data – not the file, not the document, and not the database, but to the actual datum, value, or record that’s stored in one of these containers. There’s no longer a need to call an entire file or database, read it to find what you’re looking for and then put it to use. Instead, you call just what you need – the specific data that matter to you.

This process is faster (read: cheaper in computer processing terms) and those URIs you’ve amassed can be reused by other people and applications because these pointers are durable and they persist – if the data remain in place, then each datum will keep the same individual URI (again: cheaper, highly reliable, and standardized to ensure universal access and use). It’s simply easier to exchange pointers to specific data (dereferenceable URIs) than it is to exchange potentially huge data files or documents.

Once documents and information assets are connected to the Linked Data cloud, deep connections can be made between the entities, facts and events therein. This can, for instance, enable the resolution of complex queries, such as: “Which company boards of directors include CEOs that have been involved in the sup-prime mortgage meltdown?”.

The diagram (not the datasets) is CC-BY-SA licensed. Email comments to Richard Cyganiak at richard@cyganiak.de

Analysis:

Let’s start with the premise that Thomson Reuters has 150 years of experience creating, managing, and presenting content that people want. Over this period, the company has amassed a body of high quality content that’s possibly the largest in the world. This content will continue to grow, but the advent of the Web has unleashed a torrent of content on a genuinely planetary scale. Since this content is outside Thomson Reuters editorial and/or production controls, the company considers it to be “wild” content. This doesn’t mean it’s bad – some of it’s exceedingly good.

Based on the environmental factors below, Calais puts Thomson Reuters in a position to extend its core competencies to include content it controls as well as wild content because:

The fundamental nature of publishing and using content is changing.
“World Wild Content” will dwarf the content Thomson Reuters controls.
Professionally produced content will continue to merit a premium.
The Open Access movement and similar efforts by academics, researchers, and other content authors seeking to retain control of their work will continue and grow.
Thomson Reuters has extensive experience in every aspect of the content industry.
Flexible integration/interoperation of different types of content may provide powerful added value.

Calais is a free service that stands to significantly benefit people and organizations around the world. The terms of use may vary to allow Calais rights to utilize the content’s metadata or not, but unless you’re a major publisher, this won’t be much of an issue. What matters, at least to Thomson Reuters, is that Calais is a very concrete step toward organizing and integrating the vast span of wild content with its own high quality content. Offering customers your own content combined with the very best of free, Web-based content in an easily searched, highly flexible and exceptionally expansive product is a strong competitive advantage that may ensure another 150 years of operation. This is the strategic thrust of The Calais Initiative.

Company Profile: IYOUIT

Company: IYOUIT URL: http://www.iyouit.eu/portal/ HQ: Munich, Germany Products (Primary): IYOUIT Survey Respondents: Matthias Wagner Vendor Category: R&D Project

Employees: -- Revenue: -- Installed base: --

Primary Offering:

The only reason IYOUIT isn’t a runaway global success is because it’s still a research project supported by NTT DOCOMO and the Telematica Instituut.

IYOUIT is a very deliberate effort to explore the use of Semantic Web (SW) technology in mobile environments. IYOUIT integrates a wide range of services such as GPS location, location-based points of interest, picture sharing, local weather, messaging, and more. Some of these data are user generated, while other data are generated automatically, and the application goes even further by connecting to services like Flickr and Twitter. Furthermore, the rich mobile experience is complemented by a Web site (https://www.iyouit.eu/portal/) that displays real time updates from IYOUIT users around the world.

The IYOUIT client is made to run on mobile phones that use the S60 operating system, which means just about any high end phone made by Nokia, LG, or Samsung, along with a few models made by Lenovo and Panasonic. The client is lightweight and its interaction with the network has been tuned to minimize the amount of data passed back and forth. This decision was made deliberately to reduce the impact on subscription plans that charge based on device throughput. Processing demands at the device level have been calibrated to reduce overhead while reasoning, ontology management, and processing-intensive functions occur on the network.

Launched June, 2008, IYOUIT’s user base is still small as these things go – in December, 2008 the project has roughly 1,000 users distributed across 50 countries, with most users concentrated in Europe.

Key Differentiators:

When you’re one of a kind, it’s difficult to contrast with existing products, but some fundamental (and remarkable) qualities include the fact that this application works, it’s available for download right now, it genuinely uses SW technology and it’s made for mobile devices. IYOUIT seamlessly combines the mobile experience with context based enhancements delivered by the network and users can even set “triggers” to be alerted when specified conditions are met, e.g., while you’re at your favorite coffee shop you can be alerted when one of your IYOUIT buddies arrives.

Six/Twelve Month Plans:

As a research project, IYOUIT serves as a learning environment and isn’t necessarily tied to commercial delivery schedules. Nonetheless, the team behind IYOUIT certainly has plans and one that could be discussed is the creation of a developer connection. If this effort succeeds, it’s easy to imagine the creation of more applications and in turn, growth in the user base. In fact, the IYOUIT team is counting on open participation and they’re looking forward to new discoveries.

Analysis:

IYOUIT is much more than an intriguing mobile SW application, so let’s broaden our context (fitting, isn’t it?).

While the application is presently geared for relatively high-end phones, all those phones use the S60 operating system originally created by Nokia.
Presently, Nokia holds about 40% of the global mobile device market and even if this figure is adjusted to reflect just the higher end of Nokia’s product line, that’s still a lot of phones.
Samsung, LG, and others combine to increase the potential user base even further.
Nokia has a history of SW research and development that dates back to roughly 1996 and equally, the company has a long history of participating in the open source community.
Nokia’s recent SW research seems to focus on the creation of application development tools (http://research.nokia.com/research/projects/), which would play into the promise of IYOUIT very nicely.
Nokia’s stated corporate strategy is based on its device business, mobile content, and network infrastructure. Offerings like IYOUIT could be a big win for NTT DOCOMO, Nokia, and just about anyone else who can get involved.
NTT DOCOMO is based in Japan and while it’s a cliché at this point, the Asian countries are probably still well ahead of the rest of the world when it comes to developing, deploying, and using mobile technology.

Put these factors together and IYOUIT begins to look like the tip of an iceberg – one that will mean big wins for NTT DOCOMO and other global companies and likely, big wins for innovative startups that create valuable products and services for an environment that’s increasingly ready-made to receive them. Wow!

Semantic Business

Wednesday, March 10, 2010

Updated Profile: Calais

Thursday, January 7, 2010

Company Profile: IYOUIT

Company Profile: IYOUIT

Blog Archive

Semantic Jobs

Recent Jobs

Subscribe Now: standardSmall