Wednesday, December 10, 2008

Changing Blog Sites

I'm now using TypePad and all past and future posts will be found here: http://davidjprovost.typepad.com/my_weblog/
>
In the near future I'll forward my domain http://www.davidprovost.com/ to the URL above as a further step toward consolidating my Web presence. 
>
Thanks for your patience,
>
David

Updated Profile: Calais

The Calais Initiative
>
Company: Thomson Reuters, Inc.
URL: http://www.opencalais.com/
HQ: New York, USA
Products (Primary): The Calais Initiative
Survey Respondents: Tom Tague, Krista Thomas
Vendor Category: NLP
>
Employees: 50,000
Revenue: US$13.94 Billion
Calais installed base: 7,000 developers, 2,000,000 pieces of content processed per day
>
Primary Offering:
>
The Calais Initiative (Calais) comprises several tools for processing text, but the core product is a Natural Language Processing (NLP) engine. When presented with a body of text, the Calais Web service returns the “named entities” (the categories to which the document’s key terms and concepts are assigned), facts, and events it discovers within the document. The relationships between these items are also identified and embedded in the results. Essentially, the results are the Semantic metadata of the document and can be thought of as the document’s “knowledge content,” which can be published and made available for searching and navigation.
>
On its own, and applied to one or two small, short documents, this might not seem terribly valuable. But deployed on the Web and made available as a free service, Calais is in a position to process massive amounts of data (text, quantitative, graphic, etc.) and extract their knowledge content. Once this task is complete, this content can be searched individually or combined with other similar content and searched in a larger context. This larger context can be based on other Web content, proprietary Thomson Reuters content, a combination of the two or the context of select data sources that may address a specific area of interest.
>
Ultimately, Calais’s goal is to be the world’s best tool for extracting the structure of any kind of content, recognizing its type, the concepts that are contained, their relationships, and doing so not just within a single file, but across a span of files that could be as large as the Web itself. 
>
Key Differentiators:
>
Demand from large organizations, including well established publishers, has grown at an unexpectedly high rate. This has led Thomson Reuters to introduce three contract-based versions of Calais in addition to the original free service:
  • Calais Professional - same as the free service but now backed by an SLA and with higher transaction limits.
  • Calais Professional for Publishers - Calais Professional tailored to meet the needs of large scale publishers and tied to an annual contract.
  • ClearForest On-Premise Solutions - ClearForest is the original name of the technology that makes Calais work. Now that it's available as a stand alone application, enterprises will be able to closely tailor the service to their needs, ensure the privacy of their proprietary content and also have access to what's under the hood for even further customization.
Thomson Reuters is another key differentiator – the fact that Calais is sponsored by a global information giant suggests that this entrant will be with us for a long time. Furthermore, at this time Calais is in the final stages of testing its “infinite scalability” initiative, (e.g., cloud computing) designed to address growth in demand and/or spikes in utilization. 
>
Another distinguishing characteristic is the rate at which the service has been adopted (the fact that it’s free is worth repeating). The net effect has been to discard the original projections for usage because demand has so vastly exceeded expectations. Note that until very recently, dema
nd for Calais has existed almost entirely outside of any Thomson Reuters media property. This state of affairs is changing rapidly, with internal inquiries arriving with greater frequency. 
>
Deploying Calais against the vast, professionally developed and controlled content in the Thomson Reuters empire would be a remarkable step in the company’s evolution. After 150 years as a traditional news wire service and publisher, Thomson Reuters’ content could quickly become something not yet fully defined, but possibly far more powerful and useful than what traditional publishers have offered before.
>
Six/Twelve Month Plans:
>
In January ’09, Calais is scheduled to launch Release 4, which will open the door to the world of “Linked Data,” a critical step toward fulfilling the promise of the Semantic Web. Essentially, URIs (Uniform Resource Identifiers) allow for the linking of individual data elements, a concept that goes much further than linking containers like files, pages, documents, or databases as we’re accustomed to on the WWW. The Semantic Web term for each pointer that leads to a datum is “dereferenceable URI”. 
>
Wikipedia does a nice job of explaining references and their consequent dereferencing by using house addresses and houses. In this case, a house address is the reference, or pointer. Using this pointer and finding the actual house is the same as dereferencing the address.
>
In Calais’s case, after extracting the entities (e.g., people, places, companies, etc.) from your content you could then link to (or retrieve for processing by an application) relevant data on DBpedia, The CIA World Fact Book, Freebase, or a rapidly growing number of other compatible data sources. If you’re a talented content producer, the additional leverage that comes from linking to these “external” data could make your offering substantially more useful and in turn, much more valuable.
>
Let’s build on the example above, where the entities in an original document have been linked to data residing on DBpedia and The CIA World Fact Book. The idea is that the entities extracted from each source can be linked manually, through search results, or as a result of processing by an application. Simply knowing that these entities have an association can be valuable, but the key is that the URI provides a pointer to the specific data – not the file, not the document, and not the database, but to the actual datum, value, or record that’s stored in one of these containers. There’s no longer a need to call an entire file or database, read it to find what you’re looking for and then put it to use. Instead, you call just what you need – the specific data that matter to you. 
>
This process is faster (read: cheaper in computer processing terms) and those URIs you’ve amassed can be reused by other people and applications because these pointers are durable and they persist – if the data remain in place, then each datum will keep the same individual URI (again: cheaper, highly reliable, and standardized to ensure universal access and use). It’s simply easier to exchange pointers to specific data (dereferenceable URIs) than it is to exchange potentially huge data files or documents.
>
Once documents and information assets are connected to the Linked Data cloud, deep connections can be made between the entities, facts and events therein.  This can, for instance, enable the resolution of complex queries, such as: “Which company boards of directors include CEOs that have been involved in the sup-prime mortgage meltdown?”.
>
The diagram (not the datasets) is CC-BY-SA licensed. Email comments to Richard Cyganiak at richard@cyganiak.de 
>
Analysis:
>
Let’s start with the premise that Thomson Reuters has 150 years of experience creating, managing, and presenting content that people want. Over this period, the company has amassed a body of high quality content that’s possibly the largest in the world. This content will continue to grow, but the advent of the Web has unleashed a torrent of content on a genuinely planetary scale. Since this content is outside Thomson Reuters editorial and/or production controls, the company considers it to be “wild” content. This doesn’t mean it’s bad – some of it’s exceedingly good.
>
Based on the environmental factors below, Calais puts Thomson Reuters in a position to extend its core competencies to include content it controls as well as wild content because:
  • The fundamental nature of publishing and using content is changing.
  • “World Wild Content” will dwarf the content Thomson Reuters controls.
  • Professionally produced content will continue to merit a premium.
  • The Open Access movement and similar efforts by academics, researchers, and other content authors seeking to retain control of their work will continue and grow.
  • Thomson Reuters has extensive experience in every aspect of the content industry.
  • Flexible integration/interoperation of different types of content may provide powerful added value. 
Calais is a free service that stands to significantly benefit people and organizations around the world. The terms of use may vary to allow Calais rights to utilize the content’s metadata or not, but unless you’re a major publisher, this won’t be much of an issue. What matters, at least to Thomson Reuters, is that Calais is a very concrete step toward organizing and integrating the vast span of wild content with its own high quality content. Offering customers your own content combined with the very best of free, Web-based content in an easily searched, highly flexible and exceptionally expansive product is a strong competitive advantage that may ensure another 150 years of operation. This is the strategic thrust of The Calais Initiative.

Wednesday, December 3, 2008

Changes At The Semantic Business Blog

If you're a regular reader of Semantic Business you'll recognize that things have been changing - I've added a jobs feed, some of the formatting is slightly off (to my chagrin), and I'm planning to start serving ads in a relatively unobtrusive way. All of this is a precursor to my move over to TypePad (preview here), which looks much better suited to my needs going forward. Please bear with me during this time - this isn't a random decision and I'm aware that links are likely to be broken, things may be lost (hopefully just temporarily), and I may spill my coffee. Or not.
I'm also planning to forward my domain www.davidprovost.com and consolidate my online presence. We'll see how this goes...

Bet On Nokia & S60, Not Google & Android

The excitement of Android's launch has died down a bit so this seems like a good time to cast Google's entry against Nokia. Sounds like an unfair contest right? After all, one is a giant in its industry, a leader in R&D and technical innovation, with a long history of support for the developer/open source community, not to mention a globally recognized brand. I'm referring to Nokia, in case you were wondering. I'm not ignoring Apple and the iPhone, but I don't get the impression they're pressing as hard to shape a play that stretches from fundamental infrastructure up to the end user experience - which I do believe has crossed the minds of people at Google & Nokia.
>
Both companies have money, talent, recognition, and deep commercial relationships. Nokia probably has the lead in governmental relationships due to the highly regulated nature of the telecom industry. Google's got search so solidly nailed it's scary. With Android, Google is moving onto Nokia's turf. In the meantime, Nokia's plans appear to call for an increasing emphasis on Web services and certainly mobile content, as demonstrated by its launch of Ovi and its own music store (maybe Nokia's taking a shot at Apple, after all). 
>
I won't dwell on the increasing sophistication of mobile devices or that they'll become a prominent means of Internet access (or even primary for some people), instead, I'll focus on the evolution we're seeing on the WWW toward the SW. Since the SW is simply an extension of the WWW, it's safe to assume that wherever the Web can be reached, the Semantic Web can be reached as well - it's all due to HTTP after all.
>
Here's where the respective paths of these companies begin to diverge. Google's demonstrated its schizophrenic approach to the SW pretty clearly, as I explored in this post. It seems they're quietly exploring the fringes of the SW, but they're probably expending just as much effort in denying the existence of these activities. I haven't looked through any of Android's documentation for SW references, but since I haven't found any elsewhere, it's possible that there simply aren't any.
>
On the other hand, Nokia's involvement in the Semantic Web goes back to at least to 1997, when Ora Lassila wrote a brief note titled "Introduction to RDF Metadata," likely when he was a visiting fellow at W3C. Ora's gone on to play an influential role within Nokia, promoting the SW all along the way. Others within the company have followed suit so that now there's a raft of internal SW projects here and here (they're not all SW projects, but a good number are). So, given the complexity of mobile environments, the differences in platforms, operating systems, regulatory environments, etc., it seems like SW technology might be well suited to service composition and delivery, provisioning issues generally, and offloading processing requirements from the handset.
>
Oh, did I forget to mention that Nokia acquired Navteq over the summer? Frankly, I don't think there's enough of a difference between Google Maps, Navteq, Yahoo! Maps, or others to really make a difference. But the point is that by purchasing Navteq, Nokia no longer needs to rely on Google and it no longer needs to divulge any information regarding how location based services might be developed and deployed. That's knowledge Google will have to acquire on its own. Since I've already indicated in my earlier post about Google that there's no evidence they're recruiting people with expertise in RDF, OWL, ontologies, SPARQL, graphs, triples, etc., it doesn't look to me like they're going to catch up any time soon. As a matter of fact, now that Google's cutting its workforce , I'm guessing the company's appetite for innovation may slow down, meaning that if they want to catch up in any contest with Nokia, they're going to be sucking serious wind.
>
Diversifying away from a reliance on search makes all the sense in the world to me. But can a mobile operating system succeed on its own without its very own fleet of handsets, infrastructure, content beyond maps, software development in what I believe will be a key technology, and deep experience in global telecom regulation? I don't know, but it's not a bet I would take.

Thursday, November 20, 2008

Company Updates and New Profiles

Since publishing On The Cusp at the end of September, a number of companies have contacted me to schedule briefings and quite frankly, I'm very happy to do so. Of course, now I feel self-conscious because I have a bit of a backlog to attend to...
With that said, I'm planning to provide updates on the companies I've profiled while also profiling new companies (or at least, new to me) in the Semantic Web industry. Additionally, as I become aware of new information or discover something interesting, I'll provide updates accordingly. Along these lines, anyone reading this is welcome to contact me about companies they'd like to see profiled, even if it's their own(!)
In the meantime, you'll be seeing new profiles and updates starting in the very near future.
David