
tech2024-03-25  85


Calais, the Semantic Web service, from Thomson Reuters, is today announcing a new commercial version at the EmTech Conference on the MIT campus in Cambridge, Massachusetts. Calais is a web service and open API that allows web publishers to automatically scan content and pull out semantic metadata. In other words, the services built on the Calais API can semantically mark up content automatically.

汤森路透的语义Web服务Calais今天在马萨诸塞州剑桥的MIT校园的EmTech会议上宣布了一个新的商业版本。 Calais是一个Web服务和开放的API,允许Web发布者自动扫描内容并提取语义元数据。 换句话说,基于Calais API构建的服务可以在语义上自动标记内容。

According to Tom Tague, who leads the Calais initiative at Thomson Reuters, they finally reached a critical mass of people using Calais, including both large companies and smaller web startups, who were telling them that in order to really utilize Calais, they needed a professional version with an SLA. So Tague and company responded with the professional version, which for $2,000 per month and a one year commitment, comes with 24×7 monitoring, and 100,000 transactions per day (20 per second), up from 40,000 per day and 4 per second on the free version.

汤姆森·路透(Thomson Reuters)的加来倡议组织负责人汤姆·塔格(Tom Tague)表示,他们最终接触到使用加来的人数众多,包括大型公司和小型网络创业公司,他们告诉他们,为了真正利用加来,他们需要专业人士具有SLA的版本。 因此,Tague和公司以专业版作为回应,该专业版的价格为每月2,000美元和一年的承诺,具有24×7的监控功能,以及每天100,000笔交易(每秒20笔),高于每天40,000笔和每秒4笔交易。免费版本。

Tague told me that most users asking for the professional version didn’t need more volume, they just needed a guarantee that the service would be available and that Thomson Reuters was serious about keeping it going.


In addition to the professional edition, Calais is also announcing an enterprise version that can be installed on-site for clients that can’t let their content out of their firewall. Tague tells me that the enterprise version will appeal to clients dealing with health records, financial data, or other sensitive information, or to clients who require a very large volume of transactions where it makes sense to do the processing locally rather than sending it out to a service that exists in the cloud.

除了专业版,加来(Calais)还宣布了一个企业版,该版可以在现场安装,以使客户无法将其内容放出防火墙。 Tague告诉我,企业版将吸引处理健康记录,财务数据或其他敏感信息的客户,或者需要大量交易的客户,在这种情况下,本地处理而不是发送到云中存在的服务。

One of the biggest knocks against Calais early on was that because of its early pedigree as a business application called Clear Forest (which Thomson Reuters acquired), it was biased toward business language. That meant that it was of limited usefulness for sites that didn’t deal with business topics. I asked Tague for an update on their progress in expanding Calais’ vocabulary to understand semantics outside of the business realm, and he told that the Calais had improved by leaps and bounds since it first launched.

早期对加来最大的打击之一是,由于它作为一家名为Clear Forest的商业应用程序(汤森路透收购了它)的早期血统书,因此偏向于商业语言。 这意味着它对于不涉及商业主题的网站的作用有限。 我向Tague询问了他们在扩展Calais词汇表方面的最新进展,以了解业务领域之外的语义,并且他说,自从Calais首次推出以来,它已经有了长足的进步。

The vocabulary has grown by about 40% since the Clear Forest days, according to Tague, and now includes pop culture entities such as musical groups, events, entertainers, and sports teams, as well as healthcare industry entity types. Calais is even working with some clients to create specialized vocabularies, and has about a dozen full time natural language programmers adding new entities at the rate of 10-12 items per month. Tague says that he can’t recall hearing the “too focused on business” complaint at all in the past three or four months.

根据Tague的说法,自清除森林时代以来,词汇量已增长了约40%,现在包括流行文化实体,例如音乐团体,活动,演艺人员和运动队,以及医疗保健行业实体类型。 加来甚至与一些客户合作来创建专门的词汇表,并且大约有十二名全职自然语言程序员以每月10到12项的速度添加新实体。 Tague说,在过去的三四个月里,他根本不记得听到过“过于专注于业务”的投诉。

语义代理 (Semantic Proxy)

Even though the professional version of Calais was the big news, Tague was more excited to talk to me about SemanticProxy, a new service from Open Calais.

尽管Calais的专业版本是个大新闻,但Tague还是很高兴与我谈谈来自Open Calais的一项新服务SemanticProxy 。

SemanticProxy, which is built on Calais 3.0, works like a proxy server for extracting semantic information from web content. It takes a URL, fetches the page, cleans it up and processes it with Calais, and then returns semantic metadata in HTML, RDF, or Microformats.

建立在Calais 3.0上的SemanticProxy就像一个代理服务器,用于从Web内容中提取语义信息。 它获取URL,获取页面,清理并使用Calais处理它,然后以HTML,RDF或微格式返回语义元数据。

“In the future, the Web will be one giant yet tightly interconnected information asset that delivers the content and services people need in the fashion and format they desire. Beyond publishing information for people, every site will expose its content in a way that’s readable by machines. Machines will mix, match, filter and aggregate information to greatly improve the experience for everyone,” said Tague. Unfortunately, for a lot of publishers, investing in semantically marking up their content is infeasible — either because they have overwhelmingly large back catalogs of content that needs attention, or because they publish transient content (such as news) that is only read for a short time.

“将来,Web将成为一种巨大但紧密互连的信息资产,以人们期望的方式和格式提供人们所需的内容和服务。 除了为人们发布信息之外,每个站点都将以机器可读的方式公开其内容。 机器将混合,匹配,过滤和汇总信息,以极大地改善每个人的体验。” Tague说。 不幸的是,对于许多发布者而言,对语义上标记其内容进行投资是不可行的-是因为它们拥有大量需要关注的内容的后目录,或者是因为它们发布的短暂内容(例如新闻)只能短暂阅读时间。

The goal with Calais and services built on it like SemanticProxy are to remove the barriers to marking up and adding semantics to content. “The Semantic Web is going to be a critical mass play,” Tague told me. You need enough publishers to produce semantically marked up content for the vision to work, and the easier you make it for them to add semantics to their content, the more it will happen.

加来和其上构建的服务(例如SemanticProxy)的目标是消除标记内容和向内容添加语义的障碍。 Tague告诉我:“语义网将成为至关重要的大众游戏。” 您需要足够的发布者来生成语义标记的内容,以使愿景发挥作用,并且使他们向内容中添加语义的过程越容易,发生的事情就越多。

Like Yahoo!’s Search Monkey, which encourages the use of RDF and other semantic markup, Calais and SemanticProxy will help publishers along the road to the Semantic Web by stimulating activity and making it easier to markup content.

与鼓励使用RDF和其他语义标记的Yahoo!的Search Monkey一样,Calais和SemanticProxy将通过刺激活动并使标记内容更容易来帮助发布者迈向语义Web。

翻译自: https://www.sitepoint.com/calais-semantic-web-service-adds-professional-version/

