分类目录语义技术

关联数据能给企业带来什么?

作为一个语义技术(现在是关联数据)的布道者,总是被问及“能够带来什么”和“为什么”的问题。一个简单的、有巨大价值的,甚至是革命性的技术,却不知什么原因让很多人觉得难以理解,实在是难以理解。

今天又回答了一位网友的提问,顺便把回答贴上来,也期望有更多的人看到。星星之火,点燃更多的人。

关联数据能够对一个企业或机构带来怎样的好处呢?其实现在的企业、组织机构与图书情报单位差不多,IT用得较早较多的,都已经有了很多系统了:业务管理系统、办公自动化系统、人事系统、财务系统、销售系统、客户管理系统、库存系统、物流系统……,很多单位有了这些系统却找不到数据,每次用到数据(例如人员数据)都要重新填表,效率和一致性都成问题。

如何使这些系统协同、特别是数据得到重用,是个很大的问题。以关联数据为代表的语义技术正是在数据整合乃至业务整合方面,能够发挥巨大作用,确保已有的大量产品(对象)数据或其它数据方便有效地跨系统得到使用。用一个时髦名词,就是“基于语义的系统(数据)集成”。

很多情况下企业都是通过XML消息或其它B2B标准实现跨系统信息通信,但是当一个企业内拥有几十个系统,管理着不同的业务过程,涉及上万个实体(产品、零件、藏品…)时,正确地描述每一个产品的复杂属性和取值,取得数据的一致性不是一件容易的事情。一棵再强大的XML DOM树也无法对付高度复杂的多维链接关系。答案只有一个:用图式数据(graph)。

关联数据的价值就在此时显现出来。它向数据消费方提供一个单一的、可信的、易用的实体对象数据源。关联数据自身就是开放的API,对最终用户的好处就是:发布在网站上的信息、数据表、选单、指南、合作伙伴的信息、链接信息等,能够保持高度的一致性,特别是能够解决更新时的一致性问题。

怎么做?

根据LD的发布原则,首先确立每一个独立存在的实体对象(例如产品、供应商),赋予他们唯一的HTTP URI作为标识。在系统后台可能要支持这些对象数据原有的管理系统,例如他们是通过XML RPC接口过来,还是CSV格式,还是RDBMS的,都要转成RDF是肯定的。

RDF的一个很好的特性是合并数据非常容易,能够从不同的来源很容易地进行合并,而此时如果采用大数据解决方案,例如采用图形NoSQL数据库,则更体现了灵活性。

为这类RDF图形数据库建立查询“端点(endpoint)”是很容易的,然后我们就可以通过SPARQL标准进行查询。

这里有个工具Dydra,它是一个数据库作为一种服务(Database as a Service)的云服务。可以作为小应用“试水”,边用边学。你只需要上载你的RDF数据即可。目前此类工具已经很多了,而且很强大,例如最新的Apache Jena和OpenRDF Sesame项目成果,或者“关联数据平台(LDP)”如Graphity。它们已经能够支持非常“傻瓜”地建立关联数据系统,支持很快地建立API,存取不同来源的数据,甚至能够支持非常复杂的提问。这种基于语义的整合具有过去系统所不具有的深度(智能)查询能力,如果发展得足够快,应该能应用于下一代“图书馆服务平台LSP”中。

当一个查询命中一个产品标识(以HTTP URI形式)时,这个URL是可“解引(deferencable)”的,意味着可以支持“内容协商机制(negotiation)”,按照不同的请求提供不同的数据,浏览器(人工请求)就提供HTML,机器请求就提供XML、JSON或Turtle格式的RDF数据。Graphity采用Jave和XSLT2.0,通用性很强。

如果该企业或组织机构的数据具有一定的通用性,它愿意作为一种“规范数据”发布到公网上,在一定的开放协议下提供公共服务(如果是商业服务可以收费),这些数据的模型(本体)和描述规范(元数据规范)还能构成领域标准,将使企业或组织机构的价值得到更大的提升。

计算机科学对语言的研究

计算机科学对语言的研究(包括对自然语言和对人工语言)大致有三个路向:语法syntax语义semantics语用pragmatics, 语法研究语言的形态结构,语义研究语言与其所指代对象的联系,语用研究语言和其使用者之间的联系(从使用者角度、按使用者的需求对语料进行差异化)。计算 机没有智能,计算机智能都是假的,是人给它安装上去的(像是变魔术,有人创造,目的在骗人,当然达到了目的:娱乐、教育、传递信息等),所以计算机语言最 大的特点是形式化(Formalization,包含规范化Normalization的意思),计算机语言的形式化分为语法形式化和语义形式化两个方面,形式语义学研究语义形式化,包括操作语义学、指称语义学、公理语义学和代数语义学四种。具体可参考:

  • 陆汝钤,计算机语言的形式语义.北京:科学出版社,1992
  • 屈延文,形式语义学基础与形式说明.北京:科学出版社,1998
  • 周巢尘,形式语义学引论,长沙:湖南科学技术出版社,1985

另外上海师范大学物理信息学院陈仪香教授对此也有深入研究。以下对四种语义学的简介来即自陈教授”形式语义的论语理论研究进展”一文(见下列著作第二章:陆汝钤主编,《知识科学与计算科学》.北京:清华大学出版社,2003)

  1. 操作语义的基本思想是建立一个抽象机器以模拟程序在执行过程中如何进行数据处理。
  2. 指称语义的思想是使语言的每一成分对应于一个数学对象,该对象称为该语言成分的指称,程序看作输入域到输出域的映射,输入域和输出域统称为论域。因此论域与映射是指称语义的基本研究对象。
  3. 公理语义实在程序正确性验证的基础上发展起来的,它给出一种方法,使人们在给定的前提下,验证某种特性是否成立。
  4. 代数语义的基本思想是把描述语义的逻辑体系和满足这个逻辑系统的各种模型统一在一起,同时把模型的集合看成是以代数机构,研究这些模型之间的关系。

戴维民主编的《信息组织》一书(高等教育出版社.2004 面向21世纪课程教材)中认为语法、语义和语用信息的分法是按照信息组织的层次分,具体解释如下:
从认知的角度,可把信息分为语法信息、语义信息和语用信息。由于主体具有观察力,能够感知事物运动状态及其变化方式的外在形式,由此获得的信息称为语法 信息;由于主体具有理解力,能够领悟事物运动状态及其变化方式的逻辑含义,由此获得的信息称为语义信息;又由于主体具有明确的目的性,能够判断事物运动状 态及其变化方式的效用,由此获得的信息称为语用信息。
有一定道理,好像又不是很确切。不知道这种理解出自何处?语言学?哲学?认识论?

"语义万维网服务(SWSI)"- –

“语义万维网服务” Semantic Web Services Initiative (SWSI) 的目标是使目前的万维网技术结合相关的最新进展,得以发挥其最大潜能。

语义万维网技术

万维网协会主席 Tim Berners-Lee 认为万维网的未来是”语义万维网”–万维网向机读信息和自动服务的延伸而远远超出目前的能力。在数据、程序、网页以及其他万维网资源之上的语义呈现,将使万维网成为基于知识的万维网,使目前的服务提升到一个新的水平。通过”理解”万维网上的内容,达到更精确的过滤、分类以及检索信息资源,自动服务将在更大的范围上帮助人类实现目标。这个过程将最终实现极端丰富的知识系统以及在此基础上的特别的推理服务。这些服务将有助于我们日常生活的方方面面,像今天人们对于电力一样普遍而不可或缺。

目前的万维网只是信息的堆积而不提供信息的处理,也就是说并没有把计算机当作一种计算设备。最近围绕 UDDI, WSDL, 和 SOAP 等发展起来的新技术正在把 Web 变成一种新的水平层次上的服务。应用软件课题通过万维网而获得和执行,这个技术叫做 Web 服务。 Web 服务通过提供一种程序自动交流、发现服务的机制,从而可以大大提高万维网体系结构的潜能。因而得到众多软件开发公司的关注。 Web 服务使电脑设备连接在一起,以一种新的方式使用因特网交换和联合数据。 Web 服务技术的关键在于使用松散耦合的”随时”组合可重用软件组件的方式提供服务。这从技术和业务两方面都产生深远的影响。

Semantic Web Service 似乎又多了一个兄弟: Semantic Web enabled Web Services ,欧洲 IST 的一个项目。

相关的项目、组织或网站:

http://swws.semanticweb.org/

http://swsi.semanticweb.org/

Software can be delivered and paid for as fluid streams of services as opposed to packaged products. It is possible to achieve automatic, ad hoc interoperability between systems to accomplish organizational tasks. Examples include business application, such as automated procurement and supply chain management, but also non-commercial applications as well as military applications. Web services can be completely decentralized and distributed over the Internet and accessed by a wide variety of communications devices. Organizations can be released from the burden of complex, slow and expensive software integration and focus instead on the value of their offerings and mission critical tasks. The dynamic enterprise and dynamic value chains would become achievable and may be even mandatory for competitive advantage.


Technorati : , ,

语义万维网会成为什么样子- –

一直没有很好地看看 w3c 和 SW 的坛子( semanticweb@yahoogroups.com ),虽然内容局限了一点,但很多讨论对于我的论文还是有帮助的。我比较关注一些较为系统的长帖,尤其是比较宏观一些的问题。

关于语义网络会成为什么样子( How is Semantic Web going to look )最近有一些讨论蛮有意思:

首先一个叫 Rohan Abraham 的人问了一个很菜的问题( Sent: Friday, January 14, 2005 8:02 AM ),但是很菜的问题往往很本质,也是我们经常会被别人袭击的问题:

Can anyone tell me how semantic web is going to look in future?? Is all the HTML going to be taken away?? Or is RDF going to be along side with HTML.. Can any one answer the question and give me a link to the architecture of the Semantic Web. …

我很有兴趣看看 w3c 的大牛们怎么回答,我甚至以为可能牛人们不屑回答此类问题。很多此类问题在坛子里都悄无声息地沉了下去。

很快我们有个中国人有了第一反应,(当然属于在外国的假洋鬼子):”嗨,老李爵士的文章可以回答你的这个问题哦!” ( Hi, TB Lee's vision answers all.) 充分显示了我们中国人的见多识广和心地善良。

From: Jun Shen

Sent: Friday, January 14, 2005 8:07 AM

Hi, TB Lee's vision answers all.

接着有个据说跟随李爵士多年的查尔斯给出了他的看法。并告知关于此类问题李爵士也写过相当多的文章,还有许多聪明人补充他们的看法,并非常努力地工作试图证明给大家看,但是事情仍然是 ing 状态,所以 …

他的回答要点不外是:

下一代万维网并不取代现在的万维网,置标工具也是在进化、版本更新( HTML4 到 XHTML1 到 XHTML2 ,内置 RDF ),并不废除旧的。

当然他的举例让初入门者更加摸不着头脑:

From: Charles McCathieNevile

Sent: Friday, January 14, 2005 4:43 PM

Along with other kinds of XML already on the Web (SVG, MathML, VoiceXML starting to appear more, SMIL, etc – all W3C XML languages for purposes that HTML is no good for, and capable of including RDF) this is already appearing all over the place.

But it isn't something you see, except in the functionality. It is something meant to be read by the machines, so they they can present things that are more like we want them to look (cool documents with little floating asterisks and aliens, or browsers that can tell you HOW they figured out why a particular flight seems like a good deal, or images that can explain themselves through a voice system to a blind child, or whatever you want the web to do)

接着一个 MIT 的李爵士的学生,听说这个查尔斯跟随李爵士多年,希望商榷一个关于本体的问题,把这个帖子的主题带偏了。

李爵士认为本体应该通过一群人达成共识的过程来建立,而他的想法正好相反。他从人性论的角度认为达成共识是不可能的。有意思。

From: Shashi Kant

Sent: Monday, January 17, 2005 11:50 PM

I notice that you mention your involvement with TimBL… I am a grad student at MIT under Tim's supervision and we have regular debates about Ontology creation. As you are probably aware, Tim's view is that Ontologies should be created through a consensus approach- an “Ontology-by-committee” approach.

My view is exactly the opposite – I am a firm believer that such a consensual approach is a utopian pipedream. After all consensuses is, at the best of times, a very fickle entity. In fact I remember reading somewhere that when they got 3 domain experts in a single domain to create Ontologies, they only found about 30% commonality. And that is not even considering other typically human factors – egos (“is he really an expert?”), politics, and whatnots…

Plus it is impractical to assume that a corpus of Ontologies could be generated to accommodate the breathtaking rate at which information is being generated. I think it is just humanly impossible!

IMHO Ontologies are best generated using accepted machine learning approaches – sure they may turn out be at best 50% accurate, as compared to say a committee that takes 1 year to come up with an Ontology and spends millions of dollars to come up with an Ontology that is obsolete the moment even before it is created.

What are your thoughts on this subject? As a regular member of this board I would love to hear your thoughts on this matter.

接着有人建议他们私下里讨论吧,这个偏了的主题不具有普遍性。

一个莱比锡的德国人 Sören 却把这个问题深入下去。他首先赞同李爵士的”共识”论,认为人总是倾向于偷换概念,而绝对不能允许机器这么做(那天机器懂得这么做了就是人类的灾难了–科幻小说中的故事就是这么发生的),进一步他论述了一、二阶谓词逻辑和应用数学描述领域知识的重要性,并认为目前的一些进展值得夸耀。看来这也是个大师级的人物(至少也是跟李爵士多年的师叔级人物吧)

From: Sören Auer

Sent: Tuesday, January 18, 2005 12:25 AM

Seems reasonable to me too. People are only able to communicate since there is a consensus about what distinct words mean. Unfortunately people (sometimes) tend to have (slightly) different concepts in mind when communicating – that seems from time to time the reason for problems like divorce till even war. 😉
When machines are communicating we can't tolerate such misunderstandings. That's why I think there is strong need for a terminological knowledge representation like the one provided by SemWeb standards like OWL, which base on description logic and thus may support ensuring consistency and the other DL services.

To represent the whole (not only terminological) knowledge of a domain you have to use a knowledge representation at least as expressive as first order logic. Probably even second, since mathematics needs SO and which serious domain may live without maths? Unfortunately already FO logic has terrible computational caracteristics. AI communities try (more or less successful) to develop more efficient knowledge representation strategies here such as nonmonotonic resoning.
I think ontologies are not for representing all knowledge now lying around on webpages, but rather shall provide a grid to classify and maybe rearrange this knowledge, further to build common vocabularies for application systems to communicate (see WSMO, OWL-S). I think already this would be I gigantic achievement!

John Flynn 举了很多罗嗦的例子进行了一番类比:把本体的创建与网页的创建进行类比,认为本体是个多样性的世界,将会有好的本体和不好的本体,今后应该有”权威”本体,等等。

From: John Flynn

Sent: Tuesday, January 18, 2005 6:30 AM

I believe it is likely that ontologies will emerge much in the same way that html web sites and xml schema have evolved. Almost anyone can create an html web site but some become better accepted than others. Communities of interest evolve around almost every subject and out of those communities a few “authoritative” web sites emerge. For example, if you are interested in the subject of human resources there are many web sites that focus on that subject. The HR-XML Consortium provides a reliable set of xml schemas on various aspects of human resources that have been vetted by their large corporate membership. If you are interested in news you might naturally go to CNN, Google News, or one of the other widely recognized news web sites. If you are more adventurous you might try some of the news blogs as your news source. Over time selected web sites become known and accepted as providing mostly reliable information. This process will probably hold true for ontologies as well. Some ontologies will emerge as quasi standards, such as Dublin Core, and people will incorporate, modify and/or extend those ontologies as required to meet their needs. But, just as on today's public html web, there will be lots of junk ontologies posted and some ontologies created to intentionally mislead people. We will learn to deal with these just as we do with such html sites today. There will also be ontologies that are created and maintained by educational, commercial and government organizations on intranets. Basically, I don't see the growth and availability of ontologies as anything much different that what has been happening with html sites and xml schema.

又一个希望与李爵士有某种瓜葛的 Neil 先生感到这个主题非常有趣,就加入进来。他认为本体的创建确实如 Flynn 所言,不是绝对的,受市场驱动,介于完全形式化和非形式化之间,而且要做到纯学术的形式化是非常困难的。他提出一个”市场导向论”,认为经济性和迅速普及是本体是否能够生存下去的评判标准。复杂性和功能满足可以作为进一步完善的目标。

From: neil.mcevoy@ondemand-network.com

Sent: Tuesday, January 18, 2005 2:11 PM

I thought I'd join in at this point as its very interesting thread. I'd like to say I work with TimBL in some way, but I don't, in any way… 😉

I'm inputting from a business point of view, which I think like in many technical projects does feel to be missing from the semantic web discussions, and suggest it offers a few points and ideas. Prompted by agreement with John Flynn, in that I'm working on the basis that in general the production of ontologies will be a dynamic balance of formal and informal processes, mainly driven my market demand.
One would imagine that within a purely academic context, consensual methods would be more difficult because let's just say there is more appetite for absolute technical correctness and authority with more likelihood of egos and ivory towers etc. I'm quite sure if they wanted to they could stretch out the process for years! 😉
What business adds is the imperative to get something working quickly, and the understanding that it doesn't need to be perfect to be useful. Hence why I see the balance of the two; in the early days of domain development there will be much greater freedom to define and implement with less formal controls, enabling small domain teams to drive the first chunk and make it available. The point at which you need a committee approach is to enable it to scale and become universal. Quite simply for example, if you want all the big media companies to adopt a single framework, they will all need some form of equalised involvement in its development, or they won't play ball. Once you have a large cross-company team working from all over the world together, the only way to facilitate it will be via committee processes. The general idea that a committee doesn't work is not correct because we can see it can; check out VISA for example.
I'd also suggest that what business will offer is the simplicity to get things moving along. Although I'm sure it will get much more complex, all you need to start creating business value is the simple bits. For example, a tag for [Graphic designers] so that you can search the semantic web for [Graphic designers] in [London]. Hardly a massive ontology, but would actually enable lots of flow of commerce.
So it seems it's less so about the complexities of ontologies at this stage, and more about universal adoption and basic foundations, such as the DNS equivalent for registries etc. ie everyone agreeing that [Graphic designers] is the common method, so that we can move on to defining more complex elements.

一个意大利人 Dario 跳出来说了一个悖论:任何机器是无法达成共识的,必须翻译成人的语言。那么机器怎么知道是否翻译成人的概念体系了呢?

From: Dario Bonino

Sent: Tuesday, January 18, 2005 6:41 PM

I thought I'd join in at this point as its very interesting thread. I perfectly agree with Sashi about the process of ontology creation, however there is a point that it is not clear, wheter or not human knowledge and machine knowledge should have a contact point. In the last case I think that, at this moment, we are committed to the human classification. In other words, we could extract many clusters (or other, I don't know which is the exact term, sorry for my english) using LSI, or similar techniques but we also need a group of humans saying “ok, for a human being this cluster means that concept” at least with a certain degree of confidence… This is the biggest problem I think, the join point between human and machines. In my opininion, it doesn't matter where the join point is,
on the ontology rather than on mapping automatic extracted knowledge to human knowledge.
The problem is in that, if we want to deal with human beings we need humans to tell about what resources are… I don't know any machine thinking like humans, until now….

那个 MIT 的学生可能对于他的帖子中的文法错误感到不好意思,出来对着个话题作了一个很好的总结。看得出来这个后生还是有不少研究的,在这个领域。

1 他认为本体创建中机器、人工的参与比例应该为 8 : 2 ;

2 顶层本体可以为人创造,但领域本体可以完全由机器创建,并与顶层本体合并;

3 出于他的直觉,感到人创造的本体会给机器处理带来复杂性,于是他建议最大程度地利用机器创建本体,把人放在创建本体的流程中很不合理(按:这是一个被计算机科学毒害了的青年);

4 自动创建的本体即使只有 10% 可用,也比人工创建的好;

5 语义网之所以没有得到大的发展,都是因为本体创建太慢造成!!!

然后举了一大堆例子( MIT 数据中心的人怎么说 … ,这些人多么牛逼 … ,如果他们以及沃尔玛 / 戴尔等能够应用 S/W ,将使 S/W 成为 Kill App… ),强调说明他的第 5 点。

From: Shashi Kant

Sent: Tuesday, January 18, 2005 8:09 PM

Hello Charles and everyone for responding and making this an interesting discussion. IIRC this thread has turned out to be one of the most interesting on this forum for a very long time. First off, let me apologize for the poor grammar and typos in my last post …I was very sleep-deprived and tired..take pity on me I am @MIT 🙂

1. I largely agree with the positions that Charles, Dario et al have taken, that ultimately we may end up with a hybrid approach to Ontology creation – a combination of machine-generated with human-generated. If I were to hazard a guess… perhaps in 80/20 proportion.

2. I would take another guess at this and say that the majority of top-level Ontologies would likely be human-generated, and most domain-specific ontologies would be machine generated. Perhaps Aligned and/or merged with the top-level ones.

3. Another thing counter-intuitive about the idea of human-generated Ontologies is …after all the semantic web is about making the web machine-comprehensible, so why not automate the Ontology generation process to the extent possible? It just does not make sense to place humans in the middle of this process.

4. I would further argue that if someone were to come up with a good IR algorithm and feed the encyclopedia Britannica to it. The resultant Ontologies may be contain..say only 10% of the concepts/relations in that domain. But that's 10% (some might say 10^n %) better than nothing! Take Charles' example – “medieval European Recipes”. Unless someone really has a vested interest in creating a domain Ontology for medieval culinary art I would doubt anyone would ever bother creating one. I would be very surprised if DARPA or MIT or Stanford would fund a medieval cooking ontology creation committee.

5. The semantic web idea has been out there for quite a while now, but we don't really have very many Ontologies that can claim to be acceptably complete. Ontology availability is, IMHO (以愚之见) , the single biggest challenge of the semantic web and what's really holding the semantic web back. Unless you provide “real-world” applications (no hand-waving) for people to create Ontologies, they just cannot be bothered to do so. It's that simple.

Bottomline: One doesn't get more chicken-and-egg than this!
“It is unrealistic to believe that any independent body of academics or practitioners could formulate an all-inclusive canon that would stand the test of time. The ontology approach is a throwback to the philosophy of Scholasticism that dominated Western thought during the high middle ages. History has proven that canonical structures, meant to organize and communicate knowledge, often have the unintended outcome of restricting the adoption of further innovations that exist outside the bounds of the canon.”

That is how an MIT Data Center paper (www.mitdatacenter.org) puts it. While this opinion may be the other extreme of the spectrum, I think it sums up how the Walmarts, and the Dells of the world see the semantic web today. This is very unfortunate, because the semantic web badly needs the ballyhooed “killer app”, and the coming “data tsunami” because of RFID systems, sensor networks
etc. would have been a good, good one.

BTW MIT Data Center is an offshoot of the former MIT Auto ID center – the people who came with the EPC standards for RFID etc. So their buy-in would have been a huge boost for the S/web. It now looks they are going their separate ways – in fact they are even proposing a new modeling language called “M” (counterpart of OWL).

If you are interested I recommend reading up on their website – their contrarian viewpoint is fascinating.

Sören 又回过头来澄清一些问题,并给出了几个例子,看法比那些纯”计算机”头脑要现实、全面、理性得多,但是不知道是否能够说服那些机器脑子。国外著名大学的研究生们对于许多问题的理解好像也并不一定都很准确。

From: Sören Auer

Sent: Tuesday, January 18, 2005 9:45 PM

I'm a bit confused since all of you seem to understand Ontologies as a tool for arbitrary knowledge representation. As I mentioned in my last posting I don't think they are prepared to solve this task (especially if based on Description Logic as OWL).
Textual knowledge on websites contains so many vaguenesses, contradictions and exceptions. Humans can cope with them and sometimes it's even easier (for us synapse based reasoners) to get the spirit of an idea if it is described from contradictory viewpoints. But I'm quite sure machines won't be able to do the same at least within next 20 years or so.
Artificial intelligence research developed a variety of theories to make machines more intelligent in the human way. I'm not an expert in default reasoning, nonmonotinicity or horn logic, but my impression is that they are still far from being efficiently applicable. Description Logics and ontologies probably are a bit more mature but still there are many open problems (such as perspective reasoning, linking, merging, reconciliation, versioning). Even if all those problems are solved and if you manage to automatically generate ontologies from textual documents the benefit won't be much better than todays elaborated full-text searches, since DL can't (and is not intended) to cope with vaguenesses, contradictions and exceptions at all. And already one contradiction makes any further DL reasoning more or less senseless.

Already today quite much of the current web content is structured in proprietary database schema, xml-dialects. Here I think is the real impact of a terminological knowledge representation like OWL – defining globally shared, common vocabularies for distributed searching, view generation, querying, syndication of such structured data.

Projects in this context like – OWL-S/WSMO (description for automatic selection/composition of web-services),

– D2RQ (Treating non-RDF Databases as Virtual RDF Graphs)
– future (Semantic) WebApplications (you can have a look at my Powl
project for this – http://powl.sf.net) seem very promising to me.

For applications intended by the W3C you can have a look at the “OWL Web Ontology Language Use Cases and Requirements” document ( http://www.w3.org/TR/webont-req/).
Of course enriching arbitrary web pages with terminological classifications may be an application as well. But I think even this won't be possible automatically in a quality that gives us an real impact. But I'm open to conviction. 😉

Alex 又对解决文本知识的模糊性进行了展望,似乎技术还是可以解决这些问题的。看来这个话题还没有结束,让我们拭目以待。

From: Alex Abramovich

Sent: Thursday, January 20, 2005 6:10 PM

Yes, textual knowledge vagueness is a stumbling block of SW investigations. But it has an own nature that one can to make clear. What just is vague? A current operational context is uncertain. Nothing shall prevent us from building a library of operational contexts today!
An analysis of a sentence (based on this library) will derives a set of expectations of operational contexts. An analysis of subsequent sentences will confirm one of them.
It seems to me that something similar to this approach suggested Roger Schank (“Conceptual Dependency”).


Technorati : , , ,

关于OWL-S的服务描述- –

    服务发现是否假设请求者和提供者使用同一本体?肯定不是。否则 OWL-S 的使用会大大受限,甚至失去意义了,因为其目的本来就是为了寻找合适的本体,其前提假设就有问题。(但是是否假设使用同一本体编码语言呢?也不应该是,但是不一定)

      会对本体中介进行一些规范吗?就像 WSMO 对面向对象的中间件一样?虽然可以这样做,但是至今还没有这方面的研究计划和进展。 OWL-S 是采用 OWL 语言的一种服务描述语言,并不规定是否一定有中间件或者服务实现某些功能。

        是否仅包含输入描述和输出描述?答案也是否定的。 OWL-S 纲要是用于”广告”语义 Web 服务的,描述的内容包括服务描述、产品描述、输入、输出、前提条件、效果、存取条件、服务质量、安全参数等等,凡是与服务有关的参数均可以用 OWL-S 进行描述。

        来自于Katia Sycara” < katia+@cs.cmu.edu >的一帖关于owl-s服务描述问题的澄清,感到有必要存档一下。

        There were a number of issues raised in this discussion:

        1. Does OWL-S discovery assumes that requesters and provides use a unique ontology?

        The answer is NO. OWL-S does not assume the use of a single onrology. It is difficult, however, to see what you mean by “one single ontology”. If you mean “one single OWL file”, then of course trivially OWL-S does not assume a single ontology since you can import as many OWL files as you desire in an OWL-S description, and use any of the concepts defined in those files to describe the OWL-S profile or any other OWL-S component. During the discovery process the Profile of the requested service may refer to a concept, say a:A (the concept A defined in “file” a), and an advertisement may refer to concept b:B that belong to a different ontology (different owl files), and yet b:B may be defined as a subclass of a:A. In this case, matching engines would still be able to match exploiting the logical relation between A and B. At CMU, we have shown different kinds of matches (e.g. exact, plug-in etc) in our matching algorithm (see e.g. [3]).

        Another way in which the use of different ontologies can be handled in OWL-S is through mapping rules that could be expressed in SWRL. In this way, to the extent that the similarity between A and B can be made explicit, then the mapping can be exploited. Of course there are issues of where these mappings live, how it is discovered where they live, since of course in the process of service discovery one does not know a priori what the ontological needs of one request would be vis a vis the current advertisement knowledge base. Even if one assumes a unique knowledge base containing such mappings, another set of issues is of course, how this knowledge base gets searched efficiently. 我的论文的一部分就是要解决这个问题:采用登记服务自动完成映射工作,但是基于怎样的请求?机制仍然成问题。

        The issue of ontological mapping is an old and well known one that has predated Semantic Web Services. Work on how to express mappings to achieve semantic interoperability efficiently (even assuming the mapping rules are known) has been going on since the late 80&apos;s (perhaps even earlier).

        The general problem of arbitrary ontology mapping is an open research problem. The real scientific work is in trying to attack the technical issues that I outlined (and others that are there but I did not refer to). After we solve these scientific problems (ie. How to derive the mappings, and how to use them), we can worry about what to call the algorithms.

        Since ONTOLOGY MEDIATION is an open research issue, OWL-S is agnostic about the actual ontology mediation process used.( 这方面应该有研究论文,也可以参考一下,属于论文中创新性的内容 ) To the extent that the mediation process is a service, rather than a set of rules, it can be represented in OWL-S and discovered.

        2. Should OWL-S do something about ontological mediation like WSMO is doing with the OO mediators?

        Up to now, there is no clear operational definition of what a WSMO mediator is, neither is there a clear specification of an ontology or language for describing mediators, or an algorithm for ontological mediation.

        To the extent that WSMO mediators are services, rather than sets of rules, they can be represented in OWL-S by specifying what is their profile, process model and grounding (for a detailed discussion on this point see [2]). Furthermore, the discovery mechanism may then become similar to a composition procedure where you combine discovery of the appropriate mediator with the discovery of the appropriate service.

        Note that if you take this viewpoint, the sentence “OWL-S has no mediators” is non-sensical: it is analogous to sentences like “Java has no Operating System” or other such sentences. OWL-S is a language (it uses OWL semantics) that allows you to describe Web services, it does not tell you what infrastructure Web services need, nor does it stipulate the existence of mediators or of a discovery registry or any other component. If you think you need a mediator, the role of OWL-S is to provide you the tools to describe a mediator if you decide to implement it as a Web service. If you look at [2] there is a discussion on how to do that.

关于OWL-S应用的一些问题- –

关于 OWL-S 应用的一些问题(摘自 W3C 语义万维网讨论组 public-sws-ig@w3.org Evan K. Wallace 的一个贴子):

Eric Miller 在最近的一次会议上提到,许多软件公司对 OWL-S 的应用似乎比当初 RDF-S 和 OWL 来得迟缓,究其原因,大概是因为 OWL-S 目前还是一个 W3C submission 而不是推荐标准,正在讨论之中,变动还会比较大。另一方面好的工具比较少,参考文档和参考案例不多,也影响了应用。

实际上与 OWL-S 处于同一水平层次上的同类技术规范很多,例如 XPDL, BPML, BPE4WS, ebBPSS, BPRI, WMF, 以及 UML2 Action Semantics 等等。 其它更为形式化的如 PSL 和 SWSL 。 OWL-S 似乎并没有像 OWL 一样在同类语言中鹤立鸡群(特别作为概念建模语言方面)。 OWL-S 似乎没有吸收足够的同类语言的成果。


Technorati :

Ontology大牛Tom Gruber访谈- –

Dr. Tom Gruber&apos; s (Co-founder and Chief Technical Officer of Intraspect Software) Interview
For the Official Quarterly Bulletin of AIS Special Interest Group on Semantic Web and Information Systems, Volume 1, Issue 3, 2004

Tom Gruber (tomgruber.org) ,就是那个在 Ontology 最牛的牛,给 Ontology 下定义而被无数人引用的那个大牛,最近又说了一些很牛的话:

他说:”每个本体都是一个条约–一项社会要约–存在于想共享某些事物的人之间”( “Every ontology is a treaty – a social agreement – among people with some common motive in sharing.” )

他把 Ontology 分为形式化的、半形式化的和非形式化的,他认为形式化的本体会很难达成,会有很多限制,而半形式化的本体更有用,办形式化的本体:形式化的一半由机器来处理,半形式化部分给人读的。有意思。

The term “Semiformal Ontology” refers to a ontology which has a few bits of formality but is largely informal. It is the analog of what Tom Malone calls semistructured data, such as email or office forms. A semiformal ontology could support technology to processing of its formal parts but leaves it to the reader make sense of the informal parts.

Tom 认为 Ontology 工具(他的 Intraspec 公司正在干这个明堂)将对其应用带来很大便利,特别是对那些不懂技术的用户。

Tom 还认为半形式化本体由于能够结合上下文,会工作得很好。


Technorati :