Meta-search: SRW/U会成为NISO Metasearch的标准?- –

Meta-search: SRW/U会成为NISO Metasearch的标准?- –

1、 Dlib Mag 刊出了一篇 SRW/U 的文章,把 SRW/U 与 OAI 的 protocol 进行对比,并提出了兼用这两种协议的方法。使我想到应该在”知识组织”课程里介绍这两种协议,同时介绍他们的相容性。

2、 SRW/U 作为 Z39.50 的 Web/XML 版本,有彻底的脱胎换骨,实际上 Z ( Zing )的功能被一系列新的协议所取代,而不是仅仅一个 SRW/U 。可参考网页: http://www.loc.gov/z3950/agency/zing/

3、 也是在本期 dlib 杂志上看到一个 OLAC 项目的元数据方案,采用比较规范的 DC Profile 形式在网上公开,可以供我们的项目参考。
http://www.language-archives.org/OLAC/metadata.html
http://www.language-archives.org/REC/olac-extensions.html

4、 NISO 的 Metasearch Initiative ( http://www.niso.org/committees/MetaSearch-info.html )与 Zing 到底是一种什么关系?可能 NISO 希望 Zing 的开发可以作为下一代 MetaSearch 的标准吧。

2005/2/23补记:

看到年心搏客( http://hjn66.blogchina.com/ )里对元搜索的一种区分,好像有点道理,不知是不是国内的普遍认识?

2. 整合检索:将各个数据库的元数据套录出来组成新的二次文献库,对源文件进行链接管理,这种方式技术难度大,需要数据厂商的支持。 TRS 就是这个类型的,这也是国外数字图书馆跨库检索的发展方向,不过国内的数据厂商相对比较封闭,不容易开展!

待续…


Technorati : ,

数字图书馆的检索问题- –

继续学习Modern Information Retrieval中与近期兴趣有关的部分:元搜索、数字图书馆的基本问题、知识组织等。

Modern Information Retrieval 提供了一种从计算机科学看数字图书馆的角度:

数字图书馆是:

作者并认为由于数字图书馆的跨地域性,多语种问题是数字图书馆的首要问题。解决多语种问题首先是字符集问题,字符集可以通过网络下载来解决;同时跨语种检索也是一个很重要的待解决问题。 QBIC 和可视化浏览和视觉辅助等技术有助于实现跨语种检索问题。

多媒体检索也是数字图书馆的核心技术之一。

把文件作为数字图书馆的结构单元,文件的结构及其元数据能够为数字图书馆提供微观的结构和语义。结构和语义是数字图书馆最重要的内容。

数字图书馆中的资源可能物理或者逻辑地不在一处,解决分布环境中的检索问题是数字图书馆有一个重要课题。

分布环境中的检索问题可以有两种方案解决:

其中联邦检索( Federated search )的意思为:

Federated search is the support for finding items that are scattered among a distributed collection of information sources or services, typically involving sending queries to a number of servers and then merging the results to present in an integrated, consistent, coordinated format.

对于联邦检索目前的称呼有很多,元搜索、跨库检索等等都是,其具体流程、步骤是否有什么不同未及深究,可能也应该了解一下。现代情报检索里附了一张图示,作为一个实用系统( BioKleili )的例子。

(无法贴图?)

可见与目前 NISO 组正在制定的 Metasearch 标准是何其相似。

联邦检索的具体步骤, Ricardo 和 Berthier 的书中是这样阐述的:

略有些模糊和不知所云。相比较而言中山大学计算机专业一个硕士(杜剑峰)的学位论文倒是研究得比较仔细:

另外还需参考一些近期的国外论文。


Technorati :

Modern Information Retrieval- –

智利的 Ricardo Baeza Yates 和巴西的 Berthier Ribeiro-Neto 两位计算机教授 1999 年著作的《 Modern Information Retrieval 》一书近年来被引率很高,许多学者都给与了很高的评价,成为许多学校的教科书或者必读书。从网上下载了该书的引言、第一章和第十章,感到确实不错,结构清晰,主要是内容比较新。相比较而言,国内情报检索课程所授,除了老套的东西,就是一些不伦不类的东西了。

查了馆藏书目,居然有藏,节后去借了来。

网址: http://sunsite.dcc.uchile.cl/irbook/


Technorati : ,

领域本体——广域网信息检索- –

感觉做论文时间紧迫,过年也得好好抓紧。

梳理思路:

论文的选题领域实际上是广域网的信息搜索问题,问题域集中在数字图书馆作为”一种”广域网的信息环境(首先必须定义清楚),希望利用语义万维网的一些思想来解决,包括利用元数据和知识本体的思想。

需要对自己要解决的问题领域先有一个本体:

因此先得找一些综述文档来看看。


Technorati :

台湾大学“知识组织”课件参考- –

因要给研究生班开设”知识组织与元数据”课程,系里没有指定教材,目前似乎也没有合适的教材,最近在准备课程内容时发现台湾大学咨询学系(也就是陈雪华教授那里)2003年就开设了类似的课程,名为”知识组织”,且所有课件都可以下载,狂喜。(参见http: //ceiba3.cc.ntu.edu.tw/course/cb9879/)。

看了台大的课程内容, 总的感觉,台大的”知识组织”更加偏重”知识管理”中所需的知识组织,也就是说时下比较热门的、用于许多知识型企业(咨询公司、 IT 研发企业等)的知识组织,而不是源自于哲学认识论、逻辑学或者计算机科学中的知识表示和操纵。因而看起来像是图书馆学、计算机科学与管理学的交叉。内容非常丰富,也很实用,然而就学科体系来说略感凌乱,如果想通过这门课的教授整理一份教材,还需要下不少功夫。
而且毕竟是2003年以前的内容,”知识本体”这两年进展颇多,课程的资料略显陈旧。

北大要求给研究生上课不必详细讲授知识内容,面面俱到,只要有一个大纲,让研究生掌握框架,然后去自学,并且在实践中总结。台大的课件好像也不太符合这个要求,象是给本科生上课。但是我的教材内容还是要准备得尽可能详尽,讲授的时候可以灵活掌握。这样做一方面便于自己形成一些研究课题,也方便学生拿到课件后能够进行自学,并进一步选择研究方向。

重新看一下我准备的课件,元数据部分还是强调的太多,脱胎于元数据讲座,而不是从知识组织角度,更能讲清楚元数据的作用和来龙去脉。


Technorati : , , ,

信息哲学学习手记- –

“哲学就像黄昏才起飞的智慧猫头鹰,在自己的领地上盘旋,看是否有新的内容,而这次发现的是 ” 信息 ” 。”

–刘钢 《科技哲学的新范式》

然而事实上这些人却多为决策者,应验了孔老夫子所云:”劳心者治人,劳力者治于人”。

于是咱也来务务虚,看看信息哲学能够给图书馆,特别是数字图书馆带来些什么。

关于图书馆学的学科建设

库恩、拉卡托斯、费耶阿本德、波普尔等科学哲学(我国以前叫自然辩证法)大师对现代科学的形成与发展从逻辑实证主义、批判理性主义、无政府主义、历史主义、精致证伪主义等方面进行批判和阐释,曾为我们图书馆学的学科建设带来一盏盏明灯,虽然这些”物理中心论”而成就的科学哲学未必适合其它学科,照出来的横竖不是那么回事,也得不到普遍的承认。现在可好,连同科学哲学本身都开始遭到批判,说它没有取得任何可以证明的、预见性的成果,对科学本身造成的伤害大于建设,甚至有人将这些大师说成是”真理的叛徒”。而号称一个新的对于现实世界的解释,从信息角度进行的解释,由于信息技术的发展所带来的全方位的冲击而横空出世,有人期待整个哲学的研究范式将为之一变。

这个新的解释名为”信息哲学”( Philosophy of Information )。

追本溯源,”缺乏历史的学科将难免”没有收摄而行之不远””。信息哲学本源自西方三大哲学传统之一–“莱布尼茨 – 罗素传统”。而这个传统在历史上曾经遭到歪曲、肢解甚至遗弃。以”符号学”和”逻辑”为特征的”形式传统”与另两大哲学传统波拉图”古典传统”和康德”现代传统”有着截然的不同,而”符号传统”则提供了信息哲学源远流长的理论基础。

信息技术的发展以及量子信息论的引入真的使人们需要重新审视这个世界究竟是物质的还是意识的,或者干脆是”信息”的,(我们已经有波普这个哲学大师给我们定义了近似的”知识世界”),难道我们真要怀疑 Matrix 中人类城市 Zion 的真实性吗?

这不由得使人想到哲学这样一门号称对自然、社会和历史根本认识的科学,也一样可以不断地被玩弄。中性一点的话语便是:哲学也是需要不断发展的。或者说人类的认识也是不断进步的。 …… 何况我们图书馆学呢?

我们来看一下信息哲学的面目:

社会信息化极大地改变了人们的生产方式、交往方式、生活方式和思维方式,它与当今时代的其他变化一起,改变了哲学探索的背景和语境,提出了一些迫切需要哲学回答的新问题。一些学者力图从元哲学的层面把握它,将相关探索发展为一个研究纲领、一个独立的探究领域,以为传统的和新的哲学话题提供原创性方法论,为理解信息世界和信息社会提供概念基础提供系统论证。 ” 信息哲学 ” 涉及两个方面,即信息的本质研究及其基本原理,包括它的动力学、利用和科学的批判性研究,以及信息理论和计算方法论对哲学问题的详细阐述和应用。 ” 信息哲学 ” 的理论旨趣有以下四个方面:( 1 )核心。寻求统一信息理论,其基本问题就是对信息本质进行反思;同时对信息的动力学和利用进行分析、解释和评价,重点关注在信息环境中引发的系统问题。( 2 )创新。主要目的是为各种新老哲学问题提供一种新的视角,其中涉及诸多哲学领域。( 3 )体系。利用信息的概念、方法、工具和技术来对传统和新的问题进行建模、阐释和提供解决方案。( 4 )方法论。对信息和计算机科学与信息和通信技术及其相关学科中的概念、方法和理论进行系统梳理,为其提供元理论分析框架。

( 刘钢”进入 21 世纪的中国科学技术哲学”,见: http://philo.ruc.edu.cn/pol04/Article/science/ s_digital/200409/1084.html )


Technorati : ,

火狐狸的新玩具:Piggy Bank- –

FireFox已经装了不少时日了,除了速度快一些,稳定性似乎好一些之外,没有感觉到特别的好处,有时还有一些网页好像支持得不太好。然而最近装了两个东东要改变这个看法了。

Wizz RSS 是一个博客浏览器。看天下用了一阵总会有一些问题,而且同时要用浏览器,感到不方便,这个Wizz作为一个SideBar放在浏览器边上,用起来方便多了,只是功能还不多,比如没有更新通知等等。
Piggy Bank是一个很厉害的东东,可以看成是通用的语义万维网的浏览器(哈哈,终于有了),以前只有一些专用的,例如FOAF浏览器等。知识目前语义网应用的实用系统还不多,但是试了项目(http://simile.mit.edu/piggy-bank/guide.html)提供的的一个应用,感觉非常不错:http://citeseer.csail.mit.edu/。这是个查询计算机研究论文全文的网站,Piggy Bank支持元数据下载存储、全文链接以及加注释的功能。非常好!!!

关于这个应用,Simile项目讨论组(general@simile.mit.edu)上很热闹,可以去看看。

下面转一篇别人的网志,关于这个软件的,我就不多废话了,供参考。

地址见:
http://lylejohnson.name/blog/2005/01/browsing-semantic-web-with-piggy-bank.html

Browsing the Semantic Web with Piggy-Bank
Piggy-Bank is a new extension for the Mozilla Firefox browser that allows you to easily browse the semantic data linked to from regular web pages. I've seen some other projects along these lines, but they tend to be focused on a particular flavor of RDF data (such as Joel's FOAFer extension, or Christopher Schmidt's DOAP Viewer extension ).

I'm still not quite sure how Piggy-Bank works, but at the least it's scraping web pages for any embedded links that have well-recognized types in the Semantic Web, such as “application/rss+xml” (for RSS feeds) and “application/xml+rdf”. It then follows those links and parses out the “information tidbits” from those sources, and presents that information to you in a sidebar. Piggy-Bank attempts to categorize the tidbits into high-level categories, such as “News” for RSS Channel and Item resources, or “Contacts” for FOAF's “Person” resources. You can save the tidbits of interest in a local database (“My Piggy Bank”) and search through them later; Piggy-Bank remembers the original source of the data and allows you to annotate them with comments as you desire.

In response to the question, “Why was [Piggy-Bank] built?” the developers offer the simple answer:


Technorati : , ,

语义万维网会成为什么样子- –

一直没有很好地看看 w3c 和 SW 的坛子( semanticweb@yahoogroups.com ),虽然内容局限了一点,但很多讨论对于我的论文还是有帮助的。我比较关注一些较为系统的长帖,尤其是比较宏观一些的问题。

关于语义网络会成为什么样子( How is Semantic Web going to look )最近有一些讨论蛮有意思:

首先一个叫 Rohan Abraham 的人问了一个很菜的问题( Sent: Friday, January 14, 2005 8:02 AM ),但是很菜的问题往往很本质,也是我们经常会被别人袭击的问题:

Can anyone tell me how semantic web is going to look in future?? Is all the HTML going to be taken away?? Or is RDF going to be along side with HTML.. Can any one answer the question and give me a link to the architecture of the Semantic Web. …

我很有兴趣看看 w3c 的大牛们怎么回答,我甚至以为可能牛人们不屑回答此类问题。很多此类问题在坛子里都悄无声息地沉了下去。

很快我们有个中国人有了第一反应,(当然属于在外国的假洋鬼子):”嗨,老李爵士的文章可以回答你的这个问题哦!” ( Hi, TB Lee's vision answers all.) 充分显示了我们中国人的见多识广和心地善良。

From: Jun Shen

Sent: Friday, January 14, 2005 8:07 AM

Hi, TB Lee's vision answers all.

接着有个据说跟随李爵士多年的查尔斯给出了他的看法。并告知关于此类问题李爵士也写过相当多的文章,还有许多聪明人补充他们的看法,并非常努力地工作试图证明给大家看,但是事情仍然是 ing 状态,所以 …

他的回答要点不外是:

下一代万维网并不取代现在的万维网,置标工具也是在进化、版本更新( HTML4 到 XHTML1 到 XHTML2 ,内置 RDF ),并不废除旧的。

当然他的举例让初入门者更加摸不着头脑:

From: Charles McCathieNevile

Sent: Friday, January 14, 2005 4:43 PM

Along with other kinds of XML already on the Web (SVG, MathML, VoiceXML starting to appear more, SMIL, etc – all W3C XML languages for purposes that HTML is no good for, and capable of including RDF) this is already appearing all over the place.

But it isn't something you see, except in the functionality. It is something meant to be read by the machines, so they they can present things that are more like we want them to look (cool documents with little floating asterisks and aliens, or browsers that can tell you HOW they figured out why a particular flight seems like a good deal, or images that can explain themselves through a voice system to a blind child, or whatever you want the web to do)

接着一个 MIT 的李爵士的学生,听说这个查尔斯跟随李爵士多年,希望商榷一个关于本体的问题,把这个帖子的主题带偏了。

李爵士认为本体应该通过一群人达成共识的过程来建立,而他的想法正好相反。他从人性论的角度认为达成共识是不可能的。有意思。

From: Shashi Kant

Sent: Monday, January 17, 2005 11:50 PM

I notice that you mention your involvement with TimBL… I am a grad student at MIT under Tim's supervision and we have regular debates about Ontology creation. As you are probably aware, Tim's view is that Ontologies should be created through a consensus approach- an “Ontology-by-committee” approach.

My view is exactly the opposite – I am a firm believer that such a consensual approach is a utopian pipedream. After all consensuses is, at the best of times, a very fickle entity. In fact I remember reading somewhere that when they got 3 domain experts in a single domain to create Ontologies, they only found about 30% commonality. And that is not even considering other typically human factors – egos (“is he really an expert?”), politics, and whatnots…

Plus it is impractical to assume that a corpus of Ontologies could be generated to accommodate the breathtaking rate at which information is being generated. I think it is just humanly impossible!

IMHO Ontologies are best generated using accepted machine learning approaches – sure they may turn out be at best 50% accurate, as compared to say a committee that takes 1 year to come up with an Ontology and spends millions of dollars to come up with an Ontology that is obsolete the moment even before it is created.

What are your thoughts on this subject? As a regular member of this board I would love to hear your thoughts on this matter.

接着有人建议他们私下里讨论吧,这个偏了的主题不具有普遍性。

一个莱比锡的德国人 Sören 却把这个问题深入下去。他首先赞同李爵士的”共识”论,认为人总是倾向于偷换概念,而绝对不能允许机器这么做(那天机器懂得这么做了就是人类的灾难了–科幻小说中的故事就是这么发生的),进一步他论述了一、二阶谓词逻辑和应用数学描述领域知识的重要性,并认为目前的一些进展值得夸耀。看来这也是个大师级的人物(至少也是跟李爵士多年的师叔级人物吧)

From: Sören Auer

Sent: Tuesday, January 18, 2005 12:25 AM

Seems reasonable to me too. People are only able to communicate since there is a consensus about what distinct words mean. Unfortunately people (sometimes) tend to have (slightly) different concepts in mind when communicating – that seems from time to time the reason for problems like divorce till even war. 😉
When machines are communicating we can't tolerate such misunderstandings. That's why I think there is strong need for a terminological knowledge representation like the one provided by SemWeb standards like OWL, which base on description logic and thus may support ensuring consistency and the other DL services.

To represent the whole (not only terminological) knowledge of a domain you have to use a knowledge representation at least as expressive as first order logic. Probably even second, since mathematics needs SO and which serious domain may live without maths? Unfortunately already FO logic has terrible computational caracteristics. AI communities try (more or less successful) to develop more efficient knowledge representation strategies here such as nonmonotonic resoning.
I think ontologies are not for representing all knowledge now lying around on webpages, but rather shall provide a grid to classify and maybe rearrange this knowledge, further to build common vocabularies for application systems to communicate (see WSMO, OWL-S). I think already this would be I gigantic achievement!

John Flynn 举了很多罗嗦的例子进行了一番类比:把本体的创建与网页的创建进行类比,认为本体是个多样性的世界,将会有好的本体和不好的本体,今后应该有”权威”本体,等等。

From: John Flynn

Sent: Tuesday, January 18, 2005 6:30 AM

I believe it is likely that ontologies will emerge much in the same way that html web sites and xml schema have evolved. Almost anyone can create an html web site but some become better accepted than others. Communities of interest evolve around almost every subject and out of those communities a few “authoritative” web sites emerge. For example, if you are interested in the subject of human resources there are many web sites that focus on that subject. The HR-XML Consortium provides a reliable set of xml schemas on various aspects of human resources that have been vetted by their large corporate membership. If you are interested in news you might naturally go to CNN, Google News, or one of the other widely recognized news web sites. If you are more adventurous you might try some of the news blogs as your news source. Over time selected web sites become known and accepted as providing mostly reliable information. This process will probably hold true for ontologies as well. Some ontologies will emerge as quasi standards, such as Dublin Core, and people will incorporate, modify and/or extend those ontologies as required to meet their needs. But, just as on today's public html web, there will be lots of junk ontologies posted and some ontologies created to intentionally mislead people. We will learn to deal with these just as we do with such html sites today. There will also be ontologies that are created and maintained by educational, commercial and government organizations on intranets. Basically, I don't see the growth and availability of ontologies as anything much different that what has been happening with html sites and xml schema.

又一个希望与李爵士有某种瓜葛的 Neil 先生感到这个主题非常有趣,就加入进来。他认为本体的创建确实如 Flynn 所言,不是绝对的,受市场驱动,介于完全形式化和非形式化之间,而且要做到纯学术的形式化是非常困难的。他提出一个”市场导向论”,认为经济性和迅速普及是本体是否能够生存下去的评判标准。复杂性和功能满足可以作为进一步完善的目标。

From: neil.mcevoy@ondemand-network.com

Sent: Tuesday, January 18, 2005 2:11 PM

I thought I'd join in at this point as its very interesting thread. I'd like to say I work with TimBL in some way, but I don't, in any way… 😉

I'm inputting from a business point of view, which I think like in many technical projects does feel to be missing from the semantic web discussions, and suggest it offers a few points and ideas. Prompted by agreement with John Flynn, in that I'm working on the basis that in general the production of ontologies will be a dynamic balance of formal and informal processes, mainly driven my market demand.
One would imagine that within a purely academic context, consensual methods would be more difficult because let's just say there is more appetite for absolute technical correctness and authority with more likelihood of egos and ivory towers etc. I'm quite sure if they wanted to they could stretch out the process for years! 😉
What business adds is the imperative to get something working quickly, and the understanding that it doesn't need to be perfect to be useful. Hence why I see the balance of the two; in the early days of domain development there will be much greater freedom to define and implement with less formal controls, enabling small domain teams to drive the first chunk and make it available. The point at which you need a committee approach is to enable it to scale and become universal. Quite simply for example, if you want all the big media companies to adopt a single framework, they will all need some form of equalised involvement in its development, or they won't play ball. Once you have a large cross-company team working from all over the world together, the only way to facilitate it will be via committee processes. The general idea that a committee doesn't work is not correct because we can see it can; check out VISA for example.
I'd also suggest that what business will offer is the simplicity to get things moving along. Although I'm sure it will get much more complex, all you need to start creating business value is the simple bits. For example, a tag for [Graphic designers] so that you can search the semantic web for [Graphic designers] in [London]. Hardly a massive ontology, but would actually enable lots of flow of commerce.
So it seems it's less so about the complexities of ontologies at this stage, and more about universal adoption and basic foundations, such as the DNS equivalent for registries etc. ie everyone agreeing that [Graphic designers] is the common method, so that we can move on to defining more complex elements.

一个意大利人 Dario 跳出来说了一个悖论:任何机器是无法达成共识的,必须翻译成人的语言。那么机器怎么知道是否翻译成人的概念体系了呢?

From: Dario Bonino

Sent: Tuesday, January 18, 2005 6:41 PM

I thought I'd join in at this point as its very interesting thread. I perfectly agree with Sashi about the process of ontology creation, however there is a point that it is not clear, wheter or not human knowledge and machine knowledge should have a contact point. In the last case I think that, at this moment, we are committed to the human classification. In other words, we could extract many clusters (or other, I don't know which is the exact term, sorry for my english) using LSI, or similar techniques but we also need a group of humans saying “ok, for a human being this cluster means that concept” at least with a certain degree of confidence… This is the biggest problem I think, the join point between human and machines. In my opininion, it doesn't matter where the join point is,
on the ontology rather than on mapping automatic extracted knowledge to human knowledge.
The problem is in that, if we want to deal with human beings we need humans to tell about what resources are… I don't know any machine thinking like humans, until now….

那个 MIT 的学生可能对于他的帖子中的文法错误感到不好意思,出来对着个话题作了一个很好的总结。看得出来这个后生还是有不少研究的,在这个领域。

1 他认为本体创建中机器、人工的参与比例应该为 8 : 2 ;

2 顶层本体可以为人创造,但领域本体可以完全由机器创建,并与顶层本体合并;

3 出于他的直觉,感到人创造的本体会给机器处理带来复杂性,于是他建议最大程度地利用机器创建本体,把人放在创建本体的流程中很不合理(按:这是一个被计算机科学毒害了的青年);

4 自动创建的本体即使只有 10% 可用,也比人工创建的好;

5 语义网之所以没有得到大的发展,都是因为本体创建太慢造成!!!

然后举了一大堆例子( MIT 数据中心的人怎么说 … ,这些人多么牛逼 … ,如果他们以及沃尔玛 / 戴尔等能够应用 S/W ,将使 S/W 成为 Kill App… ),强调说明他的第 5 点。

From: Shashi Kant

Sent: Tuesday, January 18, 2005 8:09 PM

Hello Charles and everyone for responding and making this an interesting discussion. IIRC this thread has turned out to be one of the most interesting on this forum for a very long time. First off, let me apologize for the poor grammar and typos in my last post …I was very sleep-deprived and tired..take pity on me I am @MIT 🙂

1. I largely agree with the positions that Charles, Dario et al have taken, that ultimately we may end up with a hybrid approach to Ontology creation – a combination of machine-generated with human-generated. If I were to hazard a guess… perhaps in 80/20 proportion.

2. I would take another guess at this and say that the majority of top-level Ontologies would likely be human-generated, and most domain-specific ontologies would be machine generated. Perhaps Aligned and/or merged with the top-level ones.

3. Another thing counter-intuitive about the idea of human-generated Ontologies is …after all the semantic web is about making the web machine-comprehensible, so why not automate the Ontology generation process to the extent possible? It just does not make sense to place humans in the middle of this process.

4. I would further argue that if someone were to come up with a good IR algorithm and feed the encyclopedia Britannica to it. The resultant Ontologies may be contain..say only 10% of the concepts/relations in that domain. But that's 10% (some might say 10^n %) better than nothing! Take Charles' example – “medieval European Recipes”. Unless someone really has a vested interest in creating a domain Ontology for medieval culinary art I would doubt anyone would ever bother creating one. I would be very surprised if DARPA or MIT or Stanford would fund a medieval cooking ontology creation committee.

5. The semantic web idea has been out there for quite a while now, but we don't really have very many Ontologies that can claim to be acceptably complete. Ontology availability is, IMHO (以愚之见) , the single biggest challenge of the semantic web and what's really holding the semantic web back. Unless you provide “real-world” applications (no hand-waving) for people to create Ontologies, they just cannot be bothered to do so. It's that simple.

Bottomline: One doesn't get more chicken-and-egg than this!
“It is unrealistic to believe that any independent body of academics or practitioners could formulate an all-inclusive canon that would stand the test of time. The ontology approach is a throwback to the philosophy of Scholasticism that dominated Western thought during the high middle ages. History has proven that canonical structures, meant to organize and communicate knowledge, often have the unintended outcome of restricting the adoption of further innovations that exist outside the bounds of the canon.”

That is how an MIT Data Center paper (www.mitdatacenter.org) puts it. While this opinion may be the other extreme of the spectrum, I think it sums up how the Walmarts, and the Dells of the world see the semantic web today. This is very unfortunate, because the semantic web badly needs the ballyhooed “killer app”, and the coming “data tsunami” because of RFID systems, sensor networks
etc. would have been a good, good one.

BTW MIT Data Center is an offshoot of the former MIT Auto ID center – the people who came with the EPC standards for RFID etc. So their buy-in would have been a huge boost for the S/web. It now looks they are going their separate ways – in fact they are even proposing a new modeling language called “M” (counterpart of OWL).

If you are interested I recommend reading up on their website – their contrarian viewpoint is fascinating.

Sören 又回过头来澄清一些问题,并给出了几个例子,看法比那些纯”计算机”头脑要现实、全面、理性得多,但是不知道是否能够说服那些机器脑子。国外著名大学的研究生们对于许多问题的理解好像也并不一定都很准确。

From: Sören Auer

Sent: Tuesday, January 18, 2005 9:45 PM

I'm a bit confused since all of you seem to understand Ontologies as a tool for arbitrary knowledge representation. As I mentioned in my last posting I don't think they are prepared to solve this task (especially if based on Description Logic as OWL).
Textual knowledge on websites contains so many vaguenesses, contradictions and exceptions. Humans can cope with them and sometimes it's even easier (for us synapse based reasoners) to get the spirit of an idea if it is described from contradictory viewpoints. But I'm quite sure machines won't be able to do the same at least within next 20 years or so.
Artificial intelligence research developed a variety of theories to make machines more intelligent in the human way. I'm not an expert in default reasoning, nonmonotinicity or horn logic, but my impression is that they are still far from being efficiently applicable. Description Logics and ontologies probably are a bit more mature but still there are many open problems (such as perspective reasoning, linking, merging, reconciliation, versioning). Even if all those problems are solved and if you manage to automatically generate ontologies from textual documents the benefit won't be much better than todays elaborated full-text searches, since DL can't (and is not intended) to cope with vaguenesses, contradictions and exceptions at all. And already one contradiction makes any further DL reasoning more or less senseless.

Already today quite much of the current web content is structured in proprietary database schema, xml-dialects. Here I think is the real impact of a terminological knowledge representation like OWL – defining globally shared, common vocabularies for distributed searching, view generation, querying, syndication of such structured data.

Projects in this context like – OWL-S/WSMO (description for automatic selection/composition of web-services),

– D2RQ (Treating non-RDF Databases as Virtual RDF Graphs)
– future (Semantic) WebApplications (you can have a look at my Powl
project for this – http://powl.sf.net) seem very promising to me.

For applications intended by the W3C you can have a look at the “OWL Web Ontology Language Use Cases and Requirements” document ( http://www.w3.org/TR/webont-req/).
Of course enriching arbitrary web pages with terminological classifications may be an application as well. But I think even this won't be possible automatically in a quality that gives us an real impact. But I'm open to conviction. 😉

Alex 又对解决文本知识的模糊性进行了展望,似乎技术还是可以解决这些问题的。看来这个话题还没有结束,让我们拭目以待。

From: Alex Abramovich

Sent: Thursday, January 20, 2005 6:10 PM

Yes, textual knowledge vagueness is a stumbling block of SW investigations. But it has an own nature that one can to make clear. What just is vague? A current operational context is uncertain. Nothing shall prevent us from building a library of operational contexts today!
An analysis of a sentence (based on this library) will derives a set of expectations of operational contexts. An analysis of subsequent sentences will confirm one of them.
It seems to me that something similar to this approach suggested Roger Schank (“Conceptual Dependency”).


Technorati : , , ,