陈炘钧教授DLI2课题的研究思路

数字图书馆先导研究计划中的互操作项目
陈炘钧教授的研究思路

陈炘钧教授在其申报NSF DLI2项目(High Perfermance Digital Library Classification System: From Information Retrieval to Knowledge Management)中提出了”自底向上”的方法,解决网页的自动聚类、成树以及自动摘要和可视化的工作,这个纯计算机的方法应该可以被Google之类的搜索引擎(或领域门户,正如陈教授在Case Study中举的几个应用实例一样)使用,这个方法不适用于我的论文选题,以及目前W3C所作的事情,甚至我都不同意陈教授将他的研究内容称为”解决语义互操作问题”,他所提供的方法大多是采用机器自动(或借助本体)发现”语用”信息结构,而我们所想的是从体系架构上解决问题,属于”自顶向下”的方法,当然实现的社会性难度更大。恐怕最好的方式应该是这两种方法的结合,当然应该到这两种方法都取得一定的成果,相对比较成熟之后结合比较好。

  • concept spaces and category maps in the Illinois project
  • textile and word sense dis-ambiguiation in the Berkeley project
  • voice recognition in the CMU project
  • image segmentation and clustering in the UCSB project

上面这些方法都不是我计划涉猎的主题:”数字图书馆先导研究计划中的互操作项目”。再探再报。


教授引述的IITA的一段话可用于论文引用:

The Information Infrastructure Technology and Applications (IITA) Working Group, the highest

level of the country’s National Information Infrastructure (NII) technical committee, held an invited workshop in May 1995 to define a research agenda for digital libraries.

The shared vision that emerged is an entire Net of distributed repositories, where objects of any type can be searched within and across different indexed collections [45]. In the short term, technologies must be developed to search across these repositories transparently, handling any variations in protocols and formats (i.e., addressing structural interoperability [35]). In the long term, technologies must be developed to handle the variations in content and meanings (knowledge) transparently as well. Meeting these requirements constitutes steps along the way toward matching the concepts requested by users with objects indexed in collections [44].

The ultimate goal, as described in the IITA report, is the Grand Challenge of Digital Libraries:

“deep semantic interoperability – the ability of a user to access, consistently and coherently, similar (though autonomously defined and managed) classes of digital objects and services, distributed across heterogeneous repositories, with federating or mediating software compensating for site-by-site variations…Achieving this will require breakthroughs in description as well as retrieval, object interchange and object retrieval protocols. Issues here include the definition and use of metadata and its capture or computation from objects (both textual and multimedia), the use of computed descriptions of objects, federation and integration of heterogeneous repositories with disparate semantics, clustering and automatic hierarchical organization of information, and algorithms for automatic rating, ranking, and evaluation of information quality, genre, and other properties.”


Technorati : ,

Popularity: 23% [?]

Share and Enjoy:
  • Print this article!
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • BlinkList
  • Blogosphere News
  • co.mments
  • connotea
  • Diigo
  • E-mail this story to a friend!
  • Live
  • RSS
  • Socialogs
  • Yahoo! Bookmarks
Tags: DLI, 数字图书馆, 数字图书馆, 陈炘钧

Related posts

Leave a Reply