Title / Titel SimPack - A Java Library for Similarity Functions
Summary / Zusammenfassung The question of similarity is a heavily researched subject in the computer science, artificial intelligence, psychology, and linguistics literature. Typically, those studies focus on the similarity between vectors [Baeza-Yates & Ribeiro-Neto '99, Salton & McGill '83], strings [Lord et al. '03], trees or graphs [Shasha & Zhang '97], or simple objects [Genter & Medina '98, Resnik '99, Lin '98].

In our case we are particularly interested in the similarity between concepts (complex objects) in ontologies. All measures are implemented in our Java-based generic similarity framework called SimPack

SimPack is intended primarily for the research of similarity between concepts in ontologies or ontologies as a whole. Possible other application areas of SimPack include

* the investigation of similarity between software source code. For instance to detect changes between classes of different software releases.
* the research of similarity between hierarchically-structured data, such as XML, to compare, search, or integrate data from different data sources.

SimPack is, for example, used in iRDQL that is an extension of traditional RDQL (RDF Data Query Language) that allows to query for similar concepts in ontologies.
Patrick Ziegler, Christoph Kiefer, Christoph Sturm, Klaus R. Dittrich, and Abraham Bernstein. Generic Similarity Detection in Ontologies with the SOQA-SimPack Toolkit (Demo Paper). To appear in 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD 2006). Chicago, USA, June 26-29, 2006. [pdf] [BibTeX]

Tobias Sager, Abraham Bernstein, Martin Pinzger, Christoph Kiefer. Detecting Similar Java Classes Using Tree Algorithms. To appear in MSR '06: Proceedings of the 2006 International Workshop on Mining Software Repositories, China, Shanghai, May 22-23, 2006. [pdf] [BibTeX]

Patrick Ziegler, Christoph Kiefer, Christoph Sturm, Klaus Dittrich, and Abraham Bernstein. Detecting Similarities in Ontologies with the SOQA-SimPack Toolkit. 10th International Conference on Extending Database Technology (EDBT 2006), Munich, Germany, March 26-31, 2006. [pdf] [BibTeX]

Abraham Bernstein and Christoph Kiefer. Imprecise RDQL: Towards Generic Retrieval in Ontologies Using Similarity Joins. 21th Annual ACM Symposium on Applied Computing (SAC/SIGAPP). Dijon, France, April 23-24, 2006. [pdf] [BibTeX]

Abraham Bernstein and Christoph Kiefer. iRDQL Prototype Description (Demo Paper). Proceedings of 15th Workshop on Information Technology and Systems (WITS). Las Vegas, Nevada, United States. 2005. [pdf] [BibTeX]

Abraham Bernstein and Christoph Kiefer. iRDQL - Imprecise Queries Using Similarity Joins for Retrieval in Ontologies (Poster Paper). 4th International Semantic Web Conference (ISWC). Galway, Irland, November 6-10, 2005. [pdf] [BibTeX] [poster]

Abraham Bernstein and Christoph Kiefer. iRDQL - Imprecise RDQL Queries Using Similarity Joins. Third International Conference on Knowledge Capture (K-CAP). Banff, Alberta, Canada, October 2-5, 2005. [pdf] [BibTeX]

Abraham Bernstein, Esther Kaufmann, Christoph Kiefer, and Christoph Bürki. Simpack: A Generic Java Library for Similiarity Measures in Ontologies (Working Paper). Department of Informatics, University of Zurich, 2005. [pdf] [BibTeX]

Keywords / Suchbegriffe Similarity, code, java, library
Prof. Abraham Bernstein, PhD (Project Leader)  
Universität Zürich (position pursuing an academic career)
Duration of Project / Projektdauer Feb 2006 to Jul 2010