One of the projects I am working on right now is the construction of OE, the Ontology of Evolution. OE is a project of the Darwin Digital Library of Evolution, which is itself a special project of the Biodiversity Heritage Library and the American Museum of Natural History Library. The immediate aim of producing OE is to provide an intelligent search capability for the library catalog of the Darwin Digital Library, which is still in its early design stages. The broader aim of OE is to provide a computation and classification tool for researchers in evolutionary biology and those who work in other fields but would like such a tool for application to their own disciplines.

Although the word “ontology” is familiar to philosophers, I do not intend it here in the usual philosophical sense. Rather, I am using it in the sense that librarians and information scientists use the term: to indicate the formal description of the concepts used in some discipline of study in a manner suitable for use as a classification system, or for computation.

OE will be modeled in OWL-DL, a decidable fragment of first-order logic sometimes called “description logic.” OWL, the Web Ontology Language, is designed for use in semantic web applications. OWL ontologies such as OE can be viewed, manipulated and created using ontology editors and browsers such as Protégé or GrOWL.

As a classification scheme, an ontology is richer than a thesaurus or a traditional subject index. Thesauri generally only indicate relationships of synonymy; traditional subject indexes typically only indicate whether one term is narrower than, broader than, or related to another term. An ontology describes the kinds of things in the domain of study of interest, and also, the relationships between them that are posited by the concepts being modeled. This is richer than the kinds of relationships I described above in connection with thesauri and traditional key word lists. For instance, consider the following relationship between geographical isolation and allopatric speciation.

Geographical isolation
-> Is a component sub-process of: allopatric speciation

It would not be possible to formulate this relationship in a thesaurus or traditional key word index. “Geographical isolation” is not synonymous with “allopatric speciation.” Geographical isolation is not a kind of allopatric speciation, and so it would be incorrect to represent the former as narrower than the latter. At best, a traditional key word index would indicate that the two terms are related; it would not provide any indication of how they are related.

To further illustrate the power of an ontology, consider the following terms and relationships:

natural selection
-> Is a cause of: adaptation
-> Is a component subprocess of: speciation
-> Has a kind: balancing selection
-> Has a kind: directional selection

To use OE as a classification and literature search system, terms describing processes, objects, particular individuals and other aspects of the natural world studied by evolutionary biologists would be applied to information resources. Searching a database of such terms and the corresponding resources, researchers would be led from resource to resource by moving among those tagged with related terms, and from term to term by looking for known papers of interest, and searching with key word tags on those papers. The process of resource discovery would be much richer than in the case of a thesaurus or traditional key word index, because the ontology-driven keywording system directs the researcher along paths generated by a rich description of the relationships of objects in the domain of knowledge. Researchers could also browse the ontology directly; looking for papers categorized under a given term, a researcher would find others on related topics, and also would gain some understanding of the nature of their relationship.

Building OE requires starting from scratch, for the most part. MeSH, the only controlled vocabulary that would be expected to serve as a source of key word terms about evolutionary biology, is greatly impoverished. There are few terms describing evolution, and many are incorrectly defined, or occupy places in the MeSH hierarchy that does not accurately represent their relative positions.

OE can also function as an addition to the semantic web. The semantic web provides an advantage over free-text searches of web pages provided by Google or other search engines because the semantic web provides a mechanism for distinguishing among terms and phrases that differ in meaning, even though they might have the same morphology. For instance, users searching the open Internet at Google for “Darwin” will result in many hits concerning the operating system created by Apple computer, as well as those concerning evolution. A semantic web search will group these search results apart from those having to do with the evolutionist, Darwin.

For the same reasons that OE can bring some organization to the Internet, it can bring organization to databases of literature in a way that citation indexing and free-text searching in article titles, abstracts, and author-assigned key words cannot. These searches are unable to detect differences between words that have a common morphology but differ in meaning. They are also unable to detect similarities between phrases and terms that share no common syntactic elements, but have the same meaning. The controlled vocabulary that will make up OE, together with the conceptual structure it represents, will facilitate both targeted searches and exploratory browsing among the linguistically heterogenous literature of evolutionary biology.

(In an article in the High Energy Physics Libraries Webzine, Arturo Montejo Ráez and Ralf Steinberger discuss the value of keywording in a large, heterogenous literature; my discussion in the previous two paragraphs has been strongly influenced by them. They do not argue for ontologies, but because ontologies are a kind of key word indexing strategy, they bring the same benefits to users as do thesauri and traditional subject indexes.)

Intelligent searching in a database of information resources, be they web pages on the semantic web or articles in a digital library, represents a computational use of the ontology. For instance, suppose that a researcher used “speciation” as a search key. An artificial reasoner searching the ontology-driven database of information resources would find papers about geographical isolation, because the reasoner would “infer” that such papers are relevant: in the ontology, geographical isolation is described as a component sub-process of allopatric speciation.

Representing the results of such intelligent searching by showing the degree and kind of relationship between terms would help researchers locate results of greatest interest. For instance, results from the “speciation” search should show the kinds of speciation—allopatric, sympatric, peripatric, etc.—as being more greatly related to speciation than geographical isolation. The latter is a sub-process of one kind of speciation process; a researcher may not want to see all papers on that topic, but steer toward those on sympatry. A “tree” view that can be expanded or closed, and that represents the number of papers on a given “branch,” is probably a good way to represent the results of this kind of intelligent search.

There are probably other computational uses of OE. For instance, OE might be used, like other ontologies such as the gene ontology or those provided by the Science Environment for Ecological Knowledge (SEEK) project, for hypothesis discovery. It also probably has important uses in bibliometrics.