Skip to main content.

About Taxonomies & Controlled Vocabularies

Page Contents:

Controlled Vocabularies

A controlled vocabulary, also called an authority file, is an authoritative list of terms to be used in indexing (human or automated). A controlled vocabulary for a project might actually include multiple authority files for different kinds of terms.

Controlled vocabularies are used to ensure consistent indexing, particularly when indexing multiple documents, periodical articles, web pages or sites, etc. They may also be used when indexing a single work, such as a encyclopedia, by multiple indexers.

Controlled vocabularies do not necessarily have any structure or relationships between terms within the list. Controlled vocabularies are often used for name authorities (proper nouns), such as persons, organization names, company names, etc.

Online controlled vocabularies often have synonyms or See references to point the user or search engine from an incorrect (unpreferred) variant to the equivalent preferred term in the controlled vocabulary. However, this is not a required attribute of a controlled vocabulary.

Controlled vocabularies are the broadest category, which includes thesauri and taxonomies. Thesauri and taxonomies are specific kinds of controlled vocabularies, but not all controlled vocabularies are thesauri or taxonomies.

Taxonomies

A taxonomy is typically a controlled vocabulary with a hierarchical structure, with the understanding that there are different definitions of a hierarchy. Terms within a taxonomy have relations to other terms within the taxonomy. These are typically: parent/broader term, child/narrower term, or often both if the term is at mid-level within a hierarchy.

Taxonomies are often displayed as a tree structure. Terms within a taxonomy are often called "nodes." A node may be repeated at more than one place within the taxonomy if it has multiple broader terms. This is referred to as a polyhierarchy.

Another type of taxonomy, with a more limited hierarchy, comprises multiple sub-taxonomies or "facets", whereby the top-level node of each represents a different type of taxonomy, attribute, or context. This is used on post-coordinated searching, whereby the user chooses a combination of nodes, one from each facet.

The use of equivalent synonyms or see references may or many not exist in a taxonomy. If a hierarchy is not too large and can be browsed, and especially if there are polyhierarchies, then there is less of a need for nonpreferred variants.

The term taxonomy tends to be used to refer to two different things:

  1. a tree-hierarchical controlled vocabulary lacking more complex relationships found in thesauri or ontologies, or
  2. any kind of controlled vocabulary, especially when applied to the world of enterprise content management and web site information architecture, rather than library science literature retrieval.

Thesauri

A thesaurus, as used in information science and literature retrieval, is essentially a controlled vocabulary following a standard structure, where all terms in the thesaurus have relationships to each other. These relationships are typically of three kinds: hierarchical (broader term/narrower term), associative (see also), and equivalent (use/used from or see/seen from). In addition, it is common in thesauri that some or all terms have scope notes, brief explanations of how the term should be used in indexing. Term history notes may also be present.

Thesauri are most often used in indexing periodical literature, especially over a period of time. Detailed thesauri have been created for specialized subject areas by publishers of periodical indexes. Often the thesauri themselves have become published works for purchase.

Ontologies

An ontology, like a thesaurus, is a kind of taxonomy with structure and specific types of relationships between terms. In an ontology the types of relationships are greater in number and more specific in their function. Relationships could include, for example, located in to relate an organization to a place, produces/is produced by to relate a company and its product, and employs/employed by to relate a company and a person. Information, that in a simple controlled vocabulary or taxonomy is conveyed through indexing, is embedded into the ontology itself.

Ontological relationships are used in more complex information systems, such as the Semantic Web.

^ TOP