On Term Formation: Consistency, Standards, and Exceptions

Term Formation in Taxonomy

As taxonomy consultants at Factor, we are frequently engaged to help our clients clean up and improve existing taxonomies; these have often developed ad hoc and, often, have not been constructed or managed by taxonomists. This is fine! Having controlled lists of tags is better than not…having controlled lists of tags, and it’s a great place to start.

Oftentimes one of the primary problems is with term formation: as terms are added on an as-needed basis (say, when a new tag is required) inexperienced taxonomists (and non-taxonomists, whom I do realize make up the vast majority of the population) add terms that make sense to them. Again: this is a fine place to begin. But good term formation is the foundation upon which most taxonomy practice rests.

The various taxonomy standards (ANSI/NISO Z39.19 and ISO 25964-1, which I understand is currently being updated! Check out the session by Margie Hlava and Joseph Busch at Taxonomy Boot Camp next week) provide some basic guidelines (this is not an exhaustive list):

1. Labels must be unique

This is really, I think, the First Rule of Taxonomy; it’s designed to prevent cases like:

Chemical elements

  • Mercury

Planets

  • Mercury

This is no good; although I really hate parenthetical qualifiers (as they are not natural language expressions) the options are limited: you can use “Mercury (planet)” or “Planet Mercury” here, although that looks pretty dumb in a list of other planet names.

2. Labels must be sensible without reference to the hierarchy

You should not have to look at a parent term to figure out what a concept is. This is I think the Second Rule of Taxonomy, and it’s the one most non-practitioners frequently get wrong. This is an example from eBay (way back in the day) that has stuck with me:

Musical instruments

  • Cases

Yeah; no. I know it’s verbose, but this really has to be “Musical instrument cases” or you’re going to have real problems. The term has to stand alone and express the concept. Besides, what are you going to do with

Eyeglasses

  • Cases

…or any number of other things that have Cases? (Because: Labels Must Be Unique.)

3. Use nouns and noun phrases (i.e., instead of verbs)

This is pretty intuitive: use (example taken from Z39.19: read it!) “Distillation” or “Distilling” instead of “Distill”

4. Prefer plural versions of concept labels

Again, with some exceptions, this is pretty easy to put into practice; instead of:

Content type

  • Article
  • Blog
  • White paper

You should prefer:

Content types

  • Articles
  • Blogs
  • White papers

5. Use natural language versions of terms, and avoid stuffing abbreviations and alternative versions into preferred labels

Natural language expressions make terms easier to understand and are friendly to NLP-based autocategorization tools and other machine-assisted tagging and classification. They also look better:

Internet of Things (IoT) developers or engineers

…is not a phrase that will ever appear in text; this is essentially keyword stuffing. It’s far better (assuming you have a dedicated taxonomy management tool or some other way of storing and tracking altLabels/NPTs).

IoT Developers

UF: IoT engineers

UF: Internet of Things developers

UF: Internet of Things engineers

6. Avoid abbreviations and acronyms, with a few exceptions

This gets garbled a lot in large enterprises where acronyms tend to be thick on the ground. The basic rule of thumb is: use acronyms when they have replaced the spelled-out version of natural language terms. Thus

Lasers

  • AIDS

…are perfectly good preferred labels, as no one needs to write or read “Light Amplification by Stimulated Emission of Radiation”. Of course, it is always good practice to store the long versions of such terms as synonyms.

7. Use initial caps only (or adhere consistently to your organization’s style guidelines)

Using initial caps is far less important than most of the other guidelines in the standards; what’s important is consistency. Camel case, initial caps, whatever works–just use the same one all the time, across taxonomies. This might seem mostly cosmetic, but consistency gives the impression (true or not) of good governance!

All of this is good practice and great advice. I highly recommend familiarity with the guidelines laid out in the standards for all practitioners.

Some Exceptions

However, the standards are absolutely designed for taxonomies for information retrieval. This means taxonomies for tagging and perhaps search and browse interfaces–but mostly for tagging. Today, many taxonomies are user-facing (for, say, site navigation or e-commerce browsing) or have other applications, which may or may not include tagging and retrieval. In such cases, other considerations are important; these may occasionally conflict with the standards. But if (for example) people can’t find what they’re looking for on your website, they can’t buy it, so having labels (or at least some kind of Display Label) that users understand is critical.

Any taxonomy destined for user-facing display on a website will have requirements that sometimes conflict with the best practices outlined above (and, not to beat a dead horse, explicated in detail in the standards); these may include:

  • Display space limitations
  • Hierarchy level limitations
  • Labels that match a user’s mental model (as opposed to strict semantic labeling)

In such cases, of course, one hopes that a strict semantic and/or product taxonomy is somewhere below the surface; but the taxonomy and labels displayed to users might differ from the “real” taxonomy helping to structure things in the background.

So be aware of use case considerations as well as the standards and best practices. If anyone has stories to share I’d love to see them in the comments.

+ posts