Thing 12: Vocabularies for data description

Data descriptor, keyword, subject … these are all terms commonly used when discussing metadata.  Learn about the use of controlled vocabularies to enhance data discovery.

  • Get started: Control your language, please!
  • Learn more: Make a contribution to Research Vocabularies Australia
  • Challenge me: KWA - the CSIRO  Science Keyword Aggregator - a service and widget

Getting started: 

Controlled vocabularies for data description

In addition to selecting a metadata standard or schema, whenever possible you should also use a controlled vocabulary. A controlled vocabulary provides a consistent way to describe data - location, time, place name, and subject.

Controlled vocabularies significantly improve data discovery. It makes data more shareable with researchers in the same discipline because everyone is ‘talking the same language’ when searching for specific data e.g. plants, animals, medical conditions, places etc

1. Start by browsing Controlling your Language: a Directory of Metadata Vocabularies from JISC in the UK. Make sure you scroll down to 5. Conclusion - it’s worth a read.

2. We are going to see some controlled vocabularies in action in the Atlas of Living Australia. We are going to use the Species Search: type text:magpie in the search box. Choose your favourite magpie species and click on the red View Records button under the occurrence records map. Pick a record and click on the View record link. Any metadata field where you see Supplied… tells you that the information supplied by the person who submitted the record (often a ‘citizen scientist’) has been changed to the controlled vocabulary being used in metadata fields e.g. Observer, Record date and Common name.

If you have time: have a browse around the stunning level of data description and data contained in the Atlas of Living Australia.

Consider: How do you think we could encourage people to use controlled vocabularies in their data descriptions?

Learn more

What controlled vocabularies exist?

Think about (or find out!) what standard vocabularies are used, or could be used, by research groups in a discipline which interests you. Note there may be more than one vocabulary used in a discipline.

1. Choose a vocabulary and determine if it would be of use to people working in a specific field. Try the University of North Carolina’s Metadata Tutorial to browse for discipline and general vocabularies.

2. Research Vocabularies Australia (RVA) is a service that helps you find, access, and reuse vocabularies for research. Go to the RVA Portal, use  and see if your chosen vocabulary is included.

Consider: why your vocabulary should (or shouldn’t) be included in RVA.

Challenge me 

Supporting multiple vocabularies

The Science Keyword Aggregator has been developed by CSIRO and released as open source to allow others to adapt and reuse. It is a service that allows users to search for defined keywords across a range of managed vocabularies.

A widget that uses the service is also available for system owners to embed in their application and thus provide a term search there.

  1. Start by viewing the metadata record describing the Keyword Aggregator (KWA) noting the rich description e.g. links to the source code and related materials.
  2. Now go to the Science KWA to read more about the KWA and try out the widget.
  3. Then take a look at the service documentation for the KWA web service.

Consider: Is this a tool that could be implemented in your organisation? How would you use it?

Do you have a question? Want to share a resource?

Keep on going to the next thing: Walk the crosswalk or return to all the things