THESAURUS

Paper-M-108

Dr. Manika Lamba

28th May 2023

THESAURUS - History, Definition

INTRODUCTION

  • Efficiency of an IR system depends on the indexing language used

  • Its capability to handle 2 fundamentally different but interdependent types of relationship between the terms used to represent the subject content of the document

  • Thesaurus has mainly been conceived in the context of post-coordinate indexing system but it can be used for pre-coordinated system as well

  • The word ‘Thesaurus’ is of Greek origin, literally meaning treasury or storehouse of knowledge

  • But in modern usage, it denotes a list of terms arranges according to their relationship of ideas

EXAMPLES

1. Unesco Thesaurus: A Structured List of Descriptors for Indexing and Retrieving Literature in the Fields of Education, Science, Social and Human Science, Culture, Communication and Information
2. Thesaurus of ERIC Descriptors
3. Thesaurus of Sociological Research Terminology
4. Thesaurus of Sociological Indexing Terms
5. Social Science and Business Microthesaurus: A Hierarchical List of Indexing Terms Used by NTIS
6. Political Science Thesaurus
7. SPINES Thesaurus: A Controlled and Structured Vocabulary of Science and Technology for Policy Making
8. Thesaurus of Psychological Index Terms Thesaurus of Engineering and Scientific Terms (TEST)
9. INSPEC Thesaurus
10. NASA Thesaurus
11. Thesaurus of Computing Terms
12. Thesaurus of Scientific, Technical and Engineering Terms
13. International Road Research Documentation (IRRD) Thesaurus
14. ASIS Thesaurus of Information Science and Librarianship
15. Thesaurus of Information Science Terminology
16. Food: Multilingual Thesaurus
17. Thesaurus of Agricultural Terms
18. The ISDD Thesaurus. Keywords Relating to Non-Medical Use of Drugs and Drug Dependence

EXAMPLES

DEFINITION

  • “A thesaurus may be defined either in terms of its function or its structure. In terms of function, a thesaurus is a terminological control device used in translating from the natural language of documents, indexers or users into a more constrained system language (documentation language, information language). In terms of structure, a thesaurus is a controlled and dynamic vocabulary of a controlled and dynamic vocabulary of semantically and generically related terms which covers a specific domain of knowledge”

DEFINITION (Cont.)

  • “A compilation of words and phrases showing synonymous, hierarchical, and other relationships and dependencies, the function of which is to provide a standardized vocabulary for information storage and retrieval”
  • “A controlled vocabulary arranges in a known order in which equivalence, homographic, hierarchical, and associative relationships among terms are clearly displayed and identified by standardized relationship indicator, which must be employed reciprocally”

DEFINITION (Cont.)

A thesaurus in the field of information storage and retrieval is a list of terms and/or of other signs (or symbols) indicating relationships among these elements, provided that the following criteria hold:

  1. the list contains a significant proportion of non-preferred terms and/or of preferred terms not used as descriptors;

  2. terminological control is intended.

HISTORY

  • Peter Mark Roget first conceived the idea of such compilation and brought out this Thesaurus of English words and Phrases in 1852 for the benefit of writers for looking for appropriate words to express their ideas

  • Helen Brownson is said to be the first person who used the term ‘Thesaurus’ in the context of IR in a paper presented in 1957 at the Dorking Conference in Classification Research

  • Hans P. Luhn was possible the first person to think in terms of ‘Information Retrieval Thesaurus’

  • The first Thesaurus used in an IR system was developed by Du Pont in US around 1959

HISTORY (Cont.)

Year Event
1959 Engineering Information Center of E. I. Dupont de Nemours developed the first true thesaurus
1960 Armed Services Technical Information Agency (ASTIA) produced the Thesaurus of ASTIA Descriptors
1961 American Institute of Chemical Engineers (AIChE) published the Chemical Engineering Thesaurus

HISTORY (Cont.)

1964 Engineers Joint Council (EJC) published the Thesaurus of Engineering Terms
1967 Thesaurus of Engineering and Scientific Terms (TEST)
1967 Committee on Scientific and Technical Information (COSATI) published the first set of guidelines for thesaurus construction
1970 Unesco Guidelines for the Establishment and Development for Monolingual Scientific and Technical Thesaurus

HISTORY (Cont.)

1974 American National Standards Institute (ANSI) Z39.19 - a US national standard for thesaurus construction
1974 First international standard for thesaurus constructionISO 2788

THESAURUS - Purpose & Use

PURPOSE & USE

  • To provide a map for a given field of knowledge indicating how the concepts of ideas are related to each other, which helps the indexer and searcher to understand the structure of the field of knowledge

  • To provide a standard vocabulary for a given subject

  • It provides consistent representation of the subject matter avoiding subject dispersion in output & input by controlling synonyms, quasi-synonyms, & homographs

  • Bringing together the term which are semantically related

  • To limit the no. of term that assign to be a document

  • To serve as search aid in retrieval

PURPOSE & USE (Cont.)

  • Its purpose is to promote consistency in the indexing of documents predominantly for post-coordinated ISAR

  • To facilitate searching by linking entry terms with descriptors

4 Principle Purposes are:

  1. Translation: To provide a means for translating the natural language of

authors, indexers, and users into a controlled vocabulary used for indexing and retrieval

  1. Consistency: To promote consistency in the assignment of index terms

  2. Indication of Relationships: To indicate semantic relationships among terms

  3. Retrieval: To serve as a searching aid in retrieval of documents

DIFFERENCE B/W LIST OF SUBJECT HEADING (LSH) & THESAURUS

LSH VS. THESAURUS

LSH Thesaurus
It is a complete list of names of subjects usually arranged in alphabetical order

It is a list of terms arranged in a helpful order.

In other words, it is a compilation of all isolate ideas that occur within a subject or group of subjects arranged in alphabetical order

LSH VS. THESAURUS (Cont.)

LSH Thesaurus
LSH were designed in view the needs of pre-coordinate indexes Thesaurus was designed to meet the special needs of the post-coordinate indexes
In LSH, the emphasis is on references from broader to narrower subject or downward references Thesauri have more elaborate network of referencing, the direction of each type of reference is clearly indicated

STRUCTURE & RELATIONSHIP

STRUCTURE & RELATIONSHIP

BASIC THESAURAS RELATIONSHIP

A. EQUIVALENCE RELATIONSHIP

  • It is the relationship between preferred & non-preferred terms where 2 or more terms are regarded for indexing purposes as referring to the same concept

  • It is denoted by USE and UF

  • It includes synonyms, lexical variants, quasi-synonyms, & upward posting

A. EQUIVALENCE RELATIONSHIP

B. HIERARCHICAL RELATIONSHIP

  • This relationship shows level of super-ordination and sub-ordination

  • It is used in locating broader and narrower concepts in a logically progressive sequence

  • The relationship is reciprocal and is set out in a thesaurus using following conventions:

    • BT(BROADER TERM)
    • NT(NARROWER TERM)
  • It includes generic relationship, hierarchical whole-part relationship, instance relationship, & polyhierarchical relationship

B. HIERARCHICAL RELATIONSHIP

  • It identifies the link between a class or category and its members or species

  • It is also known as Inclusion Relationship

  • EG.: VERTEBRATA

    • NT Amphilblia
      • Mammalia
      • Aves
      • Pisces
      • Reptilia

  • This relationship is between a general category of things and events, expressed by a common noun, and an individual instance of that category, the instance then forming a class of one which is represented by a proper name

  • EG.: SEAS

    • NT Baltic Sea

      • Caspian Sea

      • Mediterranean Sea

B. HIERARCHICAL RELATIONSHIP

  • It is the relationship between the term & its 2 or more super-ordinate terms

  • Some terms may belong to more than one hierarchy and consequently maybe related to more than one broader term & more than one set of narrower terms

  • EG.: Mamalia —Whale—Marine Animals

C. ASSOCIATIVE RELATIONSHIP

  • This relationship is found between terms which are closely related conceptually but not hierarchically and are not members of an equivalence set

  • The relation is reciprocal, and is distinguished by the abbreviation “RT” (Related Terms)

  • EG. TEACHING

    • RT Teaching aids
  • TEACHING AIDS

    • RT Teaching
  • Two types of associative relationship:
  1. Terms belonging to the same category (e.g., motorcycle / bicycle)

  2. Terms belonging to different categories

1. Whole-part (e.g., buildings / doors)
2. A discipline and the objects studied (e.g., ethnography / primitive societies)
3. An operation or process and the agent or instrument (e.g., motor racing / racing cars)
4. An occupation and the person in that occupation (e.g., accountancy / accountants)
5. An action and the product of the action (e.g., publishing / music scores)
6. An action and its patient (e.g., data analysis / data)
7. Concepts related to their properties (e.g., women / femininity)
8. Concepts linked by causal dependence (e.g., injury / accidents)
9. A thing or action and its counter-agent (e.g., pests / pesticides)
10. A raw material and its product
11. An action and a property associated with it (e.g., precision measurement / accuracy)
12. A concept and its opposite (e.g., single people / married people)

THESAUROFACET

THESAUROFACET

  • The concept was developed by Jean Aitchison in 1969

  • It is a multipurpose retrieval language tool

  • It consists of 2 sections:

  1. Faceted Classification
  2. Alphabetical Subject Index

FORMAT OF IR THESAURUS

FORMAT OF IR THESAURUS

  • An IR thesaurus may be arranged and presented in one or more of the following methods:
  1. Alphabetical - in which descriptors and cross references are arranged in alphabetical order

  2. Systematic or Classified - in which descriptors are arranged in their hierarchical order with level of hierarchy represented by indentions, dashes, dots, etc.

  3. Graphic – in which the hierarchy is shown by a tree or an arrowgraph

ESSENTIAL STEPS IN CONSTRUCTION OF THESAURUS

ESSENTIAL STEPS

ADVANTAGES OF THESAURUS

ADVANTAGES

  • It effects vocabulary control in the language being used in IR
  • It helps an indexer in selecting preferred terms
  • It provides more access points
  • It enables the searcher to find out not only information on a specific topic, but also, on all related topics
  • By using indexing terms and search terms from the same thesaurus, the speed of retrieval can be increased
  • It helps in obtaining high recall ratio & high precision ratio in information search

PREVIOUS YEAR QUESTIONS

Long Questions (12.5 Marks)

  1. Discuss the usefulness of a thesaurus in ISAR

  2. Discuss the need & purpose of a thesaurus in indexing & searching. Discuss with suitable examples, the various types of relationship among the terms in a thesaurus

  3. Discuss, the thesaurus as a tool in information organizing & retrieval and their relationship among the terms found in a thesaurus

Short Questions (5 Marks)

  1. Steps in Thesaurus construction