Paper-M-108
21st May 2023
Vocabulary is a set of terms (words, codes, etc.) that are used in a specific community
It provides a mechanism for communication (written, oral, or electronic) as the meaning of terms are known & agreed upon by the community members
When a vocabulary is formally managed, it becomes Controlled Vocabulary. Where managed means the terms are stored & maintained using agreed-upon procedures
Procedures should exist for adding terms, modifying terms, & more rarely, deprecating terms from a controlled vocabulary
Indexes | CVs | |
---|---|---|
End product | index | term list |
Use | content locator | content tagging, website navigation, search enhancement |
Project time | weeks | months |
Methodology | reading | research |
CV is a carefully selected list of words and phrase, which are used to tag units of information so that they may be more easily retrieved by a search
The terms are chosen by and organized by trained professionals (including Librarians and information scientist) who posses expertise in the subject area
CV terms can accurately describe what a given document is actually about, even if the terms themselves do not occurs within the text
Fully developed CV systems are LCSH, Sears List, Thesaurus etc.
In other words, CV is a collection of terms that are:
It can help users to find data (also known as a discovery vocabulary)
It can assist in the interpretation of data (also known as usage vocabulary)
It can provide human-understandable meaning (also known as semantic vocabulary)
It can produce machine-readable format information (also known as syntactic vocabulary)
From the above we can say that CV ensure consistencies in indexing, tagging or categorizing and to guide the user where the desire information is
Most important characteristics of CV is relationship
Terms in CV are related in certain ways:
It is the most basic term relationship in synonyms
It is important to note that context is important in determining synonyms
Example, if you use “automobiles” on your homepage & “cars” on next page, users might get confused. Users might start to wonder if there is a difference between the two terms. Instead you choose “automobiles” & don’t use “cars”.
So, using “automobiles” as “preferred term” & “cars” as variant term that is representing the same concept
There are many examples where alternate terms are used:
Terms display hierarchical relationship when one term is broader in meaning than its child terms (which has a narrower meaning)
Pairs of the terms are represented in their super ordinate & subordinate status
Super ordinate term (BT) representing the whole while the subordinate term (NT) representing a member or a part
EG. LIBRARY USERS
It can help you formulate your homepage navigation. It could improve your searching and browsing. It can help users broaden and narrow their search results quickly by showing them where each set of results fits into the site’s hierarchy
It denotes the relationship between the term that is neither hierarchical nor equivalence
Yet the terms are mentally associated to such an extent that link between them should be made explicit in CV and revel alternative terms that could be used in indexing or retrieval
It is very difficult to define the relationship between the term and therefore to determine between a pairs of terms, some guidelines are follow:
These are the lists of terms that are used to control the variant names for an entity or the domain value for a particular field
EG. names for countries, individuals, organizations
Sometimes within a catalog there are different names or spellings for only one person or subject. This can bring confusion since researchers may miss some information. Authority control is used by cataloguers to collocate materials that logically belong together but which present themselves differently.
Records are used to establish uniform titles which collocate all versions of a given work under one unique heading even when such versions are issued under different titles, such as different spelling, pen names etc. The unique header can guide users to all relevant information including related or collocated subjects. Authority records can be combined into a database and called an authority file and maintaining and updating these files as well as “logical linkages” to other files within them is the work of librarians and other information cataloguers.
A glossary is a list of terms, usually with definitions
Terms may be from a specific subject field or from a particular work
Terms are defined within a specific environment and rarely include variant meanings
EG. Environmental Protection Agency (EPA), Terms of the Environment, Glossary of Library and Information Science
Dictionaries are alphabetical lists of words and their definitions
Variant senses are provided where applicable
Dictionaries are more general in scope than are glossaries
They may also provide information about the origin of a word, variants
(by spelling and morphology), and multiple meanings across disciplines - While a dictionary may also provide synonyms and through the definitions, related words, there is no explicit hierarchical structure or attempt to group them by concept
A gazetter is a list of place names
Traditional gazetters have been published as books or have appeared as indexes to atlases
Each entry may also be identified by feature type, such as river, city, or school
Geospatially referenced
gazatteers provide coordinates for locating the place on the earth’s surface
The term gazetteer has several other meanings, including announcement publication such as a patent or legal gazetteer
These gazetters are often organized using classification schemes or subject categories
EG U.S. Code of Geographic Names
This scheme type provides a set of controlled terms to represent the subjects of items in a collection
The main objective of subject cataloguing is to fulfill the subject related needs of the readers
Subject Heading scheme helps the cataloguer/indexer to
summarizing the thought content of the document in to a no. of accepted term
Subject Heading schemes are:
MeSH is the controlled vocabulary thesaurus that gives uniformity and consistency to the indexing and cataloging of biomedical literature . It connects all the different ways to express a concept , such as “cancer.”
MeSH are standardized vocabulary of app. 20,000 terms that describe the biomedical concepts covered in the MEDLINE/PubMed database. MEDLINE is directly searchable from PubMed
MeSH consists of a set of terms or subject headings that are arranged in both an alphabetic and a hierarchical structure
MeSH thesaurus is produced by the National Library of Medicine (NLM)
When each article is indexed, an indexer at NLM assigns from 5 to 20 headings describing the concepts covered in the article
MeSH headings are powerful searching tools
They locate documents by assigned controlled vocabulary, not free text words, and are independent of the occurrence of specific words in any other field
MeSH headings allow you to retrieve all references to a particular topic, even if different terminology was used in the records
They are associated with MeSH main headings to pinpoint the specific aspect of the concept represented by the subject heading
They are a way of grouping together those citations that are concerned with a particular characteristics
of a subject
These are special use descriptors that do not represent subject matter per se but that reflect parameters or aspects of subject concept
Special efforts in indexing assures that these will be included or “checked” each time they appear as
aspects in an item being indexed
Following list of descriptors must be entered by an indexer for very journal article citation to which they apply
They describe the type of publication being indexed (i.e. format of the publication) or characteristics of the research (i.e. research design)
They are also Publication Type term that describe
what type of organization funded the research
They are of 3 main categories:
LCSH came in to existence in the year 1898 by the Library of Congress (LoC), USA and is also maintained by the same
LCSH system was originally designed as a controlled vocabulary for representing the subject and form of the books and serials in the LoC collection
Now it is widely accepted by Libraries & Information Center around the world
LCSH is also known as “Big Red Books”
It consists of 5 volumes and published annually
Subject authority records are available online
The last print edition was published in 2016. Access to the continuously revised vocabulary is now available via subscription and free services
At present running edition is 44th (2022)
There were 382,713 authority records in the file as of March 2022
The creation and revision of subject headings is a continuous process. Approximately 4,000 new headings, including headings with subdivisions, are added to LCSH each year
The fundamental principle guiding the development of LoC subject headings system are effective responses to
A. Single Concept Headings
B. Pre-coordinated Multiple-Concept Headings
It is that part of the subject heading string which represents the main concept without subdivision
Main headings may be categorized according to their functions (topical headings, form headings, & different kinds of proper name headings)
A. Topical Headings
B. Form Headings
A form heading reflects the form of the material
There are various forms of reading material in the library. Eg. (a) Bibliographic Form (b) Artistic and Literary Form
Topical and form headings
All main headings consist of single nouns or noun equivalents. Noun equivalents may be in the form of adjectives or gerunds or in the form of adjectival phrases, conjunctive phrases, or prepositional phrases
Qualifiers are added to headings when necessary
A. Single Noun Headings
Many topical and form headings consist of a single noun or a noun equivalent in the form of a single adjective or gerund
Nouns representing concrete objects are normally in the plural form, and nouns representing abstract concepts appear in the singular. Examples Enzymes, Running, Art, Education, Religion, Philosophies, Deaf, Agriculture
B. Phrase Headings
Some concepts that involve two areas of knowledge can be expressed by more or less complex phrases. Example Bible as literature, Freedom of information
There are various types of Phrase Headings which are as follows:
Direct Subdivision: Music-Japan, Music-California
Indirect Subdivision: Music-France-Paris, Music-Ontario-Toronto
A heading may contain a single concept or a combination of multiple concepts
The combination may be formed when the heading is being established or when it is assigned to a particular bibliographic item
A. Multiple-concept main headings: Children and politics, Electricity in art, Religious education of teenage boys
B. Headings with Subdivisions: Birth control-Moral and ethical aspects, Cinematography-Electronic equipment, Philosophy and Ancient-Oriental
A. Equivalence relationships
USE references are made from unauthorized or non preferred terms to an authorized or preferred heading
UF (Used For) precedes the term not used
The codes USE and UF function as reciprocals
B. Hierarchical relationships
C. Associative relationships
D. General and Specific references
A general reference is a reference made not to specific individual headings but to an entire group of headings, frequently listing one or more headings by way of example
It is denoted by see also (SA)
Example:
Chemistry
The assignment of subject heading for audiovisual and special instructional materials should follow the same principles that are applied to books
The heading most specifically describing the contents of the material should be used
Example:
American poetry-Periodicals
Tuberculosis-Statistics-Periodicals
Jesus Christ-Travel-Palestine-Maps-To 1800
Accounting-Periodical
A) Definitions
B) Relations to other headings
C) Instructions, explanations
A Library of Congress Classification number is added to a heading if the caption for the number is identical or nearly identical in scope, meaning, and language to the subject heading, or if the topic is explicitly mentioned in an “Including” note under the caption for the number
Multiple class numbers may be added to a heading when the subject is treated from more than one perspective
For the heading of a subject covered by a span of class numbers, the full span of pertinent class numbers is included
It is American biased
The words that are used in it are the words that are popular in American dialect and which are not popular to Indian conditions
Many discrepancies regarding the subject headings can also be seen in it like
Labor-Labour
Color-Colour
Elevators-Lifts
The Sears List of Subject Headings (popularly called the Sears List) is a known tool for assigning standardized subject headings to all types of documents in a general small libraries having up to 20,000 titles in all subjects
Sears List of Subject Headings was first designed in 1923 by Minnie Earl Sears (1873-1933)
and has been continuing with her name
It was designed with the objective of small libraries for simple and broader subject headings
The first edition contained only 3200 preferred headings
The 2nd (1926) and 3rd (1933) editions were again edited by her
From fourth to fourteenth came in to exist in between (1939/1991) with addition of new word, modernize the terminology of old ones and so on
But the format continued the same with some new features such as the addition of Abridged DDC numbers
It is based on the principles of the LoC Subject headings
The principles are:
It means that specific headings should be entered directly as the lead point, instead of a subdivision. For example
If the word is more than one spelling, then the most popular one chosen for common usage. For example
“USE” directing us to the preferred heading, e.g.
Dairying
UF Dairies
Dairy farming
Dairy industry
BT Agriculture
NT Dairy cattle
It is the classification of entities in an ordered system that indicates natural relationships
Thus, taxonomy is a controlled vocabulary used to describe or characterize explicit concepts of information, fur purpose of capture, management, & presentation
A defined data model that describe structured and unstructured information through
Ontologies provides context
Effective ontologies require a deep understanding of the knowledge domain
Example: Digital Library
Application Silos: An application that does not interact with other applications or information systems
Not just the documents, not just the high-level links between applications but to connect the data at a lower level
So that that the data stored on one application would be shareable and connectable in another app
Go beyond just the document and get at the data level so that specific data elements can be
referenced between documents
The main advantage is that you do not have to think about specific document but think about data and information
As Web 1.0 enabled you to not have to think about where the information was sitting you didn’t have to think about the network layer and the machine layer
Q1: Explain why controlled vocabulary is important for information storage and retrieval.
Q2: Define the concepts of ‘Controlled Vocabulary’, ‘Ontology’ and ‘Thesaurus’? Discuss the various types of relationship among the terms in a thesaurus.
Q3: What do you understand by Semantic Web? Discuss the key technology of Semantic Web.
Q4: Explain the need and purpose of controlled vocabulary in information storage and retrieval. Discuss thesaurus as a tool in information organization and retrieval and their relationship among the terms found in a thesaurus.
Q5: What do you understand by controlled vocabulary? Discuss the need and usefulness of controlled vocabulary in information storage and retrieval
Q6: Discuss the structure, principles, and usefulness of Sears List of Subject Headings in information storage and retrieval
Sears List of Subject Heading
Semantic Web