LIS 4/5523: Online Information Retrieval
In Part 1, we looked at information systems and database structures. We also addressed representation and indexing from the data modeling and technical side
In this lecture, Part 2, we are going to look at representation and indexing from the human/creator and retrieval side
Descriptive Data
Subject Representation
Definition
A system for choosing or highlighting some characteristics (attributes), together with a specification of the rules for selection (code)
This implies a tradeoff – if some characteristics are highlighted, other characteristics are left behind
ENTITIES: objects or conceptsATTRIBUTES: characteristics of entitiesDIACHRONIC Attribute: stable across timeSYNCHRONIC Attribute: changes across timeauthor, title, publisher, number of pagesDefinition
Determining intellectual content or subject content or aboutness
Types
Document Analysis: information professional (cataloger, indexer) studies document to determine document surrogate for system
Query Analysis: information professional (intermediary) or end-user studies user request to determine search terms
Familiarization: Acquainting oneself with general content of document and query
Extraction: Identifying pulling out significant concepts and natural- langugage terms
Translation: Converting extracted terms into controlled vocabulary of system
Formalization: Applying rules for exact format, spelling, punctuation, codes, etc. for input to system
Subject analysis is a dance….
based on literary warrant (information objects/author’s intentions) and on user warrant (user needs)
requiring evaluation and verification at every stage in a continuous, iterative process
During production of primary document
Author's abstract and/or index
Indexing commissioned by publisher
Cataloging in publication (CIP)Prior to storage for retrieval
Cataloging or indexing by bibliographic utility
Cataloging or indexing by individual libraryDuring information retrieval
Problem statement or question from user
Query formulation by intermediary or userIndexing
Process of creating index for purpose of representing and providing access to information objects
May be performed by humans or computers
Index Entry
Any pointer or indicator included in an index
Index Term
Any word/phrase used for physical or subject description
Any word/phrase used to search for and retrieve document
May describe any attribute of document (author, title, year, subject, etc.)
| Indexing | Consistency Subject expertise Indexing experience |
| Types of Knowledge | Search Experience System Knowledge Domain Knowledge |
| Affective/Cognitive | Motivation Level Emotional State |
System Factors
| Index Language | Specificity Level of coordination |
| Indexing Assignment | Exhaustivity Specificity of term Accuracy |
Domain: overall subject, topic, discipline, or theme
Scope: Extent or limitations of domain
Considerations
Specificity: extent to which index terms precisely represent the subject of the document. Can be general or more specific.
Exhaustivity: extent to which indexing represents all concepts in a document.
Considerations
► Level and complexity of terms in subject area/ discipline
► Users’ vocabulary level
► Terminology used in documents
Example
► Specificity is high if detailed math topics covered, e.g., set operations
► Exhaustivity is high if all math operations in textbook covered
RECALL= Relevant Documents in a IR Set/All relevant documents in the database
A measure of how good a system is at retrieving all the relevant documents
Inversely related to precision
Dependent upon the users expectations and objectives
Difficult to estimate. Need to know the number of relevant documents in the entire collection
PRECISION = Relevant documents in a retrieved set/All documents in the retrieved set
Higher specificity means higher P or R? and lower P or R?
Higher exhaustivity means higher P or R? and lower P or R?
How do users information needs/expectations fit in?
