LIS 5043: Organization of Information
Data Stewardship
is concerned with all aspects of the creation, management, analysis, and communication of data focusing particularly on the application of computational methods to digital data
Data Stewardship = Data Management + Data Curation + Data Analytics
It includes among other things: acquisition and collection, modeling, workflow, provenance, validity and integrity, metadata, preservation, integration, retrieval, re- use, policy, standards, identifiers, format conversions, processing levels, supporting reproducibility, etc.
It includes active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education; curation activities enable data discovery and retrieval, maintain quality, add value, and provide for re-use over time.
The science of data stewardship:
research and development on new methods of data management and use;
draws on mathematical and engineering methods, but also on methods from social science, law, economics, and other disciplines
The practice of data stewardship:
use and adaptation of data management methods to meet user needs and support data analytics
Data analytics values:
Extraction should be novel, fast, precise, accurate
Data stewardship values:
Data should be efficient and reliable: findable, useable, legal (thereby supporting novelty, speed, precision, accuracy)
Large amounts of rapidly changing data, often heterogeneous in nature and developed by different scientific communities, must be found, retrieved, authenticated, reformatted, integrated with other data and managed for effective use, and demonstrably reliable even after processing and preparation
. . . it involves the complex management of large-scale data storage and preservation, creation of metadata and tools for retrieval and context documentation, preparation of computationally accessible documentation of provenance and workflow, conducting reliable format conversions to support new tools and applications, the management of identifiers and validity checks that accommodate format changes, the integration of related data elements from substantially different data sources, and more. . . .
Without successful data management & curation, data analysis is not possible, it would be prohibitively expensive and and dangerously unreliable
Data Stewardship is the larger part of data science.
Not only Data Stewardship is essential for reliable efficient analysis, but most of the cost associated with using data is, by far, in management & curation, not analysis, and most of the workforce needs are, also by far, in management & curation, not analysis.
Some of the broader activites in Data Stewardship includes:
Analysis
: To determine needs, and develop relevant data models and metadata, and reformat, correct, or update data.Documentation
: To record essential information (typically via metadata)System design and implementation
: To support all data curatorial activities To support the generation and use of data documentation and processing documentationPolicy
: To specify objectives, procedures, practices, and formats.Process
: To ensure success and efficiency by managing the development of appropriate organizational units and roles, providing training, advocating for change, and managing curatorial activities.
There is no single occupational category for [data stewardship] and no precise mapping between knowledge and skills needed for [data stewardship] and existing professions, careers, or job titles.
The knowledge and skills required of those engaged in [data stewardship] are dynamic and highly interdisciplinary. They include an integrated understanding of computing and information science, librarianship, archival practice, and the disciplines and domains generating and using data. Additional knowledge and skills for effective [data stewardship] are emerging in response to data-driven scholarship.
Some professional “data” jobs
:
Data Scientist
Data/Business Analyst
Data Wrangler
Data Curator
Data Steward
Data Engineer
… ML, AI Engineer
and “database” jobs
:
Database Engineer
Database Programmer
Database Architect
Database Administrator
and "library" jobs
:
Research Data Services Librarian
Research Data Steward
Data Librarian
Data Scholarship Librarian
Digital Humanities Librarian
AI Librarian
through our web activity, we are assigned gender, ethnicity, class, age, education level, and potential status of parent with x no. of children (digital trace data/digital footprint/digital breadcrumbs)
if internet metadata identifies a user as foreigner than they lose right to privacy afforded to U.S. citizens
who would have thought that class status, citizenship, ethnicity could be algorithmically understood?
John Cheney-Lippold. (2017). We are Data: algorithms and the making of our digital selves. New York University Press.
John Cheney-Lippold. (2017). We are Data: algorithms and the making of our digital selves. New York University Press.
Why Create Visualizations Generally?
Sambasivan, N., et al. (2021). “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1–15.