Saved in:
Bibliographic Details
Main Author: Huapu Liu
Format: Recurso educativo Open Access
Language:en
Published: 2024
Subjects:
Online Access:https://eric.ed.gov/?id=ED661266
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867181729727381504
author Huapu Liu
author_facet Huapu Liu
Huapu Liu
collection Education Resources Information Center
contents Unlocking Hierarchical Structural Semantics in HathiTrust Digital Library Historical Medical Book Indexes for Information Retrieval: Two Preliminary Investigations Huapu Liu Information Retrieval Indexes Reference Materials Semantics Measurement Techniques Books Graphs Classification Computational Linguistics Electronic Libraries Medicine This two-part dissertation centers on a re-examination of the role of book indexes in information retrieval research on full-text digital book collections in digital libraries. Early research focused on information retrieval and book indexes (in addition to other parts of books) in the 2000s when the Google Books corpus was first released to the digital libraries research community. However, the current technical environment presents new opportunities to examine book indexes in digital library research that can address new theoretical roles for the book index as a human-constructed source of conceptual and semantic relationships. The first part of this dissertation is an empirical systems-oriented evaluation of book indexes deployed in a specific information retrieval role: the investigation explores the book index as a novel source of "local" conceptual knowledge for use as a thesaurus during the query expansion phase of an information retrieval process. The performance of expanded queries based on the book index was compared to the original queries and expanded queries generated through the Local Context Analysis and Word Embedding technique, as measured by a series of standardized retrieval effectiveness metrics. Overall, the evaluation results are promising and indicate the benefits of book index-based query expansion in multiple aspects, demonstrating the potential of the information embedded in book indexes to enhance information retrieval in the context of query expansion. The second part of this dissertation investigates book indexes in a more speculative mode asking the question: How could book indexes be integrated into, and advance contemporary information retrieval addressed in Part 1, as well as further the fields of semantic graph and ontology engineering? This inquiry leads directly to the problem tackled in this part of the dissertation: How to recognize and recover the hierarchical typographical layout of book indexes lost in standard OCR during the digitization process. Part 2 of this dissertation described and evaluated a novel clustering-based computational workflow designed to automatically recognize this lost hierarchical typographical layout, which is the key first step leading to future research possibilities in various contemporary information retrieval research areas related to digital libraries. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
format Recurso educativo Open Access
id eric_ED661266
institution ERIC Institute of Education Sciences
language en
publishDate 2024
record_format eric
spellingShingle Unlocking Hierarchical Structural Semantics in HathiTrust Digital Library Historical Medical Book Indexes for Information Retrieval: Two Preliminary Investigations
Huapu Liu
Information Retrieval
Indexes
Reference Materials
Semantics
Measurement Techniques
Books
Graphs
Classification
Computational Linguistics
Electronic Libraries
Medicine
Unlocking Hierarchical Structural Semantics in HathiTrust Digital Library Historical Medical Book Indexes for Information Retrieval: Two Preliminary Investigations Huapu Liu Information Retrieval Indexes Reference Materials Semantics Measurement Techniques Books Graphs Classification Computational Linguistics Electronic Libraries Medicine This two-part dissertation centers on a re-examination of the role of book indexes in information retrieval research on full-text digital book collections in digital libraries. Early research focused on information retrieval and book indexes (in addition to other parts of books) in the 2000s when the Google Books corpus was first released to the digital libraries research community. However, the current technical environment presents new opportunities to examine book indexes in digital library research that can address new theoretical roles for the book index as a human-constructed source of conceptual and semantic relationships. The first part of this dissertation is an empirical systems-oriented evaluation of book indexes deployed in a specific information retrieval role: the investigation explores the book index as a novel source of "local" conceptual knowledge for use as a thesaurus during the query expansion phase of an information retrieval process. The performance of expanded queries based on the book index was compared to the original queries and expanded queries generated through the Local Context Analysis and Word Embedding technique, as measured by a series of standardized retrieval effectiveness metrics. Overall, the evaluation results are promising and indicate the benefits of book index-based query expansion in multiple aspects, demonstrating the potential of the information embedded in book indexes to enhance information retrieval in the context of query expansion. The second part of this dissertation investigates book indexes in a more speculative mode asking the question: How could book indexes be integrated into, and advance contemporary information retrieval addressed in Part 1, as well as further the fields of semantic graph and ontology engineering? This inquiry leads directly to the problem tackled in this part of the dissertation: How to recognize and recover the hierarchical typographical layout of book indexes lost in standard OCR during the digitization process. Part 2 of this dissertation described and evaluated a novel clustering-based computational workflow designed to automatically recognize this lost hierarchical typographical layout, which is the key first step leading to future research possibilities in various contemporary information retrieval research areas related to digital libraries. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
title Unlocking Hierarchical Structural Semantics in HathiTrust Digital Library Historical Medical Book Indexes for Information Retrieval: Two Preliminary Investigations
topic Information Retrieval
Indexes
Reference Materials
Semantics
Measurement Techniques
Books
Graphs
Classification
Computational Linguistics
Electronic Libraries
Medicine
url https://eric.ed.gov/?id=ED661266