Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Huapu Liu
Format:	Recurso educativo Open Access
Language:	en
Published:	2024
Subjects:	Information Retrieval Indexes Reference Materials Semantics Measurement Techniques Books Graphs Classification Computational Linguistics Electronic Libraries Medicine
Online Access:	https://eric.ed.gov/?id=ED661266
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867181729727381504
author	Huapu Liu
author_facet	Huapu Liu Huapu Liu
collection	Education Resources Information Center
contents	Unlocking Hierarchical Structural Semantics in HathiTrust Digital Library Historical Medical Book Indexes for Information Retrieval: Two Preliminary Investigations Huapu Liu Information Retrieval Indexes Reference Materials Semantics Measurement Techniques Books Graphs Classification Computational Linguistics Electronic Libraries Medicine This two-part dissertation centers on a re-examination of the role of book indexes in information retrieval research on full-text digital book collections in digital libraries. Early research focused on information retrieval and book indexes (in addition to other parts of books) in the 2000s when the Google Books corpus was first released to the digital libraries research community. However, the current technical environment presents new opportunities to examine book indexes in digital library research that can address new theoretical roles for the book index as a human-constructed source of conceptual and semantic relationships. The first part of this dissertation is an empirical systems-oriented evaluation of book indexes deployed in a specific information retrieval role: the investigation explores the book index as a novel source of "local" conceptual knowledge for use as a thesaurus during the query expansion phase of an information retrieval process. The performance of expanded queries based on the book index was compared to the original queries and expanded queries generated through the Local Context Analysis and Word Embedding technique, as measured by a series of standardized retrieval effectiveness metrics. Overall, the evaluation results are promising and indicate the benefits of book index-based query expansion in multiple aspects, demonstrating the potential of the information embedded in book indexes to enhance information retrieval in the context of query expansion. The second part of this dissertation investigates book indexes in a more speculative mode asking the question: How could book indexes be integrated into, and advance contemporary information retrieval addressed in Part 1, as well as further the fields of semantic graph and ontology engineering? This inquiry leads directly to the problem tackled in this part of the dissertation: How to recognize and recover the hierarchical typographical layout of book indexes lost in standard OCR during the digitization process. Part 2 of this dissertation described and evaluated a novel clustering-based computational workflow designed to automatically recognize this lost hierarchical typographical layout, which is the key first step leading to future research possibilities in various contemporary information retrieval research areas related to digital libraries. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
format	Recurso educativo Open Access
id	eric_ED661266
institution	ERIC Institute of Education Sciences
language	en
publishDate	2024
record_format	eric
spellingShingle	Unlocking Hierarchical Structural Semantics in HathiTrust Digital Library Historical Medical Book Indexes for Information Retrieval: Two Preliminary Investigations Huapu Liu Information Retrieval Indexes Reference Materials Semantics Measurement Techniques Books Graphs Classification Computational Linguistics Electronic Libraries Medicine Unlocking Hierarchical Structural Semantics in HathiTrust Digital Library Historical Medical Book Indexes for Information Retrieval: Two Preliminary Investigations Huapu Liu Information Retrieval Indexes Reference Materials Semantics Measurement Techniques Books Graphs Classification Computational Linguistics Electronic Libraries Medicine This two-part dissertation centers on a re-examination of the role of book indexes in information retrieval research on full-text digital book collections in digital libraries. Early research focused on information retrieval and book indexes (in addition to other parts of books) in the 2000s when the Google Books corpus was first released to the digital libraries research community. However, the current technical environment presents new opportunities to examine book indexes in digital library research that can address new theoretical roles for the book index as a human-constructed source of conceptual and semantic relationships. The first part of this dissertation is an empirical systems-oriented evaluation of book indexes deployed in a specific information retrieval role: the investigation explores the book index as a novel source of "local" conceptual knowledge for use as a thesaurus during the query expansion phase of an information retrieval process. The performance of expanded queries based on the book index was compared to the original queries and expanded queries generated through the Local Context Analysis and Word Embedding technique, as measured by a series of standardized retrieval effectiveness metrics. Overall, the evaluation results are promising and indicate the benefits of book index-based query expansion in multiple aspects, demonstrating the potential of the information embedded in book indexes to enhance information retrieval in the context of query expansion. The second part of this dissertation investigates book indexes in a more speculative mode asking the question: How could book indexes be integrated into, and advance contemporary information retrieval addressed in Part 1, as well as further the fields of semantic graph and ontology engineering? This inquiry leads directly to the problem tackled in this part of the dissertation: How to recognize and recover the hierarchical typographical layout of book indexes lost in standard OCR during the digitization process. Part 2 of this dissertation described and evaluated a novel clustering-based computational workflow designed to automatically recognize this lost hierarchical typographical layout, which is the key first step leading to future research possibilities in various contemporary information retrieval research areas related to digital libraries. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
title	Unlocking Hierarchical Structural Semantics in HathiTrust Digital Library Historical Medical Book Indexes for Information Retrieval: Two Preliminary Investigations
topic	Information Retrieval Indexes Reference Materials Semantics Measurement Techniques Books Graphs Classification Computational Linguistics Electronic Libraries Medicine
url	https://eric.ed.gov/?id=ED661266

Similar Items