Show simple item record

Leveraging Data Semantics for Relational Data Management Tasks

dc.contributor.authorXing, Junjie
dc.date.accessioned2025-05-12T17:35:20Z
dc.date.available2025-05-12T17:35:20Z
dc.date.issued2025
dc.date.submitted2025
dc.identifier.urihttps://hdl.handle.net/2027.42/197103
dc.description.abstractIn an era of rapidly growing data, efficient and intelligent relational data management is essential for generating actionable insights and automating decision-making. A key factor driving advancements in this domain is the use of data semantics, which captures the deeper meaning and context of data, extending beyond traditional heuristic and syntactic approaches. By leveraging data semantics, we can enhance tasks such as insight generation, data integration, and other essential relational data management tasks. This dissertation explores how advanced data semantics can address several key challenges in relational data management. First, we investigate methods to capture user-defined semantics for assessing the interestingness of data insights, moving beyond traditional developer-defined measures of interestingness. Second, we leverage the enhanced natural language understanding capabilities of large language models (LLMs) to generate fine-grained column semantics for relational data and introduce the concept of “aggregate-related table search”, which captures table semantics across varying aggregation levels. Finally, we propose a self-training framework for LLM fine-tuning on table-related tasks, incorporating table task semantics by generating and validating training data to improve model performance in tasks such as natural language to SQL and schema matching. Through these contributions, this dissertation aims to advance relational data management by embedding a deeper understanding of different aspects of data semantics into various data applications, including data analysis and data discovery systems, ultimately improving the performance of relational data management tasks.
dc.language.isoen_US
dc.subjectdata semantics
dc.subjectrelational data management task
dc.subjectlarge language model for database
dc.subjectdata exploration
dc.titleLeveraging Data Semantics for Relational Data Management Tasks
dc.typeThesis
dc.description.thesisdegreenamePhD
dc.description.thesisdegreedisciplineComputer Science & Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberJagadish, H V
dc.contributor.committeememberHemphill, Libby
dc.contributor.committeememberMozafari, Barzan
dc.contributor.committeememberWang, Xinyu
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbtoplevelEngineering
dc.contributor.affiliationumcampusAnn Arbor
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/197103/1/jjxing_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/25529
dc.identifier.orcid0009-0003-6188-9851
dc.identifier.name-orcidXing, Junjie; 0009-0003-6188-9851en_US
dc.working.doi10.7302/25529en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.