Leveraging Data Semantics for Relational Data Management Tasks

Xing, Junjie

Leveraging Data Semantics for Relational Data Management Tasks

dc.contributor.author	Xing, Junjie
dc.date.accessioned	2025-05-12T17:35:20Z
dc.date.available	2025-05-12T17:35:20Z
dc.date.issued	2025
dc.date.submitted	2025
dc.identifier.uri	https://hdl.handle.net/2027.42/197103
dc.description.abstract	In an era of rapidly growing data, efficient and intelligent relational data management is essential for generating actionable insights and automating decision-making. A key factor driving advancements in this domain is the use of data semantics, which captures the deeper meaning and context of data, extending beyond traditional heuristic and syntactic approaches. By leveraging data semantics, we can enhance tasks such as insight generation, data integration, and other essential relational data management tasks. This dissertation explores how advanced data semantics can address several key challenges in relational data management. First, we investigate methods to capture user-defined semantics for assessing the interestingness of data insights, moving beyond traditional developer-defined measures of interestingness. Second, we leverage the enhanced natural language understanding capabilities of large language models (LLMs) to generate fine-grained column semantics for relational data and introduce the concept of “aggregate-related table search”, which captures table semantics across varying aggregation levels. Finally, we propose a self-training framework for LLM fine-tuning on table-related tasks, incorporating table task semantics by generating and validating training data to improve model performance in tasks such as natural language to SQL and schema matching. Through these contributions, this dissertation aims to advance relational data management by embedding a deeper understanding of different aspects of data semantics into various data applications, including data analysis and data discovery systems, ultimately improving the performance of relational data management tasks.
dc.language.iso	en_US
dc.subject	data semantics
dc.subject	relational data management task
dc.subject	large language model for database
dc.subject	data exploration
dc.title	Leveraging Data Semantics for Relational Data Management Tasks
dc.type	Thesis
dc.description.thesisdegreename	PhD
dc.description.thesisdegreediscipline	Computer Science & Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Jagadish, H V
dc.contributor.committeemember	Hemphill, Libby
dc.contributor.committeemember	Mozafari, Barzan
dc.contributor.committeemember	Wang, Xinyu
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbtoplevel	Engineering
dc.contributor.affiliationumcampus	Ann Arbor
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/197103/1/jjxing_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/25529
dc.identifier.orcid	0009-0003-6188-9851
dc.identifier.name-orcid	Xing, Junjie; 0009-0003-6188-9851	en_US
dc.working.doi	10.7302/25529	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: jjxing_1.pdf
Size:: 8.016MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.