Leveraging Data Semantics for Relational Data Management Tasks
dc.contributor.author | Xing, Junjie | |
dc.date.accessioned | 2025-05-12T17:35:20Z | |
dc.date.available | 2025-05-12T17:35:20Z | |
dc.date.issued | 2025 | |
dc.date.submitted | 2025 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/197103 | |
dc.description.abstract | In an era of rapidly growing data, efficient and intelligent relational data management is essential for generating actionable insights and automating decision-making. A key factor driving advancements in this domain is the use of data semantics, which captures the deeper meaning and context of data, extending beyond traditional heuristic and syntactic approaches. By leveraging data semantics, we can enhance tasks such as insight generation, data integration, and other essential relational data management tasks. This dissertation explores how advanced data semantics can address several key challenges in relational data management. First, we investigate methods to capture user-defined semantics for assessing the interestingness of data insights, moving beyond traditional developer-defined measures of interestingness. Second, we leverage the enhanced natural language understanding capabilities of large language models (LLMs) to generate fine-grained column semantics for relational data and introduce the concept of “aggregate-related table search”, which captures table semantics across varying aggregation levels. Finally, we propose a self-training framework for LLM fine-tuning on table-related tasks, incorporating table task semantics by generating and validating training data to improve model performance in tasks such as natural language to SQL and schema matching. Through these contributions, this dissertation aims to advance relational data management by embedding a deeper understanding of different aspects of data semantics into various data applications, including data analysis and data discovery systems, ultimately improving the performance of relational data management tasks. | |
dc.language.iso | en_US | |
dc.subject | data semantics | |
dc.subject | relational data management task | |
dc.subject | large language model for database | |
dc.subject | data exploration | |
dc.title | Leveraging Data Semantics for Relational Data Management Tasks | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | |
dc.description.thesisdegreediscipline | Computer Science & Engineering | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Jagadish, H V | |
dc.contributor.committeemember | Hemphill, Libby | |
dc.contributor.committeemember | Mozafari, Barzan | |
dc.contributor.committeemember | Wang, Xinyu | |
dc.subject.hlbsecondlevel | Computer Science | |
dc.subject.hlbtoplevel | Engineering | |
dc.contributor.affiliationumcampus | Ann Arbor | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/197103/1/jjxing_1.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/25529 | |
dc.identifier.orcid | 0009-0003-6188-9851 | |
dc.identifier.name-orcid | Xing, Junjie; 0009-0003-6188-9851 | en_US |
dc.working.doi | 10.7302/25529 | en |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.