Power DataMate Tool: Leveraging Logistic Regression Classification for Interactive Data Modeling
dc.contributor.author | Abu Alrub, Mahmoud Ibrahim | |
dc.contributor.advisor | Shaout, Adnan | |
dc.date.accessioned | 2024-05-10T16:58:33Z | |
dc.date.available | 2025-05-10 12:58:33 | en |
dc.date.issued | 2024-04-27 | |
dc.date.submitted | 2024-04-18 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/193121 | |
dc.description.abstract | The demand for efficient predictive modeling techniques has become crucial due to the growing occurrence of binary classification problems in diverse fields. Therefore, it is desirable to utilize the logistic regression classification as a potent technique for data modeling, specifically with examining its efficacy in capturing and analyzing correlations across varied datasets, thus, Power DataMate software tool is developed. The promise of logistic regression in modeling complicated data structures is thoroughly examined due to its simplicity, interpretability, and adaptability to binary classification tasks. The choice of this research to focus on logistic regression for inquiry is based on its capability to represent intricate interactions between predictors and the binary response variable. However, the goal is to forecast the likelihood of discovering Primary Keys (PK) and Foreign Keys (FK) among datasets. While many off-the-shelf data analytics software and logistic regression classification research are available, it is found that there is a lack of research or solutions that provide a method where an entity data is analyzed using logistic regression to detect its PKs and features automatically or interactively.The research technique encompasses the acquisition of a combination of fictious and real-world six datasets. Four are in the form of data file while two are in the form of database. The data, then, is preprocessed to verify its quality, followed by the deployment of data training and prediction algorithms. On the other hand, sufficient training and testing datasets were incorporated to efficiently train and evaluate the model performance. Breaking new ground, we allow the users not only to automatically have their data modeled, but also to interactively review and confirm primary keys and features for further data analysis and modeling. While the research entails a comprehensive evaluation of model performance indicators, including accuracy and precision and recall, results show that the accuracy of PK detection is 89% and 82% for the FK. Hence, these results are the first of their kind and could be a starting point for further model enhancements and data analytics research, especially for analyzing data files projects where Power DataMate user has the choice to interactively feed the learning algorithm for better outcomes.Keywords: Data Mining, Data Modeling, Classification Problem, Logistic Regression, Primary Key, Foreign Key. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Data modeling | en_US |
dc.subject | Data mining | en_US |
dc.subject | Data classification | en_US |
dc.subject | Logistic regression | en_US |
dc.subject | Primary Key or Foreign Key Discovery | en_US |
dc.subject.other | Computer and Information Science | en_US |
dc.title | Power DataMate Tool: Leveraging Logistic Regression Classification for Interactive Data Modeling | en_US |
dc.type | Thesis | en_US |
dc.description.thesisdegreename | Master of Science (MS) | en_US |
dc.description.thesisdegreediscipline | Software Engineering, College of Engineering & Computer Science | en_US |
dc.description.thesisdegreegrantor | University of Michigan-Dearborn | en_US |
dc.contributor.committeemember | Medjahed, Brahim | |
dc.contributor.committeemember | Watt, Paul | |
dc.identifier.uniqname | miabu | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/193121/1/Abu_Alrub_Thesis_Power_DataMate_Tool (1).pdf | en |
dc.identifier.doi | https://dx.doi.org/10.7302/22766 | |
dc.description.mapping | febc42ae-d444-43ae-98fd-dc98ee638897 | en_US |
dc.identifier.orcid | 0000-0002-8916-0259 | en_US |
dc.description.filedescription | Description of Abu_Alrub_Thesis_Power_DataMate_Tool (1).pdf : Thesis | |
dc.identifier.name-orcid | Abu Alrub, Mahmoud; 0000-0002-8916-0259 | en_US |
dc.working.doi | 10.7302/22766 | en_US |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.