Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects
dc.contributor.author | Houerbi, Alaa | |
dc.contributor.advisor | Hassan, Foyzul | |
dc.date.accessioned | 2024-05-07T12:47:48Z | |
dc.date.available | 2025-05-07 08:47:49 | en |
dc.date.issued | 2024-04-27 | |
dc.date.submitted | 2024-02-16 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/192878 | |
dc.description.abstract | The growing popularity of Machine Learning (ML) and the integration of ML components with other software artifacts has led to the use of CI/CD tools, such as Travis CI, GitHub Actions, etc., that enable faster integration and testing for ML projects. Such CI/CD configurations and services require synchronization during the life cycle of the projects. Severalworks discussed how CI/CD configuration and services change during their usage in traditional software systems. However, there is minimal knowledge of how CI/CD configuration and services change in ML projects.To fill this knowledge gap, this work presents the first empirical analysis of how CI/CD configuration evolves for ML software systems. We manually analyzed 343 commits collected from 508 open-source ML projects to identify frequent CI/CD configuration changecategories in ML projects. We devised a taxonomy of 14 co-changes in CI/CD and MLcomponents. Moreover, we developed a CI/CD configuration change clustering tool that identified frequent CI/CD configuration change patterns in 15,634 commits. Furthermore, we measured the expertise of ML developers who modify CI/CD configurations. Based on this analysis, we found that 61.8% of commits include a change to the build policy and minimal changes related to performance and maintainability compared to general open-source projects. Additionally, the co-evolution analysis identified that CI/CD configurations, in many cases, changed unnecessarily due to bad practices, such as the direct inclusion of dependencies and a lack of usage of standardized testing frameworks. More practices were found through the change patterns analysis, which used deprecated settings and relied on a generic build language. Finally, our developer’s expertise analysis suggests that experienced developers are more inclined to modify CI/CD configurations. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Continuous Integration (Ci) | en_US |
dc.subject | Continuous Delivery (CD) | en_US |
dc.subject | CI/CD Tools | en_US |
dc.subject | Machine Learning | en_US |
dc.subject | Software Engineering | en_US |
dc.subject | Empirical Analysis | en_US |
dc.subject | CI/CD Change Patterns | en_US |
dc.subject.other | Computer and Information Science | en_US |
dc.title | Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects | en_US |
dc.type | Thesis | en_US |
dc.description.thesisdegreename | Master of Science (MS) | en_US |
dc.description.thesisdegreediscipline | Software Engineering, College of Engineering & Computer Science | en_US |
dc.description.thesisdegreegrantor | University of Michigan-Dearborn | en_US |
dc.contributor.committeemember | Xu, Zhiwei | |
dc.contributor.committeemember | Ferreir, Thiago | |
dc.identifier.uniqname | houerbi | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/192878/1/Houebi_Thesis_Empirical_Analysis.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/22610 | |
dc.description.mapping | febc42ae-d444-43ae-98fd-dc98ee638897 | en_US |
dc.identifier.orcid | 0009-0007-9724-0703 | en_US |
dc.description.filedescription | Description of Houebi_Thesis_Empirical_Analysis.pdf : Thesis | |
dc.working.doi | 10.7302/22610 | en_US |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.