Tackling DevOps and CI within ML projects

Rzig, Dhia Elhaq

Tackling DevOps and CI within ML projects

Rzig, Dhia Elhaq

2024-12-21

View/Open

Rzig_Dissertation_Tackling_DevOps.pdf

(9.6MB

PDF)
Dissertation

Abstract

Machine Learning (ML), including Deep Learning (DL), based systems are emerging technologies applied to solve complex problems like autonomous driving and recommendation systems. To enhance the quality and deliverability of ML-based applications, the software development community is adopting DevOps practices. However there is a lack of insight about how DevOps in the context of ML projects. This lack of insight has shaped the over-arching goal of my thesis: To perform a use-driven and empirical-data validated discovery and resolution of problems related to DevOps in ML projects. This thesis is split between two phases: the first phase is the exploration phase, where we rely on empirical studies to understand the current state of DevOps in ML projects, and the second phase is the resolution phase, where we propose solutions to the problems identified in the exploration phase. The first obstacle to achieve this goal was a lack of knowledge about DevOps adoption trends, maintenance efforts, and benefits in ML projects. Hence my first research project was a large-scale empirical analysis on 4031 ML projects to quantify DevOps adoption, maintenance effort, and benefits. These ML projects were categorized into ML-Tool and ML-Applied projects, where Tool projects are libraries, frameworks, and tools for ML development, and Applied projects are projects that apply ML to solve real-world problems. Additionally, we performed the same analysis on 4076 Non-ML projects to contextualize the results. We found that ML projects, especially ML-Applied projects, have slower, lower, and less efficient DevOps adoption compared to traditional software projects. Despite this, adopting DevOps in ML projects correlates with increased development productivity, improved code quality, and reduced bug resolution time, especially in ML-Applied projects. After identifying the DevOps adoption trends in ML projects, we further investigated Continuous Integration (CI), a subset and the central tenant of DevOps, in ML projects. CI tools automate repetitive tasks such as building, testing, and deployment, which are essential for ML projects. However, unlike traditional software, the adoption and issues of CI in ML projects have not been empirically studied. This study compares CI adoption between ML and Non-ML projects using TraVanalyzer, the first Travis CI configuration analyzer, and a CI log analyzer. Our findings show Travis CI is the most popular tool for ML projects, though their CI adoption lags behind Non-ML projects. ML projects using CI focus more on building, testing, code analysis, and deployment. CI in ML projects faces varied build-breakage reasons, with testing-related problems being the most frequent. This project helped us gain a better picture of CI in ML from a static point of view, but were interested in gaining a more dynamic understanding of how CI evolves in ML projects. While several works discussed how CI/CD configuration and services change during their usage in traditional software systems, there is very limited knowledge of how CI/CD configuration and services change in ML project. To fill this knowledge gap, we manually analyzed 701 commits from 578 open-source ML projects, and devised a taxonomy of 14 co-changes in CI/CD and ML components. We also expanded TraVanalyzer to support GitHub Actions in order to identify frequent CI/CD configuration change patterns in 38,982 commits encompassing Travis CI and GitHub Actions changes. We found that most changes in Travis CI and GitHub Actions were related to build policy, with fewer changes related to performance, maintainability, and infrastructure. We also identified some CI bad practices, such as the direct inclusion of dependencies in CI files, and we found that experienced developers were more likely to modify and maintain CI/CD configurations. After having performed this exploration phase, we focused on 2 common problems in ML DevOps: the difficulty of CI Migration between platforms, and issues with ML-testing and ML-issue resolution. Concerning CI Migration, based on existing research and findings concerning CI, we believe that the efficiency of CI systems is a crucial factor for development velocity. And as a result, developers often migrate from their existing CI systems to new CI systems with more features like matrix building, better logging support, etc. We also noticed trends of this migrations between Travis CI and GitHub Actions, two popular CI systems, which were also confirmed by other studies. However, this process is challenging and error-prone due to limited knowledge and complex configurations. To address this, we propose CIMig, which uses Apriori Rule Mining and Frequent Tree mining to automate CI system migrations. Our automatic evaluation using a set of 251 project shows CIMig achieves a 70.82% success rate for GitHub Actions and 51.86% for Travis CI, with comparable performance to manual-mapping-based tools. Our user-study evaluation also revealed ratings competitive with the manual-mapping-based tools. Unlike other tools, CIMig supports bidirectional migrations and relies on technology-agnostic techniques, making it versatile and beneficial for developers. Finally, we shift our focus to the emerging sector of Large Language Models (LLMs). The advancements in LLMs has led to their swift integration into application-level logic using prompts, referred to as Developer Prompts. Through our previous works, we noted a severe lack of application of ML-specific testing practices, hence putting into doubt whether these Dev Prompts are being properly tested for vulnerabilities, bias, and performance. Further complicating matters, unlike traditional software artifacts, Dev Prompts blend natural language instructions with artificial languages such as programming and markup languages, thus requiring specialized tools for analysis. In our study of 2,173 Dev Prompts, we found that Dev Prompts 3.46% contained one or more forms of bias, and that 10.75% were vulnerable to prompt injection attacks. We introduce PromptDoctor to addressed these issues, using which we de-biased 68.29%, hardened 41.81%, and improved the performance of 37.1% of the flawed prompts. We developed a PromptDoctor VSCode extension and we plan to extend PromptDoctor for easy integration with other IDEs and CI/CD pipelines.

Deep Blue DOI

https://dx.doi.org/10.7302/25170

Subjects

Machine Learning

DevOps

Continuous Integration

Software Engineering for ML

Types

Thesis

Handle

https://hdl.handle.net/2027.42/196334

Metadata

Show full item record

Collections

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.