Comparing Costs for Cloud-based Data Archives
dc.contributor.author | Hemphill, Libby | |
dc.contributor.author | Xing, Junjie | |
dc.contributor.author | Fan, Lizhou | |
dc.date.accessioned | 2023-05-02T13:53:23Z | |
dc.date.available | 2023-05-02T13:53:23Z | |
dc.date.issued | 2023 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/176337 | en |
dc.description.abstract | Research data management is an expensive enterprise. Computing infrastructure for storing, retrieving, and preserving data is one area of expenses, and computing infrastructure costs grow as the size and number of datasets and demands for their retrieval grow. This paper compares the costs and performance of two database infrastructures, PostgreSQL and Elasticsearch, for digital data archives. We used benchmarking experiments and data from social media to estimate the costs of loading, indexing, and querying data from these two databases. The results show that traditional relational open-source databases can be effective for large social science data and run on relatively low-cost computing infrastructure, where PostgreSQL queries can be faster and less expensive than Elasticsearch. PostgreSQL required higher up front costs and time, and adding computing resources did not improve Elasticsearch’s query performance. These findings are useful for digital archives evaluating back-end storage systems. | en_US |
dc.description.sponsorship | MIDAS PODS | en_US |
dc.language.iso | en_US | en_US |
dc.rights | CC0 1.0 Universal | * |
dc.rights.uri | http://creativecommons.org/publicdomain/zero/1.0/ | * |
dc.subject | elasticsearch | en_US |
dc.subject | data curation | en_US |
dc.subject | postgres | en_US |
dc.title | Comparing Costs for Cloud-based Data Archives | en_US |
dc.type | Preprint | en_US |
dc.subject.hlbsecondlevel | Statistics and Numeric Data | |
dc.subject.hlbtoplevel | Social Sciences | |
dc.contributor.affiliationum | Inter-university Consortium for Political and Social Research | en_US |
dc.contributor.affiliationum | School of Information | en_US |
dc.contributor.affiliationum | Computer Science and Engineering | en_US |
dc.contributor.affiliationumcampus | Ann Arbor | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/176337/1/Hemphill - Comparing Costs.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/7187 | |
dc.description.mapping | 4ae71d2a-01c0-4084-84c3-c32ce960e81c | en_US |
dc.description.mapping | 5836d8a9-776f-4cd5-ba6e-a0cfd10d555d | en_US |
dc.identifier.orcid | 0000-0002-3793-7281 | en_US |
dc.description.filedescription | Description of Hemphill - Comparing Costs.pdf : Main article | |
dc.description.depositor | SELF | en_US |
dc.identifier.name-orcid | Hemphill, Libby; 0000-0002-3793-7281 | en_US |
dc.working.doi | 10.7302/7187 | en_US |
dc.owningcollname | Inter-university Consortium for Political and Social Research (ICPSR) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.