Show simple item record

Saving social media data: Understanding data management practices among social media researchers and their implications for archives

dc.contributor.authorHemphill, Libby
dc.contributor.authorHedstrom, Margaret L.
dc.contributor.authorLeonard, Susan Hautaniemi
dc.date.accessioned2021-01-05T18:44:48Z
dc.date.availableWITHHELD_13_MONTHS
dc.date.available2021-01-05T18:44:48Z
dc.date.issued2021-01
dc.identifier.citationHemphill, Libby; Hedstrom, Margaret L.; Leonard, Susan Hautaniemi (2021). "Saving social media data: Understanding data management practices among social media researchers and their implications for archives." Journal of the Association for Information Science and Technology 72(1): 97-109.
dc.identifier.issn2330-1635
dc.identifier.issn2330-1643
dc.identifier.urihttps://hdl.handle.net/2027.42/163798
dc.description.abstractSocial media data (SMD) offer researchers new opportunities to leverage those data for their work in broad areas such as public opinion, digital culture, labor trends, and public health. The success of efforts to save SMD for reuse by researchers will depend on aligning data management and archiving practices with evolving norms around the capture, use, sharing, and security of datasets. This paper presents an initial foray into understanding how established practices for managing and preserving data should adapt to demands from researchers who use and reuse SMD, and from people who are subjects in SMD. We examine the data management practices of researchers who use SMD through a survey, and we analyze published articles that used data from Twitter. We discuss how researchers describe their data management practices and how these practices may differ from the management of conventional data types. We explore conceptual, technical, and ethical challenges for data archives based on the similarities and differences between SMD and other types of research data, focusing on the social sciences. Finally, we suggest areas where archives may need to revise policies, practices, and services in order to create secure, persistent, and usable collections of SMD.
dc.publisherJohn Wiley & Sons, Inc.
dc.titleSaving social media data: Understanding data management practices among social media researchers and their implications for archives
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelInformation Science
dc.subject.hlbtoplevelSocial Sciences
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/163798/1/asi24368_am.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/163798/2/asi24368.pdf
dc.identifier.doi10.1002/asi.24368
dc.identifier.sourceJournal of the Association for Information Science and Technology
dc.identifier.citedreferenceThomson, S. D., & Kilbride, W. ( 2015 ). Preserving social media: The problem of access. New Review of Information Networking, 20 ( 1–2 ), 261 – 275.
dc.identifier.citedreferenceWallis, J. C., Rolando, E., & Borgman, C. L. ( 2013 ). If we share data, will anyone use them? data sharing and reuse in the long tail of science and technology. PLoS One, 8 ( 7 ), e67332.
dc.identifier.citedreferenceWeinberg, D. H., Abowd, J.M., Belli, R. F., Cressie, N., Folch, D. C., Holan, S. H., … Wikle, C. K. ( 2019 ). Effects of a Government‐Academic Partnership: Has the NSF‐CENSUS BureauResearch Network Helped Improve the US Statistical System? Journal of Survey Statistics and Methodology, 7 ( 4 ), 589–619. https://doi.org/10.1093/jssam/smy023
dc.identifier.citedreferenceWeller, K., & Kinder‐Kurlanda, K. ( 2017 ). To share or not to share? ethical challenges in sharing social media‐based research data. In M. Zimmer & K. Kinder‐Kurlanda (Eds.), Internet research ethics for the social age: New challenges, cases, and contexts (pp. 115 – 129 ). New York: Peter Lang Publishing, Incorporated.
dc.identifier.citedreferenceWeller, K., & Kinder‐Kurlanda, K. E. ( 2015 ). Uncovering the challenges in collection, sharing and documentation: The hidden data of social media research. In Standards and practices in large‐scale social media research. oxford: International conference on web and social media.
dc.identifier.citedreferenceWeller, K., & Kinder‐Kurlanda, K. E. ( 2016 ). A manifesto for data sharing in social media research. In Proceedings of the 8th ACM conference on web science (pp. 166–172). ACM.
dc.identifier.citedreferenceWheeler, J. ( 2018 ). Mining the first 100 days: Human and data ethics in twitter research. Journal of Librarianship and Scholarly Communication, 6 ( 2 ), eP2235. http://dx.doi.org/10.7710/2162-3309.2235
dc.identifier.citedreferenceWhitmire, A. L., Boock, M., & Sutton, S. C. ( 2015 ). Variability in academic research data management practices: Implications for data services development from a faculty survey. Programmirovanie, 49 ( 4 ), 382 – 407.
dc.identifier.citedreferenceWilkinson, M. D., Dumontier, M., Aalbersberg, I. J. J., Appleton, G., Axton, M., Baak, A., … Mons, B. ( 2016 ). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3, 160018.
dc.identifier.citedreferenceWilliams, A., & Gonlin, V. ( 2017 ). I got all my sisters with me (on black twitter): second screening of how to get away with murder as a discourse on black womanhood. Information, Communication & Society, 20 ( 7 ), 984 – 1004.
dc.identifier.citedreferenceWilliams, S. A. ( 2013 ). What do people study when they study twitter? classifying twitter related academic papers. Journal of Documentation, 69 ( 3 ), 384 – 410.
dc.identifier.citedreferenceWilliams, S. A., Terras, M., & Warwick, C. ( 2013 ). How twitter is studied in the medical professions: A classification of twitter papers indexed in PubMed. Med 2 0, 2 ( 2 ), e2.
dc.identifier.citedreferenceWolf, C., Joye, D., Smith, T. W., & Fu, Y.‐C. ( 2016 ). The SAGE handbook of survey methodology. London: SAGE Publications.
dc.identifier.citedreferenceZelenkauskaite, A., & Niezgoda, B. ( 2017 ). “Stop kremlin trolls:” Ideological trolling as calling out, rebuttal, and reactions on online news portal commenting. First Monday, 22 ( 5 ).
dc.identifier.citedreferenceZhang, Y., Wells, C., Wang, S., & Rohe, K. ( 2017 ). Attention and amplification in the hybrid media system: The composition and activity of donald trump’s twitter following during the 2016 presidential election. New Media & Society, 20 ( 9 ), 3161–3182.
dc.identifier.citedreferenceZimmer, M. ( 2010 ). “But the data is already public”: On the ethics of research in facebook. Ethics and Information Technology, 12 ( 4 ), 313 – 325.
dc.identifier.citedreferenceZimmer, M. ( 2015 ). The twitter archive at the library of congress: Challenges for information practice and information policy. First Monday, 20 ( 7 ).
dc.identifier.citedreferenceZubiaga, A. ( 2018 ). A longitudinal assessment of the persistence of twitter datasets. Journal of the Association for Information Science and Technology, 69 ( 8 ), 974 – 984.
dc.identifier.citedreferenceEngesser, S., Ernst, N., Esser, F., & Büchel, F. ( 2017 ). Populism and social media: how politicians spread a fragmented ideology. Information, Communication & Society, 20 ( 8 ), 1109 – 1126.
dc.identifier.citedreferenceAcquisti, A., Brandimarte, L., & Loewenstein, G. ( 2015 ). Privacy and human behavior in the age of information. Science, 347 ( 6221 ), 509 – 514.
dc.identifier.citedreferenceAelst, P. V., Erkel, P. v., D’heer, E., & Harder, R. A. ( 2017 ). Who is leading the campaign charts? comparing individual popularity on old and new media. Information, Communication & Society, 20 ( 5 ), 715 – 732.
dc.identifier.citedreferenceAkers, K. G., & Doty, J. ( 2013 ). Disciplinary differences in faculty research data management practices and perspectives. International Journal of Digital Curation, 8 ( 2 ), 5 – 26.
dc.identifier.citedreferenceAlvarez, M. R. ( 2016 ). Introduction. In Alvarez, M. R. (Ed.), Computational social science: Discovery and prediction (pp.1–25). New York: Cambridge University Press.
dc.identifier.citedreferenceAntenucci, D., Cafarella, M., Levenstein, M., Ré, C., & Shapiro, M. D. ( 2014 ). Using social media to measure labor market flows (No. 20010).
dc.identifier.citedreferenceAsur, S., & Huberman, B. A. ( 2010 ). Predicting the future with social media. In 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, Vol. 1, pp. 492–499.
dc.identifier.citedreferenceBarnard, S. R. ( 2018 ). Tweeting #Ferguson: Mediatized fields and the new activist journalist. New Media & Society, 20 ( 7 ), 2252 – 2271. https://doi.org/10.1177/1461444817712723
dc.identifier.citedreferenceBishop, L. and Gray, D. ( 2017 ), Ethical Challenges of Publishing and Sharing Social Media Research Data, Woodfield, K. (Ed.), The Ethics of Online Research (Advances in Research Ethics and Integrity, Vol. 2, pp. 159‐187). UK: Emerald Publishing Limited, https://doi.org/10.1108/S2398-601820180000002007
dc.identifier.citedreferenceBoukes, M., & Trilling, D. ( 2017 ). Political relevance in the eye of the beholder: Determining the substantiveness of TV shows and political debates with twitter data. First Monday, 22 ( 4 ).
dc.identifier.citedreferenceBoulianne, S. ( 2015 ). Social media use and participation: a meta‐analysis of current research. Information, Communication & Society, 18 ( 5 ), 524 – 538.
dc.identifier.citedreferenceBrock, A. ( 2012 ). From the blackhand side: Twitter as a cultural conversation. Journal of Broadcasting & Electronic Media, 56 ( 4 ), 529 – 549.
dc.identifier.citedreferenceBruns, A. ( 2019 ). After the “APIcalypse”: social media platforms and their fight against critical scholarly research. Information, Communication & Society, 22 ( 11 ), 1544–1566. https://doi.org/10.1080/1369118X.2019.1637447
dc.identifier.citedreferenceBruns, A., & Weller, K. ( 2016 ). Twitter as a first draft of the present: and the challenges of preserving it for the future. In Proceedings of the 8th ACM conference on web science, pp. 183–189. ACM.
dc.identifier.citedreferenceCoppock, A., Guess, A., & Ternovski, J. ( 2016 ). When treatments are tweets: A network mobilization experiment over twitter. Political Behavior, 38 ( 1 ), 105 – 128.
dc.identifier.citedreferenceCragin, M. H., Palmer, C. L., Carlson, J. R., & Witt, M. ( 2010 ). Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 368 ( 1926 ), 4023 – 4038.
dc.identifier.citedreferenceDeveloper Policy. ( 2017 ). Retrieved from https://developer.twitter.com/en/developer-terms/policy.html.
dc.identifier.citedreferenceDixon, K. ( 2014 ). Feminist online identity: Analyzing the presence of hashtag feminism. Journal of Arts and Humanities, 3 ( 7 ), 34 – 40.
dc.identifier.citedreferenceDocNow. (n.d.). Retrieved from https://www.docnow.io/.
dc.identifier.citedreferenceDriscoll, K., & Walker, S. ( 2014 ). Big data, big questions—working within a black box: Transparency in the collection and production of big twitter data. International Journal of Communication Systems, 8, 20.
dc.identifier.citedreferenceEllison, N. B., Vitak, J., Gray, R., & Lampe, C. ( 2014 ). Cultivating social resources on social network sites: Facebook relationship maintenance behaviors and their role in social capital processes. Journal of Computer‐Mediated Communication, 19 ( 4 ), 855 – 870.
dc.identifier.citedreferenceFaniel, I. M., Kriesberg, A., & Yakel, E. ( 2016 ). Social scientists’ satisfaction with data reuse. Journal of the Association for Information Science and Technology, 67 ( 6 ), 1404 – 1416.
dc.identifier.citedreferenceFederer, L. M., Lu, Y.‐L., Joubert, D. J., Welsh, J., & Brandys, B. ( 2015 ). Biomedical data sharing and reuse: Attitudes and practices of clinical and scientific research staff. PLoS One, 10 ( 6 ), e0129506.
dc.identifier.citedreferenceField, D., Sansone, S.‐A., Collis, A., Booth, T., Dukes, P., Gregurick, S. K., … Wilbanks, J. ( 2009 ). omics data sharing. Science, 326 ( 5950 ), 234 – 236.
dc.identifier.citedreferenceFiesler, C., & Proferes, N. ( 2018 ). “Participant” perceptions of twitter research ethics. Social Media + Society, 4 ( 1 ), 1–14. https://doi.org/10.1177/2056305118763366
dc.identifier.citedreferenceFranzke, A. S., Bechmann, A., Zimmer, M., Ess, C., & the Association of Internet Researchers. ( 2020 ). Internet research: Ethical guidelines 3.0 (Technical Report).
dc.identifier.citedreferenceFreelon, D. ( 2015 ). Discourse architecture, ideology, and democratic norms in online political discussion. New Media Society, 17 ( 5 ), 772 – 791.
dc.identifier.citedreferenceFreelon, D., McIlwain, C. D., & Clark, M. ( 2016 ). Beyond the hashtags: #ferguson, #blacklivesmatter, and the online struggle for offline justice. Washington, D.C.: Center for Media and Social Impact, American University. http://dx.doi.org/10.2139/ssrn.2747066
dc.identifier.citedreferenceGainous, J., & Wagner, K. M. ( 2014 ). Tweeting to power: The social media revolution in american politics. New York: Oxford University Press.
dc.identifier.citedreferenceGebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., & Crawford, K. ( 2019 ). Datasheets for datasets. http://arxiv.org/abs/1803.09010
dc.identifier.citedreferenceGil de Zúñiga, H., Jung, N., & Valenzuela, S. ( 2012 ). Social media use for news and individuals’ social capital, civic engagement and political participation. Journal of Computer‐Mediated Communication, 17 ( 3 ), 319 – 336.
dc.identifier.citedreferenceGolder, S., Ahmed, S., Norman, G., & Booth, A. ( 2017 ). Attitudes toward the ethics of research using social media: A systematic review. Journal of Medical Internet Research, 19 ( 6 ), e195.
dc.identifier.citedreferenceGoodman, A., Pepe, A., Blocker, A. W., Borgman, C. L., Cranmer, K., Crosas, M., … Slavkovic, A. ( 2014 ). Ten simple rules for the care and feeding of scientific data. PLoS Computational Biology, 10 ( 4 ), e1003542.
dc.identifier.citedreferenceHalse, S. E., Tapia, A., Squicciarini, A., & Caragea, C. ( 2018 ). An emotional step toward automated trust detection in crisis social media. Information, Communication & Society, 21 ( 2 ), 288 – 305.
dc.identifier.citedreferenceHarford, T. ( 2014 ). Big data: A big mistake? Significance, 11 ( 5 ), 14 – 19.
dc.identifier.citedreferenceHaustein, S., Bowman, T. D., Holmberg, K., Tsou, A., Sugimoto, C. R., & Larivière, V. ( 2016 ). Tweets as impact indicators: Examining the implications of automated “bot” accounts on twitter. Journal of the Association for Information Science and Technology, 67 ( 1 ), 232 – 238.
dc.identifier.citedreferenceHilgartner, S., & Brandt‐Rauf, S. I. ( 1994 ). Data access, ownership, and control: Toward empirical studies of access practices. Knowledge, 15 ( 4 ), 355 – 372.
dc.identifier.citedreferenceHochman, N., & Schwartz, R. ( 2012 ). Visualizing instagram: Tracing cultural visual rhythms. In Proceedings of the workshop on social media visualization (SocMedVis) in conjunction with the sixth international AAAI conference on weblogs and social media (ICWSM–12), pp. 6–9.
dc.identifier.citedreferenceJungherr, A. ( 2014 ). The logic of political coverage on twitter: Temporal dynamics and content. The Journal of Communication, 64 ( 2 ), 239 – 259.
dc.identifier.citedreferenceKennan, M. A., & Markauskaite, L. ( 2015 ). Research data management practices: A snapshot in time. International Journal of Digital Curation, 10 ( 2 ), 69 – 95.
dc.identifier.citedreferenceKim, Y., & Adler, M. ( 2015 ). Social ‘scientists’ data sharing behaviors: Investigating the roles of individual motivations, institutional pressures, and data repositories. International Journal of Information Management, 35 ( 4 ), 408 – 418.
dc.identifier.citedreferenceKim, Y., & Stanton, J. M. ( 2016 ). Institutional and individual factors affecting scientists’ data‐sharing behaviors: A multilevel analysis. Journal of the Association for Information Science and Technology, 67 ( 4 ), 776 – 799.
dc.identifier.citedreferenceKinder‐Kurlanda, K., Weller, K., Zenk‐Möltgen, W., Pfeffer, J., & Morstatter, F. ( 2017 ). Archiving information from geotagged tweets to promote reproducibility and comparability in social media research. Big Data & Society, 4 ( 2 ), 2053951717736336.
dc.identifier.citedreferenceLittman, J., Chudnov, D., Kerchner, D., Peterson, C., Tan, Y., Trent, R., … Wrubel, L. ( 2018 ). API‐based social media collecting as a form of web archiving. International Journal on Digital Libraries, 19 ( 1 ), 21 – 38.
dc.identifier.citedreferenceMassey, C. G., Genadek, K. R., Alexander, J. T., Gardner, T. K., & O’Hara, A. ( 2018 ). Linking the 1940 U.S. census with modern data. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 51 ( 4 ), 246 – 257.
dc.identifier.citedreferenceMayernik, M. S. ( 2016 ). Research data and metadata curation as institutional issues. Journal of the Association for Information Science and Technology, 67 ( 4 ), 973 – 993.
dc.identifier.citedreferenceMc Overton, J. C., Young, T. C., & Overton, W. S. ( 1993 ). Using ‘found’ data to augment a probability sample: Procedure and case study. Environmental Monitoring and Assessment, 26 ( 1 ), 65 – 83.
dc.identifier.citedreferenceMorstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. ( 2013 ). Is the sample good enough? comparing data from twitter’s streaming API with twitter’s firehose. In ICWSM. Retrieved from aaai.org.
dc.identifier.citedreferenceMostert, M., Bredenoord, A. L., Biesaart, M. C. I. H., & van Delden, J. J. M. ( 2016 ). Big data in medical research and EU data protection law: Challenges to the consent or anonymise approach. European Journal of Human Genetics, 24 ( 7 ), 956 – 960.
dc.identifier.citedreferenceOCDX‐Specification. ( 2016 ). Retrieved from https://github.com/OCDX/OCDX-Specification.
dc.identifier.citedreferencePapacharissi, Z., & de Fatima Oliveira, M. ( 2012 ). Affective news and networked publics: The rhythms of news storytelling on# egypt. The Journal of Communication, 62 ( 2 ), 266 – 282.
dc.identifier.citedreferencePepe, A., Goodman, A., Muench, A., Crosas, M., & Erdmann, C. ( 2014 ). How do astronomers share data? reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers. PLoS One, 9 ( 8 ), e104798.
dc.identifier.citedreferencePolitou, E., Alepis, E., & Patsakis, C. ( 2018 ). Forgetting personal data and revoking consent under the GDPR: Challenges and proposed solutions. Journal of Cyber Security, 4 ( 1 ), 1–20. http://dx.doi.org/10.1093/cybsec/tyy001
dc.identifier.citedreferenceRambukkana, N. ( 2015 ). Hashtag publics: The power and politics of discursive networks. New York: Peter Lang.
dc.identifier.citedreferenceRandall, S., & Coast, E. ( 2016 ). The quality of demographic data on older africans. DemRes, 34, 143 – 174.
dc.identifier.citedreferenceRoback, A., & Hemphill, L. ( 2013 ). I’d have to vote against you: issue campaigning via twitter. In Proceedings of the 2013 conference on computer supported cooperative work companion (pp. 259–262). New York, NY: ACM.
dc.identifier.citedreferenceSayogo, D. S., & Pardo, T. A. ( 2013 ). Exploring the determinants of scientific data sharing: Understanding the motivation to publish research data. Government Information Quarterly, 30, S19 – S31.
dc.identifier.citedreferenceSchaub, F., Balebako, R., & Cranor, L. F. ( 2017 ). Designing effective privacy notices and controls. IEEE Internet Computing, 1 – 1. http://dx.doi.org/10.1109/MIC.2017.265102930
dc.identifier.citedreferenceShapiro, M. A., & Hemphill, L. ( 2017 ). Politicians and the policy agenda: Does use of twitter by the U.S. congress direct new york times content? Policy & Internet, 9 ( 1 ), 109 – 132.
dc.identifier.citedreferenceSoroka, S., Daku, M., Hiaeshutter‐Rice, D., Guggenheim, L., & Pasek, J. ( 2018 ). Negativity and positivity biases in economic news coverage: Traditional versus social media. Communication Research, 45 ( 7 ), 1078 – 1098.
dc.identifier.citedreferenceTenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., … Frame, M. ( 2011 ). Data sharing by scientists: practices and perceptions. PLoS One, 6 ( 6 ), e21101.
dc.identifier.citedreferenceTenopir, C., Dalton, E. D., Allard, S., Frame, M., Pjesivac, I., Birch, B., … Dorsett, K. ( 2015 ). Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLoS One, 10 ( 8 ), e0134826.
dc.identifier.citedreferenceThelwall, M., Haustein, S., Larivière, V., & Sugimoto, C. R. ( 2013 ). Do altmetrics work? twitter and ten other social web services. PLoS One, 8 ( 5 ), e64841.
dc.identifier.citedreferenceTownsend, L. ( 2017 ). The ethics of using social media data in research: A new framework. In C. Wallace & W. Kandy (Eds.), The ethics of online research (Vol. 2, pp. 189 – 207 ). Bingley, UK: Emerald Publishing Limited.
dc.identifier.citedreferenceVoss, A., Lvov, I., & Thomson, S. D. ( 2017 ). Data storage, curation and preservation. In L. Sloan & A. Quan‐Haase (Eds.), The SAGE handbook of social media research methods (pp. 161 – 176 ). London: SAGE Publications Ltd.
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.