Show simple item record

Overcoming Barriers to Information Exchange on the Web

dc.contributor.authorGoel, Ayush
dc.date.accessioned2024-02-13T21:17:41Z
dc.date.available2024-02-13T21:17:41Z
dc.date.issued2023
dc.date.submitted2023
dc.identifier.urihttps://hdl.handle.net/2027.42/192375
dc.description.abstractWe are increasingly relying on the internet and specifically the world wide web (WWW) to exchange information and access services. Despite its ubiquitous use, there are two key barriers to accessing information that is shared on the web: 1) Many web pages suffer from poor performance with respect to both end-user loading latency and crawling throughput as observed by large-scale web crawlers. 2) Many web pages cease to exist over time causing a significant fraction of published information to no longer be available. My dissertation addresses these issues by employing fine-grained data-flow and control-flow analysis of web computations, specifically JavaScript execution. Using this analysis, I am able to extract and modify JavaScript runtime behavior during web page loads and leverage this ability to build a number of web systems. First, I propose a client-side computation caching system that stores results of JavaScript (JS) execution to reduce compute delays and improve web page load times. I show that up to 85% of JavaScript runtime can be skipped by using such a computation cache. Second, I demonstrate that legacy JavaScript code has untapped potential for parallelization across multiple cores of modern smartphones to improve page load times. I show that 88% speedup in JS execution can be achieved by parallelizing execution on 8 cores of a given mobile device. Third, I built Sprinter, a distributed web crawler that crawls the web at 5 times the rate of traditional browser-based crawlers while preserving perfect fidelity. Sprinter accomplishes this by carefully selecting a subset of pages on any site to be crawled which it crawls using a browser, and caches the corresponding compute. It then performs browser-less crawling of the remaining pages on that site using those cached computations. Finally, I built Jawa, a web archival crawler that reduces the storage overhead of web archives by 41% while eliminating all fidelity issues. Jawa accomplishes this by exploiting the differences between live and archived pages, and accurately identifying and patching the sources of non-determinism that impair JavaScript execution on archived pages.
dc.language.isoen_US
dc.subjectWeb Performance
dc.subjectWeb Archival
dc.subjectAdverse Impacts of JavaScript on the Web
dc.titleOvercoming Barriers to Information Exchange on the Web
dc.typeThesis
dc.description.thesisdegreenamePhD
dc.description.thesisdegreedisciplineComputer Science & Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberMadhyastha, Harsha
dc.contributor.committeememberPrakash, Atul
dc.contributor.committeememberHuan, Xun
dc.contributor.committeememberHuang, Ryan
dc.contributor.committeememberNetravali, Ravi
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbtoplevelEngineering
dc.subject.hlbtoplevelScience
dc.contributor.affiliationumcampusAnn Arbor
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/192375/1/goelayu_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/22284
dc.identifier.orcid0000-0002-2343-670X
dc.identifier.name-orcidGoel, Ayush; 0000-0002-2343-670Xen_US
dc.working.doi10.7302/22284en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.