Overcoming Barriers to Information Exchange on the Web
dc.contributor.author | Goel, Ayush | |
dc.date.accessioned | 2024-02-13T21:17:41Z | |
dc.date.available | 2024-02-13T21:17:41Z | |
dc.date.issued | 2023 | |
dc.date.submitted | 2023 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/192375 | |
dc.description.abstract | We are increasingly relying on the internet and specifically the world wide web (WWW) to exchange information and access services. Despite its ubiquitous use, there are two key barriers to accessing information that is shared on the web: 1) Many web pages suffer from poor performance with respect to both end-user loading latency and crawling throughput as observed by large-scale web crawlers. 2) Many web pages cease to exist over time causing a significant fraction of published information to no longer be available. My dissertation addresses these issues by employing fine-grained data-flow and control-flow analysis of web computations, specifically JavaScript execution. Using this analysis, I am able to extract and modify JavaScript runtime behavior during web page loads and leverage this ability to build a number of web systems. First, I propose a client-side computation caching system that stores results of JavaScript (JS) execution to reduce compute delays and improve web page load times. I show that up to 85% of JavaScript runtime can be skipped by using such a computation cache. Second, I demonstrate that legacy JavaScript code has untapped potential for parallelization across multiple cores of modern smartphones to improve page load times. I show that 88% speedup in JS execution can be achieved by parallelizing execution on 8 cores of a given mobile device. Third, I built Sprinter, a distributed web crawler that crawls the web at 5 times the rate of traditional browser-based crawlers while preserving perfect fidelity. Sprinter accomplishes this by carefully selecting a subset of pages on any site to be crawled which it crawls using a browser, and caches the corresponding compute. It then performs browser-less crawling of the remaining pages on that site using those cached computations. Finally, I built Jawa, a web archival crawler that reduces the storage overhead of web archives by 41% while eliminating all fidelity issues. Jawa accomplishes this by exploiting the differences between live and archived pages, and accurately identifying and patching the sources of non-determinism that impair JavaScript execution on archived pages. | |
dc.language.iso | en_US | |
dc.subject | Web Performance | |
dc.subject | Web Archival | |
dc.subject | Adverse Impacts of JavaScript on the Web | |
dc.title | Overcoming Barriers to Information Exchange on the Web | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | |
dc.description.thesisdegreediscipline | Computer Science & Engineering | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Madhyastha, Harsha | |
dc.contributor.committeemember | Prakash, Atul | |
dc.contributor.committeemember | Huan, Xun | |
dc.contributor.committeemember | Huang, Ryan | |
dc.contributor.committeemember | Netravali, Ravi | |
dc.subject.hlbsecondlevel | Computer Science | |
dc.subject.hlbtoplevel | Engineering | |
dc.subject.hlbtoplevel | Science | |
dc.contributor.affiliationumcampus | Ann Arbor | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/192375/1/goelayu_1.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/22284 | |
dc.identifier.orcid | 0000-0002-2343-670X | |
dc.identifier.name-orcid | Goel, Ayush; 0000-0002-2343-670X | en_US |
dc.working.doi | 10.7302/22284 | en |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.