Show simple item record

Delivering Affordable Fault-tolerance to Commodity Computer Systems.

dc.contributor.authorFeng, Shuguangen_US
dc.date.accessioned2011-09-15T17:16:55Z
dc.date.availableNO_RESTRICTIONen_US
dc.date.available2011-09-15T17:16:55Z
dc.date.issued2011en_US
dc.date.submitteden_US
dc.identifier.urihttps://hdl.handle.net/2027.42/86483
dc.description.abstractTo meet an insatiable consumer demand for greater performance at less power, silicon technology has scaled to unprecedented dimensions. This aggressive scaling has provided designers with an ever increasing budget of cheaper and faster transistors. Unfortunately, this trend has also been accompanied by a decline in individual device reliability as transistors have become increasingly susceptible to a host of threats. With each new technology generation the challenges associated with process variation, wearout, and transient faults gain greater prominence. We are quickly approaching a new era where fault-tolerance is becoming a first-order design constraint, no longer a luxury reserved exclusively for high-reliability, mission-critical domains. Even commodity microprocessors used in mainstream computing will require protection. However, just as the reliability needs of NASA and Apple differ dramatically, so does their ability to absorb the costs necessary to ensure fault-tolerance. Viable solutions targeting commodity systems must not only recognize this fact, but must embrace it. Simply stripping down techniques developed for enterprise servers may not result in the most appropriate designs for your laptop or cellphone. The best solutions will exploit the relaxed reliability constraints of commodity systems, judiciously sacrificing a small degree of fault tolerance to achieve far greater reductions in overhead costs. This thesis proposes a collection of works that can be selectively mixed and matched to assemble reliability solutions tailor-fit for the commodity systems community. Although the works presented address a variety of different issues from wearout to transient faults and prevention to detection, they were all motivated by the same observation–that much of the overhead costs associated with conventional fault tolerance mechanisms are spent in pursuit of the last few “nines” of reliability. This conclusion gave rise to the philosophy permeating the chapters of this work, that summarily dismissing techniques that cannot supply mission-critical fault tolerance is no longer acceptable. In presenting concrete solutions to a few of the more interesting challenges—proactive wear-leveling orchestrated through intelligent job scheduling and software-only transient fault detection and recovery that exploits intrinsic computational patterns within applications—we establish fundamental principles that can be applied more broadly to formulate a comprehensive reliability strategy.en_US
dc.language.isoen_USen_US
dc.subjectFault Tolerant Computingen_US
dc.subjectComputer Architectureen_US
dc.subjectCompiler Analysisen_US
dc.titleDelivering Affordable Fault-tolerance to Commodity Computer Systems.en_US
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineComputer Science & Engineeringen_US
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studiesen_US
dc.contributor.committeememberMahlke, Scotten_US
dc.contributor.committeememberBlaauw, Daviden_US
dc.contributor.committeememberBose, Pradipen_US
dc.contributor.committeememberMudge, Trevor N.en_US
dc.contributor.committeememberSylvester, Dennis Michaelen_US
dc.contributor.committeememberWenisch, Thomas F.en_US
dc.subject.hlbsecondlevelComputer Scienceen_US
dc.subject.hlbsecondlevelElectrical Engineeringen_US
dc.subject.hlbtoplevelEngineeringen_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/86483/1/shoe_1.pdf
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.