Show simple item record

Improving Articulated Pose Tracking and Contact Force Estimation for Qualitative Assessment of Human Actions

dc.contributor.authorLouis, Nathan
dc.date.accessioned2024-05-22T17:23:06Z
dc.date.available2024-05-22T17:23:06Z
dc.date.issued2024
dc.date.submitted2024
dc.identifier.urihttps://hdl.handle.net/2027.42/193278
dc.description.abstractUsing video to automate human performance metrics or skill analysis is an important but underexplored task. Currently, measuring the quality of an action can be highly subjective, where even assessments from experts are affected by bias and inter-rater reliability. In contrast, Computer vision and AI have the potential to provide real-time non-intrusive solutions with increased objectivity, scalability, and repeatability across various domains. From video alone, we can automatically provide supplemental objective scoring of Olympic sports, evaluate the technical skill of surgeons for training purposes, or monitor the physical rehabilitation progress of a patient. Today we solve these problems with supervised learning, obtaining features that represent high correlation with our desired point of analysis. Supervised learning is powerful, data-driven, and sometimes the best available option. However alone, it may be sub-optimal in the presence of scarce data and insufficient when needed to generalize to varying conditions or to truly understand the target task. In this dissertation, the bases of our human analysis understanding are skeletal poses, namely hand poses and full body poses. For articulated hand poses, we improve tracking using our CondPose network to integrate prior detection confidences and encourage tracking consistency. While for human poses, we propose two physical simulation-based metrics for evaluating physical plausibility and perform external force estimation through predicted ground reaction forces (GRFs). However, in the human analysis domain, collecting and annotating data at the scale of other deep learning tasks is a recurring challenge. This limits our generalizability to different environments, procedures, and motions. We address this by exploring semi-supervised learning methods, such as contrastive pre-training and multi-task learning. We apply articulated hand pose tracking in the surgical environment for assessing surgical skill. By applying a time-shifted sampling augmentation, we introduce clip-contrastive pre-training on embedded hand features as an unsupervised learning step. We show that this contrastive pre-training improves performance when fine-tuned on surgical skill classification and assessment task. Unlike most prior work, we evaluate on open surgery videos rather than solely simulated environments. Specifically, we use videos of non-laparoscopic, collected through collaboration with the Cardiac Surgery department at Michigan Medicine. We use full body poses and contact force estimation to bridge the gap between visual observations and the physical world. This physically-grounded component is vital for understanding actions involving sports or physical rehab where humans interact with their environment. We leverage multitask learning to perform 2D-to-3D human pose estimation and integrate other abundant sources of motion capture data, without requiring additional force plate supervision. Our experiments shows that this improves GRF estimation on unseen motions. To address data limitations, we also collect two novel datasets SurgicalHands and ForcePose. We use SurgicalHands in the surgical domain as a multi-instance articulated hand pose tracking dataset. It encompasses a high degree of complexity in appearance and movement, not present in prior datasets. ForcePose is a multi-view GRF dataset of tracked human poses and time-synchronized force plates, to our knowledge the largest and most varied of its kind. This dataset serves as a benchmark for mapping human body motion and physical forces, enabling physical grounding of specific actions.
dc.language.isoen_US
dc.subjectThrough limited labeled video data, we can infer quantitative evaluation characteristics of human actions.
dc.titleImproving Articulated Pose Tracking and Contact Force Estimation for Qualitative Assessment of Human Actions
dc.typeThesis
dc.description.thesisdegreenamePhD
dc.description.thesisdegreedisciplineElectrical and Computer Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberCorso, Jason
dc.contributor.committeememberOwens, Andrew
dc.contributor.committeememberLikosky, Donald
dc.contributor.committeememberNicolella, Daniel
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbsecondlevelElectrical Engineering
dc.subject.hlbtoplevelEngineering
dc.contributor.affiliationumcampusAnn Arbor
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/193278/1/natlouis_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/22923
dc.identifier.orcid0000-0003-4502-6012
dc.identifier.name-orcidLouis, Nathan; 0000-0003-4502-6012en_US
dc.working.doi10.7302/22923en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.