Deep Signal Compression with Feature Representation Learning

Liu, Bowen

Deep Signal Compression with Feature Representation Learning

Liu, Bowen

2024

View/Open

bowenliu_1.pdf

(60.7MB

PDF)

Abstract

Deep learning-based lossy signal compression methods have achieved substantial progress and significantly enriched signal compression methodologies in recent years. There are two major aspects that signal source coding can benefit from learned methods. Firstly, the data-intense nature of deep signal compression methods allows a good capture of the probabilistic distribution of feature representations, which leads to efficient entropy coding with proper modeling. Secondly, neural network architectures can provide powerful solutions to feature extraction and representation learning, therefore enabling the elimination of spatial and temporal redundancies by mapping the raw signal to compacter feature domains. This thesis presents four related works addressing the compression problem of different data formats, including speech audio, image, video, and point cloud. The first work introduces a unified compression method that uses generative adversarial networks (GAN) to compress speech audio and images. The compressed signal is represented by a latent vector fed into a generator network, which is trained to produce high-quality signals that minimize a target objective function. The alternating direction method of multipliers (ADMM) based non-uniform quantization is incorporated to effectively discretize the resulting latent vectors. The second work presents a deep video coding framework that predicts and compresses video sequences in the latent vector space. The proposed method first learns the efficient feature domain representation of each video frame and then performs inter-frame prediction in that lower-dimensional space. To exploit the temporal correlation among the feature space frames, it employs a convolutional long short-term memory (ConvLSTM) based network to predict the representation of the future frame. The transmitted bitstream is obtained by quantizing and entropy encoding the feature space residual. The application of the proposed video prediction scheme is studied in the anomaly detection task. The third work aims to address the motion pattern adaptability issue that widely exists in video codecs by a block wise mode ensemble deep video compression framework. It selects the optimal mode for feature domain prediction adapting to different motion patterns. Proposed multi-modes include ConvLSTM-based feature domain prediction, optical flow conditioned feature domain prediction, and feature propagation to address a wide range of cases from static scenes without apparent motions to dynamic scenes with a moving camera. Guided by a binary density map, dense and sparse post-quantization residual blocks are coded in separate entropy coding schemes. On top of that, applying optional run-length coding to sparse residuals can further improve the compression rate. The last work focuses on exploring methods to compress light detection and ranging (LiDAR) data, which extends the study of deep signal compression problems from 2D to 3D domain. LiDAR sensors are widely adopted in a number of applications in the autonomous navigation, virtual reality (VR), and augmented reality (AR) industries, where communication bandwidth is one of the top concerns but understudied. With point clouds and range images being two interchangeable LiDAR data representations, a hybrid framework is introduced to take the best of both worlds. The proposed pipeline mostly relies on a prediction-based approach to exploit spatial and temporal correlations in range images, while providing an octree-based path as an important fallback in certain cases to preserve the reconstruction quality. A content adaptive point cloud sampling technique is also introduced to promote extra compression gains while proving to have minimal impact on machine perceptual tasks.

Deep Blue DOI

https://dx.doi.org/10.7302/23052

Subjects

Deep Learning

Representation Learning

Signal Compression

Types

Thesis

Handle

https://hdl.handle.net/2027.42/193407

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.