Deep Signal Compression with Feature Representation Learning
Liu, Bowen
2024
Abstract
Deep learning-based lossy signal compression methods have achieved substantial progress and significantly enriched signal compression methodologies in recent years. There are two major aspects that signal source coding can benefit from learned methods. Firstly, the data-intense nature of deep signal compression methods allows a good capture of the probabilistic distribution of feature representations, which leads to efficient entropy coding with proper modeling. Secondly, neural network architectures can provide powerful solutions to feature extraction and representation learning, therefore enabling the elimination of spatial and temporal redundancies by mapping the raw signal to compacter feature domains. This thesis presents four related works addressing the compression problem of different data formats, including speech audio, image, video, and point cloud. The first work introduces a unified compression method that uses generative adversarial networks (GAN) to compress speech audio and images. The compressed signal is represented by a latent vector fed into a generator network, which is trained to produce high-quality signals that minimize a target objective function. The alternating direction method of multipliers (ADMM) based non-uniform quantization is incorporated to effectively discretize the resulting latent vectors. The second work presents a deep video coding framework that predicts and compresses video sequences in the latent vector space. The proposed method first learns the efficient feature domain representation of each video frame and then performs inter-frame prediction in that lower-dimensional space. To exploit the temporal correlation among the feature space frames, it employs a convolutional long short-term memory (ConvLSTM) based network to predict the representation of the future frame. The transmitted bitstream is obtained by quantizing and entropy encoding the feature space residual. The application of the proposed video prediction scheme is studied in the anomaly detection task. The third work aims to address the motion pattern adaptability issue that widely exists in video codecs by a block wise mode ensemble deep video compression framework. It selects the optimal mode for feature domain prediction adapting to different motion patterns. Proposed multi-modes include ConvLSTM-based feature domain prediction, optical flow conditioned feature domain prediction, and feature propagation to address a wide range of cases from static scenes without apparent motions to dynamic scenes with a moving camera. Guided by a binary density map, dense and sparse post-quantization residual blocks are coded in separate entropy coding schemes. On top of that, applying optional run-length coding to sparse residuals can further improve the compression rate. The last work focuses on exploring methods to compress light detection and ranging (LiDAR) data, which extends the study of deep signal compression problems from 2D to 3D domain. LiDAR sensors are widely adopted in a number of applications in the autonomous navigation, virtual reality (VR), and augmented reality (AR) industries, where communication bandwidth is one of the top concerns but understudied. With point clouds and range images being two interchangeable LiDAR data representations, a hybrid framework is introduced to take the best of both worlds. The proposed pipeline mostly relies on a prediction-based approach to exploit spatial and temporal correlations in range images, while providing an octree-based path as an important fallback in certain cases to preserve the reconstruction quality. A content adaptive point cloud sampling technique is also introduced to promote extra compression gains while proving to have minimal impact on machine perceptual tasks.Deep Blue DOI
Subjects
Deep Learning Representation Learning Signal Compression
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.