Work Description

Title: ASVSpoof Laundered Database Open Access Deposited

h
Attribute Value
Methodology
  • This database is based on the ASVspoof 2019 logical access (LA) eval partition. The Asvspoof 2019 LA eval database is passed through five different types of additive noise at three different Signal-to-Noise ratio (SNR) levels, three types of reverberation noise, six different re-compression rates, four different resampling factors, and one type of low pass filtering accumulating to a total of 1388.22 hours of audio data. Additive Noise includes white noise, babble noise, volvo noise, street noise, and cafe noise. Three reverberation types include reverberation with reverberation time (RT60) equal to 0.3, 0.6, and 0.9 secs. Recompression rates include compression at bit rates of 16, 64, 128, 192, 256, and 320 Kbits/s. Resampling rates include 8 KHz, 11 KHz, 22 KHz, and 44 KHz.
Description
  • Voice-cloning (VC) systems have seen an exceptional increase in the realism of synthesized speech in recent years. The high quality of synthesized speech and the availability of low-cost VC services have given rise to many potential abuses of this technology such as online smearing campaigns and dissemination of fabricated information etc. A number of detection methodologies have been proposed over the years that can detect voice spoofs with reasonably good accuracy. However, these methodologies are mostly evaluated on clean audio databases, such as Asvspoof 2019. This research aims to evaluate state-of-the-art (SOTA) Audio Spoof Detection approaches in the presence of laundering attacks. In that regard, a new laundering attack database, called ASVspoof Laundering Database, is created. This database is based on the ASVspoof 2019 LA eval database comprising a total of 1388.22 hours of audio recordings. Seven SOTA audio spoof detection approaches are evaluated on this laundered database. The results indicate that SOTA systems perform poorly in the presence of aggressive laundering attacks, especially reverberation and additive noise attacks. This suggests the need for robust audio spoof detection.
Creator
Creator ORCID
Depositor
  • alhashim@umich.edu
Contact information
Discipline
Keyword
Date coverage
  • 2024-02-10
Resource type
Last modified
  • 06/04/2024
Published
  • 06/04/2024
Language
DOI
  • https://doi.org/10.7302/za3p-5005
License
To Cite this Work:
Ali, H., Subramani, S., Sudhir, S., Varahamurthy, R., Malik, H. (2024). ASVSpoof Laundered Database [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/za3p-5005

Relationships

This work is not a member of any user collections.

Files (Count: 3; Size: 120 GB)

========================================================================================================
ASVspoof Laundered Database: This database is based on ASVspoof 2019 logical access (LA) eval partition.

The Asvspoof 2019 LA eval database is passed through five different types of additive noise at three
different Signal-to-Noise ratio (SNR) levels, three types of reverberation noise, six different re-compression rates, four
different resampling factors, and one type of low pass filtering accumulating to a total of 1388.22
hours of audio data.

Dataset Creators: Hashim Ali, Surya Subramani, Shefali Sudhir, Raksha Varahamurthy and Hafiz Malik

Dataset Contact: Hashim Ali alhashim@umich.edu

Date Written: 05/29/2024

*** WARNING ***:
The 'flac' folder contains over 2 million (2065873) files. Open this folder at your own risk.

========================================================================================================

1. Directory Structure
_______________________

--> ASVspoofLauneredDatabase
--> flac
--> protocols
--> Readme.txt

2. Description of the audio files
_________________________________

The directory flac contain audio files for each type of laundering attack, namely, Noise_Addition, Reverberation, Recompression, Resampling, and Filtering. Each laundering
attack (i) has different parameters (j) which are described below in the protocols section. All audio files in this directory are in the flac format.

3. Description of the protocols
_______________________________

The directory protocols contains five protocol files, one for each laundering attack.

Each column of the protocol is formatted as:

SPEAKER_ID AUDIO_FILE_NAME SYSTEM_ID KEY Laundering_Type Laundering_Param

1) SPEAKER_ID: LA_****, a 4-digit speaker ID
2) AUDIO_FILE_NAME: LA_****, name of the audio file
3) SYSTEM_ID: ID of the speech spoofing system (A01 - A19), or, for bonafide speech SYSTEM-ID is left blank ('-')
4) KEY: 'bonafide' for genuine speech, or, 'spoof' for spoofing speech
5) Laundering_Type Type of laundering attack. One of 'Noise_Addition', 'Reverberation', 'Recompression', 'Resampling', and 'Filtering'
6) Laundering_Param Parameters for the laundering attack. For example, in the case of Noise_Addition, it can be 'babble_0' where babble is the type of
additive noise and 0 is the SNR level at which the babble noise is added to the audio signal.

Note that:

1) the first four columns are the same as in ASVspoof2019_LA_cm_protocols (refer to the ASVspoof2019 database), where the fourth in the original database
is omitted since it is NOT used for LA.
2) Brief description on the Laundering_Param:

babble_0 babble noise at SNR level of 0
babble_10 babble noise at SNR level of 10
babble_20 babble noise at SNR level of 20
cafe_0 cafe noise at SNR level of 0
cafe_10 cafe noise at SNR level of 10
cafe_20 cafe noise at SNR level of 20
street_0 street noise at SNR level of 0
street_10 street noise at SNR level of 10
street_20 street noise at SNR level of 20
volvo_0 volvo noise at SNR level of 0
volvo_10 volvo noise at SNR level of 10
volvo_20 volvo noise at SNR level of 20
white_0 white noise at SNR level of 0
white_10 white noise at SNR level of 10
white_20 white noise at SNR level of 20
RT_0_3 Reverberation with RT60 = 0.3 sec
RT_0_6 Reverberation with RT60 = 0.6 sec
RT_0_9 Reverberation with RT60 = 0.9 sec
recompression_128k Compression using bit rate of 128 kbit/s
recompression_16k Compression using bit rate of 16 kbit/s
recompression_196k Compression using bit rate of 196 kbit/s
recompression_256k Compression using bit rate of 256 kbit/s
recompression_320k Compression using bit rate of 320 kbit/s
recompression_64k Compression using bit rate of 64 kbit/s
resample_11025 resampling rate of 11025 Hz
resample_22050 resampling rate of 22050 Hz
resample_44100 resampling rate of 44100 Hz
resample_8000 resampling rate of 8000 Hz
lpf_7000 low pass filtering with a cut-off frequency of 7 Khz

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 120 GB is too large to download directly. Consider using Globus (see below).

Files are ready   Download Data from Globus
Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.