Bulgin, James and De Decker, Paul and Nycz, Jennifer (2010) Reliability of Formant Measurements from Lossy Compressed Audio. In: British Association of Academic Phoneticians Colloquium, March 29-31, 2010, London, UK. (Unpublished)
PDF (Migrated (PDF/A Conversion) from original format: (application/pdf))
Lossy audio compression algorithms, such as those u sed in mp3s and VoIP services like Skype, achieve their high levels of compression through de structively modifying the original waveform. Although lightly compressed audio is indistinguisha ble from uncompressed audio to the human ear, it is unclear how strong an effect compression might have upon the accuracy of acoustic measurements made for the purpose of phonetic study . This study examined formant measurements from a series of sociolinguistic recor dings of both male and female speakers, and their reliability when these recordings were co mpressed. If acoustic analysis of compressed audio is suffici ently reliable, it could provide practical benefits to researchers. For instance, some collect ions of linguistically interesting recordings may not be available in a lossless format. In addit ion, the much smaller file size of compressed audio (potentially 10-30 times smaller than uncompr essed audio) could simplify the management of large corpora, and make it more feasi ble to share them with other researchers, especially over the internet. Finally, VoIP service s could offer a potentially useful tool for gathering linguistic recordings from remote locatio ns. In this study, recordings originally encoded as 24- bit, 44Hz wav files were re-encoded as mp3s, at three levels of compression (64, 128, and 320 kbps (CBR)) using the encoder built into Sound Forge. Additionally, the originals were trans mitted and re-recorded over Skype, to test the compression algorithms used internally by this program. F0 though F4 were measured using Praat (Boersma and Weenink 2009) at the temporal mi dpoint of approximately 100 vowel utterances for two speakers, and these measurements were repeated at the same timestamps for each version of the recording. The results for each compressed version were then compared with the original measurements. Results suggest that even high levels of mp3 compre ssion have a minimal effect on the accuracy of formant measurements (a result similar to Van Son 2005). For the speakers examined, F1 and F2 for each vowel type differed fr om the original recording by an average of 3Hz and 9Hz, respectively, on average. For many lin guistic purposes, this is an acceptable margin-of-error. Recordings transmitted over Skype differed from their originals to a significantly greater degree, and it does not appear at this time to be a suitable tool for gathering recordings where accurate acoustic analysis is required.
|Item Type:||Conference or Workshop Item (UNSPECIFIED)|
|Keywords:||vowel formants, compression, wav, mp3, skype, phonetics, sociolinguistics|
|Department(s):||Humanities and Social Sciences, Faculty of > Linguistics|
|Date:||29 March 2010|
Actions (login required)