Digital Audio Processing

The Digital Audio Processing Group is a multidisciplinary research group working in the areas of Music Informatics, Digital Signal Processing (DSP), and computational musicology.

With a wealth of knowledge gained from working alongside industry and in collaboration with other institutions, we are ideally placed to engage in cutting-edge research work. As part of a faculty with a long history of working alongside creative companies, we use innovative computational techniques to develop a range of outputs that can be employed by forward-thinking businesses. 

Housed in Millennium Point, Birmingham, the Group is a vibrant hub for audio technology research. With close links to research teams in Digital Image and Video Processing as well as the Centre for Music and Performance at Birmingham Conservatoire, all the research within this group is focused on using a range of scientific and mathematical techniques and draws on a strong emphasis on DSP. We have expanding opportunities for students wishing to pursue MPhil and PhD study.

Current projects

Low Latency High Resolution Audio Processing

This research focuses on minimising and quantifying audio processing latency in live music processing system chain, including low latency Sigma Delta ADC/DAC Architecture, low latency multichannel live audio system architecture and low latency audio processing DSP algorithms and sample based processing.

Low Latency Future Network

This research is the collaboration with Jerry Foss and John Grant. The research focuses on low delay multichannel multi-rate audio networking including a novel layer-two media access control (MAC) architecture, to ensure the delivery of time deterministic interactive audio/video signals. This research also looks into new audio routing/switching architecture and the FPGA based prototype.

Application of perceptual models to automatic music mixing

In recent years, researchers among the music signal processing community have turned to the development of intelligent mixing tools to perform fundamental mixing procedures automatically. This research aims to develop novel audio systems driven by computational hearing models for applications in mixing multichannel musical audio. With sophisticated auditory models readily available, an opportunity arises in which one can employ them to inform the processing chain to perform operations that are very much influenced by masking and loudness perception, such as level balancing, equalisation and dynamics compression.

Probabilistic Models for Expressive Musical Performance

Systems to emulate expression in musical performances have been developed using Hidden Markov Models (HMMs) and Bayesian networks. These systems are trained with human performance data in order to synthesize musical sequences with realistic articulation patterns. During the project we have developed DAW-based humanisers that model professional drummers and singing voice synthesiser that model pop-music vocalists.

Acoustic Variance in Musical Performance

This research explores the variances between different players where they physically influence the sound of the instrument being played. Initially emanating from the playing of traditional folk flute players, the project analyses different personal styles of playing. Timbre, melodic variation and ornamentation are detected to aid the understanding and learning process of authentic folk music styles. The project also concentrates on acoustic variances between musical instruments of the same type in order to inform musicians and instrument manufacturers.

Intelligent Music Production

In the production of music, there are a large number of complicated processes that can be extremely difficult to use effectively, without years of practice and experience. It is therefore useful to make these processes more accessible to a wider audience, thus bridging the gap between musicians and technology. To do this, we consider ways in which semantic terminology can be used to control low-level music production parameters. We approach this problem using a number of techniques, including natural language processing algorithms, unsupervised machine learning and adaptive audio processing.

Academic Staff

Prof Cham Athwal - Associate Head of School (Research)
Cham is a Professor of Digital Technology and Head of Research within the DMT school. His research interests cover 3D modelling, image processing, video processing, digital signal processing web technologies and simulation. Currently Cham is supervising eight PhD/MPhil projects covering a range of subjects including digital audio processing, digital image processing and virtual environments. Email: Phone: +44 (0)121 331 5458

Dr Ryan Stables

Ryan is a lecturer in audio engineering and acoustics, and subject leader in sound technology. His research interests span a wide range of topics from machine listening and music information retrieval to sonification and humanisation. He is currently involved in writing grant applications and supervising PhD students in the areas of musical semantics and informed source separation. Email:

Islah Ali-MacLachlan
Islah is a senior lecturer and subject leader in sound technology within the school of DMT. His research areas are musical acoustics, player variance and traditional music. Email: Phone: +44 (0)121 331 7435

Yonghao Wang
Yonghao is a Senior Lecturer within the School of Digital Media Technology. His recent research focuses on low-latency high-resolution audio processing, sigma delta ADC/DAC, and low-latency future networks in collaboration with John Grant and Jerry Foss. Email:

Jason Hockman

Dr Jason Hockman
Jason is a lecturer in audio engineering with interests in the areas of music information retrieval and computational musicology. He has performed research on a variety of topics including beat and meter analysis, rhythm description, audio effects, and human computer interaction. His current research is related to the automated analysis of UK electronic music and culture, including the genres of Hardcore, Jungle, and Drum and Bass. Email:

Research Students

Matthew Cheshire

Matthew is working towards a PhD in the area of musical semantics. He is currently developing a system that will analyse a large corpus of music production data in order to provide high-level controls for digital audio effects. Email:

Xueyang Wang

Xueyang is a currently working in the area of informed source separation. She is using techniques derived from Music Information Retrieval (MIR) to improve the accuracy of algorithms such as Non-negative Matrix Factorisation (NMF) when applied to musical signals. Email:

Carl Southall

Carl is a PhD student working in the field of music information retrieval. His main area of interest is using machine learning to improve the tasks of automatic music transcription and automatic music generation. Email:

Nicholas Jillings

Nick is a full-time PhD student in the DMT Lab researching intelligent music production. He has a BSc in Sound Engineering and Production from Birmingham City University and an MSc in Digital Signal Processing from Queen Mary, University of London. His main PhD focus is on simplifying the mixing process. This includes building intelligent tools leveraging semantic descriptors to power automatic mixing technologies, such as automatic track grouping or effect recommendation. Parallel to this is developing new software for data collection, including web-powered production suites to gather large volumes of production data, furthering the understanding of the mixing process. He is also involved in the Birmingham In Real Time project and lead-developer on the Web Audio Evaluation Toolbox, developed in collaboration with Queen Mary.

Maciek Tomczak

Maciek is working towards his PhD in the area of music informatics with a focus on rhythm analysis using machine learning techniques.

Spyridon Stasis

Spyros is a PhD student researching the field of semantic equalisation, and the correlation between timbral adjectives and operational actions undertaken by sound engineers and music producers. The techniques he is using include machine learning applications and specifically neural networks, which are implemented for data analysis purposes as well as to provide the basis for simplified audio effect interfaces.

PhD Opportunities

We have many emerging areas of research, including but not limited to:

  • Bayesian models of musical perception
  • Sparse signal representations
  • Multi-modal signal processing
  • High-level musical semantics
  • Symbolic representation of musical structure
  • Mood-based musical similarity
  • Scalable algorithms for large datasets
  • New interfaces for musical expression
  • Adaptive signal processing
  • Speech and singing synthesis
  • Folk music analysis
  • Machine listening
  • Algorithmic composition
  • Automatic music transcription
  • Auditory perception
  • Low-latency audio processing

For more information on any of these topics or studying with us, please contact Prof Cham Athwal.

Recent Publications

M. Köküer, P. Jančovič, I. A. MacLachlan, C. Athwal, (2014) “Automated detection of single- and multi-note ornaments in Irish traditional flute playing,” The 15th International Society for Music Information Retrieval Conference (ISMIR), Taipei, Taiwan, Oct. 27-31 2014.

M. Köküer, I. A. MacLachlan, P. Jančovič, C. Athwal, (2014) “Automated detection of single-note ornaments in Irish traditional flute playing,” Fourth International Workshop on Folk Music Analysis, Istanbul, Turkey, Jun. 6-7, 2014.

Isacco Arnaldi, Yonghao Wang, "A SIMULINK toolbox of Sigma-Delta modulators for high resolution audio conversions", to appear on 137th AES Convention Los Angeles, 2014.

Nicholas Jillings, Yonghao Wang, "CUDA Accelerated Audio Digital Signal Processing for Real-Time Algorithms", to appear on 137th AES Convention Los Angeles, 2014.

Stables, R., Enderby, S., De Man, B., Fazekas, G., Reiss, J. SAFE: A System for the Extraction and Retrieval of Semantic Audio Descriptors. The 15th Int. Society for Music Information Retrieval Conference (ISMIR-2014), Taipei, Taiwan, 2014.

Stables, R., Endo, S., Wing, A. Multi-Player Microtiming Humanisation using a Multivariate Markov Model. The 17th Int. Conference on Digital Audio Effects (DAFx-14), Erlangen, Germany, 2014. Vicinanza, D., Stables, R., Clemens, G., Baker, M. Differentiated Stem Cell Classification in Infrared Spectroscopy using Auditory Feedback. The 20th Int. Conference on Auditory Display (ICAD-2014), New York, USA, 2014.

R. Stables, “Semantic Music Production using Producer-defined Descriptors”, Audio Engineering Society 53rd Conference on Semantic Audio, London UK 2014.

Hockman, J.A. 2014. An ethnographic and technological study of breakbeats in Hardcore, Jungle, and Drum & Bass. Ph.D Dissertation, McGill University.

D. Ward, C. Athwal, M. Köküer, “An Efficient Time-varying Loudness Model,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New York, USA, Oct. 20-23, 2013.

I. A. MacLachlan, M. Köküer, P. Jančovič, I. Williams, C. Athwal, “Quantifying Timbral Variations in Traditional Irish Flute Playing,” Third International Workshop on Folk Music Analysis, Amsterdam, Netherlands, Jun. 6-7, 2013.

Enderby S., Baracskai Z., “Harmonic Instability of Digital Soft Clipping Algorithms” DAFx-12, York, September 2012.

Y. Wang, J. D. Reiss, "Time domain performance of decimation filter architectures for high resolution sigma delta analogue to digital conversion", 132nd AES Convention, Budapest, Hungary, April 26–29, 2012.

Y. Wang, X. Zhu, Q. Fu, "A Low Latency Multichannel Audio Processing Evaluation Platform", 132nd AES Convention, Budapest, Hungary, April 26–29, 2012.

Stables, R. Bullock, J. and Athwal, C., "Fundamental Frequency Modulation in Singing Voice Synthesis." Lecture Notes in Computer Science, Springer March 2012.

Hockman, J.A., and I. Fujinaga. 2012. One in the jungle: Downbeat detection in hardcore, jungle, and drum and bass. In Proceedings of the International Society of Music Information Retrieval Conference, Porto, Portugal. 169–74.

Stables, R., Kokuer, M., Athwal, C.,"Objective Evaluation of Naturalness in Singing Voice Synthesis." UKSpeech, Birmingham, UK. 2012.

Stables, R., Athwal, C. and Cade, R. "Percussion Humanisation Using a Recursive Bayesian Framework. " 133rd Audio Engineering Society Convention, San Francisco, USA. 2012

Ward D, Athwal C and Reiss J, " Multi-track mixing using a model of loudness and partial loudness", Audio Engineering Society (AES) 133rd Convention, San Francisco, 2012

Y. Wang, J. Grant, and J. Foss, "Flexilink: A unified low latency network architecture for multichannel live audio", to appear on 133rd AES Convention San Francisco, 2012.

Trutzschler von Falkenstein J & Baracskai Z, “ A Graphical User Interface for SuperCollider Audio Units”, International Computer Music Conference (ICMC-2011), Huddersfield, July 2011

Hockman, J.A., D.M. Weigl, C. Guastavino, and I. Fujinaga. 2011. Discrimination between phonograph playback systems. In Proceedings of the 131st Convention of the Audio Engineering Society, New York, United States.

Stables, R., Bullock, J. and Athwal, C. "The Humanisation of Stochastic Processes for the Modelling of f0 Drift in Singing" FRSM/CMMR, Bhubaneswar, India. 2011.

Wang, Y., Engineering Brief: "Latency Measurements of Audio Sigma Delta Analogue to Digital and Digital to Analogue Converts ", 131st AES Convention, New York, USA, Oct 20-23, 2011. Stables, R., Bullock, J. and Williams, I. "Perceptually Relevant Models for Articulation in Synthesised Drum Patterns" 131st Audio Engineering Society Convention, New York, USA. 2011

Stables, R., Bullock, J. and Athwal, C. "Towards a Model for the Humanisation of Pitch Drift in Singing Voice Synthesis" International Computer Music convention (ICMC 2011), Huddersfield, UK. 2011

Baracskai Z., “New Trends in Algorithmic Composition”, 7th Int. Symp. “Music in Society”, Sarajevo, October 2010 published in journal Musica

Baracskai Z., and Stables, R., “Algorithms for Digital Subharmonic Distortion”, Audio Engineering Society (AES) 128th Convention in London, May 2010

Hockman, J.A., and I. Fujinaga. 2010. Fast vs. slow: Learning tempo octaves from user data. In Proceedings of the International Society of Music Information Retrieval Conference, Utrecht, Netherlands. 231–6.

McKay, C., J.A. Burgoyne, J.A. Hockman, J. Smith, G. Vigliensoni, and I. Fujinaga. 2010. Evaluating the performance of lyrical features relative to and in combination with audio, symbolic and cultural features. In Proceedings of the International Conference on Music Information Retrieval, Utrecht, Netherlands. 213–8.

Li, Z., Q. Xiang, J.A. Hockman, J. Yang, Y. Yi, I. Fujinaga, and Y. Wang. 2010. A music search engine for therapeutic gait training. In Proceedings of ACM Multimedia 2010, Florence, Italy.

Wang, Y., Stables, R. and Reiss, J. "Audio Latency Measurement for Desktop Operating Systems with Onboard Soundcards" 128th Audio Engineering Society Convention, London, UK. 2010.

Baracskai, Z., and Stables, R. "Algorithms for Digital Subharmonic Distortion" 128th Audio Engineering Society Convention, London, UK. 2010.

Baracskai Z., “The Max for Live platform”, Digital Music Research Network, Queen Mary London, December 2009

Baracskai Z., “Presentation on Interactive Systems”, Symposium of Performing Technologies, Sonic Arts Research Centre Belfast, May 2009.

Hockman, J.A., M.M. Wanderley, and I. Fujinaga. 2009. Phase vocoder manipulation by runner’s pace. In Proceedings of the Conference for NIME, Pittsburg, United States. 90–3.