Towards human-quality drum accompaniment using deep generative models and transformers

Sadeghi Amjadi, Arash (2025) Towards human-quality drum accompaniment using deep generative models and transformers. Masters thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (1MB)

Abstract

Automatic music generation has garnered significant interest among musicians and composers. In particular, the task of accompaniment in music generation presents unique challenges, as it involves generating an instrument track responsive to other played instruments. This project focuses on accompanying musicians with automatically generated tracks, specifically accompanying bass guitar players with AIgenerated drum tracks. The proposed system was trained on multi-track songs to capture the connection between bass and drum tracks using the framework of Conditional Generative Adversarial Networks (CGANs). Unlike typical AI-generated drum tracks, which often lack nuanced dynamics, human-performed drums feature expressive elements such as velocity—the varying loudness of each strike. To capture this expressiveness, our transformer model is trained on human drum performances and focuses on assigning realistic velocities to the generated drum hits. An ablation study was conducted, and the results indicate that combining pitch and velocity generation into a single network significantly reduces music quality (measured by groove consistency), reinforcing our approach of separating velocity assignments to maintain coherent drum patterns while enhancing expressiveness. We also evaluate the generated music using objective metrics, demonstrating the models’ performance and evolution during training. The drum generation system supports real-time interaction, enabling spontaneous live jamming sessions. Simplifications facilitate real-time operation, and we provide results from sample sessions.

Item Type: Thesis (Masters)
URI: http://research.library.mun.ca/id/eprint/16957
Item ID: 16957
Additional Information: Includes bibliographical references (pages 44-47)
Keywords: accompaniment generation, deep learning, music, large language models
Department(s): Science, Faculty of > Computer Science
Date: May 2025
Date Type: Submission
Library of Congress Subject Heading: Deep learning (Machine learning); Composition (Music); Musical accompaniment--Computer simulation; Computer music--Performance; Drum set--Computer simulation

Actions (login required)

View Item View Item

Downloads

Downloads per month over the past year

View more statistics