A deep generative model framework for creating high quality synthetic transaction sequences

Nickerson, Kyle (2023) A deep generative model framework for creating high quality synthetic transaction sequences. Doctoral (PhD) thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (2MB)

Abstract

Synthetic data are artificially generated data that closely model real-world measurements, and can be a valuable substitute for real data in domains where it is costly to obtain real data, or privacy concerns exist. Synthetic data has traditionally been generated using computational simulations, but deep generative models (DGMs) are increasingly used to generate high-quality synthetic data. In this thesis, we create a framework which employs DGMs for generating highquality synthetic transaction sequences. Transaction sequences, such as we may see in an online banking platform, or credit card statement, are important type of financial data for gaining insight into financial systems. However, research involving this type of data is typically limited to large financial institutions, as privacy concerns often prevent academic researchers from accessing this kind of data. Our work represents a step towards creating shareable synthetic transaction sequence datasets, containing data not connected to any actual humans. To achieve this goal, we begin by developing Banksformer, a DGM based on the transformer architecture, which is able to generate high-quality synthetic transaction sequences. Throughout the remainder of the thesis, we develop extensions to Banksformer that further improve the quality of data we generate. Additionally, we perform extensively examination of the quality synthetic data produced by our method, both with qualitative visualizations and quantitative metrics.

Item Type: Thesis (Doctoral (PhD))
URI: http://research.library.mun.ca/id/eprint/16098
Item ID: 16098
Additional Information: Includes bibliographical references (pages 107-122)
Keywords: generative models, synthetic data, banking data, machine learning, evolutionary computing
Department(s): Science, Faculty of > Computer Science
Date: August 2023
Date Type: Submission
Digital Object Identifier (DOI): https://doi.org/10.48336/VRYB-9E21
Library of Congress Subject Heading: Machine learning; Deep learning (Machine learning); Evolutionary programming (Computer science); Electronic data processing Quantitative research

Actions (login required)

View Item View Item

Downloads

Downloads per month over the past year

View more statistics