Deep neural networks for conditional visual synthesis

Huang, Xin (2022) Deep neural networks for conditional visual synthesis. Doctoral (PhD) thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (168MB)

Abstract

Conditional visual synthesis is the process of artificially generating images or videos that satisfy desired constraints. Individual visual synthesis tasks include high-fidelity natural image generation, artwork creation, face animation, etc. Such tasks have many realworld applications, such as database expansion, face editing in beauty camera, and face effects in short videos. With advances in deep learning, methods for conditional visual synthesis have evolved rapidly in recent years. Many of these recent approaches are based on Generative Adversarial Networks (GANs), which have strong abilities to generate samples following almost any implicit distribution, allowing the synthesis of visual content in an unconditional or input-conditional manner. However, GANs still have many limitations, such as difficulty in directly approximating high-resolution image distributions, poor model generalization ability on unpaired datasets, and limited power for mimicking human actions. Hence, it is worth to tackle these limitations and investigate how to handle different conditional visual synthesis tasks. Four conditional visual synthesis tasks are investigated in this thesis. The first task studies how to generate high-resolution images from conditioning text descriptions. The second task simulates facial changes based on desired age inputs. How to synthesize realistic talking face videos from conditioning audio inputs is investigated as the third task. Finally, the forth task generates human-like painting actions based on desired target images. Both qualitative and quantitative validations are conducted for method developed for each task. Comparisons with existing works demonstrate the respective merits of these techniques. Insights on how to design conditional visual synthesis approaches are summarized.

Item Type: Thesis (Doctoral (PhD))
URI: http://research.library.mun.ca/id/eprint/15525
Item ID: 15525
Additional Information: Includes bibliographical references (pages 133-155).
Keywords: generative adversarial networks, visual synthesis, deep learning, talking face, attention mechanism
Department(s): Science, Faculty of > Computer Science
Date: May 2022
Date Type: Submission
Digital Object Identifier (DOI): https://doi.org/10.48336/5WM1-R081
Library of Congress Subject Heading: Deep learning (Machine learning); Generative art; Neural networks (Computer science); Image processing; Image processing--Digital techniques.

Actions (login required)

View Item View Item

Downloads

Downloads per month over the past year

View more statistics