Lu, Wanglong (2025) Generative models for semantic facial image editing: multimodal approaches. Doctoral (PhD) thesis, Memorial University of Newfoundland.
![]() |
[English]
PDF
- Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. Download (74MB) |
Abstract
With the rapid development of digital imaging and machine learning, generative models for facial image manipulation have emerged as powerful tools, significantly impacting various domains, from entertainment to law enforcement. Despite significant advancements in generating natural-looking images, facial editing poses unique challenges, such as generating high-quality and detailed facial features, preserving identity, expression, and the integrity of facial structures. This thesis investigates the application of generative models to facial image manipulation, targeting four key tasks: unconditional global facial editing (face restoration), unconditional local facial editing (face inpainting), conditional facial editing (exemplar-guided facial inpainting), and multimodal face editing. For the first task, traditional face restoration techniques typically miss finer facial details. We explored the use of latent representations as style prompts by using GANs and diffusion models to guide the restoration, improving image quality and detail. In the second task, existing image inpainting methods often depend on extensive training data, limiting their effectiveness in few-shot scenarios. We developed a GANbased method that achieves high-quality results with small-scale data. For the third task, current methods usually require substantial professional skills to edit facial attributes like identity, expression, and gender. We propose an exemplar-guided GAN framework that ensures a seamless blend between edited and unedited areas of the face. For the fourth task, current multimodal editing techniques can alter unedited background areas and rely heavily on manually annotated paired data. We introduce a novel multimodal editing method using GANs that allows for incremental editing of facial images and reduces reliance on manual annotations. We introduce novel frameworks that significantly enhance the realism and applicability of facial image manipulation by solving problems in fidelity in restoration, data efficiency, exemplar-guided inpainting, and multimodal editing. Our contributions are mainly four manifold: • A novel framework for blind face restoration is presented, leveraging latent representations as style prompts to guide the restoration process, thereby enhancing the fidelity and detail of restored facial images from degraded sources. • We introduce a data-efficient generative model for facial image inpainting that achieves high-quality results on limited datasets, addressing the challenge of data scarcity and overfitting in image inpainting. • We proposed an interactive, example-guided facial inpainting framework that enables users to manipulate facial features with high realism, facilitating userdriven customization in facial image editing. • A multimodal facial image editing framework is proposed, integrating various types of inputs to achieve comprehensive and personalized facial edits, catering to the diverse needs of digital content editing.
Item Type: | Thesis (Doctoral (PhD)) |
---|---|
URI: | http://research.library.mun.ca/id/eprint/16907 |
Item ID: | 16907 |
Additional Information: | Includes bibliographical references (pages 202-232) |
Keywords: | generative models, image editing, multimodalities |
Department(s): | Science, Faculty of > Computer Science |
Date: | February 2025 |
Date Type: | Submission |
Library of Congress Subject Heading: | Face perception--Computer simulation; Image processing--Digital techniques; Machine learning; Neural networks (Computer science); Digital media--Editing |
Actions (login required)
![]() |
View Item |