Controllable Image Generation Based on Causal Representation Learning

Newswise — A study led by researchers from Chongqing University published a research paper in Frontiers of Information Technology & Electronic Engineering, Vol. 25, No. 1, 2024. To address the issue that existing artificial intelligence methods often struggle to generate images that are both flexible and controllable due to ignoring the causal relationships within images, the study developed a novel causal controllable image generation (CCIG) method, which combines causal representation learning with bi-directional generative adversarial networks.

The study reviewed the latest advances in the fields of causal structure learning, causal representation learning, and controllable image generation. Causal structure learning is divided into score-based, constraint-based, function causal model (FCM)-based, and continuous optimization-based methods, with deep learning promoting continuous optimization-based methods; causal representation learning combines structural causal models (SCMs) with representation learning and is applied to tasks such as image classification; causal methods for controllable image generation mostly rely on prior causal graphs or variational autoencoders (VAEs), while CCIG automatically learns causal structures through an end-to-end framework, overcoming the limitations of existing methods.

The CCIG framework consists of three main components: a causal structure learning (CSL) module, an image generation module (IGM), and a loss function. The CSL learns the causal relationships of attributes through a graph autoencoder, optimizing the causal graph with reconstruction loss and acyclic constraints; the IGM utilizes the encoder, generator, and joint discriminator (JointD) of the bi-directional generative adversarial network (GAN) to enhance representation capability, and combines attention mechanism and residual structure to improve image quality. Through the joint optimization of generative loss, supervised loss, and causal structure loss, causal controllable image generation is achieved.

On the CelebA dataset, CCIG was compared with four methods: CausalGAN, CausalVAE, CMGAN, and DEAR. It can be concluded that from a subjective perspective, CCIG not only outperforms other methods in terms of the visual perception quality of generated images but also produces more reasonable images than other methods; from the perspective of objective metrics, the proposed method shows superior performance in both FID and IS, indicating its advantages in generating realistic and diverse images; in terms of image controllability, the proposed method achieves a higher level of control over generated images. Ablation experiments show that the CSL module significantly improves causal rationality, and the attention mechanism and JointD structure optimize image details. The experimental results verify the effectiveness of CCIG in causal modeling and image generation quality.

The paper “Controllable image generation based on causal representation learning” authored by Shanshan HUANG, Yuanhao WANG, Zhili GONG, Jun LIAO, Shu WANG and Li LIU. Full text of the open access paper:

link