Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis


1. Archimedes AI, Athena Research Center
2. National Technical University of Athens
3. Institute for Language and Speech Processing, Athena Research Center
4. Department of Informatics and Telecommunications, National and Kapodistrian University of Athens
5. Computation-based Science and Technology Research Center, The Cyprus Institute
Intro Image

Given one or multiple regions of interest we can identify latent directions that result in diverse semantic edits without affecting the rest of the image. Linear interpolation within the identified semantic directions leads to gradual changes in the generated image like opening and closing the eyes.

Abstract

Recent advances in Diffusion Models (DMs) have led to significant progress in visual synthesis and editing tasks, establishing them as a strong competitor to Generative Adversarial Networks (GANs). However, the latent space of DMs is not as well understood as that of GANs. Recent research has focused on unsupervised semantic discovery in the latent space of DMs by leveraging the bottleneck layer of the denoising network, which has been shown to exhibit properties of a semantic latent space. However, these approaches are limited to discovering global attributes. In this paper we address, the challenge of local image manipulation in DMs and introduce an unsupervised method to factorize the latent semantics learned by the denoising network of pre-trained DMs. Given an arbitrary image and defined regions of interest, we utilize the Jacobian of the denoising network to establish a relation between the regions of interest and their corresponding subspaces in the latent space. Furthermore, we disentangle the joint and individual components of these subspaces to identify latent directions that enable local image manipulation. Once discovered, these directions can be applied to different images to produce semantically consistent edits, making our method suitable for practical applications. Experimental results on various datasets demonstrate that our method can produce semantic edits that are more localized and have better fidelity compared to the state-of-the-art. Our code will be made available upon acceptance.



Method



Intro Image


  • • Firstly, we utilize the row space of the Jacobian \( J_\mathcal{H} \) constrained to each specified region, which is obtained by its Singular Value Decomposition (SVD).

  • • The row space \( \text{row}(J_\mathcal{H}) \) is spanned by directions that manipulate the attributes in each region of interest.

  • • To achieve local manipulation, we propose decomposing the Jacobian associated with each region of interest into two distinct components: a joint and an individual component.

  • • The row space of the joint component comprises latent directions that induce global changes across the entire image.

  • • In contrast, the row space of the individual component, which is orthogonal to the joint , is spanned by latent directions that specifically target a designated region of interest without influencing other regions.

  • • To obtain this decomposition, we utilize the so-called Joint and Individual Variation Explained (JIVE) method.

Localized Attribute Edits

Examples of localized editing of real images from Celeba-HQ dataset

Red Lips

Smile

Open-Close Mouth

Eye Color

Gaze

Cite Us

Copy Code Snippet

    @inproceedings{localdiff,
      title = {Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis},
      author = {Kouzelis, Theodoros and Manos, Plitsis and Mihalis A., Nikolaou and Yiannis, Panagakis},
      booktitle = {BMVC},
      year = {2024},
    }
        
-->