LatentSwap3D: Semantic Edits on 3D Image GANs

1Technical University of Munich, 2Google Switzerland
MVCGAN Smiling Attribute Editing
EG3D Wearing Eyeglasses Attribute Editing
pi-GAN Smiling Attribute Editing

Abstract

Recent 3D-aware GANs rely on volumetric rendering techniques to disentangle the pose and appearance of objects, de facto generating entire 3D volumes rather than single-view 2D images from a latent code. Complex image editing tasks can be performed in standard 2D-based GANs (e.g., StyleGAN models) as manipulation of latent dimensions. However, to the best of our knowledge, similar properties have only been partially explored for 3D-aware GAN models. This work aims to fill this gap by showing the limitations of existing methods and proposing LatentSwap3D, a model-agnostic approach designed to enable attribute editing in the latent space of pre-trained 3D-aware GANs. We first identify the most relevant dimensions in the latent space of the model controlling the targeted attribute by relying on the feature importance ranking of a random forest classifier. Then, to apply the transformation, we swap the top-K most relevant latent dimensions of the image being edited with an image exhibiting the desired attribute. Despite its simplicity, LatentSwap3D provides remarkable semantic edits in a disentangled manner and outperforms alternative approaches both qualitatively and quantitatively. We demonstrate our semantic edit approach on various 3D-aware generative models such as pi-GAN, GIRAFFE, StyleSDF, MVCGAN, EG3D and VolumeGAN, and on diverse datasets, such as FFHQ, AFHQ, Cats, MetFaces, and CompCars.

Italian Trulli
Figure 1: Given an image we invert it in the latent space of a pre-trained MVCGAN on FFHQ enabling novel view synthesis. Then we use our LatentSwap3D to perform attribute editing. Row two and three show a comparison on this task between LatentSwap3D and StyleFlow.

LatentSwap3D Framework

We aim to build a model agnostic method that can work on any 3D-aware image generator. Our method, LatentSwap3D, consists of two main components. The first one identifies important features in the latent space of a 3D GAN that controls the desired attribute through a random forest algorithm. Then, the target attribute is manipulated in an identity-preserving manner through a feature-swapping approach.

Framework
Figure 2: (a) We propose to train a random forest regressor taking latent codes \( s_i \) to predict the presence/absence of a desired attribute. We use the trained forest to rank the importance of dimensions of \( s_i \) with respect to the desired attribute. (b) Given the latent code \( s \) of an image, first, we find the closest latent code in the support set exhibiting the desired attribute (e.g., \( s^+ \) to increase blondness), then we swap the top \( K \) dimensions related to the attribute to generate an edited latent code \( \hat{s} \) that can be decoded in an edited image.

LatentSwap3D on Other 3D-aware Generators

GIRAFFE

GIRAFFE consists of NeRF and 2D GANs. The NeRF part outputs the features of the 3D shape and texture, while the 2D GAN part outputs the final image. Figure 3 shows smiling and wearing eyeglasses edits from LatentSwap3D on the GIRAFFE - FFHQ model following the same protocol detailed for the other FFHQ trained generators.

GIRAFFE Result
Figure 3: LatentSwap3D on GIRAFFE - FFHQ.

To test how well LatentSwap3D generalizes to different datasets we extended the experiment to include CompCars using the pre-trained GIRAFFE generator. Due to the lack of classifiers for car attributes, as a proof of concept, we trained a ResNet-50 to classify the color of a car from scratch on Myauto.ge Cars Dataset. As seen from Fig. 4, using these classifiers our approach can edit the color of the cars successfully.

GIRAFFE Result
Figure 4: LatentSwap3D on GIRAFFE - CompCars.

StyleNeRF

StyleNeRF is another high-resolution 3D-aware generative model that integrates a NeRF into a 2D style-based generator. StyleNeRF is able to generate high-resolution and 3D consistent images/shapes from unstructured 2D images. Figure 5 shows our attribute editing, e.g., smiling, removing bangs, and changing the hair color to blond, on StyleNeRF - FFHQ.

StyleNeRF Result
Figure 5: LatentSwap3D on StyleNeRF - FFHQ.

VolumeGAN

VolumeGAN is a high-quality 3D-aware generative model explicitly trained to learn a structural and a textural representation and it is based on NeRF. The results of our approach on VolumeGAN - FFHQ are provided in Fig. 6. Our approach applies the desired attributes, e.g., removing eyeglasses, changing the hair color, and reducing the facial hair, to the latent space of VolumeGAN, without changing the identity of the input face.

VolumeGAN Result
Figure 6: LatentSwap3D on VolumeGAN - FFHQ.

LatentSwap3D on StyleGAN2

LatentSwap3D is not limited to 3D-aware GANs but also works on image-based GANs like StyleGAN2, see Fig. 7. First, by applying the same procedure in Fig. 2(a), we identify the latent codes from the style space of StyleGAN2 that are most important for the desired attribute. Then, we swap these latent codes to generate the desired edits, as explained in Fig. 2(b).

StyleGAN2
Figure 7: LatentSwap3D on StyleGAN2 - FFHQ, MetFaces, AFHQ Cats and AFHQ Dogs.

BibTeX

@misc{simsar2022latentswap3d,
      doi = {10.48550/ARXIV.2212.01381},
      url = {https://arxiv.org/abs/2212.01381},
      author = {Simsar, Enis and Tonioni, Alessio and Örnek, Evin Pınar and Tombari, Federico},
      keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
      title = {LatentSwap3D: Semantic Edits on 3D Image GANs},
      publisher = {arXiv},
      year = {2022},
      copyright = {Creative Commons Attribution 4.0 International}
    }