Publications

LatentSwap3D: Semantic Edits on 3D Image GANs

Published in arXiv preprint, 2023

Recent 3D GANs have the ability to generate latent codes for entire 3D volumes rather than only 2D images. While they offer desirable features like high-quality geometry and multi-view consistency, complex semantic image editing tasks for 3D GANs have only been partially explored, unlike their 2D counterparts, e.g., StyleGAN and its variants. To address this problem, we propose LatentSwap3D, a latent space discovery-based semantic edit approach which can be used with any off-the-shelf 3D or 2D GAN model and on any dataset. LatentSwap3D relies on identifying the latent code dimensions corresponding to specific attributes by feature ranking of a random forest classifier. It then performs the edit by swapping the selected dimensions of the image being edited with the ones from an automatically selected reference image. Compared to other latent space control-based edit methods, which were mainly designed for 2D GANs, our method on 3D GANs provides remarkably consistent semantic edits in a disentangled manner and outperforms others both qualitatively and quantitatively. We show results on seven 3D generative models (\pigan{}, GIRAFFE, StyleSDF, MVCGAN, EG3D, StyleNeRF, and VolumeGAN) and on five datasets (FFHQ, AFHQ, Cats, MetFaces, and CompCars).

Project Page

Fantastic Style Channels and Where to Find Them: A Submodular Framework for Discovering Diverse Directions in GANs

Published in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023

The discovery of interpretable directions in the latent spaces of pre-trained GAN models has recently become a popular topic. In particular, StyleGAN2 has enabled various image generation and manipulation tasks due to its rich and disentangled latent spaces. The discovery of such directions is typically done either in a supervised manner, which requires annotated data for each desired manipulation or in an unsupervised manner, which requires a manual effort to identify the directions. As a result, existing work typically finds only a handful of directions in which controllable edits can be made. In this study, we design a novel submodular framework that finds the most representative and diverse subset of directions in the latent space of StyleGAN2. Our approach takes advantage of the latent space of channel-wise style parameters, so-called stylespace, in which we cluster channels that perform similar manipulations into groups. Our framework promotes diversity by using the notion of clusters and can be efficiently solved with a greedy optimization scheme. We evaluate our framework with qualitative and quantitative experiments and show that our method finds more diverse and disentangled directions. Our project page can be found at http://catlab-team.github.io/fantasticstyles.

Project Page

Object-aware Monocular Depth Prediction with Instance Convolutions

Published in IEEE Robotics and Automation Letters (RA-L), 2022

With the advent of deep learning, estimating depth from a single RGB image has recently received a lot of attention, being capable of empowering many different applications ranging from path planning for robotics to computational cinematography. Nevertheless, while the depth maps are in their entirety fairly reliable, the estimates around object discontinuities are still far from satisfactory. This can be contributed to the fact that the convolutional operator naturally aggregates features across object discontinuities, resulting in smooth transitions rather than clear boundaries. Therefore, in order to circumvent this issue, we propose a novel convolutional operator which is explicitly tailored to avoid feature aggregation of different object parts. In particular, our method is based on estimating per-part depth values by means of superpixels. The proposed convolutional operator, which we dub “Instance Convolution”, then only considers each object part individually on the basis of the estimated superpixels. Our evaluation with respect to the NYUv2 as well as the iBims dataset clearly demonstrates the superiority of Instance Convolutions over the classical convolution at estimating depth around occlusion boundaries, while producing comparable results elsewhere.

Project Page

LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions

Published in International Conference on Computer Vision (ICCV), 2021

Recent research has shown great potential for finding interpretable directions in the latent spaces of pre-trained Generative Adversarial Networks (GANs). These directions provide controllable generation and support a wide range of semantic editing operations such as zoom or rotation. The discovery of such directions is often performed in a supervised or semi-supervised fashion and requires manual annotations, limiting their applications in practice. In comparison, unsupervised discovery enables finding subtle directions a priori hard to recognize. In this work, we propose a contrastive-learning-based approach for discovering semantic directions in the latent space of pretrained GANs in a self-supervised manner. Our approach finds semantically meaningful dimensions compatible with state-of-the-art methods.

Project Page

Graph2Pix: A Graph-Based Image to Image Translation Framework

Published in Advances of Image Manipulation (ICCV Workshop), 2021

In this paper, we propose a graph-based image-to-image translation framework for generating images. We use rich data collected from the popular creativity platform Artbreeder, where users interpolate multiple GAN-generated images to create artworks. This unique approach of creating new images leads to a tree-like structure where one can track historical data about the creation of a particular image. Inspired by this structure, we propose a novel graph-to-image translation model called Graph2Pix, which takes a graph and corresponding images as input and generates a single image as output. Our experiments show that Graph2Pix is able to outperform several image-to-image translation frameworks on benchmark metrics, including LPIPS (with a 25% improvement) and human perception studies (n=60), where users preferred the images generated by our method 81.5% of the time.

Project Page

Dental enumeration and multiple treatment detection on panoramic X-rays using deep learning

Published in Nature - Scientific reports, 2021

In this paper, a new powerful deep learning framework, named as DENTECT, is developed in order to instantly detect five different dental treatment approaches and simultaneously number the dentition based on the FDI notation on panoramic X-ray images. This makes DENTECT the first system that focuses on identification of multiple dental treatments; namely periapical lesion therapy, fillings, root canal treatment (RCT), surgical extraction, and conventional extraction all of which are accurately located within their corresponding borders and tooth numbers. Although DENTECT is trained on only 1005 images, the annotations supplied by experts provide satisfactory results for both treatment and enumeration detection. This framework carries out enumeration with an average precision (AP) score of 89.4% and performs treatment identification with a 59.0% AP score. Clinically, DENTECT is a practical and adoptable tool that accelerates the process of treatment planning with a level of accuracy which could compete with that of dental clinicians.

Project Page

Enis Simsar

Publications

LatentSwap3D: Semantic Edits on 3D Image GANs

Fantastic Style Channels and Where to Find Them: A Submodular Framework for Discovering Diverse Directions in GANs

Object-aware Monocular Depth Prediction with Instance Convolutions

LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions

Graph2Pix: A Graph-Based Image to Image Translation Framework

Dental enumeration and multiple treatment detection on panoramic X-rays using deep learning