Generative Models: What do they know?
Do they know things? Let's find out!

1Toyota Technological Institute at Chicago, 2Adobe
Teaser

INTRINSIC LoRA (I-LoRA) uncovers the hidden capabilities of generative models like VQGAN, StyleGAN-XL, StyleGAN-v2, and Stable Diffusion. I-LoRA modulates key feature maps to extract intrinsic scene properties such as normals, depth, albedo, and shading, using the models' existing decoders without additional layers, revealing their deep understanding of scene intrinsics.

Abstract

Generative models have been shown to be capable of synthesizing highly detailed and realistic images. It is natural to suspect that they implicitly learn to model some image intrinsics such as surface normals, depth, or shadows. In this paper, we present compelling evidence that generative models indeed internally produce high-quality scene intrinsic maps. We introduce INTRINSIC LoRA (I-LoRA), a universal, plug-and-play approach that transforms any generative model into a scene intrinsic predictor, capable of extracting intrinsic scene maps directly from the original generator network without needing additional decoders or fully fine-tuning the original network. Our method employs a Low-Rank Adaptation (LoRA) of key feature maps, with newly learned parameters that make up less than 0.6% of the total parameters in the generative model. Optimized with a small set of labeled images, our model-agnostic approach adapts to various generative architectures, including Diffusion models, GANs, and Autoregressive models. We show that the scene intrinsic maps produced by our method compare well with, and in some cases surpass those generated by leading supervised techniques.

Summary of scene intrinsic extraction capabilities across different generative models without changing generator head.
: Intrinsics can be extracted with high quality. : Intrinsics can be extracted with medium quality. : Intrinsics cannot be extracted.
Model Pretrain Type Domain Normal Depth Albedo Shading
VQGAN Autoregressive FFHQ
StyleGAN-v2 GAN FFHQ
StyleGAN-v2 GAN LSUN Bed
StyleGAN-XL GAN FFHQ
StyleGAN-XL GAN ImageNet
Stable Diffusion-UNet Diffusion Open
Stable Diffusion Diffusion Open


Image Surface Normals Depth Albedo Shading
Original Image Surface Normal Generated Surface Normal Depth Generated Depth Albedo Generated Albedo Shading Generated Shading
Original Image Surface Normal Generated Surface Normal Depth Generated Depth Albedo Generated Albedo Shading Generated Shading
Original Image Surface Normal Generated Surface Normal Depth Generated Depth Albedo Generated Albedo Shading Generated Shading
Original Image Surface Normal Generated Surface Normal Depth Generated Depth Albedo Generated Albedo Shading Generated Shading
Omnidata-v2 I-LoRA ZoeDepth I-LoRA Paradigms I-LoRA Paradigms I-LoRA
Figure: Comparison of intrinsic maps generated by our method using augmented Stable Diffusion 2.1 and the pseduo ground truth

BibTeX

@article{du2023generative,
  author    = {Du, Xiaodan and Kolkin, Nicholas and Shakhnarovich, Greg and Bhattad, Anand},
  title     = {Generative Models: What do they know? Do they know things? Let's find out!},
  journal   = {arXiv},
  year      = {2023},
}