Generative Models: What do they know? Do they know things? Let's find out!

(Previous Title: Intrinsic LoRA: A Generalist Approach for Discovering Knowledge in Generative Models)

¹Toyota Technological Institute at Chicago, ²Adobe

Abstract

Generative models excel at mimicking real scenes, suggesting they might inherently encode important intrinsic scene properties. In this paper, we aim to explore the following key questions: (1) What intrinsic knowledge do generative models like GANs, Autoregressive models, and Diffusion models encode? (2) Can we establish a general framework to recover intrinsic representations from these models, regardless of their architecture or model type? (3) How minimal can the required learnable parameters and labeled data be to successfully recover this knowledge? (4) Is there a direct link between the quality of a generative model and the accuracy of the recovered scene intrinsics?

Our findings indicate that a small Low-Rank Adaptators (LoRA) can recover intrinsic images-depth, normals, albedo and shading-across different generators (Autoregressive, GANs and Diffusion) while using the same decoder head that generates the image. As LoRA is lightweight, we introduce very few learnable parameters (as few as 0.04% of Stable Diffusion model weights for a rank of 2), and we find that as few as 250 labeled images are enough to generate intrinsic images with these LoRA modules. Finally, we also show a positive correlation between the generative model's quality and the accuracy of the recovered intrinsics through control experiments.

Summary of scene intrinsic extraction capabilities across different generative models without changing generator head.
✓: Intrinsics can be extracted with high quality. ～: Intrinsics can be extracted with medium quality. ✗: Intrinsics cannot be extracted.
Model	Pretrain Type	Domain	Normal	Depth	Albedo	Shading
VQGAN	Autoregressive	FFHQ	～	～	✓	✓
StyleGAN-v2	GAN	FFHQ	✓	～	✓	✓
StyleGAN-v2	GAN	LSUN Bed	✓	✓	✓	✓
StyleGAN-XL	GAN	FFHQ	✓	～	✓	✓
StyleGAN-XL	GAN	ImageNet	✗	✗	✗	✗
Stable Diffusion-UNet	Diffusion	Open	✓	✓	✓	✓
Stable Diffusion	Diffusion	Open	✓	✓	✓	✓

Summary of scene intrinsic extraction capabilities across different generative models without changing generator head.
✓: Intrinsics can be extracted with high quality. ～: Intrinsics can be extracted with medium quality. ✗: Intrinsics cannot be extracted.

Model

Pretrain Type

Domain

Normal

Depth

Albedo

Shading

VQGAN

Autoregressive

FFHQ

～

✓

StyleGAN-v2

GAN

FFHQ

✓

～

✓

StyleGAN-v2

GAN

LSUN Bed

✓

StyleGAN-XL

GAN

FFHQ

✓

～

✓

StyleGAN-XL

GAN

ImageNet

✗

Stable Diffusion-UNet

Diffusion

Open

✓

Stable Diffusion

Diffusion

Open

✓

Image

Surface Normals

Depth

Albedo

Shading

Omnidata-v2

Ours

ZoeDepth

Ours

Paradigms

Ours

Paradigms

Ours

@article{du2023generative, title={Generative Models: What do they know? Do they know things? Let's find out!}, author={Du, Xiaodan and Kolkin, Nicholas and Shakhnarovich, Greg and Bhattad, Anand}, journal={arXiv preprint arXiv:2311.17137}, year={2023} }

Generative Models: What do they know? Do they know things? Let's find out!

(Previous Title: Intrinsic LoRA: A Generalist Approach for Discovering Knowledge in Generative Models)

Abstract

BibTeX