Generative Adverserial Networks

Juan Vera

March 2025

Abstract

just a whiteboard, dm me if errors

Preliminaries.

A new framework for estimating generative models by training a generative network that captures the distribution of the data and a discriminative network which classifies whether the output of the generator is to have come from the training data itself.

The goal of the generator is to maximize the probability of the discriminator making a mistake, essentially generator GG and discriminator DD play a minimax game where one attempts to maximize for the objective one is trained to minimize.

Adverserial Nets.

The generator GG takes in a random noise variable zz.

The discriminator DD takes in xx, a vector of data (real or synthetic). DD is trained to maximize the probability of the output of GG being drawn / generated from the training data x\in x.

GG is trained to minimze log(1D(G(z)))\log\left(1 - D(G(z)) \right) such that the outputs of D()D(\cdot) classify that G()G(\cdot) correctly generated synthetic data from the real data.

The entire objective function is constructed as:

minGmaxDV(D,G)=Expdata(x)[logD(x)]+Ezpx(z)[log(1D(G(z)))]\min_G \max_D V(D, G) = \mathbb{E}_{x \sim p_{\text{data}}(x)}\left[\log D(x) \right] + \mathbb{E}_{z \sim p_x(z)}\left[\log(1 - D(G(z)))\right]

Note that V(D,G)[,0]V(D, G) \in [-\infty, 0]

where we want to find the GG which minimizes the value function VV but find the DD which maximizes VV.

We're essentially playing a mini-max game, where DD is trained to maximize the value VV, which essentially translates to DD being an optimal binary classifier between synthetic data non-synthetic data.

This is as if D(x)=1D(x) = 1, then log(D(x))=0\log(D(x)) = 0, then log(1D(G(z)))=log(1)=0\log(1 - D(G(z))) = \log(1) = 0, and finally maxV()=0\max V(\cdot) = 0

When we attempt to find GG which satisfies minV()\min V(\cdot), we're try to find the GG which turns D(G(x))1D(G(x)) \rightarrow 1 such that log(D(x))+log(.99)\log(D(x)) + \log(\overline{.99}) \rightarrow -\infty

Reminder that D(x)D(x) is assumed to be a clasifier where D(x)1D(x) \rightarrow 1 indicates that the input is real data while D(x)0D(x) \rightarrow 0 indicates that the input is fake synthetic data

Then as we try to find the GG which satisfies maxV(D,G)\max V(D, G) while we try to find the DD which satisfies minV(D,G)\min V(D, G), we're essentially playing what's called a zero-sum game where the gain in the generator is equal to the loss in the discrmininator (or vice versa).

Training.

For kk epochs and mm iterations = count of minibatches, update the discriminator by updating it's parameters, via gradient ascent,

θdV(D,G)=1mi=1m[logD(xi)+log(1D(G(zi)))]\nabla_{\theta_d} V(D, G) = \frac{1}{m} \sum_{i = 1}^{m} \left[ \log D(x_i) + \log(1 - D(G(z_i))) \right]

while you update the generator by updating it's parameters via gradient descent,

θgV(D,G)=1mi=1m[log(1D(G(zi)))]\nabla_{\theta_g} V(D, G) = \frac{1}{m} \sum_{i = 1}^{m} \left[ \log(1 - D(G(z_i))) \right]

all via the update rule,

θi(q)=θi(q)±αθ(q)V(D,G)\theta^{(q)}_i = \theta_i^{(q)} ± \alpha \cdot \nabla_{\theta^{(q)}} V(D, G)

where qg,dq \in g, d and ++ if q=dq = d, else - if q=dq = d.

The GAN is optimally trained when the discriminator can't distinguish the real data from the synthetic data -- this happens as we're trianing GG via gradient ascent which "confuses" or "fools" the discriminator from being able to optimally distinguish from both.