16. Generative Adversarial Networks (GANs)

In this episode, we dive into one of the most fascinating architectures in deep learning: Generative Adversarial Networks (GANs).

🛰️ From Satellite Images to High-Resolution Aerial Views

Imagine you’re tasked with monitoring distribution centers for a gaming company using satellite imagery. Within the U.S., you could rely on traffic camera footage to track truck movements and gauge activity levels.

But what about areas outside the United States, where camera access is unavailable? Your best option is satellite imagery — but there’s a problem: resolution.

By law, most satellite images can only capture 30 cm per pixel, meaning even a small 2×2 pixel patch covers over 3,600 cm² (about 3 ft²). At that resolution, you can spot large tractor-trailers, but smaller objects (cars, containers, or boats) become indistinguishable.

To improve visibility, we might turn to aerial imagery. Unlike satellites, aerial photos — taken by planes or drones — can reach much higher resolutions because they aren’t bound by the same legal limits. These images reveal details like the type of vehicle, parking lot density, and even container layouts.

Now imagine if we could generate aerial-like images from low-resolution satellite images — artificially enhancing clarity and detail. That’s exactly what a Generative Adversarial Network (GAN) can do.

🧠 What Is a GAN?

A GAN is a type of neural network architecture composed of two competing networks:

Generator – tries to create realistic synthetic data (in this case, high-resolution aerial images).
Discriminator – tries to tell apart real data (true aerial images) from fake ones (generated by the generator).

They’re locked in an adversarial game — like a counterfeiter trying to make convincing fake money while the police try to detect counterfeits. As both improve over time, the counterfeiter (generator) gets so good that even the police (discriminator) can’t tell the difference.

🧩 How GANs Work

Step 1: Setup

We start with:

200,000 satellite images of coastlines, ports, cities, farms, mountains, and suburbs.
Matching aerial images of the same regions, taken at the same time.

Each aerial image has 4× the resolution of its satellite counterpart — for instance, converting a 200×200 image into a 400×400 one.

Step 2: Training Process

At first, both networks are untrained and random:

The generator produces blurry nonsense images.
The discriminator guesses randomly whether an image is “real” or “fake.”

Gradually:

The discriminator learns to better detect fakes.
The generator learns, through feedback, to fool the discriminator more effectively.

This continues until the discriminator’s accuracy drops to 50% — meaning it can no longer tell real from fake better than random chance. At this point, the generator has “won,” producing convincing high-resolution aerial images.

⚙️ The Adversarial Min-Max Game

Mathematically, GANs train using a min-max optimization:

The discriminator maximizes its accuracy (detecting real vs. fake).
The generator minimizes the discriminator’s ability to detect fakes.

It’s a tug-of-war where progress in one forces improvement in the other. This adversarial tension is the secret behind GANs’ creative power.

🧱 Building the GAN

🔹 The Generator

The generator is built as a Convolutional Neural Network (CNN) that:

Takes a low-resolution satellite image as input.
Produces a high-resolution aerial estimate.
Uses pixel shuffle layers to up-sample images efficiently — converting multiple feature maps into a higher-dimension output.
Includes residual connections to prevent vanishing gradients.

These design tricks let the generator learn fine-grained details and scale up images while maintaining structure and sharpness.

🔹 The Discriminator

The discriminator is another CNN, structured much like those used for classification:

Convolution + Max Pooling layers
Followed by dense layers and a sigmoid output (real = 0, fake = 1).
Uses leaky ReLU activations (with a slope of 0.2) for smoother gradients.

The discriminator learns to distinguish true aerial images from generated ones — and its feedback constantly pushes the generator to improve.

⚖️ Balancing the Two Networks

Training GANs is tricky — if one learns too fast, the other can’t keep up. To maintain balance:

Use small mini-batches (e.g., 16 images) so the discriminator doesn’t overpower the generator.
Tune learning rates carefully.
Watch out for mode collapse — when the generator repeatedly outputs one convincing image.

One fix is the unrolled GAN, where the generator anticipates how the discriminator will evolve in future steps — avoiding local exploitation.

📊 Evaluating GAN Performance

How do we know our model works?

Mean Squared Error (MSE) – per-pixel difference between generated and real images.
Peak Signal-to-Noise Ratio (PSNR) – measures image clarity; higher PSNR = less noise.
Human Evaluation – if people can’t tell generated images from real ones, that’s a win.

Our model performed impressively — it could identify small boats, SUVs, and parked cars, unlocking new analytics opportunities:

Retail parking lot monitoring to estimate shopping activity.
Vehicle-shipping tracking to gauge automobile sales.

In short, the GAN turned coarse satellite data into actionable intelligence.

🚀 Wrapping Up

Generative Adversarial Networks represent a creative leap in AI — systems that don’t just analyze data but generate new realities. From enhancing satellite imagery to art creation, medical imaging, and video synthesis, GANs redefine what’s possible in deep learning.

Our experiment succeeded, and the next step could be licensing the software or scaling it to other industries.

Stay tuned for the next episode as we continue exploring the frontiers of machine learning!