Can Your Model Learn This Function?

A Visual and Mathematical Dive into the VC Dimension: the Hidden Geometry of Learnability

SATYAM MISHRA

Aug 08, 2025

1. What’s the Real Problem?

When you train a model on data and it performs poorly, it’s tempting to blame the data or the optimizer.

But sometimes, your model fundamentally cannot express the function you’re hoping for, even with infinite data.

That’s the gap between:

Representational power
Learnability

Visual: Function Expressivity Chart

2. What Is VC Dimension?

The Vapnik–Chervonenkis (VC) Dimension gives a crisp, geometric way to measure how “powerful” a hypothesis class is.

Definition:
VC Dimension = The largest number of points that can be shattered by a model class.

What is “Shattering”?

A set of points is shattered by a hypothesis class if the model can correctly classify every possible labeling of those points.

Visual: “Shattering in 2D (3 vs 4 Points)”

3. Mathematical Definition

Let H be a hypothesis class.

We say it shatters a set \{x_1, x_2, …, x_n\} if:

∀ binary labelings \{y_1, …, y_n\},
∃ h ∈ H such that h(x_i) = y_i ∀ i.

Then:

VCdim(𝓗) = d ⇔ there exists a set of d points shattered by 𝓗, but no set of (d+1) points is.

Visual: Growth Function Plot (m(n))

4. Sauer’s Lemma & Growth Function

Let m_H(n) = number of distinct labelings of n points realizable by H.

Then, Sauer’s Lemma says:

This limits how “expressive” the model is with finite data.

Visual: Combinatorics of Labelings

5. VC in Modern Models

Linear classifiers in 2D → VC = 3
SVM with Gaussian kernel → potentially infinite VC
Neural networks → VC dimension grows with:
Number of weights
Activation nonlinearity
Depth

Visual: Model Type vs VC Dimension

6. Algorithmic Insight (UML-style Pseudocode)

function estimate_VC(HypothesisClass H):
    for n = 1 to max_possible:
        for all sets S of n points:
            for all 2^n labelings:
                if not exists h in H such that h(x_i) = y_i ∀i:
                    return n - 1
    return max_possible

Visual: “VC Test Algorithm” Flowchart

7. Satyam’s Explanation

Imagine you’re playing a game with colored dots.
Your model is a magic pen that tries to draw lines to separate red from blue.
If it always finds a way: no matter how you color the dots, it’s really smart.
But at some point, you add one more dot, and your pen fails.
That number is how smart your model is. That’s its VC Dimension.

8. Satyam’s Explanation™ (Continued)

VC Dimension isn’t a performance metric.
It’s a theoretical boundary.

It says: “There exists no labeling your model can learn beyond this point: regardless of how much training you do.”

It connects:

Geometry (separability)
Combinatorics (label counts)
Learning theory (generalization bounds)

9. Q1 / A★ Research Framing

Problem:

We often scale models blindly: without knowing their fundamental limits of expression.

Solution:

Use VC theory to guide:

Model architecture decisions
Adaptive pruning
Sample efficiency prediction

Novelty:

Modern research is using VC-like ideas to:

Understand double descent
Formalize emergence in LLMs
Connect to minimum description length and compression bounds

TL;DR

VC Dimension = “How many points your model can label in all possible ways”
A theoretical cap on model capacity
Helps explain, debug, and design smarter ML systems

Follow and Share

You can follow me on Medium to read more: https://medium.com/@satyamcser

#satmis #VCdimension #TheoryOfLearning #MachineLearningMath
#ShatteringSets #CapacityVsGeneralization #StatisticalLearningTheory
#SauerLemma #MLFundamentals #DeepLearningTheory #PACLearning
#NeuralNetworkGeometry #MLExplainedSimply #SatyamsExplanation
#AIAlignmentMathematics

Thanks for reading! This post is public so feel free to share it.