Can Your Model Learn This Function?
A Visual and Mathematical Dive into the VC Dimension: the Hidden Geometry of Learnability
1. What’s the Real Problem?
When you train a model on data and it performs poorly, it’s tempting to blame the data or the optimizer.
But sometimes, your model fundamentally cannot express the function you’re hoping for, even with infinite data.
That’s the gap between:
Representational power
Learnability
Visual: Function Expressivity Chart
2. What Is VC Dimension?
The Vapnik–Chervonenkis (VC) Dimension gives a crisp, geometric way to measure how “powerful” a hypothesis class is.
Definition:
VC Dimension = The largest number of points that can be shattered by a model class.
What is “Shattering”?
A set of points is shattered by a hypothesis class if the model can correctly classify every possible labeling of those points.
Visual: “Shattering in 2D (3 vs 4 Points)”
3. Mathematical Definition
Let H
be a hypothesis class.
We say it shatters a set \{x_1, x_2, …, x_n\}
if:
∀
binary labelings\{y_1, …, y_n\}
,∃ h ∈ H
such thath(x_i) = y_i ∀ i
.
Then:
VCdim(𝓗) = d
⇔ there exists a set of d points shattered by𝓗
, but no set of(d+1)
points is.
Visual: Growth Function Plot (m(n))
4. Sauer’s Lemma & Growth Function
Let m_H(n)
= number of distinct labelings of n
points realizable by H
.
Then, Sauer’s Lemma says:
This limits how “expressive” the model is with finite data.
Visual: Combinatorics of Labelings
5. VC in Modern Models
Linear classifiers in 2D → VC = 3
SVM with Gaussian kernel → potentially infinite VC
Neural networks → VC dimension grows with:
Number of weights
Activation nonlinearity
Depth
Visual: Model Type vs VC Dimension
6. Algorithmic Insight (UML-style Pseudocode)
function estimate_VC(HypothesisClass H):
for n = 1 to max_possible:
for all sets S of n points:
for all 2^n labelings:
if not exists h in H such that h(x_i) = y_i ∀i:
return n - 1
return max_possible
Visual: “VC Test Algorithm” Flowchart
7. Satyam’s Explanation
Imagine you’re playing a game with colored dots.
Your model is a magic pen that tries to draw lines to separate red from blue.
If it always finds a way: no matter how you color the dots, it’s really smart.
But at some point, you add one more dot, and your pen fails.
That number is how smart your model is. That’s its VC Dimension.
8. Satyam’s Explanation™ (Continued)
VC Dimension isn’t a performance metric.
It’s a theoretical boundary.
It says: “There exists no labeling your model can learn beyond this point: regardless of how much training you do.”
It connects:
Geometry (separability)
Combinatorics (label counts)
Learning theory (generalization bounds)
9. Q1 / A★ Research Framing
Problem:
We often scale models blindly: without knowing their fundamental limits of expression.
Solution:
Use VC theory to guide:
Model architecture decisions
Adaptive pruning
Sample efficiency prediction
Novelty:
Modern research is using VC-like ideas to:
Understand double descent
Formalize emergence in LLMs
Connect to minimum description length and compression bounds
TL;DR
VC Dimension = “How many points your model can label in all possible ways”
A theoretical cap on model capacity
Helps explain, debug, and design smarter ML systems
Follow and Share
You can follow me on Medium to read more: https://medium.com/@satyamcser
#satmis #VCdimension #TheoryOfLearning #MachineLearningMath
#ShatteringSets #CapacityVsGeneralization #StatisticalLearningTheory
#SauerLemma #MLFundamentals #DeepLearningTheory #PACLearning
#NeuralNetworkGeometry #MLExplainedSimply #SatyamsExplanation
#AIAlignmentMathematics