- real-life distributions are combinations of sub-distributions
- MLP can learn to fit sub-regions (sub-distributions) of the data
- (1) is tricky to prove, but we can artificially create datasets that fit that description and perform experiment on it:
- e.g. a multi-task dataset where each task represents a sub-distribution
- need to prove (2) starting with simple MLP
- use a shallow MLP to approximate a simple function e.g. 1 period of the sine wave
- show that MLP can be trained to fit segments of the function
- show the effect of different activation functions:
- ReLU: like a linear gated unit
- SELU: similar to ReLU but can be smoother
- Tanh: smooth curves
- …
- analyse the effect on multiple period of sine
- discuss how Universal Approximation Theorems relate to this
- discuss how KAN relate to this
- explore the idea of Discriminative-Generative Learning
- analyse the effect of depth vs width
- Beyond neural scaling laws - beating power law scaling via data pruning
Link to original