research ml
what
hypothesis
- real-life distributions are combinations of sub-distributions
 
- MLP can learn to fit sub-regions (sub-distributions) of the data
 
methodology
- (1) is tricky to prove, but we can artificially create datasets that fit that description and perform experiment on it:
- e.g. a multi-task dataset where each task represents a sub-distribution
 
 
- need to prove (2) starting with simple MLP
- use a shallow MLP to approximate a simple function e.g. 1 period of the sine wave
- show that MLP can be trained to fit segments of the function
 
- show the effect of different activation functions:
- ReLU: like a linear gated unit
 
- SELU: similar to ReLU but can be smoother
 
- Tanh: smooth curves
 
- …
 
 
- analyse the effect on multiple period of sine
 
 
 
- discuss how Universal Approximation Theorems relate to this
 
- discuss how KAN relate to this
 
- explore the idea of Discriminative-Generative Learning
 
- analyse the effect of depth vs width
 
- Beyond neural scaling laws - beating power law scaling via data pruning 
Annotations
Zotero
PDF++
Link to original