design a model that learn the sub-distributions in the data while also ensure those sub-distributions are separated
the distance between the distributions of latent vectors (at some middle layer or any part of network) and the distributions of the final output must be similar/proportional/equivalent
datasets can be artificially created to fit this description, i.e. the separation are known, and inductive bias can be used to design the algorithm
then try to make the algorithm learn to cluster the data without inductive bias