Tag: career

- - IEEE 754 floating-point
- - relay ssh-agent and gpg-agent in windows
    set automounts option for WSL
    symlink windows exe to wsl
- - Organize your Omnivore library with labels
"Reasonable people with good intentions can still disagree over matters of substance."
(some) functions are vectors (infinite dimensional objects)
`model.eval()` vs `torch.no_grad()`
a hypothesis for the mechanism of reasoning
Adaptive KV cache pruning with selective features
Adaptive LoRA
alias in CMD
Angular Steering
anyway vs any way vs anyways
Aphrodite
Applying to Ph.D. Programs in Computer Science - Mor Harchol-Balter, CMU
Approximate Nearest Neighbors
AQLM
associative memory
auto start ssh-agent
AutoAWQ
auxiliary loss for language models
Bertrand's theorem
Best and Worst of both worlds - combining LSTM and Transformers
binary prefix trie
Bring Windows features to MacOS
bubble sort
Byte-level BPE
Byte-Pair Encoding (BPE)
Calculate GPU memory requirement and tokens for any LLM
career advices from Terrance Tao
Central Limit Theorem
Changing MLP layers in Transformers to Probabilistic encoders
chat agent with action planning
combination
Combining Modular Skills in Multitask Learning
common bit manipulation
complex number vs 2d vectors
Conformal prediction
Connections of SVD, PCA, eigenvectors and eigenvalues
Controlling reasoning duration with activation steering
count the GCD values of all pairs
coupling object and relation representation
Cross-lingual transfer by forced alignment of embeddings in different languages
Cross-moments of a random vector
CTransformers
curse of dimenstionality
detect if PATH has a specific directory entry in it
DFS, BFS, Dynamic Programming and LLM decoding
diffusion is path finding in gradient space of data distribution using Langevin sampling
Diffusion language modeling with maximum semantic likelihood
discriminative and generative models
Discriminative-Generative Learning
distill from AR LM to diffusion LM
Dynamic-sized latent representations
Efficient Transformers with Dynamic Token Pooling
eigen values and Page Rank
Euler's theorem
EXL2
Exploiting the Bias-Variance trade-off for data fitting
Failure is a favour to the future
Fenwick Tree
GCD Convolution
generate primes
generate random unit vectors
GGML
GGUF
GPTQ-for-LLaMa
Graduate Application Aid
Graduate Applications Advice - Nathan Lambert
Graph of Life
heighten a behaviour in LLMs
Hierarchical sliding window transformers
Highlight Colour Codings
how does smaller versions of the same llm (3b, 7b, 13b, etc.) are trained
how to "average" a set of vectors
How To Train Your LLM Efficiently
hybrid of full weight update and PEFT
Hyrum's Law
I do things to satisfactory.
I know nothing
IELTS materials
Information Retrieval service
Is an LLM a one giant Hidden Markov model ?
it's better to learn unique skills instead of common ones
Just be. Smart. - letam.io
karabiner complex modification to make macOS feel more like Windows
learning by interacting with oneself
Legendre's conjecture
light travels in all possible paths to get from A to B
limit equals to e
LLM and HNSW
LLM and quantum computing and direction finding - unified by superposition
LLM generation is path finding in activation space, each decoder block's processing is taking a step in said space
LLMs development
local AI server
Local LLM
LoRA-abliterated
Machine Learning vocabs
me
merge sort
Mixture of Experts
Mixture-of-Distribution learning
ML might be bad for science and the importance of understanding how ML works
modular multi-modal hard routing
Moravec's paradox
my hybrid sort
my naive theory on ML
my systematic errors in English
No-Free-Lunch theorem
Nomi questions
overfitting-underfitting and variance-bias
priority queue
Problem solving by superpositional steering
profile vs rc scripts
programming interview questions to ask
Pursuit ignorance, pursuit mastery.
python max heap
quantum computing deals with directions
quick sort
random unit vectors in high dimension are nearly orthogonal
Refusal behaviour can be controlled by turning a knob
related concepts might be packed in a low-dimensional subspace
relay ssh-agent and gpg-agent in windows
research interests
Riemannian Manifolds and Fisher Information
Rotate any vector to a target angle in high dimensional space
Rotate from one vector to another vector in high dimensional space
run command on a remote
safetensor
Scaling Sparse Fine-Tuning to Large Language Models
Segment Tree
SentencePiece
Set a deadline to end, do not set a deadline to start.
set automounts option for WSL
setup pyenv
SFT - Supervised Fine-tuning
Sharded checkpointing
show all files cmd
speculative decoding
Stanford cheatsheet
Steering multiple behaviours jointly
Structure LLMs output
stupid things
subfractorial
The Bitter Lesson
The effect of BatchNorm vs LayerNorm vs RMSNorm
The effects of neural net layers on activation space
The only work that really matters is the work that no one sees
The Reparameterization trick
the type of system design philosophy that I like
The wrong way of teaching
Theory of activation space
tokenizers for LLMs
Toy Models of Superposition
Transforming column vectors of model weights to standard bases
trie
Unigram Tokenization
Universal Approximation Theorems
Untitled
vector database
vector magnitude in LLMs acts as a filter
vector space
vimium c configs
vLLM
VRDSynth
what ML models learns and what's the real solution
When we think of what could go wrong, we achieve so little.
which features represented by Attention layers vs by MLP layers
why I like ML
WordPiece
Writing to learn
yet another hypothesis on the mechanism of LLMs

Lone's notes

Explorer

Tag: career

Do important work.

Career Advice, Reading Research Papers by Andrew Ng

career advices from Terrance Tao

Recently Updated

Diffusion language modeling with maximum semantic likelihood

distill from AR LM to diffusion LM

LLM generation is path finding in activation space, each decoder block's processing is taking a step in said space

AI resources

Controlling reasoning duration with activation steering