Lone's notes
Search
Search
Dark mode
Light mode
Explorer
completed thoughts
a different type of intelligence
Be motivated by the action, not by the goal. Constant action gives constant progress, which is a recurring source of dopamine.
Cinnamon should rethink research
Don't seek to understand, seek for a satisfying intuition. Understanding is when you know something makes sense, intuition is when you can feel the sense that was made.
how Obsidian is good for me
I have a constant pressure of how much I don't know and how much more I don't even know that I don't know.
Live intentionally, spend time intentionally.
my favourite activity is thinking
The problem with modelling the human brain
what knowledge gives me
working in AI and understanding statistics
write your thoughts down
cs
computer architect
IEEE 754 floating-point
os
wsl
relay ssh-agent and gpg-agent in windows
set automounts option for WSL
symlink windows exe to wsl
Excalidraw
Scripts
Downloaded
Auto Draw for Pen
Hardware Eraser Support
Zoom to Fit Selected Elements
Drawing 2023-12-05 04.02.33.excalidraw
temperature-vs-nucleus-sampling.excalidraw
incompleted thoughts
(2x2)D attention
A pen that can write on any surface and the content will be recorded in an app
ads that appear in game environment
auto suggest which paper to cite
Dimensions of language
document layout synthesis
dynamic computation for deep learning models
is LLaMa better than LLaMa 2
network of documents connected by small-scale ideas
QA on documents
random attention
Related work writing tool
swap windows in 2 monitors
The relationship between knowledge and creativity
the self-attention equation similar to Newton's gravity equation
literature notes
Omnivore
*Refusal in LLMs is mediated by a single direction — LessWrong
LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B — LessWrong
papers
BAM! Just Like That - Simple and Efficient Parameter Upcycling for Mixture of Experts
Beyond neural scaling laws - beating power law scaling via data pruning
CAMEx - Curvature-Aware Merging of Experts
web-clipper
A Mathematical Framework for Transformer Circuits
Memory-efficient Model Weight Loading
zotero
A Language Model's Guide Through Latent Space
Are Emergent Abilities of Large Language Models a Mirage
BAM! Just Like That - Simple and Efficient Parameter Upcycling for Mixture of Experts
CAMEx - Curvature-Aware Merging of Experts
Mechanistic Interpretability for AI Safety -- A Review
Refusal in Language Models Is Mediated by a Single Direction
Representation Engineering - A Top-Down Approach to AI Transparency
Scaling Laws for Fine-Grained Mixture of Experts
Steering Language Models With Activation Engineering
Superposition of many models into one
The Linear Representation Hypothesis and the Geometry of Large Language Models
Transformer Feed-Forward Layers Are Key-Value Memories
Universal and Transferable Adversarial Attacks on Aligned Language Models
Zotero Integration template
"the effort to use machines to try to mimic human reasoning is both foolish and dangerous"
1's complement
2's complement
Advices on building LLMs
All models are wrong. Some models are useful.
Bayesians are frequentists
Career Advice, Reading Research Papers by Andrew Ng
Do important work.
Earth rotation can't be used for time keeping
fleeting notes
Goodhart's Law
If I can't find a path I'll make one
Joy is be able to see things that people couldn't.
kasten (boxes)
literature notes
llama-cpp
llama-cpp-python
LLMs shortcomings
MosaicML Foundry
MosaicML Streaming Data Loader
MosaicML tools
offset binary
Open Problems in the Theory of Deep Learning - MITCBMM
permanent notes
pipelining
Principles of Success by Ray Dalio
risc vs cisc
Sidereal day
sign-magnitude
Survey of LLMs
syncthing
The Earth is slowing down
The length of a solar day is not actually 24 hours
The opposite of a profound truth
The productivity of a knowledge worker should be tracked by how many permanent notes you produce a day.
The pursuit of ignorance.
The Unknown Unknown
Things that change the rotation of the Earth
Training and Fine-tuning LLMs - W&B course
What would you do even if you know you would fail
You just can't compete with someone who is having fun
zettel (notes)
zettlekasten
permanent notes
A day is not actually 24h long
Activation Checkpointing
Adam
Aho-Corasick algorithm
ALiBi
Bayesian statistics
Building your own tokenizers
Checkpointing for LLM training
Chinchilla scaling laws
Choosing architecture for training LLM
classical probability
Cost to train a LLM
Data parallelism
descent
Dijkstra algorithm
Fermat's 2-square theorem
frequentist statistics
frequentist vs Bayesian
hash
HFU - Hardware FLOPs Utilization
HumanEval Pros and Cons
insertion sort
Intuition on understanding the meaning of variance and bias for Machine Learning
inversion
KMP - Knuth-Morris-Pratt
Lagrange's 4-square theorem
Legendre's 3-square theorem
likelihood function
LLM fine-tuning
LLM parallelism strategies
LLM sampling strategy
LLMs evaluation
Logistics of Data Loading for LLMs training
LoRA
LSP - Longest Suffix which is also a prefix
Mahonian numbers
major index
Memory usage for LLM training
MFU - Model FLOPs Utilization
modular multiplicative inverse
nucleus sampling (top_p)
obsidian backup
obsidian sync
palindrome substrings
pattern searching
permutation
perplexity vs entropy
Pipeline parallelism
polynomial rolling hash
principle of difference
probability vs likelihood
Problems during LLMs pre-training
Rabin-Karp
RLHF
rolling hash
RoPE - Rotary Position Embeddings
symlink windows exe to wsl
Techniques for improving the stability of training large ML models
temperature sampling
Tensor parallelism
The ability to make analogies indicates intelligence. If it helps you understanding things, don't be afraid of making too many analogies.
To curb one's own ignorance is a joy cherished only by the most restless of minds.
union-find
What can't be done by an AI will be done by many AIs working together.
Z-function
ZeRO - Zero Redundancy Optimizer
todo
AQLM
AutoGPTQ
AWQ
BF16 - brain floating-point
Bias-Complexity tradeoff
Continuous Batching
Control LLM generation
Excellent explanation of the Euler equation
ExLlamaV2
Expressivity and Universal Approximation Theorems
From Autoencoder to Beta-VAE
FSDP - Fully Sharded Data Parallel
GPT4All
GPTQ
graphical model
great resources
Grouped-query Attention (GQA)
Gumbel-sigmoid
Hierarchical Navigable Small Worlds (HNSW)
KV Cache
Lessons learnt for training LLM (from bigscience)
LLM101n - Course by Andrej Kapathy
Locally Typical Sampling
Mechanistic Interpretability
Mirostat
MLC LLM
new English words
Paged Attention
quantization
QuIP
RAG
retrival
Smoothquant+
SqueezeLLM
Tail-Free Sampling
task arithmetic
text chunking techniques
the brilliance of transformers' sinusoidal positional embeddings
To read
uncensor LLMs
Why embeddings are added, not concatenated
(some) functions are vectors
`model.eval()` vs `torch.no_grad()`
Adaptive LoRA
alias in CMD
anyway vs any way vs anyways
Aphrodite
Applying to Ph.D. Programs in Computer Science - Mor Harchol-Balter, CMU
Approximate Nearest Neighbors
AQLM
associative memory
auto start ssh-agent
AutoAWQ
auxiliary loss for language models
Bertrand's theorem
Best and Worst of both worlds - combining LSTM and Transformers
binary prefix trie
Bring Windows features to MacOS
bubble sort
Byte-level BPE
Byte-Pair Encoding (BPE)
Calculate GPU memory requirement and tokens for any LLM
career advices from Terrance Tao
Central Limit Theorem
Changing MLP layers in Transformers to Probabilistic encoders
chat agent with action planning
combination
Combining Modular Skills in Multitask Learning
complex number vs 2d vectors
Conformal prediction
Connections of SVD, PCA, eigenvectors and eigenvalues
count the GCD values of all pairs
coupling object and relation representation
Cross-lingual transfer by forced alignment of embeddings in different languages
Cross-moments of a random vector
CTransformers
curse of dimenstionality
detect if PATH has a specific directory entry in it
discriminative and generative models
Discriminative-Generative Learning
Dynamic-sized latent representations
Efficient Transformers with Dynamic Token Pooling
Euler's theorem
EXL2
Exploiting the Bias-Variance trade-off for data fitting
Failure is a favour to the future
Fenwick Tree
GCD Convolution
generate primes
generate random unit vectors
GGML
GGUF
GPTQ-for-LLaMa
Graduate Application Aid
Graduate Applications Advice - Nathan Lambert
Graph of Life
Hierarchical sliding window transformers
Highlight Colour Codings
how does smaller versions of the same llm (3b, 7b, 13b, etc.) are trained
How To Train Your LLM Efficiently
hybrid of full weight update and PEFT
Hyrum's Law
I do things to satisfactory.
I know nothing
IELTS materials
Information Retrieval service
it's better to learn unique skills instead of common ones
Just be. Smart. - letam.io
karabiner complex modification to make macOS feel more like Windows
learning by interacting with oneself
Legendre's conjecture
limit equals to e
llm fine-tuning resources
LLMs development
local AI server
Local LLM
LoRA-abliterated
Machine Learning vocabs
me
merge sort
Mixture of Experts
Mixture-of-Distribution learning
ML might be bad for science and the importance of understanding how ML works
modular multi-modal hard routing
Moravec's paradox
my hybrid sort
my naive theory on ML
my systematic errors in English
No-Free-Lunch theorem
overfitting-underfitting and variance-bias
priority queue
profile vs rc scripts
Pursuit ignorance, pursuit mastery.
python max heap
quick sort
random unit vectors in high dimension are nearly orthogonal
relay ssh-agent and gpg-agent in windows
research interests
Riemannian Manifolds and Fisher Information
Rotate from one vector to another vector in high dimensional space
safetensor
Scaling Sparse Fine-Tuning to Large Language Models
Segment Tree
SentencePiece
set automounts option for WSL
setup pyenv
SFT - Supervised Fine-tuning
Sharded checkpointing
show all files cmd
speculative decoding
Stanford cheatsheet
Structure LLMs output
stupid things
subfractorial
The Bitter Lesson
The effect of BatchNorm vs LayerNorm vs RMSNorm
The effects of neural net layers on activation space
The only work that really matters is the work that no one sees
The Reparameterization trick
the type of system design philosophy that I like
The wrong way of teaching
Theory of activation space
tokenizers for LLMs
Toy Models of Superposition
Transforming column vectors of model weights to standard bases
trie
Unigram Tokenization
Universal Approximation Theorems
Untitled
vector database
vector space
vimium c configs
vLLM
VRDSynth
what ML models learns and what's the real solution
When we think of what could go wrong, we achieve so little.
why I like ML
WordPiece
Writing to learn
Home
❯
tags
❯
Tag: career
Tag: career
3 items with this tag.
Dec 08, 2024
career advices from Terrance Tao
advice
career
Dec 08, 2024
Career Advice, Reading Research Papers by Andrew Ng
advice
research
career
Dec 08, 2024
Do important work.
quote
advice
career
Recently Updated
Mechanistic Interpretability
Dec 08, 2024
cs/ai/ml/theory
cs/ai/ml/nlp/llm
cs/ai/ml/modular-learning
cs/ai/ml/mechanistic-interpretability
todo/study
When we think of what could go wrong, we achieve so little.
Dec 08, 2024
quote
Writing to learn
Dec 08, 2024
advice
Excellent explanation of the Euler equation
Dec 08, 2024
maths
todo/write
write your thoughts down
Dec 08, 2024
thought
advice