Lone's notes

Recently Updated

  • Adaptive KV cache pruning with selective features

    Feb 25, 2025

  • eigen values and Page Rank

    Mar 02, 2025

  • The effects of neural net layers on activation space

    Oct 29, 2024

  • which features represented by Attention layers vs by MLP layers

    Dec 03, 2024

  • Mechanistic Interpretability

    Oct 24, 2024

      • a different type of intelligence
      • Be motivated by the action, not by the goal. Constant action gives constant progress, which is a recurring source of dopamine.
      • Cinnamon should rethink research
      • Don't seek to understand, seek for a satisfying intuition. Understanding is when you know something makes sense, intuition is when you can feel the sense that was made.
      • how Obsidian is good for me
      • I have a constant pressure of how much I don't know and how much more I don't even know that I don't know.
      • Live intentionally, spend time intentionally.
      • my favourite activity is thinking
      • The problem with modelling the human brain
      • what knowledge gives me
      • working in AI and understanding statistics
      • write your thoughts down
        • IEEE 754 floating-point
          • relay ssh-agent and gpg-agent in windows
          • set automounts option for WSL
          • symlink windows exe to wsl
          • Auto Draw for Pen
          • Hardware Eraser Support
          • Zoom to Fit Selected Elements
      • Drawing 2023-12-05 04.02.33.excalidraw
      • temperature-vs-nucleus-sampling.excalidraw
      • (2x2)D attention
      • A pen that can write on any surface and the content will be recorded in an app
      • ads that appear in game environment
      • auto suggest which paper to cite
      • Dimensions of language
      • document layout synthesis
      • dynamic computation for deep learning models
      • is LLaMa better than LLaMa 2
      • network of documents connected by small-scale ideas
      • QA on documents
      • random attention
      • Related work writing tool
      • swap windows in 2 monitors
      • The relationship between knowledge and creativity
      • the self-attention equation similar to Newton's gravity equation
        • *Refusal in LLMs is mediated by a single direction — LessWrong
        • LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B — LessWrong
        • BAM! Just Like That - Simple and Efficient Parameter Upcycling for Mixture of Experts
        • Beyond neural scaling laws - beating power law scaling via data pruning
        • CAMEx - Curvature-Aware Merging of Experts
        • A Mathematical Framework for Transformer Circuits
        • Memory-efficient Model Weight Loading
        • A Language Model's Guide Through Latent Space
        • Are Emergent Abilities of Large Language Models a Mirage
        • BAM! Just Like That - Simple and Efficient Parameter Upcycling for Mixture of Experts
        • CAMEx - Curvature-Aware Merging of Experts
        • Linear Representations of Sentiment in Large Language Models
        • Mechanistic Interpretability for AI Safety -- A Review
        • Refusal in Language Models Is Mediated by a Single Direction
        • Representation Engineering - A Top-Down Approach to AI Transparency
        • Scaling Laws for Fine-Grained Mixture of Experts
        • Steering Language Models With Activation Engineering
        • Superposition of many models into one
        • The Linear Representation Hypothesis and the Geometry of Large Language Models
        • Training Large Language Models to Reason in a Continuous Latent Space
        • Transformer Feed-Forward Layers Are Key-Value Memories
        • Universal and Transferable Adversarial Attacks on Aligned Language Models
        • Zotero Integration template
      • "the effort to use machines to try to mimic human reasoning is both foolish and dangerous"
      • 1's complement
      • 2's complement
      • Advices on building LLMs
      • All models are wrong. Some models are useful.
      • Bayesians are frequentists
      • Career Advice, Reading Research Papers by Andrew Ng
      • Do important work.
      • Earth rotation can't be used for time keeping
      • fleeting notes
      • Goodhart's Law
      • If I can't find a path I'll make one
      • Joy is be able to see things that people couldn't.
      • kasten (boxes)
      • literature notes
      • llama-cpp
      • llama-cpp-python
      • LLMs shortcomings
      • MosaicML Foundry
      • MosaicML Streaming Data Loader
      • MosaicML tools
      • offset binary
      • Open Problems in the Theory of Deep Learning - MITCBMM
      • permanent notes
      • pipelining
      • Principles of Success by Ray Dalio
      • risc vs cisc
      • Sidereal day
      • sign-magnitude
      • Survey of LLMs
      • syncthing
      • The Earth is slowing down
      • The length of a solar day is not actually 24 hours
      • The opposite of a profound truth
      • The productivity of a knowledge worker should be tracked by how many permanent notes you produce a day.
      • The pursuit of ignorance.
      • The Unknown Unknown
      • Things that change the rotation of the Earth
      • Training and Fine-tuning LLMs - W&B course
      • What would you do even if you know you would fail
      • You just can't compete with someone who is having fun
      • zettel (notes)
      • zettlekasten
        • Organize your Omnivore library with labels
      • A day is not actually 24h long
      • Activation Checkpointing
      • Adam
      • Aho-Corasick algorithm
      • ALiBi
      • Bayesian statistics
      • Building your own tokenizers
      • Checkpointing for LLM training
      • Chinchilla scaling laws
      • Choosing architecture for training LLM
      • classical probability
      • Cost to train a LLM
      • Data parallelism
      • descent
      • Dijkstra algorithm
      • Fermat's 2-square theorem
      • frequentist statistics
      • frequentist vs Bayesian
      • hash
      • HFU - Hardware FLOPs Utilization
      • HumanEval Pros and Cons
      • insertion sort
      • Intuition on understanding the meaning of variance and bias for Machine Learning
      • inversion
      • KMP - Knuth-Morris-Pratt
      • Lagrange's 4-square theorem
      • Legendre's 3-square theorem
      • likelihood function
      • LLM fine-tuning
      • LLM parallelism strategies
      • LLM sampling strategy
      • LLMs evaluation
      • Logistics of Data Loading for LLMs training
      • LoRA
      • LSP - Longest Suffix which is also a prefix
      • Mahonian numbers
      • major index
      • Memory usage for LLM training
      • MFU - Model FLOPs Utilization
      • modular multiplicative inverse
      • nucleus sampling (top_p)
      • obsidian backup
      • obsidian sync
      • palindrome substrings
      • pattern searching
      • permutation
      • perplexity vs entropy
      • Pipeline parallelism
      • polynomial rolling hash
      • principle of difference
      • probability vs likelihood
      • Problems during LLMs pre-training
      • Rabin-Karp
      • RLHF
      • rolling hash
      • RoPE - Rotary Position Embeddings
      • symlink windows exe to wsl
      • Techniques for improving the stability of training large ML models
      • temperature sampling
      • Tensor parallelism
      • The ability to make analogies indicates intelligence. If it helps you understanding things, don't be afraid of making too many analogies.
      • To curb one's own ignorance is a joy cherished only by the most restless of minds.
      • union-find
      • What can't be done by an AI will be done by many AIs working together.
      • Z-function
      • ZeRO - Zero Redundancy Optimizer
      • AQLM
      • AutoGPTQ
      • AWQ
      • BF16 - brain floating-point
      • Bias-Complexity tradeoff
      • Continuous Batching
      • Control LLM generation
      • Excellent explanation of the Euler equation
      • ExLlamaV2
      • Expressivity and Universal Approximation Theorems
      • From Autoencoder to Beta-VAE
      • FSDP - Fully Sharded Data Parallel
      • GPT4All
      • GPTQ
      • graphical model
      • great resources
      • Grouped-query Attention (GQA)
      • Gumbel-sigmoid
      • Hierarchical Navigable Small Worlds (HNSW)
      • KV Cache
      • Lessons learnt for training LLM (from bigscience)
      • LLM101n - Course by Andrej Kapathy
      • Locally Typical Sampling
      • Mechanistic Interpretability
      • Mirostat
      • MLC LLM
      • new English words
      • Paged Attention
      • quantization
      • QuIP
      • RAG
      • retrival
      • Smoothquant+
      • SqueezeLLM
      • Tail-Free Sampling
      • task arithmetic
      • text chunking techniques
      • the brilliance of transformers' sinusoidal positional embeddings
      • To read
      • uncensor LLMs
      • Why embeddings are added, not concatenated
      • (some) functions are vectors
      • `model.eval()` vs `torch.no_grad()`
      • a hypothesis for the mechanism of reasoning
      • Adaptive KV cache pruning with selective features
      • Adaptive LoRA
      • alias in CMD
      • anyway vs any way vs anyways
      • Aphrodite
      • Applying to Ph.D. Programs in Computer Science - Mor Harchol-Balter, CMU
      • Approximate Nearest Neighbors
      • AQLM
      • associative memory
      • auto start ssh-agent
      • AutoAWQ
      • auxiliary loss for language models
      • Bertrand's theorem
      • Best and Worst of both worlds - combining LSTM and Transformers
      • binary prefix trie
      • Bring Windows features to MacOS
      • bubble sort
      • Byte-level BPE
      • Byte-Pair Encoding (BPE)
      • Calculate GPU memory requirement and tokens for any LLM
      • career advices from Terrance Tao
      • Central Limit Theorem
      • Changing MLP layers in Transformers to Probabilistic encoders
      • chat agent with action planning
      • combination
      • Combining Modular Skills in Multitask Learning
      • common bit manipulation
      • complex number vs 2d vectors
      • Conformal prediction
      • Connections of SVD, PCA, eigenvectors and eigenvalues
      • count the GCD values of all pairs
      • coupling object and relation representation
      • Cross-lingual transfer by forced alignment of embeddings in different languages
      • Cross-moments of a random vector
      • CTransformers
      • curse of dimenstionality
      • detect if PATH has a specific directory entry in it
      • DFS, BFS, Dynamic Programming and LLM decoding
      • discriminative and generative models
      • Discriminative-Generative Learning
      • Dynamic-sized latent representations
      • Efficient Transformers with Dynamic Token Pooling
      • eigen values and Page Rank
      • Euler's theorem
      • EXL2
      • Exploiting the Bias-Variance trade-off for data fitting
      • Failure is a favour to the future
      • Fenwick Tree
      • GCD Convolution
      • generate primes
      • generate random unit vectors
      • GGML
      • GGUF
      • GPTQ-for-LLaMa
      • Graduate Application Aid
      • Graduate Applications Advice - Nathan Lambert
      • Graph of Life
      • Hierarchical sliding window transformers
      • Highlight Colour Codings
      • how does smaller versions of the same llm (3b, 7b, 13b, etc.) are trained
      • how to "average" a set of vectors
      • How To Train Your LLM Efficiently
      • hybrid of full weight update and PEFT
      • Hyrum's Law
      • I do things to satisfactory.
      • I know nothing
      • IELTS materials
      • Information Retrieval service
      • Is an LLM a one giant Hidden Markov model ?
      • it's better to learn unique skills instead of common ones
      • Just be. Smart. - letam.io
      • karabiner complex modification to make macOS feel more like Windows
      • learning by interacting with oneself
      • Legendre's conjecture
      • limit equals to e
      • LLM and HNSW
      • llm fine-tuning resources
      • LLMs development
      • local AI server
      • Local LLM
      • LoRA-abliterated
      • Machine Learning vocabs
      • me
      • merge sort
      • Mixture of Experts
      • Mixture-of-Distribution learning
      • ML might be bad for science and the importance of understanding how ML works
      • modular multi-modal hard routing
      • Moravec's paradox
      • my hybrid sort
      • my naive theory on ML
      • my systematic errors in English
      • No-Free-Lunch theorem
      • overfitting-underfitting and variance-bias
      • priority queue
      • profile vs rc scripts
      • programming interview questions to ask
      • Pursuit ignorance, pursuit mastery.
      • python max heap
      • quick sort
      • random unit vectors in high dimension are nearly orthogonal
      • Refusal behaviour can be controlled by turning a knob
      • related concepts might be packed in a low-dimensional subspace
      • relay ssh-agent and gpg-agent in windows
      • research interests
      • Riemannian Manifolds and Fisher Information
      • Rotate any vector to a target angle in high dimensional space
      • Rotate from one vector to another vector in high dimensional space
      • safetensor
      • Scaling Sparse Fine-Tuning to Large Language Models
      • Segment Tree
      • SentencePiece
      • Set a deadline to end, do not set a deadline to start.
      • set automounts option for WSL
      • setup pyenv
      • SFT - Supervised Fine-tuning
      • Sharded checkpointing
      • show all files cmd
      • speculative decoding
      • Stanford cheatsheet
      • Structure LLMs output
      • stupid things
      • subfractorial
      • The Bitter Lesson
      • The effect of BatchNorm vs LayerNorm vs RMSNorm
      • The effects of neural net layers on activation space
      • The only work that really matters is the work that no one sees
      • The Reparameterization trick
      • the type of system design philosophy that I like
      • The wrong way of teaching
      • Theory of activation space
      • tokenizers for LLMs
      • Toy Models of Superposition
      • Transforming column vectors of model weights to standard bases
      • trie
      • Unigram Tokenization
      • Universal Approximation Theorems
      • vector database
      • vector space
      • vimium c configs
      • vLLM
      • VRDSynth
      • what ML models learns and what's the real solution
      • When we think of what could go wrong, we achieve so little.
      • which features represented by Attention layers vs by MLP layers
      • why I like ML
      • WordPiece
      • Writing to learn
      • yet another hypothesis on the mechanism of LLMs
    Home

    ❯

    literature notes

    ❯

    Earth rotation can't be used for time keeping

    Earth rotation can't be used for time keeping

    Dec 22, 20231 min read

    • physics/astrology

    astrology

    source: How long is a day?⌚ | How long is a day?⌚ | By StarTalk | Facebook

    Why

    • We can’t define time based on the rotation of the Earth, because it’s not stable.
    • If Earth slow down or speed up, we could never not know.

    Hence

    • We use the vibration of Cesium 137: there is a electron transition between 2 energy levels that has a very precise frequency.
    • And when we measure, it turns out that The Earth is slowing down.

    Graph View

    • Why
    • Hence

    Backlinks

    • A day is not actually 24h long

    Created with Quartz v4.4.0 © 2025

    • GitHub
    • LinkedIn
    • Google Scholar
    • CV