llm

loader

Tools used for loading LLMs to be used by CPU/GPU.

OSHardwareFormat / quantization
MacOSWindowsWindows
WSL
LinuxAndroidiOS /
iPadOS
WebApple SiliconApple
Intel
Nvdia
(CUDA)
AMD
(ROCm)
Intel/AMD
CPU
Intel ArcIntel iGPUoffloadGGMLGUFFGPTQAWQEXL2[[QuIP#]]MLChugging
face
safetensorSqueezeLLM
huggingface/transformers++++++++++
(PyTorch)
mem++
llama-cpp
(llama-cpp-python)
++++++
(1st class)
++++
(SYCL)
layer-
(dropped)
+
GPTQ-for-LLaMa
(paused, use AutoGPTQ instead)
--++-----++--layer+
(LLaMa only)
AutoGPTQ+++---++memy~
(indirect)
~
(indirect)
ExLlamaV2+++---++++
AutoAWQ-+++-----+++--layer
(accelerate)
+
CTransformers+
(Metal, LLaMa 1/2 only)
+++~
(Metal, LLaMa 1/2 only)
~
(Metal, LLaMa 1/2 only)
~
(limited)
~
(limited ?)
+layer+++
[[QuIP#]]++++++~
(indrect)
?
MLC LLM+
(Metal)
++++
(OpenCL on Adreno, Mali)
+
(Metal on A-series)
+
(WebGPU, WASM)
+
(Metal)
+
(Metal)
+++
(Vulkan)
+
(Vulkan, Metal)
+~
(indirect)
?
GPT4All+++++++++
(AVX/AVX2 instructions)
+
(Vulkan)
+
(Vulkan)
~
(limited architectures)
vLLM+-
(uses GPU)
++++++++
Aphrodite