llmquantizationbackend
what
- A python package for quantizing and running 4 bits quantization of LLaMA using GPTQ.
- Discontinued, the author recommends AutoGPTQ instead.
features
--act-order
: heuristically quantizing columns in order of decreasing activation size
- dramatically improves GPTQ’s accuracy on the OPT-66B outlier model
- very slow
--true-sequential
: performing sequential quantization even within a single Transformer block
--act-order
and --true-sequential
properties
--act-order
is very slow
- only supports linux
resources