llm quantization backend
what
- A python package for quantizing and running 4 bits quantization of LLaMA using GPTQ.
- Discontinued, the author recommends AutoGPTQ instead.
features
--act-order: heuristically quantizing columns in order of decreasing activation size
- dramatically improves GPTQ’s accuracy on the OPT-66B outlier model
- very slow
--true-sequential: performing sequential quantization even within a single Transformer block
--act-order and --true-sequential
properties
--act-order is very slow
- only supports linux
resources