llmquantizationbackend

what

  • A python package for quantizing and running 4 bits quantization of LLaMA using GPTQ.
  • Discontinued, the author recommends AutoGPTQ instead.

features

  • --act-order: heuristically quantizing columns in order of decreasing activation size
    • dramatically improves GPTQ’s accuracy on the OPT-66B outlier model
    • very slow
  • --true-sequential: performing sequential quantization even within a single Transformer block
  • --act-order and --true-sequential

properties

  • --act-order is very slow
  • only supports linux

resources