research llm

  • selective refusal
    • there are many categories of harmful inputs and there are data for each
    • so we can apply multi-direction steering on subsets of the categories