-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
LLMs are prone to hallucinations, i.e. generating content that deviates from facts seen during training. There are
two simple decoding strategies for reducing hallucinations with pretrained LLMs: DoLA and SLED. Both specifically improve factual accuracy by contrasting logits from different layers, a technique that requires the model's forward pass to extract intermediate layer outputs. This feature is not currently built into the standard mlx_lm.generate pipeline.
- DoLA (https://arxiv.org/pdf/2309.03883) strategy is to choose the layer with the highest JSD as the premature layer, and the chosen layer will be contrasted with the final layer to update probabilities.
- SLED (https://arxiv.org/pdf/2411.02433v3) minimizes KL divergence between the latent knowledge distribution from all premature layers and the output distribution.
Implementing both should be relatively easy since each individual model only needs to return intermediate layer outputs. The reference code for the final computation is also available.
Please, implement both options (or just SLED since it shows better metrics as per https://jayzhang42.github.io/sled_page/). Thanks!