Skip to content

Feature Request: Implement Power Law sampling #1074

@Geechan

Description

@Geechan

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

I would like to request the implementation of the new Power Law sampler that has recently gotten an outstanding PR implementation in llama.cpp. Originally authored and created by @MrJackSpade here, and a working implementation in upstream llama.cpp ported and refined by @ddh0 here.

What is it?

This sampler effectively targets the 'creative yet coherent' token options in the middle of the sampling range, allowing for the end user to specify a target value that the model will swing towards. This changes the probability distribution in a less destructive way compared to adjusting temperature, and allows for more creativity while breaking a lot of repetitive model patterns and phrases (or, as we know it, slop). This can be considered an evolved version of Microstat and XTC, with similar goals, but with less reliance on RNG or complexity, setting an adaptive target instead of a static one determined by chance probability.

Adaptive target tracking

The sampler maintains a weighted history of the original probabilities of selected tokens. If recent selections have been higher-probability than the target, it compensates by temporarily lowering the effective target, and vice-versa. This keeps the average selection probability near your configured target over time.

Tunable Parameters

Flag Description Valid range Default
--power-law-target Select tokens near this probability. Negative = disabled. [0.0, 1.0] -1.0
--power-law-decay Decay rate for target adaptation over time. Effective history length ≈ 1/(1-decay) tokens. [0.0, 0.99] 0.9

Negative target values should disable the sampler and just sample a token from the un-transformed distribution. By setting the default target to -1.0, the sampler will be disabled by default. This is probably preferable, since it's a specialized sampler.

Usage notes

This sampler must be last in the chain, like the existing greedy, dist, or mirostat samplers, because it selects a token ID rather than just transforming logits. As a result, an implementation that guarantees Power Law to be at the end of the chain is probably preferable; see here.

Motivation

This sampler effectively functions as a smart, dynamic XTC, functioning very well for creative tasks of all descriptions. It can also additionally be used for non-creative tasks quite effectively by setting a higher target value (0.8-0.9), leaning closer towards the original probabilities.

By implementing it in ik_llama, it would add another powerful sampler option for the end user. The actual sampler logic is quite simple and should be easy to port over.

Possible Implementation

A reference .gist implementation can be found here, written for llama.cpp, which contains all the sampler logic:

https://gist.github.com/ddh0/3870fba61e482e0ad27f1812e32581bf

The outstanding PR request for upstream llama.cpp, with a detailed, technical explanation on the sampler and the parameters chosen to be visible to the end user, alongside other pertinent information:

ggml-org/llama.cpp#17927

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions