Feature Request: Implement Power Law sampling

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

I would like to request the implementation of the new **Power Law** sampler that has recently gotten an outstanding PR implementation in llama.cpp. Originally authored and created by @MrJackSpade [here](https://github.com/MrJackSpade/llama.cpp/blob/master/README.md), and a working implementation in upstream llama.cpp ported and refined by @ddh0 [here](https://github.com/ggml-org/llama.cpp/pull/17927).
### What is it?

This sampler effectively targets the 'creative yet coherent' token options in the middle of the sampling range, allowing for the end user to specify a target value that the model will swing towards. This changes the probability distribution in a less destructive way compared to adjusting temperature, and allows for more creativity while breaking a lot of repetitive model patterns and phrases (or, as we know it, slop). This can be considered an evolved version of Microstat and XTC, with similar goals, but with less reliance on RNG or complexity, setting an adaptive target instead of a static one determined by chance probability.

### Adaptive target tracking

The sampler maintains a weighted history of the original probabilities of selected tokens. If recent selections have been higher-probability than the target, it compensates by temporarily lowering the effective target, and vice-versa. This keeps the average selection probability near your configured target over time.

### Tunable Parameters

Flag | Description | Valid range | Default
-- | -- | -- | --
--power-law-target | Select tokens near this probability. Negative = disabled. | [0.0, 1.0] | -1.0
--power-law-decay | Decay rate for target adaptation over time. Effective history length ≈ 1/(1-decay) tokens. | [0.0, 0.99] | 0.9

Negative target values should disable the sampler and just sample a token from the un-transformed distribution. By setting the default target to -1.0, the sampler will be disabled by default. This is probably preferable, since it's a specialized sampler.


### Usage notes

This sampler must be last in the chain, like the existing `greedy`, `dist`, or `mirostat` samplers, because it selects a token ID rather than just transforming logits. As a result, an implementation that guarantees Power Law to be at the end of the chain is probably preferable; see [here](https://github.com/ggml-org/llama.cpp/pull/17927#issuecomment-3667178498).

### Motivation

This sampler effectively functions as a smart, dynamic XTC, functioning very well for creative tasks of all descriptions. It can also additionally be used for non-creative tasks quite effectively by setting a higher target value (0.8-0.9), leaning closer towards the original probabilities.

By implementing it in ik_llama, it would add another powerful sampler option for the end user. The actual sampler logic is quite simple and should be easy to port over.

### Possible Implementation

A reference .gist implementation can be found here, written for llama.cpp, which contains all the sampler logic:

https://gist.github.com/ddh0/3870fba61e482e0ad27f1812e32581bf

The outstanding PR request for upstream llama.cpp, with a detailed, technical explanation on the sampler and the parameters chosen to be visible to the end user, alongside other pertinent information:

https://github.com/ggml-org/llama.cpp/pull/17927

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Implement Power Law sampling #1074

Prerequisites

Feature Description

What is it?

Adaptive target tracking

Tunable Parameters

Usage notes

Motivation

Possible Implementation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Flag	Description	Valid range	Default
--power-law-target	Select tokens near this probability. Negative = disabled.	[0.0, 1.0]	-1.0
--power-law-decay	Decay rate for target adaptation over time. Effective history length ≈ 1/(1-decay) tokens.	[0.0, 0.99]	0.9

Feature Request: Implement Power Law sampling #1074

Description

Prerequisites

Feature Description

What is it?

Adaptive target tracking

Tunable Parameters

Usage notes

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions