Directional steering is a runtime activation edit for DS4. A steering file is a
flat f32 matrix with one normalized 4096-wide direction per layer. During
inference, ds4 can apply the edit after attention outputs, FFN outputs, or both:
y = y - scale * direction[layer] * dot(direction[layer], y)
Positive scale removes the represented direction. Negative scale amplifies it. With no steering file or zero scales, ds4 follows the normal inference path.
--dir-steering-file FILE load a 43 x 4096 f32 direction file
--dir-steering-ffn F apply steering after FFN outputs; default is 1 when a file is provided
--dir-steering-attn F apply steering after attention outputs; default is 0
The FFN output is usually the best first target because it is late enough in each layer to represent behavior, style, and topic signals. Attention steering is available for experiments, but it can be more fragile.
The bundled example builds a style direction from 100 paired prompts. Each pair asks for the same information in two ways:
examples/succinct.txt: terse target prompts.examples/verbose.txt: detailed contrast prompts.
Because the extracted direction is succinct - verbose, negative FFN scales
make answers shorter, while positive FFN scales tend to make answers longer and
more explanatory.
Build the vector:
python3 dir-steering/tools/build_direction.py \
--ds4 ./ds4 \
--model ds4flash.gguf \
--good-file dir-steering/examples/succinct.txt \
--bad-file dir-steering/examples/verbose.txt \
--out dir-steering/out/verbosity.json \
--component ffn_out \
--ctx 512This writes:
dir-steering/out/verbosity.json
dir-steering/out/verbosity.f32
Try a terse run:
./ds4 -m ds4flash.gguf --nothink --temp 0 -n 160 \
--dir-steering-file dir-steering/out/verbosity.f32 \
--dir-steering-ffn -1 \
-p "Explain why databases use indexes."Try a verbose run:
./ds4 -m ds4flash.gguf --nothink --temp 0 -n 220 \
--dir-steering-file dir-steering/out/verbosity.f32 \
--dir-steering-ffn 2 \
-p "Explain why databases use indexes."The same vector can be used in either direction. The sign is the important part:
- negative scale amplifies the succinct target direction;
- positive scale suppresses that direction and usually gives the model more room to elaborate.
Use the sweep helper to test several strengths on a fixed prompt set:
python3 dir-steering/tools/run_sweep.py \
--ds4 ./ds4 \
--model ds4flash.gguf \
--direction dir-steering/out/verbosity.f32 \
--prompts dir-steering/examples/eval_prompts.txt \
--scales "-1,-0.5,0,0.5,1,2" \
--tokens 180 \
--nothinkStart with FFN scales between -1 and 2. If the model becomes repetitive,
ignores the prompt, or starts losing factual content, the scale is too strong.
For this example, -1 is a good first terse setting and 2 is a good first
verbose setting. Strong negative scales such as -2 or -3 can over-amplify
the terse direction and collapse into repetition on some prompts.
With the 100-pair vector built from the commands above, local greedy checks showed the expected behavior:
- Prompt:
Explain why databases use indexes. --dir-steering-ffn -1: 67 words, one compact paragraph.--dir-steering-ffn 0: 136 words, structured explanation.--dir-steering-ffn 1: 140 words, structured explanation with more detail.
On a prompt that the unsteered model already answered briefly, positive steering made the expansion more visible:
- Prompt:
What does DNS do? --dir-steering-ffn 0: 44 words.--dir-steering-ffn 2: 171 words, with sections and step-by-step detail.
The extractor compares two prompt sets:
good-file: target prompts for the direction you want to represent.bad-file: contrast prompts that should be separated from the target.
It captures DS4 activations from the same local GPU graph used for inference,
averages target minus contrast, normalizes one vector per layer, and writes both
metadata JSON and the runtime .f32 file.
Concept removal:
- Put concept-heavy prompts in
good-file. - Put neutral prompts in
bad-file. - Run with a positive FFN scale.
Concept amplification:
- Put desired concept prompts in
good-file. - Put neutral prompts in
bad-file. - Run with a negative FFN scale.
Style control:
- Put prompts for the target style in
good-file. - Put contrasting style prompts in
bad-file. - Use negative scale to amplify the target style, positive scale to reduce it.
The method is not a fine-tune. It is a low-rank runtime edit, so it works best for coarse behavior, topic, or style directions that are consistently present in the activation captures.