Skip to content

Conversation

@maximizemaxwell
Copy link
Contributor

Implement GPT OSS on candle

@maximizemaxwell maximizemaxwell marked this pull request as draft October 12, 2025 02:04
@maximizemaxwell maximizemaxwell marked this pull request as ready for review October 12, 2025 02:06
Copy link
Member

@ivarflakstad ivarflakstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! 👏

Generally this looks good to me, but there are a couple things I'd like to address.

I'll also ask @EricLBuehler to have a look, since he has been working on an implementation which includes mxfp4 support 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this binary 👍

Comment on lines +250 to +251
self.sinks
.reshape((1, self.num_heads, 1, 1))?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this reshape could be done just once in Attention::new, correct?

Copy link
Member

@ivarflakstad ivarflakstad Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really agree that this proves correctness. Best to remove this as well.

If you want to add some tests you could add something to gpt_oss.rs where you manually create a config and varbuilder and verify (for example) that the correct expert is chosen.

@embedding-shapes
Copy link

embedding-shapes commented Oct 21, 2025

I wanted to try this but couldn't even compile the example, and on the surface it seems to be just syntax errors, but trying to address the syntax errors myself, it seems like fundamentally it hasn't been completely implemented? Doesn't seem to be a hardware compatibility issue either, Rust straight up cannot compile this.

$ git rev-parse HEAD
71f38130e9677baf0d3cb770cb9461c4064266a5

$ cargo run --example gpt-oss --release -- --prompt "The future of AI is"
   Compiling candle-transformers v0.9.1 (/projects/huggingface/candle/candle-transformers)
warning: unused import: `Activation`
  --> candle-transformers/src/models/gpt_oss.rs:14:37
   |
14 | use candle_nn::{linear_b as linear, Activation, Linear, VarBuilder};
   |                                     ^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

error[E0599]: no method named `topk` found for struct `candle_core::Tensor` in the current scope
   --> candle-transformers/src/models/gpt_oss.rs:315:42
    |
315 |         let (top_vals, top_idx) = logits.topk(
    |                                   -------^^^^ method not found in `candle_core::Tensor`
    |
   ::: candle-transformers/src/models/deepseek2.rs:88:8
    |
 88 |     fn topk(&self, topk: usize) -> Result<TopKOutput>;
    |        ---- the method is available for `candle_core::Tensor` here
    |
    = help: items from traits can only be used if the trait is in scope
help: trait `TopKLastDimOp` which provides `topk` is implemented but not in scope; perhaps you want to import it
    |
 11 + use crate::models::deepseek2::TopKLastDimOp;
    |

error[E0599]: no method named `i` found for struct `candle_core::Tensor` in the current scope
    --> candle-transformers/src/models/gpt_oss.rs:384:27
     |
 384 |             let x_tok = x.i(token_idx)?; // (H,)
     |                           ^
     |
    ::: /projects/huggingface/candle/candle-core/src/indexer.rs:137:8
     |
 137 |     fn i(&self, index: T) -> Result<Tensor, Error>;
     |        - the method is available for `candle_core::Tensor` here
     |
     = help: items from traits can only be used if the trait is in scope
help: there is a method `id` with a similar name, but with different arguments
    --> /projects/huggingface/candle/candle-core/src/tensor.rs:1901:5
     |
1901 |     pub fn id(&self) -> TensorId {
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: trait `IndexOp` which provides `i` is implemented but not in scope; perhaps you want to import it
     |
  11 + use candle_core::IndexOp;
     |

error[E0599]: no method named `i` found for reference `&candle_core::Tensor` in the current scope
    --> candle-transformers/src/models/gpt_oss.rs:388:41
     |
 388 |                 let expert_id = top_idx.i((token_idx, k))?.to_scalar::<i64>()? as usize;
     |                                         ^
     |
     = help: items from traits can only be used if the trait is in scope
help: there is a method `id` with a similar name, but with different arguments
    --> /projects/huggingface/candle/candle-core/src/tensor.rs:1901:5
     |
1901 |     pub fn id(&self) -> TensorId {
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: trait `IndexOp` which provides `i` is implemented but not in scope; perhaps you want to import it
     |
  11 + use candle_core::IndexOp;
     |

error[E0599]: no method named `i` found for reference `&candle_core::Tensor` in the current scope
    --> candle-transformers/src/models/gpt_oss.rs:389:51
     |
 389 |                 let expert_weight = router_scores.i((token_idx, k))?;
     |                                                   ^
     |
     = help: items from traits can only be used if the trait is in scope
help: there is a method `id` with a similar name, but with different arguments
    --> /projects/huggingface/candle/candle-core/src/tensor.rs:1901:5
     |
1901 |     pub fn id(&self) -> TensorId {
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: trait `IndexOp` which provides `i` is implemented but not in scope; perhaps you want to import it
     |
  11 + use candle_core::IndexOp;
     |

error[E0599]: no method named `i` found for struct `candle_core::Tensor` in the current scope
    --> candle-transformers/src/models/gpt_oss.rs:392:38
     |
 392 |                 let w = self.gate_up.i(expert_id)?;
     |                                      ^
     |
    ::: /projects/huggingface/candle/candle-core/src/indexer.rs:137:8
     |
 137 |     fn i(&self, index: T) -> Result<Tensor, Error>;
     |        - the method is available for `candle_core::Tensor` here
     |
     = help: items from traits can only be used if the trait is in scope
help: there is a method `id` with a similar name, but with different arguments
    --> /projects/huggingface/candle/candle-core/src/tensor.rs:1901:5
     |
1901 |     pub fn id(&self) -> TensorId {
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: trait `IndexOp` which provides `i` is implemented but not in scope; perhaps you want to import it
     |
  11 + use candle_core::IndexOp;
     |

error[E0599]: no method named `i` found for struct `candle_core::Tensor` in the current scope
    --> candle-transformers/src/models/gpt_oss.rs:393:43
     |
 393 |                 let b = self.gate_up_bias.i(expert_id)?;
     |                                           ^
     |
    ::: /projects/huggingface/candle/candle-core/src/indexer.rs:137:8
     |
 137 |     fn i(&self, index: T) -> Result<Tensor, Error>;
     |        - the method is available for `candle_core::Tensor` here
     |
     = help: items from traits can only be used if the trait is in scope
help: there is a method `id` with a similar name, but with different arguments
    --> /projects/huggingface/candle/candle-core/src/tensor.rs:1901:5
     |
1901 |     pub fn id(&self) -> TensorId {
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: trait `IndexOp` which provides `i` is implemented but not in scope; perhaps you want to import it
     |
  11 + use candle_core::IndexOp;
     |

error[E0599]: no method named `i` found for struct `candle_core::Tensor` in the current scope
    --> candle-transformers/src/models/gpt_oss.rs:394:40
     |
 394 |                 let w_down = self.down.i(expert_id)?;
     |                                        ^
     |
    ::: /projects/huggingface/candle/candle-core/src/indexer.rs:137:8
     |
 137 |     fn i(&self, index: T) -> Result<Tensor, Error>;
     |        - the method is available for `candle_core::Tensor` here
     |
     = help: items from traits can only be used if the trait is in scope
help: there is a method `id` with a similar name, but with different arguments
    --> /projects/huggingface/candle/candle-core/src/tensor.rs:1901:5
     |
1901 |     pub fn id(&self) -> TensorId {
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: trait `IndexOp` which provides `i` is implemented but not in scope; perhaps you want to import it
     |
  11 + use candle_core::IndexOp;
     |

error[E0599]: no method named `i` found for struct `candle_core::Tensor` in the current scope
    --> candle-transformers/src/models/gpt_oss.rs:395:45
     |
 395 |                 let b_down = self.down_bias.i(expert_id)?;
     |                                             ^
     |
    ::: /projects/huggingface/candle/candle-core/src/indexer.rs:137:8
     |
 137 |     fn i(&self, index: T) -> Result<Tensor, Error>;
     |        - the method is available for `candle_core::Tensor` here
     |
     = help: items from traits can only be used if the trait is in scope
help: there is a method `id` with a similar name, but with different arguments
    --> /projects/huggingface/candle/candle-core/src/tensor.rs:1901:5
     |
1901 |     pub fn id(&self) -> TensorId {
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: trait `IndexOp` which provides `i` is implemented but not in scope; perhaps you want to import it
     |
  11 + use candle_core::IndexOp;
     |

error[E0277]: the `?` operator can only be applied to values that implement `Try`
   --> candle-transformers/src/models/gpt_oss.rs:412:32
    |
412 |                 token_output = (&token_output + &weighted_out)?;
    |                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the `?` operator cannot be applied to type `candle_core::Tensor`
    |
    = help: the trait `Try` is not implemented for `candle_core::Tensor`

error[E0277]: the `?` operator can only be applied to values that implement `Try`
   --> candle-transformers/src/models/gpt_oss.rs:492:21
    |
492 |         let x = x + residual?;
    |                     ^^^^^^^^^ the `?` operator cannot be applied to type `&candle_core::Tensor`
    |
    = help: the trait `Try` is not implemented for `&candle_core::Tensor`

error[E0277]: the `?` operator can only be applied to values that implement `Try`
   --> candle-transformers/src/models/gpt_oss.rs:610:17
    |
610 |         x = x * (self.hidden_size as f64).sqrt()?;
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the `?` operator cannot be applied to type `f64`
    |
    = help: the trait `Try` is not implemented for `f64`

Some errors have detailed explanations: E0277, E0599.
For more information about an error, try `rustc --explain E0277`.
warning: `candle-transformers` (lib) generated 1 warning
error: could not compile `candle-transformers` (lib) due to 11 previous errors; 1 warning emitted

@ivarflakstad
Copy link
Member

I wanted to try this but couldn't even compile the example, and on the surface it seems to be just syntax errors, but trying to address the syntax errors myself, it seems like fundamentally it hasn't been completely implemented? Doesn't seem to be a hardware compatibility issue either, Rust straight up cannot compile this.

The dangers of vibe coding.

Haven't tested it myself so thank you for taking initiative.

If you want to try and complete the implementation you are very welcome to do so @maximizemaxwell :)
If you feel that it would be a bit too difficult that is completely understandable as well, claude struggles because it's not a simple task ;)

@embedding-shapes
Copy link

The dangers of vibe coding.

Judging by the commits, that seems to be the case indeed :) I initially found this as I've started my own GPT-OSS implementation with Candle too and I've already ended up with way more code than this and got curious how there is so little here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants