Welcome to xlnscpp Discussions! #1
Replies: 21 comments 34 replies
-
|
I'm Arya Gupta, a pre-final year student at JIIT Noida. I'm passionate about deep learning, numerical computing, and system optimization. Excited to be part of this community and looking forward to learning from everyone! |
Beta Was this translation helpful? Give feedback.
-
|
Hello @pradeeban @markgarnold , I have completed the challenges for this project. Now, I was trying to understand the structure of ggml. It has various objects like ggml_context, which handles the context of ggml operations, and ggml_backend_buffer, which manages the buffer for ggml backend operations, and many more. These objects (not objects exactly) are created considering standard datatypes like float, char, etc. Am i thinking in correct direction ?? |
Beta Was this translation helpful? Give feedback.
-
|
I wanted to say the same but could not phrase properly. Thanks @markgarnold Here is what i plan to to :
@markgarnold could you please provide some suggestions on what should be done ... |
Beta Was this translation helpful? Give feedback.
-
|
@markgarnold I have been digging through the code base of ggml and here are some of the findings: Suppose we want to add two tensors, this is what we do
How ggml do it at backend :
We will change this function to use lns internally. But this will involve lot many conversions from float to lns for a llm model. We need to optimize this. Again, is this the correct direction of thought process ?? |
Beta Was this translation helpful? Give feedback.
-
Vectors functions / tensorsggml_vec_xl_u8x2 Quantization functionsascendc_quantize_f16_to_q4_0 These functions need to be changed. There are a lot... |
Beta Was this translation helpful? Give feedback.
-
|
@markgarnold I have completed preparing my proposal. Could you kindly review it? I will be sending it to markgarnold@yahoo.com, as I believe this is the correct email address. Please let me know if otherwise. |
Beta Was this translation helpful? Give feedback.
-
|
The GGML library includes support for various backends, such as CUDA, OpenCL, and Vulkan. For this project, do we need to implement the LNS backend (xlns32/xlns16) for all these modules, or should we focus on CPU only? |
Beta Was this translation helpful? Give feedback.
-
|
Is this project considered medium-scaled or large-scaled? What should I write in gsoc proposal ?? |
Beta Was this translation helpful? Give feedback.
-
|
I'm Ashwin from CS first year, i came to know about the project pretty late and didn't know we've to link the done code challenges in the proposal application. I didn't put any. What should I do now ? |
Beta Was this translation helpful? Give feedback.
-
|
You can upload the proposal again if you wish.
Ed
…On Tue, 8 Apr 2025, 09:49 Ash, ***@***.***> wrote:
I'm Ashwin from CS first year, i came to know about the project pretty
late and didn't know we've to link the done code challenges in the proposal
application. I didn't put any. What should I do now ?
—
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYYSO3O5XO3LTVVNB7KGYL2YOEQ7AVCNFSM6AAAAABXCVORECVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTENZWGE2TEOI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
Hello @markgarnold @echester I am excited to work on this project!! apart from the attached research papers is there anything we should go through in this regards? |
Beta Was this translation helpful? Give feedback.
-
|
I'm interested in contributing a modern CMake build system to xlnscpp. Goals:
Would this be a valuable contribution? I'm happy to discuss the Looking forward to your thoughts! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @markgarnold, @pradeeban, @EdChester — I'm Arjun. I submitted the native LNS softmax (#22) and table-backed exp/log (#23) that were merged recently, and have a few more open PRs (layer normalization, table-backed exp2/log2, weight quantization). I'm interested in the GSoC project on LNS support for LLMs and have been looking at the ggml backend architecture to understand how the integration would work. Are there specific areas you'd like contributors to focus on? |
Beta Was this translation helpful? Give feedback.
-
|
@ArjunDeshwal : a further clarification. This project was not selected by Google last year. There were proposals on how to do it, but not quite good enough. For example, see haarit19058 list of functions in this thread. I am not saying that is the correct answer, but it illustrates the extent of the changes needed to ggml to make this work. Since it was posted here (open source) you are free to use this as a starting point, but there are other issues to consider. For example, will you make a clone of ggml and modify that? Or something else? |
Beta Was this translation helpful? Give feedback.
-
|
Hii @markgarnold reading whole thread i see that the reasons of your concern about this project not getting selected last year is very understandable the key aspects most applicants are missing is treating this as any regular GSOC project where you have to contribute according to set guidelines to get selected but this one is different instead of contributing or adding new feature we have to make a new technical perspective and implement it ,not as hard as making things out of scratch but realigning the way they work at the core my current approach (sharing in case it’s useful): my perspective based on experience with large codebases like LLVM let me know what you all think, since there is a month left before deadline lets give our best |
Beta Was this translation helpful? Give feedback.
-
|
I appreciate the PRs and comments submitted (including those from @ArjunDeshwal @Ayush3941 and @naman9271 ), but it is important to understand the goal of this project: to see whether 16-bit LNS can work in llama (by modifying ggml to use xlns16). This will be a proof of concept (probably a lot slower than FP). The mentors for this project are @akrentz6 (who created xlnstorch last summer during GSoC2025) and me. (Ed is no longer able to devote time to this). @pradeeban is the Alaska org person who makes sure we comply with all GSoC rules. At most one contributor will be selected by Google to receive a stipend during the coding period. As noted, this project is a bit different than those for other orgs. This is an open-source project; sharing your code and insights with your fellow contributors is a good thing. On the other hand, only one contributor will be selected to implement the ggml/xlns16 project. This will be based on the technical quality of the proposal you submit. If there are no "good" proposals, GSoC Alaska will choose not to fund this project. Since @akrentz6 was selected last year, he may be able give some hints on how to conceive and write a good proposal. Do not use AI to write your proposal. I am willing to give feedback on your proposal (sent privately to my email) when it is nearing completion. Check with @akrentz6 if he is also willing to do this--I don't know if he has the time. I will not write your proposal for you. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @markgarnold and @akrentz6, I am Vedant from IIT Gandhinagar. I really liked xlns support for llama.cpp project idea for GSoC 2026, so I have been working on it for about a week now. I went through the whole thread and identified the core objective: to build an XLNS backend that performs matrix multiplication using xlns16 Is this correct approach? I will include about adding other functions in proposal. By referring to other backend implementations such as I tested it using a very small model, More implementation and testing details can be found here: https://github.com/Ninjacoder-vedant/llama.cpp/blob/xlns-backend/docs/backend/XLNS.md. You can clone the repo’s |
Beta Was this translation helpful? Give feedback.
-
|
Hi all, I've attached my proposal from last year's xlnstorch project for reference. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Vedant,
I do not see your proposal on the GSoC portal. The deadline is 31 March. You are allowed to submit revisions up to that time. The portal may become slow near the deadline, so it is wise to submit early.
On Wednesday, March 18, 2026 at 12:48:33 AM EDT, Vedant Acharya ***@***.***> wrote:
@markgarnold Okay, got your point! I think then it's similar to what I have implemented here: matmul_fun. According to what @akrentz6 said, it's a naive implementation in which I am converting the output to f32 every time, and then the next layer converts it back to xlns16.
Instead, what I think we can do is convert the input f32 token embeddings into xlns16 once, and then each operation after this will have at least one tensor (the activations) in xlns16. The other tensor (the weights in mat_mul) can stay in a quantized format. For that, we would need to implement dynamic dequantization functions that convert the most popular quantized formats into xlns16. Then at the output layer we can convert activations (logits) back to f32.
For the operations, these are the ones I think we would need to implement in xlns16: RoPE (Rotary Positional Encodings), RMS_NORM, ADD, SILU, MUL, MUL_MAT, and possibly a few others depending on the model we want to run. These are based on the requirements of the llama3.2-1B model. Is this the correct direction?
Another question I have:
- Llama3.2-1B supports flash attention, for which there is an op in ggml called FLASH_ATTN_EXT. It seems to be a fairly complex algorithm to implement in xlns16. I think normal attention should work fine for the GSoC PoC, which would just be Query–Key matmul followed by softmax. So, should we implement FLASH_ATTN_EXT in xlns16?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
Hi everyone, I’m Krishna, a final year student currently focusing on SLM quantization. I’ve been following this project closely because replacing floats at the compute level is exactly the kind of on-device optimization I've been researching for smaller models. I’ve already Completed the code challenges (running the xlns16/32 tests and the ggml FP32 matmul examples). I also put together a three-way comparison between FP32 and the LNS variants just to see where the precision trade-offs are. I’m looking forward to the technical discussions here, especially on picking the right vec functions to modify without messing with the core stability code Challenge Repo: https://github.com/krishnamurthi-ramesh/Gsoc-xlnscpp-CodeChallenge |
Beta Was this translation helpful? Give feedback.
-
|
I wanted to follow up on my introduction with some concrete progress. I've spent the last 24 hours focusing on the #ifdef fork strategy you described for ggml. I've implemented a proof-of-concept fork where I've patched the core vectorized functions in vec.h and vec.cpp. Specifically: ggml_vec_dot_f32: I've implemented this with a persistent xlns16_float accumulator. It performs dynamic conversion (float2xlns16_) for the inputs, executes exact LNS multiplication, and uses table-referenced addition, only converting back to float at the very end of the dot product. I believe this core-patch approach is a good way to simulate LNS for llama.cpp without the memory overhead of shadow tensors. Looking forward to your feedback. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
build together 💪.
To get started, comment below with an introduction of yourself and tell us about what you do with this community.
Beta Was this translation helpful? Give feedback.
All reactions