Skip to content

Optimizing #570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Optimizing #570

wants to merge 4 commits into from

Conversation

GabrielCaetanoo
Copy link

Changes:

init_distributed function: Extracted the distributed setup logic into a separate function.
sample function: Modified it to use torch.multinomial instead of an exponentiation-based approach for sampling.
Argument Validation: Replaced the assert with a more user-friendly validation in main to ensure that at least one of the parameters (input-file or interactive) is provided.
Interactive Code Refactoring: The user interaction logic was kept, but the init_distributed function is now called separately at the beginning of main.

Refactored init_distributed function: Extracted distributed setup logic into a separate function.
Updated sample function: Replaced exponential approach with torch.multinomial for sampling.
Improved argument validation: Replaced assert with a more user-friendly validation in main to ensure at least one parameter (input-file or interactive) is provided.
Refactored interactive mode logic: Maintained user interaction logic but moved init_distributed call to the beginning of main.

Changes:

init_distributed function: Extracted the distributed setup logic into a separate function.
sample function: Modified it to use torch.multinomial instead of an exponentiation-based approach for sampling.
Argument Validation: Replaced the assert with a more user-friendly validation in main to ensure that at least one of the parameters (input-file or interactive) is provided.
Interactive Code Refactoring: The user interaction logic was kept, but the init_distributed function is now called separately at the beginning of main.
Here are the improvements made to the code for your commit message:

Refactored init_distributed function: Extracted distributed setup logic into a separate function.
Updated sample function: Replaced exponential approach with torch.multinomial for sampling.
Improved argument validation: Replaced assert with a more user-friendly validation in main to ensure at least one parameter (input-file or interactive) is provided.
Refactored interactive mode logic: Maintained user interaction logic but moved init_distributed call to the beginning of main.
@Magos-Technicus
Copy link

uh oh, a Brazilian? bem vindo amigo, e bora lá


# Validate input
if not (args.input_file or args.interactive):
print("Erro: É necessário especificar --input-file ou ativar --interactive.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new error message here are in Portuguese. Is this intended?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am Brazilian and i get that one language in a code is better, although, i would preffer to have some instructions in portuguese-Brazilian

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One language in code is essential, no more than one language. If we have two then most would probably be chinese.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so... Chinese is okay but Portuguese-Brazilian is not okay?
i don't get why people can't have multiple instructions in code, its not like there would come with any downside if the code and instructions are well structured

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no chinese, even though all of the main devs are chinese. There should only be english comments.

@a-holm
Copy link

a-holm commented Apr 4, 2025

So much of the comments are in Portuguese (which is non-standard), the PR also includes significant changes to inference/kernel.py that are not mentioned in the description:

  • Removal of act_quant / act_quant_kernel.
  • Rewriting of weight_dequant_kernel and fp8_gemm_kernel.
  • Removal of Triton autotuning (@triton.autotune).
  • Changes to the signatures and apparent logic of weight_dequant (renamed dequantize_weights) and fp8_gemm (e.g., scaling factors removed from fp8_gemm signature).

- Improved readability and structure of Triton kernels for FP8 weight dequantization and matrix multiplication (GEMM)
- Added comments for clarity
- Replaced hardcoded block sizes with configurable parameters
- Improved safety using tl.cdiv and masking
- Renamed variables and ensured consistency in naming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants