- 
                Notifications
    
You must be signed in to change notification settings  - Fork 140
 
Draft: Show how we can build libcuvs_c.tar.gz in CI #1438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Draft: Show how we can build libcuvs_c.tar.gz in CI #1438
Conversation
| 
           Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here.  | 
    
| 
           /okay to test  | 
    
    
      
        3 similar comments
      
    
  
    | 
           /okay to test  | 
    
| 
           /okay to test  | 
    
| 
           /okay to test  | 
    
d9e8b30    to
    227ee79      
    Compare
  
    | 
           /okay to test  | 
    
| 
           /okay to test  | 
    
| 
           /okay to test  | 
    
    
      
        1 similar comment
      
    
  
    | 
           /okay to test  | 
    
b7803b2    to
    58554d6      
    Compare
  
    | 
           /okay to test  | 
    
| 
           /okay to test  | 
    
| 
           /okay to test  | 
    
| 
           /okay to test  | 
    
| 
           /okay to test  | 
    
Supports rollout of new branching strategy. https://docs.rapids.ai/notices/rsn0047/ xref: rapidsai/build-planning#224 Authors: - Bradley Dice (https://github.com/bdice) - Nate Rock (https://github.com/rockhowse) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Nate Rock (https://github.com/rockhowse) URL: rapidsai#1439
Contributes to rapidsai/build-planning#224 ## Notes for Reviewers This is safe to admin-merge because the change is a no-op... configs on those 2 branches are identical. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Nate Rock (https://github.com/rockhowse) URL: rapidsai#1444
…tion (rapidsai#1354) This PR adds support for AVQ loss/Noise shaping to the BFloat16 dataset quantization. AVQ loss is a modified version of L2 loss which separately penalizes the components of the residual vector which are parallel and perpendicular to the original vector. Quantizing vectors with AVQ loss rather than L2 loss gives a better approximation of the inner product, and thus performs better in Maximal Innter Product Search (https://arxiv.org/abs/1908.10396). Math: x : original vector x_q : quantized vector r = x - x_q : residual vector r_para = < r , x > x / || x ||^2 : parallel component of the residual r_perp = r - r_para : perpendicular compoent of the residual eta >= 1 : AVQ parameter AVQ loss = eta * || r_para ||^2 + || r_perp ||^2 For a float vector x, the goal is to find a bfloat16 vector x_q which minimizes the AVQ loss for a given eta. Unlike L2, AVQ loss is not separable (e.g. ||r_para||^2 contains cross terms from the inner product), so we cannot optimize individual dimensions in parallel and expect convergence. Instead, we use coordinate descent to optimize dimensions of x_q one at a time, until convergence. This coordinate descent happens in the new kernel "quantize_bfloat16_noise_shaped_kernel". For efficient memory accesses and compute, one warp is assigned to optimize each dataset vector. The computation of avq loss is algebraically separated into two pieces: those which can be computed in parallel (i.e. those only depending on local information for the assigned dimension) and those which require global information (namely those depending on < r , x >). Finally threads in a warp serialize to compute the final cost for their dimension, update the quantized value and value of < r , x > (if applicable), and broadcast the updated value of < r, x > for other threads. This continues in blocks of 32 dimensions, until convergence (or a maximum of 10 iterations). I've found this strategy does a good job taking advantage of the inherently row_major structure of the dataset/index for efficient coalesced accesses, while still making good use of compute resources (hitting >90% compute throughput on an A6000). Besides the coordinate descent kernel, this PR adds some helper functions for the above, refactors the existing bfloat16 to take advantage of them, and adds configuration for the AVQ eta (code uses normal bfloat16 quantization when avq threshold is NaN). Authors: - https://github.com/rmaschal - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Karsin (https://github.com/bkarsin) - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1354
Use javascript to not display duplicates in doxygen doc Authors: - Micka (https://github.com/lowener) - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1427
| 
           /okay to test  | 
    
| 
           /okay to test  | 
    
7c04bcb    to
    59c3a68      
    Compare
  
    | 
           /okay to test  | 
    
59c3a68    to
    3c46a2d      
    Compare
  
    | 
           /okay to test  | 
    
| 
           /okay to test  | 
    
baed47c    to
    8104ad4      
    Compare
  
    | 
           /okay to test  | 
    
| 
           /okay to test  | 
    
| 
           /okay to test  | 
    
| 
           /okay to test  | 
    
| 
           /okay to test  | 
    
No description provided.