- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.1k
 
[muP] Rework #1087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Open
      
      
            lintangsutawika
  wants to merge
  109
  commits into
  main
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
rework-mup
  
      
      
   
  
    
  
  
  
 
  
      
    base: main
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          
  
     Open
                    [muP] Rework #1087
Changes from 58 commits
      Commits
    
    
            Show all changes
          
          
            109 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      0d921f7
              
                changed ordering for setting up norm_factor
              
              
                lintangsutawika abee54d
              
                Update NeoXArgs docs automatically
              
              
                invalid-email-address a08c3ef
              
                updated muP args to the minimum required
              
              
                lintangsutawika c35e830
              
                calculate m_width
              
              
                lintangsutawika 2807e52
              
                Merge branch 'main' of https://github.com/EleutherAI/gpt-neox into re…
              
              
                lintangsutawika 2d127df
              
                Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
              
              
                lintangsutawika 81fdc4d
              
                Update NeoXArgs docs automatically
              
              
                invalid-email-address 7d6b246
              
                changed ordering for setting up norm_factor
              
              
                lintangsutawika a0d1929
              
                updated muP args to the minimum required
              
              
                lintangsutawika d63b3b8
              
                calculate m_width
              
              
                lintangsutawika 9be82fe
              
                Update NeoXArgs docs automatically
              
              
                invalid-email-address 66214d9
              
                removed redundant line
              
              
                lintangsutawika 17b7183
              
                removed redundant lines
              
              
                lintangsutawika a6bad07
              
                Update NeoXArgs docs automatically
              
              
                invalid-email-address 63984bd
              
                removed redundant lines
              
              
                lintangsutawika 02687a8
              
                Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
              
              
                lintangsutawika 11114e2
              
                Update NeoXArgs docs automatically
              
              
                invalid-email-address 05c4de3
              
                modify init with mup
              
              
                lintangsutawika 71a91e4
              
                divide logits by the m_width
              
              
                lintangsutawika 99c8ce0
              
                moved position of mup parameters being processed
              
              
                lintangsutawika b253ab6
              
                add note
              
              
                lintangsutawika 1919499
              
                made param groups to hold flag for mup scaling
              
              
                lintangsutawika 17678e0
              
                lr scale
              
              
                lintangsutawika 2bd5ae6
              
                update config
              
              
                lintangsutawika 6642291
              
                adjust process of mup variables
              
              
                lintangsutawika 8be6c66
              
                remove calling save_base_shapes
              
              
                lintangsutawika c9fb18b
              
                lr adjustments is done in train_step to address lr being reset due to…
              
              
                lintangsutawika 795371c
              
                lr scaling for mup is moved here instead
              
              
                lintangsutawika 087beee
              
                removed mup usage for coord check
              
              
                lintangsutawika 16d04b1
              
                merged with main
              
              
                lintangsutawika e7b7bf6
              
                latest update on coord check implementation
              
              
                lintangsutawika 8dea9ce
              
                fix merge conflict
              
              
                lintangsutawika 3664eba
              
                changed `mup_m_width` to `mup_width_multiplier`
              
              
                lintangsutawika 6a46247
              
                fixed notations
              
              
                lintangsutawika 7439f9a
              
                correct scale
              
              
                lintangsutawika 5b2d31c
              
                m_emb * embed(X)
              
              
                lintangsutawika 98caa82
              
                removed mup rescale in the layers
              
              
                lintangsutawika 5c99637
              
                removed mup rescale in the layers
              
              
                lintangsutawika a636f06
              
                adjust mup_m_emb to mup_embedding_multiplier
              
              
                lintangsutawika 39190c5
              
                add multiplier mup_output_multiplier
              
              
                lintangsutawika 2489cc0
              
                reorder model loading
              
              
                lintangsutawika 23b8776
              
                removed comments
              
              
                lintangsutawika 10e935e
              
                removed comments
              
              
                lintangsutawika a0aca99
              
                implement full process
              
              
                lintangsutawika 9472b35
              
                set neox_args.iteration to 0 for coord_check mode
              
              
                lintangsutawika 5c5f2df
              
                move mup_width_multiplier init
              
              
                lintangsutawika 7eca3e7
              
                mup_coord_check returns 2 df
              
              
                lintangsutawika c9a3a65
              
                can run
              
              
                lintangsutawika a7877d4
              
                remove commehts
              
              
                lintangsutawika bd9d399
              
                add hooks
              
              
                lintangsutawika fe180d3
              
                remove comments
              
              
                lintangsutawika b240c19
              
                uncomment activation data
              
              
                lintangsutawika 93b4241
              
                plot coords
              
              
                lintangsutawika d4899fc
              
                removed variables, add way to plot only from rank 0
              
              
                lintangsutawika f589e29
              
                changed key name in dict
              
              
                lintangsutawika 8261e0d
              
                remove print
              
              
                lintangsutawika 25aa786
              
                fix how width_multiplier is applied
              
              
                lintangsutawika 4d246a1
              
                updated plot config
              
              
                lintangsutawika 84c5380
              
                update files
              
              
                lintangsutawika b2f1101
              
                Merge branch 'main' into rework-mup
              
              
                lintangsutawika 42d4cde
              
                Update NeoXArgs docs automatically
              
              
                invalid-email-address 4c477d5
              
                init function, add input embedding different initialization
              
              
                lintangsutawika 64dc4c5
              
                Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
              
              
                lintangsutawika 65c103e
              
                changeoutput layer to normal
              
              
                lintangsutawika 08b5d40
              
                change from mean to std
              
              
                lintangsutawika 2ca94a8
              
                double attention head for every hidden size doubled
              
              
                lintangsutawika 7483246
              
                Merge branch 'main' into rework-mup
              
              
                lintangsutawika 497485c
              
                Update NeoXArgs docs automatically
              
              
                invalid-email-address 34fb7ca
              
                added args
              
              
                lintangsutawika 2d53f1f
              
                simplify coordcheck
              
              
                lintangsutawika 7897610
              
                seperate sp and mup configs
              
              
                lintangsutawika 4f39209
              
                perform coordcheck for sp and mup seperately
              
              
                lintangsutawika 5f84a3f
              
                Update NeoXArgs docs automatically
              
              
                invalid-email-address 479b854
              
                update
              
              
                lintangsutawika 21a7e32
              
                update how params are sorted
              
              
                lintangsutawika bb2e0c9
              
                remove unused comments
              
              
                lintangsutawika bf1ce06
              
                adjust
              
              
                lintangsutawika 50a3dba
              
                simplify
              
              
                lintangsutawika c4c1660
              
                fix mup embedding multiplier
              
              
                lintangsutawika 1c35911
              
                embeddingpipe fix init
              
              
                lintangsutawika 84be4d4
              
                changed how manual seed is loaded
              
              
                lintangsutawika fbb4daf
              
                removed musgd and other changces
              
              
                lintangsutawika fa142ff
              
                update config
              
              
                lintangsutawika ad2336f
              
                fixed how params are sorted
              
              
                lintangsutawika fe73bc3
              
                update how seed is computed
              
              
                lintangsutawika a3bd44c
              
                update to follow pre-commit format
              
              
                lintangsutawika 56b6c9b
              
                update from main
              
              
                lintangsutawika 2365fd5
              
                update
              
              
                lintangsutawika e8639a0
              
                Update NeoXArgs docs automatically
              
              
                invalid-email-address 47e1438
              
                fix lr weighting
              
              
                lintangsutawika a064f9b
              
                hard set to 1.0 if neox_args.use_mup is false
              
              
                lintangsutawika b0da27a
              
                Merge branch 'main' into rework-mup
              
              
                Quentin-Anthony 6fe55f4
              
                Update NeoXArgs docs automatically
              
              
                invalid-email-address 8bf8bcd
              
                add new parameters
              
              
                lintangsutawika 7f0b033
              
                add parameter checks
              
              
                lintangsutawika f802869
              
                updates to argument processing for mup
              
              
                lintangsutawika cc71104
              
                add data save and descriptions being printed
              
              
                lintangsutawika c8feb39
              
                update mup
              
              
                lintangsutawika b6b3a02
              
                update seed
              
              
                lintangsutawika 847e892
              
                remove print text
              
              
                lintangsutawika 1b0027c
              
                fixed kv
              
              
                lintangsutawika 055596f
              
                update
              
              
                lintangsutawika fabb45b
              
                update dewcriptions being printed
              
              
                lintangsutawika 5ccf693
              
                removed unused lines
              
              
                lintangsutawika 9dd583b
              
                Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
              
              
                lintangsutawika 6a8ad71
              
                Merge branch 'main' into rework-mup
              
              
                lintangsutawika 485cad4
              
                Update NeoXArgs docs automatically
              
              
                invalid-email-address c291906
              
                Merge branch 'main' into rework-mup
              
              
                Quentin-Anthony 1ac9add
              
                Merge branch 'main' into rework-mup
              
              
                Quentin-Anthony File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During our call, we noted that this leads to a bug: Since the width multiplier is applied to all layers, this doesn't allow the embedding layer to be initialized differently from the transformer backbone layers (precisely: muP prescribes that layers who's input and output dimensions both scale with width need to have a sqrt(width) multiplying factor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would recommend refactoring this code: Remove the muP width multiplier completely from the initialization methods code, and only take the initializer parameters in here (e.g., standard deviation). Then, when the initializers are used from various layers, adjust the initializer based on that particular layer's muP width adjustment requirements.