awslabs
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 34 additions & 16 deletions b/‎CONTRIBUTING.md‎
Lines changed: 34 additions & 16 deletions
diff --git a/‎keys_values/data/evaluation.py‎
Lines changed: 0 additions & 5 deletions b/‎keys_values/data/evaluation.py‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎keys_values/data/longbench_v2.py‎
Lines changed: 0 additions & 7 deletions b/‎keys_values/data/longbench_v2.py‎
Lines changed: 0 additions & 7 deletions
diff --git a/‎keys_values/data/sequence_classification.py‎
Lines changed: 0 additions & 1 deletion b/‎keys_values/data/sequence_classification.py‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎keys_values/finetune/args.py‎
Lines changed: 2 additions & 2 deletions b/‎keys_values/finetune/args.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎keys_values/finetune/longcon_offload_full.py‎
Lines changed: 4 additions & 3 deletions b/‎keys_values/finetune/longcon_offload_full.py‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎keys_values/finetune/longcon_offload_lora.py‎
Lines changed: 5 additions & 4 deletions b/‎keys_values/finetune/longcon_offload_lora.py‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎keys_values/finetune/longcontext_eval.py‎
Lines changed: 0 additions & 1 deletion b/‎keys_values/finetune/longcontext_eval.py‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎keys_values/finetune/longcontext_full.py‎
Lines changed: 31 additions & 9 deletions b/‎keys_values/finetune/longcontext_full.py‎
Lines changed: 31 additions & 9 deletions
@@ -1,18 +1,22 @@
 # Contributing Guidelines
 
-Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
-documentation, we greatly value feedback and contributions from our community.
+Thank you for your interest in contributing to our project. Whether it's a bug
+report, a new feature, an integration into existing long context libraries, or
+additional documentation, we greatly value feedback and contributions from our
+community.
 
-Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
-information to effectively respond to your bug report or contribution.
+Please read through this document before submitting any issues or pull requests
+to ensure we have all the necessary information to effectively respond to your
+bug report or contribution.
 
 
 ## Reporting Bugs/Feature Requests
 
 We welcome you to use the GitHub issue tracker to report bugs or suggest features.
 
-When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
-reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
+When filing an issue, please check existing open, or recently closed, issues to
+make sure somebody else hasn't already reported the issue. Please try to include
+as much information as you can. Details like these are incredibly useful:
 
 * A reproducible test case or series of steps
 * The version of our code being used
@@ -21,27 +25,38 @@ reported the issue. Please try to include as much information as you can. Detail
 
 
 ## Contributing via Pull Requests
-Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
+Contributions via pull requests are much appreciated. Before sending us a pull
+request, please ensure that:
 
 1. You are working against the latest source on the *main* branch.
-2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
-3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
+2. You check existing open, and recently merged, pull requests to make sure
+   someone else hasn't addressed the problem already. If the pull request is open,
+   feel free to add a comment to it, expressing your interest.
+3. You open an issue to discuss any significant work - we would hate for your
+   time to be wasted.
 
 To send us a pull request, please:
 
 1. Fork the repository.
-2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
-3. Ensure local tests pass.
+2. Modify the source; please focus on the specific change you are contributing.
+   If you also reformat all the code, it will be hard for us to focus on your
+   change.
+3. Ensure that all local tests pass.
 4. Commit to your fork using clear commit messages.
-5. Send us a pull request, answering any default questions in the pull request interface.
-6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
+5. Send us a pull request, answering any default questions in the pull request
+   interface.
+6. Pay attention to any automated CI failures reported in the pull request, and
+   stay involved in the conversation.
 
 GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
 [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
 
 
 ## Finding contributions to work on
-Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
+Looking at the existing issues is a great way to find something to contribute
+on. As our projects, by default, use the default GitHub issue labels
+(enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at
+any 'help wanted' issues is a great place to start.
 
 
 ## Code of Conduct
@@ -51,9 +66,12 @@ opensource-codeofconduct@amazon.com with any additional questions or comments.
 
 
 ## Security issue notifications
-If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
+If you discover a potential security issue in this project we ask that you notify
+AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/).
+Please do **not** create a public github issue.
 
 
 ## Licensing
 
-See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
+See the [LICENSE](./LICENSE) file for our project's licensing. We will ask you
+to confirm the licensing of your contribution.
@@ -233,11 +233,6 @@ def _wrapped_collate_fn(
     task = next(iter(tasks))
     orig_collated_samples = orig_collate_fn(samples)
     orig_idxs = [elem[ORIG_IDX_NAME] for elem in samples]
-    # DEBUG
-    #print(f"*** evaluation._wrapped_collate_fn: {orig_idxs} ({task})")
-    #offset = samples[0]["prefix_len"]
-    #print(orig_collated_samples["input_ids"][:, offset:(offset + 15)])
-    # END DEBUG
     return {
         **orig_collated_samples,
         ORIG_IDX_NAME: orig_idxs,
 
@@ -648,13 +648,6 @@ def filter_and_transform(
         min_length = test_results[0]["num_tokens_instruction"]
         max_length = test_results[-1]["num_tokens_instruction"]
         print(f"Test dataset has {len(test_results)} records, token lengths between {min_length} and {max_length}")
-        # DEBUG
-        #prefix_len = len(tokenizer.encode("\n".join(PROMPTLINES_PREFIX) + "\n"))
-        #test_results = [
-        #    dict(entry, prefix_len=prefix_len)
-        #    for entry in test_results
-        #]
-        # END DEBUG
     else:
         test_results = None
     if seq_lengths is None:
 
@@ -102,7 +102,6 @@ def __getitem__(self, idx: int) -> Dict[str, Union[Tensor, Dict[str, Any]]]:
         return {
             INPUT_IDS_NAME: encoded_prompt,
             LABELS_NAME: label_idx,
-            #"prefix_len": example["prefix_len"],  # DEBUG!!
             "token_counts": token_counts,
         }
 
 
@@ -64,8 +64,8 @@ class KVCacheArgs:
             dtype. The default is delayed allocation with first usage
 
     """
-    name: str  # TODO: Different per layer
-    cache_length: int  # TODO: Different per layer
+    name: str
+    cache_length: int
     chunk_size: int = 16
     cache_kwargs: Optional[Dict[str, Any]] = None
     randomize_chunk_sizes: bool = False
 
@@ -73,6 +73,7 @@
     print_with_rank_and_timestamp,
     print_message,
     check_kv_cache,
+    adapt_requires_grad,
 )
 from keys_values.gpu_memory import RecordGPUMemory
 from keys_values.head_model import CrossEntropyOnLogits
@@ -210,7 +211,7 @@ def setup(
             - 1: Only record gradient computations (after initial forward). For
                 each update, we store one snapshot file per row of cells being
                 processed.
-            - 2: Special case (DEBUG)
+            - 2: Special case
             - 3: One snapshot file during initial validation
             Defaults to 0.
         record_gpu_memory_period: Only if `record_gpu_memory_snapshots` is used.
@@ -278,7 +279,7 @@ def setup(
     config = Config.from_file(checkpoint_dir / "model_config.yaml")
 
     precision = precision or get_default_supported_precision(training=True)
-    # TODO: Currently not used!
+    # Currently not used:
     logger = choose_logger(
         logger_name,
         out_dir,
@@ -409,6 +410,7 @@ def main(
             **head_model_kwargs,
         )
     gpt_model = gpt_model.to(optim_device)
+    adapt_requires_grad(gpt_model, head_model)
     batch_size = train.micro_batch_size
     if eval.micro_batch_size is not None:
         batch_size = max(batch_size, eval.micro_batch_size)
@@ -782,7 +784,6 @@ def fit(
                 )
             else:
                 generate_example_kwargs = None
-            # TODO: Fix bug in generation!
             valid_model = model.copy_model_for_evaluation()
             metrics = validate_and_all_reduce(
                 model=valid_model,
 
@@ -79,6 +79,7 @@
     print_with_rank_and_timestamp,
     print_message,
     check_kv_cache,
+    adapt_requires_grad,
 )
 from keys_values.head_model import CrossEntropyOnLogits
 from keys_values.head_model_factory import HeadModelFactory
@@ -230,7 +231,7 @@ def setup(
             - 1: Only record gradient computations (after initial forward). For
                 each update, we store one snapshot file per row of cells being
                 processed.
-            - 2: Special case (DEBUG)
+            - 2: Special case
             - 3: One snapshot file during initial validation
             Defaults to 0.
         record_gpu_memory_period: Only if `record_gpu_memory_snapshots` is used.
@@ -309,7 +310,7 @@ def setup(
     )
 
     precision = precision or get_default_supported_precision(training=True)
-    # TODO: Currently not used!
+    # Currently not used:
     logger = choose_logger(
         logger_name,
         out_dir,
@@ -440,6 +441,8 @@ def main(
             **head_model_kwargs,
         )
     gpt_model = gpt_model.to(optim_device)
+    mark_only_lora_as_trainable(gpt_model)
+    adapt_requires_grad(gpt_model, head_model)
     batch_size = train.micro_batch_size
     if eval.micro_batch_size is not None:
         batch_size = max(batch_size, eval.micro_batch_size)
@@ -456,7 +459,6 @@ def main(
         profile_parts=profile_parts,
         cpu_offload_device=device,
     )
-    mark_only_lora_as_trainable(model.gpt_model)
 
     num_trainable_params = num_parameters(model, requires_grad=True)
     print_message(f"\nNumber of trainable parameters: {num_trainable_params:,}")
@@ -820,7 +822,6 @@ def fit(
                 )
             else:
                 generate_example_kwargs = None
-            # TODO: Fix bug in generation!
             valid_model = model.copy_model_for_evaluation()
             metrics = validate_and_all_reduce(
                 model=valid_model,
 
@@ -220,7 +220,6 @@ def main(
         eos_id=tokenizer.eos_id,
         ignore_index=ignore_index,
     )
-    print(f"\ntokenizer.eos_id = {tokenizer.eos_id}\n")  # DEBUG!
 
     fabric.seed_everything(seed)  # same seed for every process to init model (FSDP)
 
 
@@ -32,7 +32,6 @@
 from keys_values.utils import flush_io_streams
 from litgpt.args import TrainArgs
 from litgpt.data import DataModule
-from litgpt.generate.base import generate
 from litgpt.config import Config
 from litgpt.prompts import save_prompt_style
 from litgpt.tokenizer import Tokenizer
@@ -80,6 +79,7 @@
     print_message,
     check_kv_cache,
 )
+from keys_values.generate.base import generate
 from keys_values.gpu_memory import RecordGPUMemory
 from keys_values.head_model import HeadModel, CrossEntropyOnLogits
 from keys_values.head_model_factory import HeadModelFactory
@@ -169,6 +169,7 @@ def setup(
     record_gpu_memory_snapshots: Optional[int] = None,
     record_gpu_memory_kind: int = 0,
     record_gpu_memory_period: int = 0,
+    generate_with_eval: bool = False,
     profile_grad_times: int = 0,
     profile_parts: Optional[str] = None,
 ) -> None:
@@ -365,6 +366,7 @@ def setup(
         record_gpu_memory_snapshots=record_gpu_memory_snapshots,
         record_gpu_memory_kind=record_gpu_memory_kind,
         record_gpu_memory_period=record_gpu_memory_period,
+        generate_with_eval=generate_with_eval,
         profile_grad_times=profile_grad_times,
         profile_parts=profile_parts,
     )
@@ -394,6 +396,7 @@ def main(
     record_gpu_memory_snapshots: Optional[RecordGPUMemory],
     record_gpu_memory_kind: int,
     record_gpu_memory_period: int,
+    generate_with_eval: bool,
     profile_grad_times: int,
     profile_parts: Optional[str],
 ) -> None:
@@ -540,6 +543,7 @@ def main(
         record_gpu_memory_snapshots=record_gpu_memory_snapshots,
         record_gpu_memory_kind=record_gpu_memory_kind,
         record_gpu_memory_period=record_gpu_memory_period,
+        generate_with_eval=generate_with_eval,
         profile_grad_params=profile_grad_params,
     )
     training_time = time.perf_counter() - train_time
@@ -550,13 +554,21 @@ def main(
     if eval.final_validation:
         print_with_rank_and_timestamp("Starting validation evaluations.", fabric.global_rank)
         print_message("\nFinal validation evaluation ...", fabric)
+        if generate_with_eval:
+            generate_example_kwargs = dict(
+                tokenizer=tokenizer,
+                data=data,
+            )
+        else:
+            generate_example_kwargs = None
         metrics = validate_and_all_reduce(
             model=model,
             val_dataloader=val_dataloader,
             eval=dataclasses.replace(eval, max_iters=len(val_dataloader)),
             batch_transform=batch_transform,
             log_metrics=False,
             fabric=fabric,
+            generate_example_kwargs=generate_example_kwargs,
         )
         fabric.log_dict(metrics, step=state["iter_num"])
         print_message(
@@ -576,7 +588,6 @@ def main(
             save_prompt_style(data.prompt_style, save_dir)
 
 
-# TODO: Support caches of different lengths, maybe even different types
 def wrap_gpt_model(
     gpt_model: GPT,
     head_model: HeadModel,
@@ -723,6 +734,7 @@ def fit(
     record_gpu_memory_snapshots: Optional[RecordGPUMemory],
     record_gpu_memory_kind: int,
     record_gpu_memory_period: int,
+    generate_with_eval: bool,
     profile_grad_params: Optional[Dict[str, Any]],
 ) -> Dict[str, Any]:
     model = state["model"]
@@ -740,12 +752,20 @@ def fit(
     if eval.initial_validation:
         print_with_rank_and_timestamp("Starting validation evaluations.", fabric.global_rank)
         print_message("\nInitial validation evaluation ...", fabric)
+        if generate_with_eval:
+            generate_example_kwargs = dict(
+                tokenizer=tokenizer,
+                data=data,
+            )
+        else:
+            generate_example_kwargs = None
         metrics = validate_and_all_reduce(
             model=model,
             val_dataloader=val_dataloader,
             eval=dataclasses.replace(eval, max_iters=len(val_dataloader)),
             batch_transform=batch_transform,
             fabric=fabric,
+            generate_example_kwargs=generate_example_kwargs,
         )
         val_loss = f"{metrics['val_loss']:.3f}"
         print_message(
@@ -905,17 +925,19 @@ def fit(
         if not is_accumulating and state["step_count"] % eval.interval == 0:
             print_with_rank_and_timestamp("Starting validation evaluations.", fabric.global_rank)
             print_message("\nPeriodic validation evaluation ...", fabric)
-            generate_example_kwargs = dict(
-                tokenizer=tokenizer,
-                data=data,
-            )
-            # TODO: Fix bug in generation!
+            if generate_with_eval:
+                generate_example_kwargs = dict(
+                    tokenizer=tokenizer,
+                    data=data,
+                )
+            else:
+                generate_example_kwargs = None
             metrics = validate_and_all_reduce(
                 model=model,
                 val_dataloader=val_dataloader,
                 eval=eval,
                 batch_transform=batch_transform,
-                # generate_example_kwargs=generate_example_kwargs,
+                generate_example_kwargs=generate_example_kwargs,
                 log_metrics=False,
                 fabric=fabric,
             )
@@ -1041,7 +1063,7 @@ def generate_example(
 
     if max_returned_tokens < gpt_model.max_seq_length:
         output = generate(
-            model=gpt_model,
+            model=model,
             prompt=encoded,
             max_returned_tokens=max_returned_tokens,
             temperature=0.8,
Original file line number	Diff line number	Diff line change
`@@ -102,7 +102,6 @@ def __getitem__(self, idx: int) -> Dict[str, Union[Tensor, Dict[str, Any]]]:`
`102`	`102`	`return {`
`103`	`103`	`INPUT_IDS_NAME: encoded_prompt,`
`104`	`104`	`LABELS_NAME: label_idx,`
`105`		`- #"prefix_len": example["prefix_len"], # DEBUG!!`
`106`	`105`	`"token_counts": token_counts,`
`107`	`106`	`}`
`108`	`107`
Original file line number	Diff line number	Diff line change
`@@ -220,7 +220,6 @@ def main(`
`220`	`220`	`eos_id=tokenizer.eos_id,`
`221`	`221`	`ignore_index=ignore_index,`
`222`	`222`	`)`
`223`		`- print(f"\ntokenizer.eos_id = {tokenizer.eos_id}\n") # DEBUG!`
`224`	`223`
`225`	`224`	`fabric.seed_everything(seed) # same seed for every process to init model (FSDP)`
`226`	`225`