Model Modification for Autoregressive Usage #12

RongLirr · 2025-05-04T23:06:22Z

This pull request modifies the training pipeline to support autoregressive generation of fluent sign language poses, conditioned on a whole disfluent sequence and the previously generated fluent history.

1. data/load_data.py (SignLanguagePoseDataset):

Modified _getitem_ to enable autoregressive training. Instead of returning a fixed initial segment of fluent sequence: It now randomly samples a target chunk (data, length=chunk_len) from the ground truth fluent sequence. And it extracts the corresponding ground truth fluent pose history preceding this chunk and returns it as conditions['previous_output'].
The full disfluent sequence remains as conditions['input_sequence'].
Replaced the custom global mean/std calculation with pose_anonymization.data.normalization.normalize_mean_std. Data is now normalized by calling this function on the Pose objects after loading.
Ensured the sampled target_chunk is always padded (with zeros) to the fixed chunk_len within _getitem_ if the sampled segment is shorter (e.g., at the end of a sequence or for short sequences). The corresponding target_mask is padded with True (masked).
Parameter Renaming: The fluent_frames parameter in init is now internally referred to as chunk_len to better reflect its role in the autoregressive setup.

RongLirr · 2025-05-04T23:17:06Z

2. core/models.py (SignLanguagePoseDiffusion):

Autoregressive Input: Modified the forward method signature to accept previous_output: Optional[torch.Tensor].
Updated the forward method's implementation: Encodes the previous_output tensor (using self.fluent_encoder). And concatenates the previous_output embedding with embeddings from the timestep (t), the disfluent sequence (disfluent_seq), and the noisy target chunk (fluent_clip/x) along the sequence dimension (dim=1) before feeding into the sequence_encoder.

3. core/training.py (PoseTrainingPortal):

Mask Handling for Loss: masked_l2_per_sample function
Unnormalization:
evaluate_sampling now passes the normalized numpy arrays to export_samples.
export_samples now uses the imported unnormalize_mean_std function on temporary Pose objects to unnormalize data before saving .pose files.

RongLirr · 2025-05-07T23:00:14Z

/infer_autoregressive.py
- It iteratively generates chunks of the fluent pose sequence, feeding the previously generated chunk as a condition for the next.
- Generation stops when the model predicts a "stop" signal (near-zero pose values) or a maximum length is reached.
- Saves both the generated fluent poses and the original disfluent input poses in .pose and .npy formats.
/data/load_data.py)
- In SignLanguagePoseDataset.__getitem__:
  - Modified the handling of previous_output (history_chunk). When the actual history length is zero (e.g., for the first chunk of a sequence), previous_output is now initialized as a single frame of zeros (np.zeros((1,) + ...)).
  - Reason: This change ensures that the pose_format.torch.masked.collator.zero_pad_collator works correctly. Previously, a mix of zero-length tensors (shape [0, K, D]) and very short tensors (e.g., shape [1, K, D]) for previous_output within the same batch could cause a RuntimeError during torch.stack inside the collator.
/core/training.py)
- In PoseTrainingPortal.export_samples:
  - Added a call to normalize_pose_size(unnorm_pose) immediately after unnorm_pose = unnormalize_mean_std(pose_obj).
  - Reason: The unnormalize_mean_std function reverses the Z-score normalization (mean/std). However, for valid visualization normalize_pose_size (from pose_format.utils.generic) is needed.

RongLirr · 2025-05-08T19:35:40Z

1. config/option.py

add parser.add_argument('--lambda_vel' and '--load_num'

2. training.py

Introduced a weight (lambda_vel) for the velocity loss term

AmitMY

i'm going to accept the changes, so it is easier to review future changes.
worst case, we'll clean up after.

AmitMY

i'm going to accept the changes, so it is easier to review future changes.
worst case, we'll clean up after.

RongLirr and others added 5 commits April 30, 2025 18:51

Create infer.py

85d564d

Update training.py

60c2637

new load_data.py

216a524

new models.py training.py

c8f76bd

new models.py training.py

86a3aa6

add infer_autoregressive.py

9a80e03

add args

b60d9d0

AmitMY approved these changes May 10, 2025

View reviewed changes

AmitMY merged commit db5bd90 into sign-language-processing:main May 10, 2025
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Modification for Autoregressive Usage #12

Model Modification for Autoregressive Usage #12

Uh oh!

RongLirr commented May 4, 2025

Uh oh!

RongLirr commented May 4, 2025

Uh oh!

RongLirr commented May 7, 2025

Uh oh!

RongLirr commented May 8, 2025

Uh oh!

AmitMY left a comment

Uh oh!

AmitMY left a comment

Uh oh!

Uh oh!

Uh oh!

Model Modification for Autoregressive Usage #12

Model Modification for Autoregressive Usage #12

Uh oh!

Conversation

RongLirr commented May 4, 2025

Uh oh!

RongLirr commented May 4, 2025

Uh oh!

RongLirr commented May 7, 2025

Uh oh!

RongLirr commented May 8, 2025

Uh oh!

AmitMY left a comment

Choose a reason for hiding this comment

Uh oh!

AmitMY left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!