Skip to content

Add torch-compatible transformations#246

Open
Tarun-goswamii wants to merge 5 commits into
gc-os-ai:mainfrom
Tarun-goswamii:pytorch-string-transforms
Open

Add torch-compatible transformations#246
Tarun-goswamii wants to merge 5 commits into
gc-os-ai:mainfrom
Tarun-goswamii:pytorch-string-transforms

Conversation

@Tarun-goswamii

Copy link
Copy Markdown
Contributor

Closes #171

Added trafos.torch module with transforms for DataLoaders.

Includes str->str (Reverse, DNAtoRNA), str->tensor (GreedyEncode), and tensor->tensor (RandomMask) transforms. Also added get_torch_transform() to sklearn base class per discussion in #170.

@Tarun-goswamii Tarun-goswamii force-pushed the pytorch-string-transforms branch from 55b9f7b to 0e980af Compare March 8, 2026 07:03
@Tarun-goswamii Tarun-goswamii force-pushed the pytorch-string-transforms branch from f1c587a to 0e980af Compare March 17, 2026 05:53
@siddharth7113 siddharth7113 requested a review from fkiraly March 17, 2026 10:39

@fkiraly fkiraly left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this design would be used in practice? E.g., by providing a basic vignette on how you expect users to interact with the new classes?

@Tarun-goswamii Tarun-goswamii force-pushed the pytorch-string-transforms branch from 669edde to 0e980af Compare March 21, 2026 05:28
@Tarun-goswamii

Copy link
Copy Markdown
Contributor Author

@fkiraly plz sir review it .

@siddharth7113

Copy link
Copy Markdown
Collaborator

Hi @Tarun-goswamii , I think @fkiraly meant to give an example in conversation itself and not put it in code right now, I would suggest to revert the commit and post here in the conversation instead first and confirm with him.

@Tarun-goswamii

Tarun-goswamii commented Mar 29, 2026

Copy link
Copy Markdown
Contributor Author

Practical Design & Usage

Dear @fkiraly.
Here's the actual workflow this enables:

Real-world pipeline

Direct chaining example:

from pyaptamer.trafos.torch import GreedyEncode, RandomMask, DNAtoRNA, Reverse

# String transforms can be chained
dna_sequence = "ACGTACGT"
rna_transform = DNAtoRNA()
reverse_transform = Reverse()

rna_seq = rna_transform(dna_sequence)  # "ACGUACGU"
reversed_seq = reverse_transform(rna_seq)  # "UGCAUGCA"

# Encoding to tensor
vocab = {"A": 1, "C": 2, "G": 3, "U": 4}
encoder = GreedyEncode(vocab=vocab, max_len=16)
encoded = encoder(rna_seq)
# Result: tensor([1, 2, 3, 4, 1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0])

# Apply augmentation
masker = RandomMask(mask_idx=99, mask_rate=0.15)
masked = masker(encoded)
# Some non-padded positions are randomly replaced with 99

@fkiraly @siddharth7113 plz review it

Comment thread pyaptamer/trafos/torch/_string.py Outdated
from pyaptamer.trafos.torch._base import BaseTorchTransform


class Reverse(BaseTorchTransform):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have a BaseTorchTransform? My understanding is if a transform is using torch underneath we need a torch transformation, see @fkiraly comment's here

@siddharth7113

Copy link
Copy Markdown
Collaborator

Also please ensure code quality checks are passing in CI, and pre-commit is installed

@siddharth7113 siddharth7113 marked this pull request as draft April 6, 2026 17:55
String transforms (Reverse, DNAtoRNA) only operate on Python strings
and don't use torch, so they shouldn't inherit from BaseTorchTransform.

This separates the architecture properly - only transforms that
actually use torch (GreedyEncode, RandomMask) inherit from the base.

Also add __repr__ methods for consistency with other transforms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Tarun-goswamii Tarun-goswamii marked this pull request as ready for review April 7, 2026 06:13
Comment thread pyaptamer/trafos/torch/_string.py Outdated
@@ -0,0 +1,21 @@
"""String-to-string transformations."""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this still doesn't make sense, why have a _string in torch ? when there are no tensor operations involved.

Comment thread pyaptamer/trafos/__init__.py Outdated
@@ -1 +1 @@
"""Transformations."""
"""Transformations module."""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add module?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank u , i will correct it!

String transforms (Reverse, DNAtoRNA) don't use torch operations,
so they shouldn't be in the torch module. This was architecturally
confusing and didn't make sense.

- Removed pyaptamer/trafos/torch/_string.py
- Removed DNAtoRNA and Reverse from torch module exports
- Removed related tests from test_torch_transforms.py
- Reverted unnecessary docstring change in trafos/__init__.py

Only kept torch-specific transforms: GreedyEncode and RandomMask.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@siddharth7113 siddharth7113 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Tarun-goswamii ,

There are some design decision that would be needed for this PR, we can wait for @fkiraly response for some time then raise in discord if needed

X, _ = self._check_X_y(X, None)
return X

def get_torch_transform(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning and use of the function that is being added here? Please explain, since it's just returning None?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @siddharth7113 The get_torch_transform() is designed as an extension point in the sklearn base class.

By default it returns None, meaning "this transform doesn't have a torch equivalent."

Subclasses that have torch-compatible versions can override this to return them. For example, the sklearn GreedyEncode encoder overrides this method to return the torch GreedyEncode transform with the same parameters.

This allows flexibility - not every sklearn transform needs a torch version, but those that do can provide it through this interface.

from torch import Tensor


class BaseTorchTransform(ABC):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @fkiraly , This is the proposed design for the torch base class, but the currently it only puts call and repr, do we have something similar in sktime to borrow for torch related transformers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PyTorch: string transformations

4 participants