Skip to content

Conversation

@sarahyurick
Copy link
Contributor

@sarahyurick sarahyurick commented Nov 19, 2024

TODO:

  • Exact deduplication files
  • Semantic deduplication files
  • Fuzzy deduplication files
  • Tutorials folder

@sarahyurick sarahyurick changed the title Global cache variable for exact, fuzzy, and semantic deduplication Global cache_dir variable for exact, fuzzy, and semantic deduplication Nov 19, 2024
sarahyurick and others added 6 commits November 19, 2024 16:13
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added the gpuci Run GPU CI/CD on PR label Nov 20, 2024
@sarahyurick sarahyurick marked this pull request as ready for review November 20, 2024 23:27
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Dec 23, 2024
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Jan 3, 2025
@sarahyurick sarahyurick marked this pull request as draft January 22, 2025 00:54
@sarahyurick sarahyurick changed the title Global cache_dir variable for exact, fuzzy, and semantic deduplication Create Cache class for exact, fuzzy, and semantic deduplication Jan 22, 2025
@Maghoumi
Copy link
Contributor

Thanks so much for working on this change. What I like about this now is that it gives users the option to either use the same cache directory for anything that requires caching, or provide a specific directory if they don't want to re-use the same cache.

The cache class implementation is functional but not thread-safe. I don't think that's a blocking problem for this PR.

I didn't run the samples/tutorials, but I assume the change has been thoroughly verified?

@sarahyurick
Copy link
Contributor Author

Thanks so much for working on this change. What I like about this now is that it gives users the option to either use the same cache directory for anything that requires caching, or provide a specific directory if they don't want to re-use the same cache.

The cache class implementation is functional but not thread-safe. I don't think that's a blocking problem for this PR.

I didn't run the samples/tutorials, but I assume the change has been thoroughly verified?

Thanks! Yes, I tried to make as few breaking changes as possible. The examples and tutorials should all reflect those changes.

sarahyurick and others added 3 commits February 18, 2025 14:35
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Feb 18, 2025
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Feb 19, 2025
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Feb 25, 2025
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Feb 25, 2025
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Feb 28, 2025
@github-actions github-actions bot added the Stale label Jul 26, 2025
@sarahyurick sarahyurick removed the Stale label Jul 28, 2025
@NVIDIA-NeMo NVIDIA-NeMo deleted a comment from github-actions bot Jul 28, 2025
@github-actions github-actions bot added the Stale label Aug 12, 2025
@sarahyurick sarahyurick removed the Stale label Aug 12, 2025
@NVIDIA-NeMo NVIDIA-NeMo deleted a comment from github-actions bot Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpuci Run GPU CI/CD on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants