Skip to content

Conversation

@zhongkaifu
Copy link
Owner

Summary

  • add a VisionCaption application that turns image patches into transformer-friendly embeddings so we can train captioning models without the legacy Image2Seq code
  • expose the new application through a dedicated VisionCaptionConsole CLI and wire it into the solution
  • fix the vision corpus token accounting and add ImageSharp as a dependency for loading and normalizing images

Testing

  • ⚠️ dotnet build Seq2SeqSharp.sln (not run: dotnet CLI is unavailable in the execution environment)

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants