Add CNN vision support and image caption console #98

zhongkaifu · 2025-11-17T06:16:56Z

Summary

add the CNN-based vision encoder, convolution ops and metadata so Seq2Seq models can ingest image tensors
extend Seq2Seq runtime/options with ImageCaptionOptions, ImageTensorBuilder, and new caches/tests to run the mixed vision/text flow safely
create the SeqImageCaptionConsole CLI that wires the new VisionTextCorpus pipeline for training/validation/testing image caption models

Testing

dotnet build Seq2SeqSharp.sln -c Release (fails: dotnet CLI is not available in the execution environment)

Codex Task

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-17T06:24:42Z

Seq2SeqSharp/Tools/BaseSeq2SeqFramework.cs

+            if (modelMetaData.EncoderType != EncoderTypeEnums.VisionCNN)
            {
-                Logger.WriteLine(Logger.Level.debug, $"Creating shared embeddings for both source side and target side. Shape = '({modelMetaData.SrcVocab.Count} ,{modelMetaData.EncoderEmbeddingDim})'");
+                if (modelMetaData.SharedEmbeddings)
+                {
+                    Logger.WriteLine(Logger.Level.debug, $"Creating shared embeddings for both source side and target side. Shape = '({modelMetaData.SrcVocab.Count} ,{modelMetaData.EncoderEmbeddingDim})'");

-                srcEmbeddings = new MultiProcessorNetworkWrapper<IWeightTensor>(new WeightTensor(new long[2] { modelMetaData.SrcVocab.Count, modelMetaData.EncoderEmbeddingDim },
-                    raDeviceIds.GetNextItem(), initType: RandomInitType.Uniform, fanOut: true, name: "SharedEmbeddings", isTrainable: isSrcEmbeddingTrainable, learningRateFactor: encoderStartLearningRateFactor, dtype: elementType), DeviceIds);
+                    srcEmbeddings = new MultiProcessorNetworkWrapper<IWeightTensor>(new WeightTensor(new long[2] { modelMetaData.SrcVocab.Count, modelMetaData.EncoderEmbeddingDim },
+                        raDeviceIds.GetNextItem(), initType: RandomInitType.Uniform, fanOut: true, name: "SharedEmbeddings", isTrainable: isSrcEmbeddingTrainable, learningRateFactor: encoderStartLearningRateFactor, dtype: elementType), DeviceIds);

-                tgtEmbeddings = null;
-            }
-            else
-            {
-                Logger.WriteLine(Logger.Level.debug, $"Creating embeddings for source side. Shape = '({modelMetaData.SrcVocab.Count} ,{modelMetaData.EncoderEmbeddingDim})'");
+                    tgtEmbeddings = null;
+                }
+                else
+                {
+                    Logger.WriteLine(Logger.Level.debug, $"Creating embeddings for source side. Shape = '({modelMetaData.SrcVocab.Count} ,{modelMetaData.EncoderEmbeddingDim})'");

-                srcEmbeddings = new MultiProcessorNetworkWrapper<IWeightTensor>(new WeightTensor(new long[2] { modelMetaData.SrcVocab.Count, modelMetaData.EncoderEmbeddingDim },
-                    raDeviceIds.GetNextItem(), initType: RandomInitType.Uniform, fanOut: true, name: "SrcEmbeddings", isTrainable: isSrcEmbeddingTrainable, learningRateFactor: encoderStartLearningRateFactor, dtype: elementType), DeviceIds);
+                    srcEmbeddings = new MultiProcessorNetworkWrapper<IWeightTensor>(new WeightTensor(new long[2] { modelMetaData.SrcVocab.Count, modelMetaData.EncoderEmbeddingDim },
+                        raDeviceIds.GetNextItem(), initType: RandomInitType.Uniform, fanOut: true, name: "SrcEmbeddings", isTrainable: isSrcEmbeddingTrainable, learningRateFactor: encoderStartLearningRateFactor, dtype: elementType), DeviceIds);
+                }
+            }

-                Logger.WriteLine(Logger.Level.debug, $"Creating embeddings for target side. Shape = '({modelMetaData.TgtVocab.Count} ,{modelMetaData.DecoderEmbeddingDim})'");
+            Logger.WriteLine(Logger.Level.debug, $"Creating embeddings for target side. Shape = '({modelMetaData.TgtVocab.Count} ,{modelMetaData.DecoderEmbeddingDim})'");

-                tgtEmbeddings = new MultiProcessorNetworkWrapper<IWeightTensor>(new WeightTensor(new long[2] { modelMetaData.TgtVocab.Count, modelMetaData.DecoderEmbeddingDim },
-                    raDeviceIds.GetNextItem(), initType: RandomInitType.Uniform, fanOut: true, name: "TgtEmbeddings", isTrainable: isTgtEmbeddingTrainable, learningRateFactor: decoderStartLearningRateFactor, dtype: elementType), DeviceIds);
-            }
+            tgtEmbeddings ??= new MultiProcessorNetworkWrapper<IWeightTensor>(new WeightTensor(new long[2] { modelMetaData.TgtVocab.Count, modelMetaData.DecoderEmbeddingDim },
+                raDeviceIds.GetNextItem(), initType: RandomInitType.Uniform, fanOut: true, name: modelMetaData.SharedEmbeddings ? "SharedEmbeddings" : "TgtEmbeddings", isTrainable: isTgtEmbeddingTrainable, learningRateFactor: decoderStartLearningRateFactor, dtype: elementType), DeviceIds);


Skip creating a second embedding when embeddings are shared

When SharedEmbeddings is enabled the new logic still instantiates a target embedding wrapper (tgtEmbeddings ??= …), even though a shared source embedding has already been created. Both wrappers produce tensors named "SharedEmbeddings"; during SaveModel this results in two calls to model.AddWeights("SharedEmbeddings", …) which throws an ArgumentException for duplicate keys and prevents models with shared vocabularies from being saved. The previous implementation avoided this by not allocating a separate target embedding in the shared case. The additional allocation should be removed or redirected to reuse srcEmbeddings so only one set of shared weights is saved.

Useful? React with 👍 / 👎.

Add configurable image normalization for vision encoder

7f6be23

zhongkaifu added the codex label Nov 17, 2025 — with ChatGPT Codex Connector

chatgpt-codex-connector bot reviewed Nov 17, 2025

View reviewed changes

zhongkaifu added 7 commits November 16, 2025 22:30

Dispatch convolution ops via registry

4739acd

Fix build errors for image caption cnn

fa7031a

Add dedicated image caption wrapper

18404bb

Reimplement CUDA spatial convolution

184591d

Fix convolution weight layout handling

9303e4a

Fix CUDA finput buffer handling

d151de7

Add layer normalization to CNN encoder

723f84f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CNN vision support and image caption console #98

Add CNN vision support and image caption console #98

Uh oh!

zhongkaifu commented Nov 17, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add CNN vision support and image caption console #98

Are you sure you want to change the base?

Add CNN vision support and image caption console #98

Uh oh!

Conversation

zhongkaifu commented Nov 17, 2025

Summary

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants