-
Notifications
You must be signed in to change notification settings - Fork 105
Open
Description
Dear author, I have tried to reproduce The CVPR 2022 consnet model. But in the xe training stage, the highest cider only reached 1.260, after sc training, only reached 1.393, much lower than 1.411 that the paper reported. I post my training config of xe below, can you help me to find some mistakes, thank you.
CUDNN_BENCHMARK: true
DATALOADER:
ANNO_FOLDER: ../open_source_dataset/mscoco_dataset
ATTRIBUTE_FILE: ''
FEATS_FOLDER: ../open_source_dataset/mscoco_dataset/features/CLIP_RN101_49
FILE_PATHS: []
GV_FEAT_FILE: ''
INF_BATCH_SIZE: 200
MAX_FEAT_NUM: 50
NEGATIVE_SIZE: -1
NUM_WORKERS: 6
RELATION_FILE: ''
SAMPLE_IDS: []
SAMPLE_PROB: 0.2
SEQ_PER_SAMPLE: 5
TEST_BATCH_SIZE: 32
TRAIN_BATCH_SIZE: 8
USE_GLOBAL_V: true
DATASETS:
TEST: MSCoCoCOSNetDataset
TRAIN: MSCoCoCOSNetDataset
VAL: MSCoCoCOSNetDataset
DECODE_STRATEGY:
BEAM_SIZE: 3
NAME: BeamSearcher
ENGINE:
NAME: DefaultTrainer
INFERENCE:
GENERATION_MODE: true
ID_KEY: image_id
NAME: COCOEvaler
TEST_ANNFILE: ../open_source_dataset/mscoco_dataset/captions_test5k.json
TEST_EVAL_START: -1
VALUE: caption
VAL_ANNFILE: ../open_source_dataset/mscoco_dataset/captions_val5k.json
VAL_EVAL_START: -1
VOCAB: ../open_source_dataset/mscoco_dataset/vocabulary.txt
LOSSES:
LABELSMOOTHING: 0.1
MARGIN: 0.2
MAX_VIOLATION: true
NAMES:
- LabelSmoothing
- SemComphderLoss
LR_SCHEDULER:
FACTOR: 1.0
GAMMA: 0.1
MIN_LR: 1.0e-05
MODEL_SIZE: 512
NAME: NoamLR
STEPS:
- 3
STEP_SIZE: 3
WARMUP: 20000
WARMUP_FACTOR: 0.0
WARMUP_METHOD: linear
MODEL:
BERT:
ATTENTION_PROBS_DROPOUT_PROB: 0.1
FFN_DROPOUT_PROB: 0.2
G_LAYER_DROP: 0.0
HIDDEN_ACT: relu
HIDDEN_DROPOUT_PROB: 0.1
HIDDEN_SIZE: 512
INTERMEDIATE_DROP: 0.2
INTERMEDIATE_SIZE: 2048
LAYER_DROP: 0.0
NUM_ATTENTION_HEADS: 8
NUM_GENERATION_LAYERS: 6
NUM_HIDDEN_LAYERS: 6
NUM_UNDERSTANDING_LAYERS: 6
U_LAYER_DROP: 0.0
V_LAYER_DROP: 0.0
V_NUM_HIDDEN_LAYERS: 6
V_TARGET_SIZE: 0
COSNET:
FILTER_WEIGHT: 1.0
MAX_POS: 26
NUM_CLASSES: 906
NUM_SEMCOMPHDER_LAYERS: 3
RECONSTRUCT_WEIGHT: 0.1
SLOT_SIZE: 6
DECODER: COSNetDecoder
DECODER_DIM: 512
DEVICE: cuda
EMA_DECAY: 0.9999
ENCODER: COSNetEncoder
ENCODER_DIM: 512
ENSEMBLE_WEIGHTS:
- ''
ITM_NEG_PROB: 0.5
MAX_SEQ_LEN: 20
META_ARCHITECTURE: TransformerEncoderDecoder
MODEL_WEIGHTS:
- 1.0
- 1.0
PREDICTOR: BasePredictor
PRED_DROPOUT: 0.5
PRETRAINING:
DO_LOWER_CASE: true
FROM_PRETRAINED: bert-base-uncased
MODEL_NAME: bert-base-uncased
TOKEN_EMBED:
ACTIVATION: none
DIM: 512
DROPOUT: 0.1
ELU_ALPHA: 0.5
NAME: TokenBaseEmbedding
POSITION: SinusoidEncoding
POSITION_MAX_LEN: 5000
TYPE_VOCAB_SIZE: 0
USE_NORM: true
USE_EMA: false
VISUAL_EMBED:
ACTIVATION: relu
DROPOUT: 0.5
ELU_ALPHA: 0.5
G_IN_DIM: 512
IN_DIM: 2048
LOCATION_SIZE: 0
NAME: VisualGridEmbedding
OUT_DIM: 512
USE_NORM: true
VOCAB_SIZE: 10200
V_PREDICTOR: ''
WEIGHTS: ''
OUTPUT_DIR: ./cosnet_output_baseline
SCHEDULED_SAMPLING:
INC_EVERY_EPOCH: 3
INC_PROB: 0.05
MAX_PROB: 0.5
START_EPOCH: 9999
SCORER:
CIDER_CACHED: ../open_source_dataset/mscoco_dataset/mscoco_train_cider.pkl
EOS_ID: 0
GT_PATH: ../open_source_dataset/mscoco_dataset/mscoco_train_gts.pkl
NAME: BaseScorer
TYPES:
- Cider
WEIGHTS:
- 1.0
SEED: -1
SOLVER:
ALPHA: 0.99
AMSGRAD: false
BASE_LR: 0.0005
BETAS:
- 0.9
- 0.999
BIAS_LR_FACTOR: 1.0
CENTERED: false
CHECKPOINT_PERIOD: 1
DAMPENING: 0.0
EPOCH: 35
EPS: 1.0e-08
EVAL_PERIOD: 1
GRAD_CLIP: 0.1
GRAD_CLIP_TYPE: value
INITIAL_ACCUMULATOR_VALUE: 0.0
LR_DECAY: 0.0
MOMENTUM: 0.9
NAME: Adam
NESTEROV: 0.0
NORM_TYPE: 2.0
WEIGHT_DECAY: 0.0
WEIGHT_DECAY_BIAS: 0.0
WEIGHT_DECAY_NORM: 0.0
WRITE_PERIOD: 20
VERSION: 1
Metadata
Metadata
Assignees
Labels
No labels