Model is predicting empty string for custom python dataset

Hi @urialon,

As mentioned in one of the previous issues, I am trying to train and test Code2Seq for the code summarization tasks on our own python dataset. I am able to train the model but now the predictions/training doesn't seem to be correct. This issue seems to be similar to #62 which is also not properly resolved. Following are the things that I have tried:

1.  First time I tried to train with the same default config and after a couple of epochs, the predicted text for all cases was like "the|the|the|the|the|the".

2.  Following the suggestions of #17 and #45, I updated the model config to make it suitable for predicting longer sequences. But then also the predictions were similar but the length of predicted texts was varying which might be because I changed MAX_TARGET_PARTS as part of the config.

3.  Next I have followed the suggestions in #62 and make sure that there is no extra delimiter(",", "|" and " "), there is no punctuation and numbers, no non-alphanumeric characters(using str.isalpha() check over both doc and paths) and removing extra pipes(||). This time there was empty hypothesis for all the validation data points like #62.

4. To check if there is any issue in my setup, I tried to train the model using the python150k dataset and it's training properly on that so I am assuming it's some kind of dataset issue only.

5. I have observed that during the first 1 or 2 epochs there are some texts in prediction but with more epochs it goes down to become empty for all data points.

Here are some of the training logs during my experiments.
[training-logs-1.txt](https://github.com/tech-srl/code2seq/files/8938901/training-logs-1.txt)
[training-logs-2(config change).txt](https://github.com/tech-srl/code2seq/files/8938902/training-logs-2.config.change.txt)
[training-logs-3(alnum).txt](https://github.com/tech-srl/code2seq/files/8938903/training-logs-3.alnum.txt)

Thanks & Regards,
Tamal Mondal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model is predicting empty string for custom python dataset #124

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model is predicting empty string for custom python dataset #124

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions