The performance is worse

Hi, I test the code on the 'labeledTrainData.tsv' dataset, 80% of the dataset is the testing data, and 20% of the dataset is the validation data.
I use theano backend. However, the performance is worse. The results are as follows.
_________________________________________________________________
Train on 20000 samples, validate on 5000 samples
Epoch 1/10
20000/20000 [==============================] - 104s - loss: 0.6974 - acc: 0.5082 - val_loss: 0.6938 - val_acc: 0.5058
Epoch 2/10
20000/20000 [==============================] - 103s - loss: 0.6982 - acc: 0.5025 - val_loss: 0.6930 - val_acc: 0.5124
Epoch 3/10
20000/20000 [==============================] - 104s - loss: 0.6959 - acc: 0.5128 - val_loss: 0.6950 - val_acc: 0.5058
Epoch 4/10
20000/20000 [==============================] - 104s - loss: 0.6978 - acc: 0.4936 - val_loss: 0.6939 - val_acc: 0.4942
Epoch 5/10
20000/20000 [==============================] - 103s - loss: 0.6983 - acc: 0.4958 - val_loss: 0.6934 - val_acc: 0.4954
Epoch 6/10
20000/20000 [==============================] - 103s - loss: 0.6994 - acc: 0.5002 - val_loss: 0.7012 - val_acc: 0.4944
Epoch 7/10
20000/20000 [==============================] - 104s - loss: 0.6992 - acc: 0.4973 - val_loss: 0.6931 - val_acc: 0.5054
Epoch 8/10
20000/20000 [==============================] - 103s - loss: 0.6977 - acc: 0.5032 - val_loss: 0.6931 - val_acc: 0.4940
Epoch 9/10
20000/20000 [==============================] - 103s - loss: 0.6966 - acc: 0.5070 - val_loss: 0.6937 - val_acc: 0.4942
Epoch 10/10
20000/20000 [==============================] - 103s - loss: 0.6961 - acc: 0.5068 - val_loss: 0.7287 - val_acc: 0.4942

The best performance is pretty much still cap at 90.4%s that is reported in your website https://richliao.github.io/supervised/classification/2016/12/26/textclassifier-HATN/.

 I wonder know that whether the attention layer code in the following is right for the theano backend. The implentation of the attention layer is as follows:

class AttLayer(Layer):
    def __init__(self, attention_dim):
        self.init = initializers.get('normal')
        self.supports_masking = True
        self.attention_dim = attention_dim
        super(AttLayer, self).__init__()

    def build(self, input_shape):
        assert len(input_shape) == 3
        self.W = K.variable(self.init((input_shape[-1], self.attention_dim)))
        self.b = K.variable(self.init((self.attention_dim, )))
        self.u = K.variable(self.init((self.attention_dim, 1)))
        self.trainable_weights = [self.W, self.b, self.u]
        super(AttLayer, self).build(input_shape)

    def compute_mask(self, inputs, mask=None):
        return mask

    def call(self, x, mask=None):
        # size of x :[batch_size, sel_len, attention_dim]
        # size of u :[batch_size, attention_dim]
        # uit = tanh(xW+b)
        uit = K.tanh(K.bias_add(K.dot(x, self.W), self.b))
        ait = K.dot(uit, self.u)
        ait = K.squeeze(ait, -1)

        ait = K.exp(ait)

        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in theano
            ait *= K.cast(mask, K.floatx())
        ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
        ait = K.expand_dims(ait)
        weighted_input = x * ait
        output = K.sum(weighted_input, axis=1)

        return output

    def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[-1])

Thanks a lot


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The performance is worse #40

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The performance is worse #40

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions