-
Notifications
You must be signed in to change notification settings - Fork 377
Description
Hi, I test the code on the 'labeledTrainData.tsv' dataset, 80% of the dataset is the testing data, and 20% of the dataset is the validation data.
I use theano backend. However, the performance is worse. The results are as follows.
Train on 20000 samples, validate on 5000 samples
Epoch 1/10
20000/20000 [==============================] - 104s - loss: 0.6974 - acc: 0.5082 - val_loss: 0.6938 - val_acc: 0.5058
Epoch 2/10
20000/20000 [==============================] - 103s - loss: 0.6982 - acc: 0.5025 - val_loss: 0.6930 - val_acc: 0.5124
Epoch 3/10
20000/20000 [==============================] - 104s - loss: 0.6959 - acc: 0.5128 - val_loss: 0.6950 - val_acc: 0.5058
Epoch 4/10
20000/20000 [==============================] - 104s - loss: 0.6978 - acc: 0.4936 - val_loss: 0.6939 - val_acc: 0.4942
Epoch 5/10
20000/20000 [==============================] - 103s - loss: 0.6983 - acc: 0.4958 - val_loss: 0.6934 - val_acc: 0.4954
Epoch 6/10
20000/20000 [==============================] - 103s - loss: 0.6994 - acc: 0.5002 - val_loss: 0.7012 - val_acc: 0.4944
Epoch 7/10
20000/20000 [==============================] - 104s - loss: 0.6992 - acc: 0.4973 - val_loss: 0.6931 - val_acc: 0.5054
Epoch 8/10
20000/20000 [==============================] - 103s - loss: 0.6977 - acc: 0.5032 - val_loss: 0.6931 - val_acc: 0.4940
Epoch 9/10
20000/20000 [==============================] - 103s - loss: 0.6966 - acc: 0.5070 - val_loss: 0.6937 - val_acc: 0.4942
Epoch 10/10
20000/20000 [==============================] - 103s - loss: 0.6961 - acc: 0.5068 - val_loss: 0.7287 - val_acc: 0.4942
The best performance is pretty much still cap at 90.4%s that is reported in your website https://richliao.github.io/supervised/classification/2016/12/26/textclassifier-HATN/.
I wonder know that whether the attention layer code in the following is right for the theano backend. The implentation of the attention layer is as follows:
class AttLayer(Layer):
def init(self, attention_dim):
self.init = initializers.get('normal')
self.supports_masking = True
self.attention_dim = attention_dim
super(AttLayer, self).init()
def build(self, input_shape):
assert len(input_shape) == 3
self.W = K.variable(self.init((input_shape[-1], self.attention_dim)))
self.b = K.variable(self.init((self.attention_dim, )))
self.u = K.variable(self.init((self.attention_dim, 1)))
self.trainable_weights = [self.W, self.b, self.u]
super(AttLayer, self).build(input_shape)
def compute_mask(self, inputs, mask=None):
return mask
def call(self, x, mask=None):
# size of x :[batch_size, sel_len, attention_dim]
# size of u :[batch_size, attention_dim]
# uit = tanh(xW+b)
uit = K.tanh(K.bias_add(K.dot(x, self.W), self.b))
ait = K.dot(uit, self.u)
ait = K.squeeze(ait, -1)
ait = K.exp(ait)
if mask is not None:
# Cast the mask to floatX to avoid float64 upcasting in theano
ait *= K.cast(mask, K.floatx())
ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
ait = K.expand_dims(ait)
weighted_input = x * ait
output = K.sum(weighted_input, axis=1)
return output
def compute_output_shape(self, input_shape):
return (input_shape[0], input_shape[-1])
Thanks a lot