Skip to content

句向量特征提取的最佳实践 #160

@RingoD

Description

@RingoD

我想通过 albert_chinese_tiny 对一句话进行特征提取,再用提取的向量做我自己的训练。
由于 albert_chinese_tiny 的输出是 (none, 512, 21148),因此我对其进行了 max pooling,输出为 (none, 512)
想请教这样的实践是否正确

if __name__ == '__main__':
    import keras_albert_model
    import keras_bert
    import numpy as np
    from tensorflow import keras
    model = keras_albert_model.load_brightmart_albert_zh_checkpoint('./albert_tiny_489k/')
    tokenizer = keras_bert.Tokenizer(keras_bert.load_vocabulary('./albert_tiny_489k/vocab.txt'))

    outputs = keras.layers.GlobalMaxPool1D(name='MaxPooling', data_format='channels_first')(model.outputs[0])

    model = keras.models.Model(inputs=model.inputs, outputs=outputs)
    model.summary()
    token, segment = tokenizer.encode('I like it', max_len=512)
    prediction = model.predict([np.array([token]), np.array([segment])])[0]
    print(prediction, prediction.shape)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions