Learning Kernels over Strings using Gaussian Processes
Daniel Beck, Trevor Cohn
Proceedings of IJCNLP | Asian Federation of Natural Language Processing | Published : 2017
Non-contiguous word sequences are widely known to be important in mod-elling natural language. However they are not explicitly encoded in common text representations. In this work we propose a model for text processing using string kernels, capable of flexibly representing non-contiguous sequences. Specifically, we derive a vectorised version of the string kernel algorithm and their gradi-ents, allowing efficient hyperparameter optimisation as part of a Gaussian Process framework. Experiments on synthetic data and text regression for emotion analysis show the promise of this technique.