Conference Proceedings
An empirical evaluation of doc2vec with practical insights into document embedding generation
JH Lau, T Baldwin
Proceedings of the Annual Meeting of the Association for Computational Linguistics | The Association for Computational Linguistics | Published : 2016
DOI: 10.18653/v1/w16-1609
Abstract
Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This paper presents a rigorous empirical evaluation of doc2vec over two tasks. We compare doc2vec to two baselines and two state-of-the-art document embedding methodologies. We found that doc2vec performs robustly when using models trained on large external corpora, and can be further improved by using pre-trained word embeddings. We also provide recommendations on hyper-parameter settings for generalpurpose applications, and release source code to induce docu..
View full abstract