Conference Proceedings

PubSE: A hierarchical model for publication extraction from academic homepages

Y Zhang, J Qi, R Zhang, C Yin

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 | Association for Computational Linguistics | Published : 2020

Abstract

Publication information in a researcher's academic homepage provides insights about the researcher's expertise, research interests, and collaboration networks. We aim to extract all the publication strings from a given academic homepage. This is a challenging task because the publication strings in different academic homepages may be located at different positions with different structures. To capture the positional and structural diversity, we propose an end-to-end hierarchical model named PubSE based on Bi-LSTM-CRF. We further propose an alternating training method for training the model. Experiments on real data show that PubSE outperforms the state-of-the-art models by up to 11.8% in F1-..

View full abstract

University of Melbourne Researchers

Grants

Funding Acknowledgements

We thank Afshin Rahimi and the anonymous reviewers for their valuable suggestions, and we gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research. We also sincerely thank Han Liu and other annotators for their help in creating the HomePub dataset.