Journal article

The CHEMDNER corpus of chemicals and drugs and its annotation principles

M Krallinger, O Rabal, F Leitner, M Vazquez, D Salgado, Z Lu, R Leaman, Y Lu, D Ji, DM Lowe, RA Sayle, RT Batista-Navarro, R Rak, T Huber, T Rocktäschel, S Matos, D Campos, B Tang, H Xu, T Munkhdalai Show all

Journal of Cheminformatics | Chemistry Central | Published : 2015

Abstract

Copyright © 2015 Krallinger et al. The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. Th..

View full abstract

Grants

Awarded by MICROME grant


Funding Acknowledgements

This work is supported by the Innovative Medicines Initiative Joint Undertaking (IMI-eTOX) and the MICROME grant 222886-2. We would like to thank Peter Corbett, Colin Batchelor and Corinna Kolarik their colleagues for their pioneering work on chemical entity annotation.