Journal article

Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

Alaa Abi-Haidar, Jasleen Kaur, Ana Maguitman, Predrag Radivojac, Andreas Rechtsteiner, Karin Verspoor, Zhiping Wang, Luis M Rocha

Genome Biology | BMC | Published : 2008


BACKGROUND: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for compariso..

View full abstract


Awarded by NSF

Funding Acknowledgements

We would like to thank Santiago Schnell for graciously providing us with additional proteomics-related articles not containing PPI information. We would also like to thank the FLAD Computational Biology Collaboratorium at the Gulbenkian Institute in Oeiras, Portugal, for hosting and providing facilities used to conduct part of this research. It was at the collaboratorium that we interacted with Florentino Riverola, whose SpamHunting system inspired our approach to the IAS task, and who was most helpful in discussing his system with us. We are also grateful to Indiana University's Research and Technical Services for technical support. The AVIDD Linux Clusters used in our analysis are funded in part by NSF Grant CDA-9601632.