2ED: An efficient entity extraction algorithm using two-level edit-distance
Z Wen, D Deng, R Zhang, R Kotagiri
2019 IEEE 35th International Conference on Data Engineering (ICDE) | IEEE | Published : 2019
© 2019 IEEE. Entity extraction is fundamental to many text mining tasks such as organisation name recognition. A popular approach to entity extraction is based on string matching against a dictionary of known entities. For approximate entity extraction from free text, considering solely character-based or solely token-based similarity cannot simultaneously deal with minor name variations at token-level and typos at character-level. Moreover, the tolerance of mismatch in character-level may be different from that in token-level, and the tolerance thresholds of the two levels should be able to be customised individually. In this paper, we propose an efficient character-level and token-level ed..View full abstract
Related Projects (1)
Awarded by Australian Research Council
This work is supported by Australian Research Council (ARC) Discovery Project DP180102050. Majority of the work was done when Zeyi Wen was with The University of Melbourne. We also like to thank the anonymous reviewers for their insightful comments.