Conference Proceedings

Using relative entropy for authorship attribution

Ying Zhao, Justin Zobel, Phil Vines, HT Ng (ed.), MK Leong (ed.), MY Kan (ed.), D Ji (ed.)

INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS | SPRINGER-VERLAG BERLIN | Published : 2006

Abstract

Authorship attribution is the task of deciding who wrote a particular document. Several attribution approaches have been proposed in recent research, but none of these approaches is particularly satisfactory; some of them are ad hoc and most have defects in terms of scalability, effectiveness, and efficiency. In this paper, we propose a principled approach motivated from information theory to identify authors based on elements of writing style. We make use of the Kullback-Leibler divergence, a measure of how different two distributions are, and explore several different approaches to tokenizing documents to extract style markers. We use several data collections to examine the performance of ..

View full abstract

University of Melbourne Researchers