Conference Proceedings

Clustering near-duplicate images in large collections

JJ Foo, J Zobel, R Sinha

Proceedings of the ACM International Multimedia Conference and Exhibition | Published : 2007


Near-duplicate images introduce problems of redundancy and copyright infringement in large image collections. The problem is acute on the web, where appropriation of images without acknowledgment of source is prevalent. In this paper, we present an effective clustering approach for near-duplicate images, using a combination of techniques from invariant image local descriptors and an adaptation of near-duplicate text-document clustering techniques; we extend our earlier approach of near-duplicate image pairwise identification for this clustering approach. We demonstrate that our clustering approach is highly effective for collections of up to a few hundred thousand images. We also show - - vi..

View full abstract

University of Melbourne Researchers