Dataset

Attribute-Value-Level Matching

Z Lim, B RUBINSTEIN

The University of Melbourne | Published : 2014

Abstract

We explore normalisation of attribute values across multiple data sources, where attributes could be categorical, numerical, otherwise and could multi-valued. For example the genre(s) of a movie or the cuisine(s) of a restaurant. To benchmark our statistical approach (based on Canonical Correlation Analysis) against baselines, we crawled and prepared two datasets of four sources each on movie genres (7852 records across IMDB, Rotten Tomatoes, The Movie DB, Yahoo! Movies) and restaurant cuisines (3120 records across Factual, Foursquare, Google Places, Yelp). After performing a simple entity resolution (record linkage) to align matched records across sources, we extracted the attribute to be m..

View full abstract