Hypothesis-Based Collaborative Filtering

Retrieving Like-Minded People Based on the Comparison of Hypothesized Preferences

The vast product variety and product variation offered by online retailers provide an amazing amount of choice options to individuals, thus posing a big challenge to them finding and choosing interesting products which provide them the most utility. Consequently, consumers have to be satisfied with finding a product that provides them sufficient utility. Beyond that, individuals tend to even defer product choice [Dhar, 1997].

Recommender systems have emerged in the past years as an effective method to help individuals with finding interesting products. As a result, the consumer welfare enhanced by $731 million to $1.03 billion in the year 2000 due to the increased product variety of online bookstores [Brynjolfsson et al., 2003]. Consumer welfare refers to consumers’ total satisfaction. This enhancement in consumer welfare is 7 to 10 times larger than the consumer welfare gain from increased competition and lower prices in the book mar- ket [Brynjolfsson and Smith, 2000]. In other words, recommender systems are essential for increasing consumers welfare, which ultimately leads to an increase of economic and social welfare.

Typically, recommender systems use the collective wisdom of individuals for exposing individuals to products which best fits their preferences, thus maximizing their utility. More precisely, the product ratings of like- minded individuals are considered by the recommender system to provide individuals recommendations. Commonly, like-minded individuals are retrieved by comparing their ratings for common rated products. This filtering technology is commonly referred to as collaborative filtering.

However, retrieving like-minded individuals based on their ratings for common rated products may be inappropriate because common rated products may not necessarily be a representative sample of two individuals’ preferences being compared. There are four reasons. Firstly, the set of common rated products is too sparse to draw a significant conclusion about the preference similarity of both individuals.

Secondly, ratings for common rated products correspond to the inter- section of two individuals’ rated products and thus may represent only partially both individuals’ preferences. Consequently, overall preference similarity is, in fact, deduced from partial preference similarity.

Thirdly, the preference similarity between two individuals is not assess- able in the case when both individuals do not share ratings for the same products. Consequently, like-minded individuals are missed due to lack of ratings.

Lastly, retailers collect only a fraction of individuals’ ratings on their store, because individuals purchase products from different stores. Hence, individuals’ ratings are distributed across multiple retailers, which limits the set of common rated products per retailer.

In this dissertation, we propose hypothesis-based collaborative filtering (HCF) to expose individuals to products which best fits their preferences. In HCF, like-minded individuals are retrieved based on the similarity of their respective hypothesized preferences by means of machine learning algorithms hypothesizing individuals’ preferences. Machine learning is a method to extract patterns to generalize from observations, thus being adequate to hypothesize individuals’ preferences from their product ratings.

Generally, the similarity of two individuals’ hypothesized preferences can be computed in two different ways. One way is to compare the hypothesized utilities which products provide to both individuals. To this goal, we use both individuals’ hypothesized preferences to predict the utilities of some products. To compute the preference similarity, we propose three similarity metrics to compare product utilities.

The other way is to analyze the composition of both individuals’ hypothesized preferences. For this purpose, we introduce the notion of hypothesized partial preferences (HPPs), which are self-contained and form the components which constitute hypothesized preferences. We propose several methods to compare HPPs to compute the similarity of two individuals’ preferences.

We conduct a large empirical study on a quasi benchmark dataset and diverse variation of this dataset, which vary by means of sparsity degree, to evaluate the cold-start behavior of HCF. Based on this empirical study, we provide empirical evidence for the robustness of HCF against data sparsity and the superiority to state-of-the-art collaborative filtering methods.

We use the research methodology of grounded theory to scrutinize the empirical results to explain the cold-start behavior of HCF for retrieving like-minded individuals relative to other collaborative filtering methods. Based on this theory, we show that HCF is more efficient in retrieving like- minded individuals from large sets of individuals and is more appropriate for individuals which provide few provide ratings. We verify the validity of the grounded theory by means of an empirical study.

In conclusion, HCF provides individuals better recommendations, particularly for individuals who provide few ratings and for frequently rated products, which complicates the retrieval of like-minded individuals. Hence, HCF increases consumers welfare, which ultimately leads to an increase of economic and social welfare.