Personal, Private Movie Recommender System at the Semantic Web Challenge

I advised together with Gerald Reif the master thesis of Tobias Bannwart about a personal cross-site movie recommender system that is implemented as Firefox add-on. The add-on is known as OMORE and can be downloaded. We decided to bring OMORE to market maturity. For this, we had to rethink its architectural design and usability and beside from its advantages we came up with the following open challenges:

  1. Movie cross-references: It is not known what movie of one provider corresponds to what other movie page of another provider. Providers may be commercial pages like Amazon.com, review pages like RottenTomatoes.com or knowledge bases such as IMDb.com
  2. Retrieval of movie cross-references: No flexible search service exists to retrieve movie cross-references based on movie title and release year information.
  3. Maintenance of movie cross-references: A vast amount of potential movie cross-references exists that is difficult to gathered with a Web crawler approach. In addition, the set of movie pages increases fast.

We came up with the following solutions:

  1. Movie cross-references: A knowledge base is needed to persist movie cross-references. Concretely, for all movies the information of (1) what movie pages represent the movie as its content and (2) what movie pages represent the commercial product of a movie such as DVD, Blu-Ray, VHS or even Video-On-Demand (VoD). This semantical distinction has to be done, because a movie represented in VHS and Blu-Ray are not the same but show the same movie. Therefore, we decided to apply Semantic Web technologies to persist movie cross-references. We applied D2R that maps data from a relational database management system to RDF. RDF is the basic format to represent resources semantically. Our knowledge base of movie cross-references is called LiMo.
  2. Retrieval of movie cross-references: A search service has to be provided, that is able to provide even fuzzy search on the movie cross-references knowledge base. We reason for fuzzy search because the movie are presented quite heterogeneously among different Web pages. Movie titles may be misspelled, transformed or even extended in various way. Especially on online shops, we experienced that the movie titles are extended with information about many variants of special or collector’s edition and the type of medium the movie is provided. Instead of trying to extract the original title from the unpurified title, we decided to apply fuzzy search over movie titles and release year to retrieve movies. Our movie retrieval service can be accessed at MOLookup.
  3. Maintenance of movie cross-references: A Web crawler approach is not feasible due to the time latency and the need for resources. Thus, we decided to invent a collaborative approach. Whenever a user browses a new movie Web page that is not yet cross-referenced with LiMo, OMORE automatically uses the movie retrieval services MOLookup and provides the current URL of the new movie page. Then, this URL is cross-referenced to the retrieved movie. With that approach we automatically gather all the relevant movie cross-references with the user’s help. This way, normal users even contribute to the Semantic Web without knowing it.

With this new approach, we decided to participate in the this year’s Semantic Web Challenge that is co-located with the International Semantic Web Conference 2009 (ISWC) in Washington D.C. We were 16 participants that made it to the Semantic Web Challenge in Washington D.C. We presented our movie recommender system and its revised architecture besides the official Poster and Demonstrations session the main conference. Our secret weapon to attract many people to our stand was Swiss chocolate. And well, it worked out ;). The official time for the challenge presentation was 19:15-21:15. But people already showed up to our stand at half past 6 and kept coming by until 10 in the evening. One reason my be of course the chocolate ;), but also the viral marketing that people started that saw our challenge. Overall, people were really excited about our personal and private movie recommender system that even provides cross-site movie recommendations.
Despite the great success, we didn’t made it to finals. However, the challenge was a really nice experience and we had still have the great success having people excited about our OMORE.

Abstract

Online stores and Web portals bring information about a myriad of items such as books, CDs, restaurants or movies at the user’s fingertips. Although, the Web reduces the barrier to the information, the user is overwhelmed by the number of available items. Therefore, recommender systems aim to guide the user to relevant items. Current recommender systems store user ratings on the server side. This way the scope of the recommendations is limited to this server only. In addition, the user entrusts the operator of the server with valuable information about his preferences.
Thus, we introduce the private, personal movie recommender OMORE, which learns the user model based on the user’s movie ratings. To preserve privacy, OMORE is implemented as Firefox add-on which stores the user ratings and the learned user model locally at the client side. Although OMORE uses the features from the movie pages on the IMDb site, it is not restricted to IMDb only. To enable cross-referencing between various movie sites such as IMDb, Amazon.com, Blockbuster, Netflix, Jinni, or Rotten Tomatoes we introduce the movie cross-reference database LiMo which contributes to the Linked Data cloud.

Presentation

In the following, you can watch my presentation I prepared for the Semantic Web Challenge:

Poster

In the following, you can see a preview of the Semantic Web Challenge 2009 poster titled “OMORE”:
Poster of OMORE, a firefox plugin for online movie recommenations

Downloads

We include the papers on this page to ensure timely dissemination on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by the copyrights. These works may not be reposted without the explicit permission of the copyright holder.

Personal Cross-Site Movie Recommender System Implemented as Mozilla Firefox Add-On

Online stores or Web page bring information about a myriad of items such as books, CDs, restaurants or movies at the user’s fingertips. Although, the Web reduces the barrier to the information, the user is overwhelmed by the number of available items. Therefore, online stores provide recommender systems that aim to guide the user to relevant items. However, recommender systems are generally limited to the Web page’s content and the explicit or implicit ratings provided by the users on the particular Web page. User are lazy when it comes to repeat providing rating information to recommender systems on other Web pages. That is a typical lock-in situation based on high transaction costs such that people are addicted to one or at least a limited number of Web pages.

People are required to have an account on a particular Web page before being provided with interesting recommendations. People may have concerns about providing explicit or implicit ratings on items that may expose some delicate details about their privacy. But many rather small online stores do not even provide recommendations.

Thus, we need a recommender system that (1) recognizes items over various Web pages as the same and remembers the ratings for those, (2) applies algorithms to provide recommendations and (3) smoothly integrates the rating and recommendation functionality directly in the Web site. We found the basic infrastructure for such a recommender system in the Firefox Add-on API and WEKA, a common data mining library.
We formulated these requirements as a master thesis. We were very happy to engage Tobias, an excellent master student.

The described recommender system implemented as Firefox Add-on can be downloaded at s.e.a.l. group site.

Abstract

Online stores and Web portals bring information about a myriad of items such as books, CDs, restaurants or movies at the user’s fingertips. Although, the Web reduces the barrier to the information, the user is overwhelmed by the number of available items. Therefore, recommender systems aim to guide the user to relevant items. Current recommender systems store user ratings on the server side. This way the scope of the recommendations is limited to this server only. In addition, the user entrusts the operator of the server with valuable information about his preferences.

In this thesis, we introduce our recommender system OMORE, a private, personal movie recommender, which learns the user model based on the user’s movie ratings. To preserve privacy, OMORE is implemented as a Mozilla Firefox add-on, which stores the user’s ratings and the learned user model locally at the client side. Although OMORE makes use of the movie features, which are provided by the different movie pages on the Amazon.com, Blockbuster, Netflix and Rotten Tomatoes.

Tobias Bannwart: “OMORE – Private, Personal Movie Recommendations implemented in a Mozilla Firefox Add-on“, ed. by Amancio Bouza, Gerald Reif and Harald C. Gall, University of Zurich, July 2009. (master thesis)

Downloads:

Probabilistic Partial User Model Similarity for Collaborative Filtering

Our current work on a probabilistic approach to compute partial user preference similarities was accepted and published at the 1st International Workshop on Inductive Reasoning and Machine Learning for the Semantic Web (IRMLeS) at the 6th European Semantic Web Conference (ESWC) 2009. The paper is available online at CEUR-WS.org as volume 474. The presentation is available at the IRMLeS Web page.

The general idea is that people may share similar preferences only partially. For instance, a person may like Italian food like another person but not Chinese food. But the other person does like Chinese food. Traditional collaborative filtering computes global preference similarity and fail detect this relation. Our approach computes is able to compute partial preference similarities on the basis of hypothesized user preferences. The hypothesized user preferences are learned applying traditional machine learning algorithms. We could show, that our approach performs significantly better then traditional user-based collaborative filtering. Especially in cases where people have only few common rated items. The strength of our approach are the use of partial preference similarities and using hypothesized user preferences instead of item ratings that are always biased.

It was my first real presentation at a conference and it was a great success. I got very positive feedback on it. But I also noticed that a too fancy presentation may irritate some people. Well, I just had the new version of Keynote installed on my Mac and thus, I had to try out the new fancy features. This workshop was one of the most successful at the this year’s ESWC measured by number of participants. I also enjoyed the workshop dinner where I participated interesting discussion on artificial intelligence, data mining in practice, football and Shakespeare.

At the conference, I got in touch with some very interesting people. Especially at the very well organised poster session and after the conference dinner. Unfortunately, it was the last European Semantic Web Conference because the organizers decided to have the industry as main target. Thus, the abbreviation of ESWC stands now for the Extended Semantic Web Conference.

Abstract

Recommender systems play an important role in supporting people getting items they like. One type of recommender systems is user-based collaborative filtering. The fundamental assumption of user-based collaborative filtering is that people who share similar preferences for common items behave similar in the future. The similarity of user preferences is computed globally on common rated items such that partial preference similarities might be missed. Consequently, valuable ratings of partially similar users are ignored. Furthermore, two users may even have similar preferences but the set of common rated items is too small to infer preference similarity. We propose first, an approach that computes user preference similarities based on learned user preference models and second, we propose a method to compute partial user preference similarities based on partial user model similarities. For users with few common rated items, we show that user similarity based on preferences significantly outperforms user similarity based on common rated items.

Presentation

In the following, you can watch my paper presentation I gave at the IRMLeS workshop:

Presentation at the IRMLeS workshop

Downloads

We include the papers on this page to ensure timely dissemination on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by the copyrights. These works may not be reposted without the explicit permission of the copyright holder.

Evaluation of the Adoption of New Features in a Web-Based Social Network

Together with Marc Vontobel’s main advisor Gerald Reif, I successfully advised and supported Marc with his bachelor thesis. He was my first student I advised at all. He reenginered Purple Leaf – a party portal – and investigated the impact of social relations among people with respect to acceptance rate of new features on a web page. At the time he finished with his thesis, the community of Purple Leaf consisted of several thousands of people.
It seems that one rather accepts and adopts a new feature when a friend already adopted it. It seems that the inhibition threshold is much lower due to the fact, that people trust their friends and thus, trust the feature and it added value. Marc did a great job that could not be expected from a Bachelor student, starting from the software engineering skills to the data gathering and complex Social Network Analysis (SNA). But he was already prepared from our precedent seminar on “Trust and Recommendation in Social Networks”.

We were surprised as he delivered not only his thesis but also a complete picture book with fantastic analysis of differenc aspects of his party community. He discovered the core group of party people being on most parties, the latent semantics among different music styles and drinks.

Abstract

Purple Leaf is a social network which offers its member several possibilities to personalize its exclusive events by providing them unique online services. After the size of our platform suddenly increased from 300 initially invited guests to a multiple, we were obliged to completely revise the platform and enlarge our range of services. To embed these new services smoothly into the existing web presence, we fully restructured the application and changed the basis to a modern web framework. After that makeover, we designed five other services which we targeted to increase the customer loyalty and the entertainment value of our platform. Because new features are often not instantly accepted by existing users, we developed an integrated concept for boosting the acceptance of novel functionality. This concept is based on the technology acceptance model which was developed by Davis (1986). The model postulates that the actual use of a new feature is solely based on external factors. On the one hand, there are factors which influence the ‘perceived ease-of-use’ and on the other hand some that have impact on the ‘perceived usefulness’. In order to foster the perceived ease-of-use, we developed several usability concepts and tried to figure out how Web 2.0 features can help to simplify different processes. Beside the creation of intuitive user interfaces and plain procedures, we worked on an elaborated data and application structure which itself also contributed a big part to the simplicity of the new functionality. After we had embedded the services into our Internet portal, we started to analyze the acceptance of one new feature: ‘The most favored Guest’. This service allows every sign up member to define his personal list of favored guests for an upcoming event. Once the selected users are informed about their election, they, in turn, have the chance to define their own list. After a first round of selection, we tried to boost the personal acceptance of our members by providing specific incentives. Beside the active interventions into the process of adoption, we also analyzed a passive phenomenon: Does some kind of peer pressure exist within virtual cliques? If so, there might emerge some interesting changes in common marketing strategies which could narrow down the target audience to some single users of the network. In addition, we visualized some of the encountered situations and putted them together in an illustrated book as supplement to this paper.

Marc Vontobel: “Purple Leaf – Evaluation of the Adoption of New Features in a Web-Based Social Network”, ed. by Gerald Reif, Amancio Bouza, Harald C. Gall, University of Zurich, December 2008. (bachelorsthesis)

Starting as PhD Student in the Area of Recommender Systems and Semantic Web Technologies

I’m starting my PhD in July. I’m associated with two research groups from the Department of Informatics at the University of Zurich. That are the Software Evolution and Architecture Lab (s.e.a.l.) headed by Prof. Harald C. Gall and the Dynamic and Distributed Information Systems (DDIS) headed by Prof. Bernstein. I’m lucky to start working on the hot topic of recommender systems. The overall goal of the joint work with an industry partner is the development of a web-based location recommender system. This project is partially supported by the CTI (commission of technology and innovation) because of its high potential use in different areas like tourism. The CTI is the national agency for innovation. The CTI supports the knowledge transfer between companies and universities to create innovation and not just inventions.

Well, I’m extremely happy and proud having this chance to promovate advised by excellent advisors even if I get payed less then I would get in the industry. But who cares about money, right?. What I looked for was a great challenge that only a PhD can provide.

Kick-Off Meeting of the Localina project

Kick-Off Meeting of the Localina project

Graph-Based Knowledge Browser for Content Management Systems

A week ago, the 16th of May, I finished my master thesis about an implementation of a graph-based knowledge browser for a content management system (CMS). Now, I spend time in relaxing and organising my next steps to the future. The current IT market is asking for graduates and I remark it every time taking a look to my email account, if you know what I mean ;).

The time writing my master thesis was absolutely great. It was extremely work intensive, but great! I got a lot of support from my family and girlfriend, and from my friends too. Thank you at this point. Now, I know what it means to work from 7.30 to 23.00 over several months every day including the weekend. I experienced that working over 100 hours a week isn’t really efficient. You work not at 100% and your personal efficiency rate decreases every additional day. It’s not recommendable to work many over a long period of time. It does not only affect your work, if you know what I mean.

Anyhow, my thesis’s topic seems to be very attractive. During my research I got lots of requests from all over the world. A Professor from the St. Louis University in America was interested in using my tool for further research on Natural Language Processing (NLP). Other knowledge workers wanted to share experience. The results of my research and the potential of an interactive knowledge map for knowledge transfer leads me to possible future works. I just got some recommendations searching for venture capitalist to innovate my invention. We’ll see. It’s not the only project on my fingertips.

Abstract

The success of knowledge transfer is crucial in the area of knowledge management. Not only companies in outsourcing-relations have the need of successful knowledge transfer. Organisations have the need of successful knowledge transfer too in order to create market advantages. This thesis introduces a graph-based knowledge browser for a CMS to support the topic of knowledge transfer by providing ?shared material? for generating knowledge and providing easy access to knowledge by visualising knowledge as associative networks. Knowledge is presented as graph or radial layout in hyperspace. Web 2.0 technologies like AJAX and SVG are used for the implementation.

Entering the Semi Finals of International User Interface Competition with a Knowledge-Graph Visualization

I’m proceeding to the next round in the Imagine Cup 2007 as 3 ranked team. Only 30 Teams with 1-2 persons had the chance to proceed to the next round. This 30 teams were elected by community voting. Only registered competitors were allowed to vote for other teams. No one could vote for his own team. The Imagine Cup consists of 7 categories and over 100′000 students from all around the world joined to compete.

Imagine the Wiki concept combined with Web 2.0 and let it become 2D. The knowledge of a Wiki or every other CMS is visualised as a topic map with nodes (e.g. article, person, knowledge entity, activity) and relations between them if they have a relation. You don’t see one article at once, you see the hole context of an article! You can directly add new articles in the topic map or knowledge browser and can directly paint relations between nodes. The topic map is rendered in hyperspace to focus on the nodes in the center of the screen. But you can use your mouse to move the hyperspace and the hole topic map (i.e. graph). The layout is calcualted in realtime with either a Spring model or a radial layout. In the spring model repulsive and attractive forces between nodes are calcualted to get a layout with minimum edge crossings etc. (graph layout heuristics). It looks really nice ; It runs on a Web browser and with Web 2.0 technologies (Ajax).

My mentor Benjamin promoted my currently successful participation in the Imagine Cup 2007 at the Department of Informatics at the University of Zurich. He published a news article about my current ranking (3rd rank) and proceeding to the semi finals:

Leaderboard of the Imagine Cup 2007. My team is called IfIface

Leaderboard of the Imagine Cup 2007. My team is called IfIface

IfI Student Enters Semi Finals in International User Interface Competition

IfI diploma student Amancio Bouza ranked 3rd in the user interface discipline to enter the semi-finals of Microsoft’s international computer science talent competition Imagine Cup. His successful contribution presents a novel AJAX powered user interface. The solution improves accessing and modifying graph based knowledge structures in an enterprise content management system. The user interface unites editing and browsing functions, and therefore will empower regular knowledge workers to view and change how knowledge is represented within their organization more easily.

The diploma thesis is currently under development with the Information Management Research Group at the IfI. Since Mr. Bouza seems to be the only Swiss participant in the competition, he hopefully will advance to the final round held this summer in Korea.

Published: 04.04.07

Downloads

Knowledge Graph for the Exploration of the Wikipedia

Based on the graph-browser JSaurus, I implemented Wikigraph, a simple graph-based visualization of the content of Wikipedia.org. Every node in the graph represents a topic. Topics are connected to each other if and only if one topic refers to the other one. The references are take from the meta tag keywords of the topic’s website.

But what is the advantage of the graph-browser Wikigraph? Well, first of all, it is possible to create a knowledge map of wikipedia‘s content. The knowledge map shows which topics are related to other topics. You have a breath overview about related topics. In other words you see the context of a selected topic.
As an example you can search for Informatics. As result you get Informatics and some linked topics (i.e., Mathematics, Information, Information System). You get the related topics to the related topics to. With all the relations you the context of the informatics.
The context can support you understanding a specific topic rather to read its content twice.

The main advantage is that you don’t have to find the right keyword to find the specific topic anymore. You search by context and not by keyword. You only have to search for a topic of the same context. You get a map of topics of the same context and you can selected the right one or browse further. So, Wikigraph provides not only searching by keyword, it provides searching by browsing too.

Evaluation of Graph Layout Algorithms

A readable graph respects aesthetic criteria of syntactic validity, perceptual organisation and aesthetic optimality as proposed by Kosak et al. in 1994. Some algorithms focus on minimising edge crossing whilst other focus on other aesthetic criteria.
Spring-embedded models and its variants fits aesthetic criteria. But which of them is the best?
First of all you have to define what best means. It depends on the scenario where the graph is used. Do you need a symmetric graph with lots of edge crossing, or to you need a graph to simulate molecular interactions? You have to give the answer! In this evaluation here best fits the following criteria:

  1. Performance: Short calculation time of node placement
  2. Scalability
  3. Aesthetic criteria: Small number of edge crossing, optimal organisation of vertices.

I’m evaluating the upper criterias for an implementation in Javascript. Javascript is highly sensible on calculation complexity. Lots of calculations and look-ups break down the speed very fast. Short calculation time is very important.

As already mentioned spring-embeded models fits aesthetic criteria very well. Using spring embeded models fits the 3. criteria of aesthetics. Let’s evaluate the algorithms in respect criteria 1 and 2.

The Spring Model

The spring model was originally proposed by Eades (1984). The concept is easy. For all connected vertices a attractive force fa(d) is calculated. A repulsive force fr(d) is calculated among all nodes not connected.

  • fa(d) = ka * log(d)
  • fr(d) = kr / d2

d is the current distance between two nodes and ka and kr are constants.
Let be n the number of nodes and r the number of relations.
According to the implementation with Javascript a division is as expensive as a multiplication. But the calculation of a logarithm is 4 times more expensive then a multiplication or a division. The calculation time is:

(n * (1+1))2+ r * (1 + 4) = 4*n2 + 5*r

The complexity is O(n2) because the repulsive force is calculated among all nodes.

The spring model is symmetric. It doesn’t try to reduce edge crossing. Reading those graphs could be problematic.

Force-directed Placement

The force-directed placement has been proposed by Fruchterman and Reingold (1991). This algorithm fits the criteria of minimised edge crossing. The spring model does not. This algorithm is based consists of attractive repulsive forces among nods. As in the spring model, attractive forces fa are calculated between two connected nodes and repulsive forces fr among all nodes.

  • fa(d) = d2 / k
  • fr(d) = -k2 / d

d is the distance between two nodes and k is the optimal distance betweent two nodes. k is calculated by the number of nodes and the drawing area.

Let be n the number of nodes and r the number of relations.
According to the implementation with Javascript a division is as expensive as a multiplication. But the calculation of a logarithm is 4 times more expensive then a multiplication or a division. The calculation time is:

(n * (1+1))2 + r * (1 + 1) = 4*n2 + 2*r

The complexity is O(n2) because the repulsive force is calculated among all nodes.
The force directed placement is better then the spring model because it fits better the criteria 3 about minimising edge crossing and has lower calculation complexity in calculating attractive forces. Force-directed placement strikes the spring model in performance issues too (criteria 1);

Local Minimum and Simulated Annealing

Both try to organise nodes and relations to minimise the energy of forces between nodes. The display results are the best, but is extremely expensive in the meaning of calculation complexity that is at least O(n2).
I implemented the algorithm of local minimum with minimising the energy state. The energy was calculated by the functions of the force-directed-graph. My experience with the local minimum is, that the nodes need lot more time to organise them self.
Simulated annealing is a very interesting concept. The difference between the this and other spring model based algorithm is, that you cool down the temperature (i.e., decrease ability of movement) on every step. You’ll get a stable system very fast depending on the amount of temperature decrease. The force-directed placement can be seen as a special version of simulated annealing, but without a temperature decrease. Depending on the implementation of simulated annealing, it is possible to get a runtime of O(n). Imagine a stable system with nodes. You add one with high kinetic energy. You only have to calculate the energies for the new nodes with all the other nodes. But doing this, you’ll get a bad organised system of nodes and relations.

Conclusion

I think the best graph drawing algorithm is a combination of the force-directed placement in combination with local minimum or simulated annealing. With the force-directed placement you get a good organised system in a short time. To reduce cpu time a change of algorithm is need, because it doesn’t make sense to move nodes only a little and calculate so much. I think local minimum or simulated annealing is a better choice for the calculation at the end because they are going to filter nodes from calculation. But all spring-embedded models aren’t scalable of cause the calculation time complexity of O(n2). We all have to live with it.