Archive for the ‘Tool’ category

Applicability of Social Network Graph Patterns to Recommender Systems

January 24th, 2010
spotting user interface

User interface of the spotting application

Some research has already been done investigating Web-based social networks and its applicability for different tasks such as trust inferrencing with trust networks or collaborative filtering respectively recommender systems. My master student Reto Hodel applied social network analysis to a social network and used the metrics to predict item ratings.
For this purpose, he built a very nice looking Web-based location recommender system called spotting. In this application, people get many locations presented on a geographical map that is based on Google Maps. The meaning of the encoded locations is very intuitive. The size of the location determines the level of match between the person’s preferences and the location. The bigger the location is presented the more relevant it is.
In order to boost the process of social network developing, we decided to build uppon an existing social network. Thus, we can rely on a already existent social network that has developed the relations among people already. For this reason, we decided to use the Web-based social network of Facebook because it provides a useful API, the Facebook API. Instead of developing a Facebook app, we decided to develop a stand-alone application that uses the Facebook Connect provided by Facebook. Facebook Connect is a function that a person can use to login to a different Web page with his Facebook account information. With this function it is still possible to gather his public information from Facebook, such as personal information and friends.
The value respectively usefulness of a collaborative recommender system increases with the amount of people and ratings. Therefore, the theory of network externalities respectively network effects applies to such systems. Thus, attracting people to use and rate the application in the first place is crucial.
Besides the social network and API’s that are provided by Facebook and others, Facebook also provides the facility of publishing news on the person’s wall that is seen by his friends. We used this function to publish location ratings and attract his friends to use spotting too. Our experience showed, that even with relying on a existing and developed social network it needs a lot of marketing to attract people. People do not just use an application because their friends use it. In general, no general approach exists to overcome this problem. First attempts try to reduce the so-called cold-start problem that describes the situation where only few ratings exist. But this is just one part of the solution.

Social network

Social Network of spotting

Anyhow, we could get 139 people in 3 weeks. Their ratings were the basis of our analysis. We analyzed the metrics of trendsetter, cliques, friend chains and some others. Unfortunately, we could not show that one social network metric leads to more accurate rating predictions. In contrast, the location’s average rating has been shown to perform better or at least equal to the social network graph patterns that we have investigated. In addition, the average rating is very cheap to compute! We applied the Friedman test that gave us evidence that some significantly performance difference exist on the significance level of 5%. Therefore, we run the Wilcoxon rank test. On the given significance level, we could not determine one single approach that is better then all the other. Of cause, we applied the Bonferroni correction for the family-wise error. The predictions based on trendsetters have been shown to be worst.
But these results have to be taken with caution. Despite applying statistical tests, the data set it self has some major threats to validity. The social networks consists mainly of the personal social network of the master student. That means that the average rating consisted mainly of his friends’ ratings. This fact highly influenced the performance of all other social network patterns.
However, I think that the experiment shows that the simple concept of wisdom of crowds respectively average rating is a simple but effective approach to provide ratings. Further, the average rating has the highest item coverage since patterns such as cliques, trendsetters, etc. do not give that much information about that many items.

Error distribution of cliques

Error distribution of cliques

To conclude, social network patterns may provide valuable information to generate more accurate recommendations, but not in general. The high computational costs of computing social network metrics should be taken into account because some of them are NP-hard.
However, other researchers investigated more positive results. But their evaluations should be read rather carefully. Sometimes, they tweak the experimental setting by defining weak hypotheses and unit of analysis to favour the social networks.
Thus, just be careful when people sell their work on social networks and be aware of the structure of the social network itself, that influences highly the experimental results. For instance, not all Facebook friends are real friends in Real-life.

Abstract

Generating accurate recommendations for items, such as locations, movies or books, is challenging. Common Web-based recommender systems require information about the users’ past to generate suitable recommendations for them.
In this thesis we first present spotting.li, a location recommender system based on Facebook, which allows users to rate locations and generates recommendations inferred by their friends’ ratings. In doing so, we examine requirments to successfully implement such a system using the latest web technologies (i.e., Grails) and describe key elements of our approach. Our focus is put on performance and providing an easy-to-use interface incorporating Google Maps.
Furthermore, we analyse different recommendation approaches which leverage structural in- formation from a social network to predict ratings. In particular, we examine the use of social network patterns, such as cliques and trendsetters, as well as direct friends and two levels of indirect friends. We finally conduct an extensive evaluation of these approaches, based on real data collected during the time of the thesis.
To prove our findings, we test our dataset, based on 139 users, for statistical significance. We demonstrate that even a simple algorithm, such as the average rating, bares similar results to more elaborate algorithms.

Reto Hodel: “spotting – Realisation and Analysis of a Location Recommender System Based on Facebook“, ed. by Amancio Bouza and Harald C. Gall, University of Zurich, December 2009. (master thesis)

Personal, Private Movie Recommender System at the Semantic Web Challenge

December 19th, 2009
Preparing the stand for the OMORE presentation

Preparing the stand for the OMORE presentation

I advised together with Gerald Reif the master thesis of Tobias Bannwart about a personal cross-site movie recommender system that is implemented as Firefox add-on. The add-on is known as OMORE and can be downloaded. We decided to bring OMORE to market maturity. For this, we had to rethink its architectural design and usability and beside from its advantages we came up with the following open challenges:

  1. Movie cross-references: It is not known what movie of one provider corresponds to what other movie page of another provider. Providers may be commercial pages like Amazon.com, review pages like RottenTomatoes.com or knowledge bases such as IMDb.com
  2. Retrieval of movie cross-references: No flexible search service exists to retrieve movie cross-references based on movie title and release year information.
  3. Maintenance of movie cross-references: A vast amount of potential movie cross-references exists that is difficult to gathered with a Web crawler approach. In addition, the set of movie pages increases fast.

We came up with the following solutions:

  1. Movie cross-references: A knowledge base is needed to persist movie cross-references. Concretely, for all movies the information of (1) what movie pages represent the movie as its content and (2) what movie pages represent the commercial product of a movie such as DVD, Blu-Ray, VHS or even Video-On-Demand (VoD). This semantical distinction has to be done, because a movie represented in VHS and Blu-Ray are not the same but show the same movie. Therefore, we decided to apply Semantic Web technologies to persist movie cross-references. We applied D2R that maps data from a relational database management system to RDF. RDF is the basic format to represent resources semantically. Our knowledge base of movie cross-references is called LiMo.
  2. Retrieval of movie cross-references: A search service has to be provided, that is able to provide even fuzzy search on the movie cross-references knowledge base. We reason for fuzzy search because the movie are presented quite heterogeneously among different Web pages. Movie titles may be misspelled, transformed or even extended in various way. Especially on online shops, we experienced that the movie titles are extended with information about many variants of special or collector’s edition and the type of medium the movie is provided. Instead of trying to extract the original title from the unpurified title, we decided to apply fuzzy search over movie titles and release year to retrieve movies. Our movie retrieval service can be accessed at MOLookup.
  3. Maintenance of movie cross-references: A Web crawler approach is not feasible due to the time latency and the need for resources. Thus, we decided to invent a collaborative approach. Whenever a user browses a new movie Web page that is not yet cross-referenced with LiMo, OMORE automatically uses the movie retrieval services MOLookup and provides the current URL of the new movie page. Then, this URL is cross-referenced to the retrieved movie. With that approach we automatically gather all the relevant movie cross-references with the user’s help. This way, normal users even contribute to the Semantic Web without knowing it.

With this new approach, we decided to participate in the this year’s Semantic Web Challenge that is co-located with the International Semantic Web Conference 2009 (ISWC) in Washington D.C. We were 16 participants that made it to the Semantic Web Challenge in Washington D.C. We presented our movie recommender system and its revised architecture besides the official Poster and Demonstrations session the main conference. Our secret weapon to attract many people to our stand was Swiss chocolate. And well, it worked out ;) . The official time for the challenge presentation was 19:15-21:15. But people already showed up to our stand at half past 6 and kept coming by until 10 in the evening. One reason my be of course the chocolate ;) , but also the viral marketing that people started that saw our challenge. Overall, people were really excited about our personal and private movie recommender system that even provides cross-site movie recommendations.
Despite the great success, we didn’t made it to finals. However, the challenge was a really nice experience and we had still have the great success having people excited about our OMORE.

Abstract

Online stores and Web portals bring information about a myriad of items such as books, CDs, restaurants or movies at the user’s fingertips. Although, the Web reduces the barrier to the information, the user is overwhelmed by the number of available items. Therefore, recommender systems aim to guide the user to relevant items. Current recommender systems store user ratings on the server side. This way the scope of the recommendations is limited to this server only. In addition, the user entrusts the operator of the server with valuable information about his preferences.
Thus, we introduce the private, personal movie recommender OMORE, which learns the user model based on the user’s movie ratings. To preserve privacy, OMORE is implemented as Firefox add-on which stores the user ratings and the learned user model locally at the client side. Although OMORE uses the features from the movie pages on the IMDb site, it is not restricted to IMDb only. To enable cross-referencing between various movie sites such as IMDb, Amazon.com, Blockbuster, Netflix, Jinni, or Rotten Tomatoes we introduce the movie cross-reference database LiMo which contributes to the Linked Data cloud.

Presentation

In the following, you can watch my presentation I prepared for the Semantic Web Challenge:

Poster

In the following, you can see a preview of the Semantic Web Challenge 2009 poster titled “OMORE”:

Poster presentation of OMORE at the Semantic Web Challenge 2009

Poster presentation of OMORE at the Semantic Web Challenge 2009

Downloads

We include the papers on this page to ensure timely dissemination on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by the copyrights. These works may not be reposted without the explicit permission of the copyright holder.

OMORE – Personal Cross-Site Movie Recommender System Implemented as Mozilla Firefox Add-On

August 15th, 2009

OMORE recognizes a Movie Web site and adds automatically rating and recommendation functionality to a Movie Web site at Amazon.com

OMORE recognizes a Movie Web site and adds automatically rating and recommendation functionality to a Movie Web site at Amazon.com


Online stores or Web page bring information about a myriad of items such as books, CDs, restaurants or movies at the user’s fingertips. Although, the Web reduces the barrier to the information, the user is overwhelmed by the number of available items. Therefore, online stores provide recommender systems that aim to guide the user to relevant items. However, recommender systems are generally limited to the Web page’s content and the explicit or implicit ratings provided by the users on the particular Web page. User are lazy when it comes to repeat providing rating information to recommender systems on other Web pages. That is a typical lock-in situation based on high transaction costs such that people are addicted to one or at least a limited number of Web pages.
People are required to have an account on a particular Web page before being provided with interesting recommendations. People may have concerns about providing explicit or implicit ratings on items that may expose some delicate details about their privacy.
But many rather small online stores do not even provide recommendations.

Thus, we need a recommender system that (1) recognizes items over various Web pages as the same and remembers the ratings for those, (2) applies algorithms to provide recommendations and (3) smoothly integrates the rating and recommendation functionality directly in the Web site. We found the basic infrastructure for such a recommender system in the Firefox Add-on API and WEKA, a common data mining library.
We formulated these requirements as a master thesis. We were very happy to engage Tobias, an excellent master student.

The described recommender system implemented as Firefox Add-on can be downloaded at s.e.a.l. group site.

Abstract

Online stores and Web portals bring information about a myriad of items such as books, CDs, restaurants or movies at the user’s fingertips. Although, the Web reduces the barrier to the information, the user is overwhelmed by the number of available items. Therefore, recommender systems aim to guide the user to relevant items. Current recommender systems store user ratings on the server side. This way the scope of the recommendations is limited to this server only. In addition, the user entrusts the operator of the server with valuable information about his preferences.

In this thesis, we introduce our recommender system OMORE, a private, personal movie recommender, which learns the user model based on the user’s movie ratings. To preserve privacy, OMORE is implemented as a Mozilla Firefox add-on, which stores the user’s ratings and the learned user model locally at the client side. Although OMORE makes use of the movie features, which are provided by the different movie pages on the Amazon.com, Blockbuster, Netflix and Rotten Tomatoes.

Zusammenfassung

Online-Geschäfte und Web Portale bieten einem Kunden im Allgemeinen eine riesige Auswahl an Filmen oder Büchern an. Oftmals ist dieser aber mit der riesigen Auswahl an vorhandenen Artikeln überfordert und braucht Unterstützung, um die für ihn interessanten Produkte auch wirklich zu finden. Empfehlungssysteme haben sich bewährt und sind sehr erfolgreich beim Filtern von grossen Datenbeständen. Doch nur wenige Portale wie das Online-Geschäftshaus Amazon bieten einem Kunden ein solches Empfehlungssystem zur aktiven Unterstützung an.

In der Regel basieren die von einem Empfehlungssystem vorgeschlagenen Produktempfehlungen auf den Bewertungen von anderen Kunden. Diese werden in den heutzutage verfügbaren Empfehlungssystemen häufig von den Anbietern eines Web Portals individuell verwaltet, so dass sie dadurch nicht auf anderen Portalen wie zum Beispiel dem Online DVD Verleih Netflix verwendet werden können. Zudem vertraut ein Kunde einem Anbieter eines solchen Portals oft sehr vertrauliche Informationen über sein Kaufverhalten und seine Präferenzen an.

In dieser Arbeit soll daher ein portalunabhängiges Empfehlungssystem entwickelt werden, welches direkt im Web-Browser integriert ist. Das von uns auf den Namen OMORE getaufte Empfehlungssystem, ist ein auf Sicherheit ausgerichtetes personalisiertes Empfehlungssystem für Filmliebhaber, welches als Erweiterung für den Mozilla Firefox Browser angeboten wird. Es lernt die Benutzerpräferenzen basierend auf den Filmbewertungen eines Benutzers und speichert das gelernte Model der Benutzerpräferenzen lokal auf dem System des Benutzers ab. Dadurch wird sichergestellt, dass die Benutzerpräferenzen vor unbefugtem Zugriff geschützt sind. OMORE bietet einem Benutzer portalübergreifende Empfehlungen an, wobei die aktuelle Implementierung die Filmseiten von Amazon.com, Blockbuster, der Internet Movie Database, Netflix und Rotten Tomatoes umfasst.

Tobias Bannwart: “OMORE – Private, Personal Movie Recommendations implemented in a Mozilla Firefox Add-on“, ed. by Amancio Bouza, Gerald Reif and Harald C. Gall, University of Zurich, July 2009. (master thesis)

Downloads:

Wikigraph – A Graph-Browser for Wikipedia

December 3rd, 2006
Wikigraph

Wikigraph

Based on the graph-browser JSaurus, I implemented Wikigraph, a simple graph-based visualization of the content of Wikipedia.org. Every node in the graph represents a topic. Topics are connected to each other if and only if one topic refers to the other one. The references are take from the meta tag keywords of the topic’s website.

But what is the advantage of the graph-browser Wikigraph? Well, first of all, it is possible to create a knowledge map of wikipedia’s content. The knowledge map shows which topics are related to other topics. You have a breath overview about related topics. In other words you see the context of a selected topic.
As an example you can search for Informatics. As result you get Informatics and some linked topics (i.e., Mathematics, Information, Information System). You get the related topics to the related topics to. With all the relations you the context of the informatics.
The context can support you understanding a specific topic rather to read its content twice.

The main advantage is that you don’t have to find the right keyword to find the specific topic anymore. You search by context and not by keyword. You only have to search for a topic of the same context. You get a map of topics of the same context and you can selected the right one or browse further. So, Wikigraph provides not only searching by keyword, it provides searching by browsing too.

Introducing Jsaurus

November 23rd, 2006

Jsaurus is a visualization tool to display a thesaurus with its nodes and relations in between. Jsaurus is written in JavaScript and DHTML. The goal of Jsaurus is to provide a piece of softare that manages every type of thesaurus and manages the visualization and behavior of nodes and relations too.
Jsaurus is build with the MVC design pattern. This pattern separates the model (data), the visualization and the control of the model from each other and defines interfaces to communicate between each layer. The advantages is the creation of more transparency and each layer can easy replaced by a new version or a complitely other one. In the Jsaurus case, the model is the thesaurus, the controller and eventhandler build the control layer and the visualization layer consists of a particle system and a renderer.

Below you can see an example of a thesaurus with 5 nodes and wihthout any relations. The particle system calculates the behavior of the nodes in the viszalization. The current particle system gives a kind of gravitation to each node. It calculates the force of gravitation and infers the velocity and position of each node. The example below shows remembers to a 3D planet system.

I’m developing Jsaurus for my diploma thesis about a Graph Based Knowledge Browser for a CMS. I’m looking forward to visualize knowledge maps for enterprises using Microsoft Sharepoint Portal Server. But I’m still in the beginning of my diploma thesis. It will end in 6 months from now on.

Information Visualisation of Mailinglists with the Game Engine Doom3

January 23rd, 2006
Screenshot of visualised e-mails.

Screenshot of visualised e-mails.

Ich besuche in diesem Semester (WS05/06) die Vorlesung Information Visualization in the Information Management domain, die von Dr. Malgorzata Bugajska abgehalten. Als Projekte neben der Vorlesung müssen wir eine Visualisierung implementieren, die ein vorhandenes Problem eines Gebietes aus dem Informationsmanagement lösen soll. Mit einer Gruppe haben wir Mailinglistenarchive als Thema ausgesucht, da ich zeitgleich gerade eine Seminararbeit über die Expertenfindung in Mailinglisten schreibe. Wir haben lange diskutiert mit welcher Technologie wir die Visualisierung vornehmen wollen. Java konnten wir alle. Doch für eine Visualisierung müsste man sehr viel Zeit investieren. Anders ist das Flash. Mit nur wenig Aufwand kommt man zu einer sehr prächtigen äusseren Erscheinung. Allerdings hatte keiner aus der Gruppe genügend Erfahrung damit. Ich bin dann schliesslich auf die Idee mit der Visualisierung auf Basis einer Spieleengine gekommen. Ich hatte bis Dato genügend Erfahrung mit Mapping sammeln können in einem Projekt am Institut für Publizistik unter Prof. Werner Wirth, wo es um die Erforschung des Flowerlebnisses und Emotionen in virtuellen Räumen ging.

Mailinglistenarchive werden heute im Web als hierarchische angezeigt. Das Problem dabei ist, dass die Betreffs der Emails wenig oder überhaupt nichts über den Inhalt einer Email verrrät. Die meisten Antwortmails führen einfach noch ein “Re:” im Betreff. Der Benutzer muss sich also alle Emails ansehen, was natürlich viel Zeit kostet.
Mit unserer Visualisierung möchten wir dem Benutzer eine Erlebniswelt bieten, wo er wie in einem Multiplayerspiel, zusammen mit anderen Benutzern ein Mailarchiv durchstöbert. Dabei steht ein Raum für eine Email. Die Antwortmails werden ebenfalls als Raum dargestellt und über Gänge mit dem Ausgangsmail verbunden. Die Benutzer laufen von Raum zu Raum und können die Emails lesen, die in der Mitter des Raumes projiziert ist. Damit hat der Benutzer die gleiche Funktionalität wie bei den anderen üblichen Darstellungen von Mailinglistenarchive.
Das spezielle an unserer Visualisierung ist, dass die Benutzer durch Beschuss auf die projizierten Emails das Licht im Raum dämmen und so die Qualität der Emails bewerten. Dabei enthält ein heller Raum eine qualitativ gute Email und ein dunkler Raum eine schlechte Email. Durch beschuss auf einen Mülleimer klassiert der Benutzer die Email als Spam. Dabei haben wir die Metapher in Emailclients verwendet, die Spam direkt in den Mülleimer werfen. Als visueller Effekt wir auf im Raum befindlichen Monitoren eine Animation des CrazyFrogs gezeigt, dessen Jamba-Klingeltonsound abgespielt und ein Spiel von Discolichtern angezeigt, die mit einem Stroboeffekt aufleuchten. Durch beschuss auf eine Telefon wird die Email als sittenwiedrig klassiert. Als visueller Effekt taucht pinkfarbener Nebel auf und im Hintergrund wird das Lied “Je t’aime…moi non plus” gespielt. Um Emails als Flamingmails zu klassieren, muss man auf die Spielekonsole Super Turbo Turkey Puncher schiessen.

Diese Bewertungen bleiben im Level enthalten und jeder neue Benutzer tritt in ein Mailinglistenarchiv ein, welches durch früherer Benutzer bewertet worden ist. Somit sieht er auf einen Blick, welche Emails lesenswert sind und welche nicht, da diese Bewertungen als Farbsignal über den Türen stehen, die zu diesen Emailräumen führen.
Ein zusätzliches Plus ist die Tatsache, dass mehrere Leute gleichzeitig im Mailarchiv stöbern können. Die Benutzer begenen sich und können mit einander mittels Chatfunktion in Kontakt treten. Dabei könnten sie sich gegenseitig helfen und auf Emails verweisen, die sie bereits besucht haben.

In kürze werde ich den Prototypen hier auf dieser Homepage veröffentlichen. Jeder der Doom3 bei sich auf dem Computer installiert hat, kann dann die Map des Mailinglistenarchives testen.

Downloads

mapfiles4mailinglist