Starting as PhD Student in the Area of Recommender Systems and Semantic Web Technologies

I’m starting my PhD in July. I’m associated with two research groups from the Department of Informatics at the University of Zurich. That are the Software Evolution and Architecture Lab (s.e.a.l.) headed by Prof. Harald C. Gall and the Dynamic and Distributed Information Systems (DDIS) headed by Prof. Bernstein. I’m lucky to start working on the hot topic of recommender systems. The overall goal of the joint work with an industry partner is the development of a web-based location recommender system. This project is partially supported by the CTI (commission of technology and innovation) because of its high potential use in different areas like tourism. The CTI is the national agency for innovation. The CTI supports the knowledge transfer between companies and universities to create innovation and not just inventions.

Well, I’m extremely happy and proud having this chance to promovate advised by excellent advisors even if I get payed less then I would get in the industry. But who cares about money, right?. What I looked for was a great challenge that only a PhD can provide.

Kick-Off Meeting of the Localina project

Kick-Off Meeting of the Localina project

Graph-Based Knowledge Browser for Content Management Systems

A week ago, the 16th of May, I finished my master thesis about an implementation of a graph-based knowledge browser for a content management system (CMS). Now, I spend time in relaxing and organising my next steps to the future. The current IT market is asking for graduates and I remark it every time taking a look to my email account, if you know what I mean ;).

The time writing my master thesis was absolutely great. It was extremely work intensive, but great! I got a lot of support from my family and girlfriend, and from my friends too. Thank you at this point. Now, I know what it means to work from 7.30 to 23.00 over several months every day including the weekend. I experienced that working over 100 hours a week isn’t really efficient. You work not at 100% and your personal efficiency rate decreases every additional day. It’s not recommendable to work many over a long period of time. It does not only affect your work, if you know what I mean.

Anyhow, my thesis’s topic seems to be very attractive. During my research I got lots of requests from all over the world. A Professor from the St. Louis University in America was interested in using my tool for further research on Natural Language Processing (NLP). Other knowledge workers wanted to share experience. The results of my research and the potential of an interactive knowledge map for knowledge transfer leads me to possible future works. I just got some recommendations searching for venture capitalist to innovate my invention. We’ll see. It’s not the only project on my fingertips.

Abstract

The success of knowledge transfer is crucial in the area of knowledge management. Not only companies in outsourcing-relations have the need of successful knowledge transfer. Organisations have the need of successful knowledge transfer too in order to create market advantages. This thesis introduces a graph-based knowledge browser for a CMS to support the topic of knowledge transfer by providing ?shared material? for generating knowledge and providing easy access to knowledge by visualising knowledge as associative networks. Knowledge is presented as graph or radial layout in hyperspace. Web 2.0 technologies like AJAX and SVG are used for the implementation.

Knowledge Graph for the Exploration of the Wikipedia

Based on the graph-browser JSaurus, I implemented Wikigraph, a simple graph-based visualization of the content of Wikipedia.org. Every node in the graph represents a topic. Topics are connected to each other if and only if one topic refers to the other one. The references are take from the meta tag keywords of the topic’s website.

But what is the advantage of the graph-browser Wikigraph? Well, first of all, it is possible to create a knowledge map of wikipedia‘s content. The knowledge map shows which topics are related to other topics. You have a breath overview about related topics. In other words you see the context of a selected topic.
As an example you can search for Informatics. As result you get Informatics and some linked topics (i.e., Mathematics, Information, Information System). You get the related topics to the related topics to. With all the relations you the context of the informatics.
The context can support you understanding a specific topic rather to read its content twice.

The main advantage is that you don’t have to find the right keyword to find the specific topic anymore. You search by context and not by keyword. You only have to search for a topic of the same context. You get a map of topics of the same context and you can selected the right one or browse further. So, Wikigraph provides not only searching by keyword, it provides searching by browsing too.

Evaluation of Graph Layout Algorithms

A readable graph respects aesthetic criteria of syntactic validity, perceptual organisation and aesthetic optimality as proposed by Kosak et al. in 1994. Some algorithms focus on minimising edge crossing whilst other focus on other aesthetic criteria.
Spring-embedded models and its variants fits aesthetic criteria. But which of them is the best?
First of all you have to define what best means. It depends on the scenario where the graph is used. Do you need a symmetric graph with lots of edge crossing, or to you need a graph to simulate molecular interactions? You have to give the answer! In this evaluation here best fits the following criteria:

  1. Performance: Short calculation time of node placement
  2. Scalability
  3. Aesthetic criteria: Small number of edge crossing, optimal organisation of vertices.

I’m evaluating the upper criterias for an implementation in Javascript. Javascript is highly sensible on calculation complexity. Lots of calculations and look-ups break down the speed very fast. Short calculation time is very important.

As already mentioned spring-embeded models fits aesthetic criteria very well. Using spring embeded models fits the 3. criteria of aesthetics. Let’s evaluate the algorithms in respect criteria 1 and 2.

The Spring Model

The spring model was originally proposed by Eades (1984). The concept is easy. For all connected vertices a attractive force fa(d) is calculated. A repulsive force fr(d) is calculated among all nodes not connected.

  • fa(d) = ka * log(d)
  • fr(d) = kr / d2

d is the current distance between two nodes and ka and kr are constants.
Let be n the number of nodes and r the number of relations.
According to the implementation with Javascript a division is as expensive as a multiplication. But the calculation of a logarithm is 4 times more expensive then a multiplication or a division. The calculation time is:

(n * (1+1))2+ r * (1 + 4) = 4*n2 + 5*r

The complexity is O(n2) because the repulsive force is calculated among all nodes.

The spring model is symmetric. It doesn’t try to reduce edge crossing. Reading those graphs could be problematic.

Force-directed Placement

The force-directed placement has been proposed by Fruchterman and Reingold (1991). This algorithm fits the criteria of minimised edge crossing. The spring model does not. This algorithm is based consists of attractive repulsive forces among nods. As in the spring model, attractive forces fa are calculated between two connected nodes and repulsive forces fr among all nodes.

  • fa(d) = d2 / k
  • fr(d) = -k2 / d

d is the distance between two nodes and k is the optimal distance betweent two nodes. k is calculated by the number of nodes and the drawing area.

Let be n the number of nodes and r the number of relations.
According to the implementation with Javascript a division is as expensive as a multiplication. But the calculation of a logarithm is 4 times more expensive then a multiplication or a division. The calculation time is:

(n * (1+1))2 + r * (1 + 1) = 4*n2 + 2*r

The complexity is O(n2) because the repulsive force is calculated among all nodes.
The force directed placement is better then the spring model because it fits better the criteria 3 about minimising edge crossing and has lower calculation complexity in calculating attractive forces. Force-directed placement strikes the spring model in performance issues too (criteria 1);

Local Minimum and Simulated Annealing

Both try to organise nodes and relations to minimise the energy of forces between nodes. The display results are the best, but is extremely expensive in the meaning of calculation complexity that is at least O(n2).
I implemented the algorithm of local minimum with minimising the energy state. The energy was calculated by the functions of the force-directed-graph. My experience with the local minimum is, that the nodes need lot more time to organise them self.
Simulated annealing is a very interesting concept. The difference between the this and other spring model based algorithm is, that you cool down the temperature (i.e., decrease ability of movement) on every step. You’ll get a stable system very fast depending on the amount of temperature decrease. The force-directed placement can be seen as a special version of simulated annealing, but without a temperature decrease. Depending on the implementation of simulated annealing, it is possible to get a runtime of O(n). Imagine a stable system with nodes. You add one with high kinetic energy. You only have to calculate the energies for the new nodes with all the other nodes. But doing this, you’ll get a bad organised system of nodes and relations.

Conclusion

I think the best graph drawing algorithm is a combination of the force-directed placement in combination with local minimum or simulated annealing. With the force-directed placement you get a good organised system in a short time. To reduce cpu time a change of algorithm is need, because it doesn’t make sense to move nodes only a little and calculate so much. I think local minimum or simulated annealing is a better choice for the calculation at the end because they are going to filter nodes from calculation. But all spring-embedded models aren’t scalable of cause the calculation time complexity of O(n2). We all have to live with it.

Information Visualization of Meta Data of an Operational Datastore

My student project about Metadata-Management at the Zürcher Kantonalbank (ZKB) was a big success. The goals were to implement a prototype of an easy-to-use application for browsing, searching and navigating through meta data, defining processes how people have to updated and import meta data, defining a data model where meta data of every business area can inserted and to evaluate some more questions.
I invented a new type of visualization for meta data of an operational data store or Data-Warehouse.
Well, people were quite impressed being presented such a visualization, that abstracts from technical elements and elements from the business areas. With the developed browser you can see the relations between technical elements and elements from the business area. With this prototype it’s easy to make a complete impact analysis, making reports, see how elements are related to each other, and so on.
Unfortunately I can’t publish a screenshot of the prototype or go more deeper into details because the results of my student project is only for ZKB’s internal.

The duration of this student project was 12 weeks. In this time a met a lot of friendly and competent people at the ZKB. They all were nicely and friendly. I feel sadly to stop working there, but I have to finish my study and write my diploma thesis about an Implementation of a Graph Based Knowledge Browser for a CMS. With this implementation it is possible to measure the amount of knowledge transfer from one company to another e.g. in a outsourcing process.
In this student project I learned a lot of the factor humans. At the university you learn to solve problems and invent and innovate new solutions. But the focus is set to the solution. But in the real life, you have to make people understand your solution, sensibilize people for the problem and the need for a solution. It’s not a problem, it’s a challenge!
I wrote quite a lot source code and documentation. Just to imagine how much I wrote in 12 Weeks, I’ll list them all:

  • over 14’000 lines of code
  • 119 pages of program documentation
  • 29 pages of user documentation
  • 25 pages of data model descriptions (tables, attributes)
  • 61 pages for the final paper
  • 11 presentations in different business areas of the banc

I can recommend the ZKB to everyone making a studyproject there. It’s a good company with very nice people.

Expert Identification Based on E-Mail Analysis

Answer distribution of different authors

Answer distribution of different authors

Dieses Semester hatte ich das Seminar Informationsmanagement am Institut für Informatik an der Universität Zürich belegt. Das Thema des Seminars war “empirische Forschung in der Wirtschaftsinformatik”. Während dem Semester hatten alle Teilnehmer eine Seminarbeit verfasst. Heute und morgen präsentieren die Seminarteilnehmer ihre Arbeit und Resultate.
Meine Präsentation fand schon heute morgen statt von 10:30-11:15. Meine Präsentation wurde zeitlich etwas nach vorne geschoben, da ein paar Teilnehmer und ich selber um 12Uhr noch eine Prüfung in “Innovations und Technologiemanagement” ablegen mussten. Aber das ist ja nur Nebensache ;).
Mein Thema war Dokumentenanalyse: Expertenbestimmung durch E-Mail-Analyse. Die Präsentation verlief sehr gut. Nach der Präsentation überhäufte mich mein Professor mit Komplimenten. Er war begeistert wie methodisch ich bei der Analyse der Fragestellung vorgegangen bin und sagte sogar, dass so manche Redner an einer wissenschaftlichen Konferenz mit weit weniger auftreten. Nach seinem Kommentar war ich natürlich mächtig stolz auf meine Leistung und die Euphorie übertrug sich dann auch auf die gleich anschliessende Prüfung. Da meine Seminararbeit eigentlich bereits eine kleine Forschungsarbeit war möchte er, dass ich noch etwas mehr in meine Arbeit stecke, um die Resultate anschliessend an einer wissenschaftlichen Konferenz zu publizieren.
Hui, ich bin ja so was von durch den Wind über diese neue Perspektive. Mein erstes Paper ist zum greifen nah!!!

Download

3D-Ego-Shooter-Visualization and Experience of Mailinglists with the Game Engine of Doom3

Screenshot of visualised e-mails.

Screenshot of visualised e-mails.

Ich besuche in diesem Semester (WS05/06) die Vorlesung Information Visualization in the Information Management domain, die von Dr. Malgorzata Bugajska abgehalten. Als Projekte neben der Vorlesung müssen wir eine Visualisierung implementieren, die ein vorhandenes Problem eines Gebietes aus dem Informationsmanagement lösen soll. Mit einer Gruppe haben wir Mailinglistenarchive als Thema ausgesucht, da ich zeitgleich gerade eine Seminararbeit über die Expertenfindung in Mailinglisten schreibe. Wir haben lange diskutiert mit welcher Technologie wir die Visualisierung vornehmen wollen. Java konnten wir alle. Doch für eine Visualisierung müsste man sehr viel Zeit investieren. Anders ist das Flash. Mit nur wenig Aufwand kommt man zu einer sehr prächtigen äusseren Erscheinung. Allerdings hatte keiner aus der Gruppe genügend Erfahrung damit. Ich bin dann schliesslich auf die Idee mit der Visualisierung auf Basis einer Spieleengine gekommen. Ich hatte bis Dato genügend Erfahrung mit Mapping sammeln können in einem Projekt am Institut für Publizistik unter Prof. Werner Wirth, wo es um die Erforschung des Flowerlebnisses und Emotionen in virtuellen Räumen ging.

Mailinglistenarchive werden heute im Web als hierarchische angezeigt. Das Problem dabei ist, dass die Betreffs der Emails wenig oder überhaupt nichts über den Inhalt einer Email verrrät. Die meisten Antwortmails führen einfach noch ein “Re:” im Betreff. Der Benutzer muss sich also alle Emails ansehen, was natürlich viel Zeit kostet.
Mit unserer Visualisierung möchten wir dem Benutzer eine Erlebniswelt bieten, wo er wie in einem Multiplayerspiel, zusammen mit anderen Benutzern ein Mailarchiv durchstöbert. Dabei steht ein Raum für eine Email. Die Antwortmails werden ebenfalls als Raum dargestellt und über Gänge mit dem Ausgangsmail verbunden. Die Benutzer laufen von Raum zu Raum und können die Emails lesen, die in der Mitter des Raumes projiziert ist. Damit hat der Benutzer die gleiche Funktionalität wie bei den anderen üblichen Darstellungen von Mailinglistenarchive.
Das spezielle an unserer Visualisierung ist, dass die Benutzer durch Beschuss auf die projizierten Emails das Licht im Raum dämmen und so die Qualität der Emails bewerten. Dabei enthält ein heller Raum eine qualitativ gute Email und ein dunkler Raum eine schlechte Email. Durch beschuss auf einen Mülleimer klassiert der Benutzer die Email als Spam. Dabei haben wir die Metapher in Emailclients verwendet, die Spam direkt in den Mülleimer werfen. Als visueller Effekt wir auf im Raum befindlichen Monitoren eine Animation des CrazyFrogs gezeigt, dessen Jamba-Klingeltonsound abgespielt und ein Spiel von Discolichtern angezeigt, die mit einem Stroboeffekt aufleuchten. Durch beschuss auf eine Telefon wird die Email als sittenwiedrig klassiert. Als visueller Effekt taucht pinkfarbener Nebel auf und im Hintergrund wird das Lied “Je t’aime…moi non plus” gespielt. Um Emails als Flamingmails zu klassieren, muss man auf die Spielekonsole Super Turbo Turkey Puncher schiessen.

Diese Bewertungen bleiben im Level enthalten und jeder neue Benutzer tritt in ein Mailinglistenarchiv ein, welches durch früherer Benutzer bewertet worden ist. Somit sieht er auf einen Blick, welche Emails lesenswert sind und welche nicht, da diese Bewertungen als Farbsignal über den Türen stehen, die zu diesen Emailräumen führen.
Ein zusätzliches Plus ist die Tatsache, dass mehrere Leute gleichzeitig im Mailarchiv stöbern können. Die Benutzer begenen sich und können mit einander mittels Chatfunktion in Kontakt treten. Dabei könnten sie sich gegenseitig helfen und auf Emails verweisen, die sie bereits besucht haben.

In kürze werde ich den Prototypen hier auf dieser Homepage veröffentlichen. Jeder der Doom3 bei sich auf dem Computer installiert hat, kann dann die Map des Mailinglistenarchives testen.

Downloads

mapfiles4mailinglist