PhD Progress Report to August 2012

It's been awhile since I have posted on this topic so here's what I've been doing with my PhD. I have since made several improvement to the semantic web browser described in my last PhD update and have been experimenting with techniques to reduce information overload.

Reminder: My research tries to reduce information overload when displaying data where the structure of that data is not known ahead of time.

There is a good body of research in linking determining how related text strings are by meaning (the linguistics meaning of semantics). The problem for me is that the speed is not fast enough for building displays at runtime in a web-browser. I instead went with lexical methods that use naïve rules to just compare characters in a label. There are well establish algorithms for this; Dice, Levenshtein to name a two. My implementation of Dice in Javascript is now in the WikiBook project called Algorithm Implementation

What is not immediately clear is how these algorithms should determine similarity when two triplets with labels resolved are compared. Assuming that the subject is equal/equivalent then just how does x->firstname:John relate to x-->surname:Xeedown? My most recent chapter took Dice and Levenshtein string similarity algorithms and then compared an averaging of the predicate and object similarities versus simply taking the higher of the predicate or object similarities. I conducted a study with twenty human participants and have analysed the data. The data is indicating that a naïve lexical algorithm can give acceptable (i.e. “usefully better than random”) results in line with what human participants would say about the same triplet pairs. This is encouraging.

In that study I also tested any algorithm for determining if triplets were redundant. This is used to subsume (not display) triplets when it is deemed that other triplets contain equivalent information. I already had built such an algorithm using intuition and voodoo – and the testing on humans indicates that the algorithm is usefully better than random at matching what humans also rank as redundant data. This is also encouraging.

Using the redundancy algorithms I now manage to avoid displaying quite a number of triplets. In some cases, using data from dbpedia, about 55% of triplets are simply not displayed because they contain redundant information or are turned into lists.

I also used the similarity algorithms (actually the inverse) as the distance metric in hierarchical clustering of triplets. The cluster hierarchy is then flattened which results in groups of related triplets. While not giving perfect results, for a “first naïve attempt” at grouping triplets the results appear to be better than random – though I have not tested this.

I am currently writing up research results and learning tons about statistics as I go. It is particularly interesting to read about the debates in statistics for multi-rater agreement in Likert scales. After trying to come to grips with so much math I eventually went with the old method of just just looking at the shape of the of raters histogram.

Going forward from here is the next chapter. This one leaps off from the “first attempt” approaches described above and attempts to find algorithms that learn the user's preferences for ordering, grouping and redundancy. Recommender systems research has some particularly interesting avenues to explore here. I currently invisage a conversational User Interface that allows the user to express their data display preferences (order, grouping, redundancy etc) via direct manipulation.

The performance of any algorithms found and adjusted will then be tested against human raters. Once that study is complete I then theoretically switch into “write up” mode for the thesis but will need the ocassional coding distraction of actually implementing the results of the studies into Eme. Things are looking promising.

Arduino LCD: Hunt the Wumpus Conversion

This week I recreated an old text game that I remember as a kid in the '80s: Hunt the Wumpus. The game is apparently from 1971. Basically you wander around a map trying not to fall into a pit trap or get eaten by the wumpus. You win by shooting the wumpus - but don't shoot yourself!

Code: http://eturnerx.com/files/arduino/003_wumpus_hunt_eturnerx_arduino_lcd.html

The map is a squashed Dodecahedron. Arranged into three rings. The inner and outer rings have five rooms while the ring between them has ten rooms. Every room links to exactly three others; back and forth on the current ring and another to change the ring. While the interconnections in the map never change, my version of the game scrambles the room names whenever a new map is generated (on power up or when selected at game over). A new map also randomises the starting positions of the player, the pit, the bats and the wumpus.

Shots can travel up to five rooms, but can only travel into rooms that you have already visited and or have been one room away from. It is possible to shoot yourself.

The implementation uses a state machine in the main loop. It's a pretty big one but things don't seem to get out of hand. That's why I love the state machine approach; it allows the programmer to think of the overall structure but then just concentrate on the single task they were working on. For a much simpler state machine check out the previous video.

Checkout the other games I have converted


Arduino LCD & State Machine Demo: Life & Poison counters

Continuing on from the last video; adding a hidden Poison counter is a good chance to build a simple switch()...case state machine.

The code: http://eturnerx.com/files/arduino/002_mtg_lifepoisoncounter_eturnerx_arduino_lcd.html
Last video: http://youtu.be/t6Y_GCoSUt4

A more comprehensive example of a state machine is in the next video; A Hunt the Wumpus game conversion.


Arduino & Freetronics LCD - Magic:The Gathering Life Counter

My very first YouTube video. This is my first real Arduino project. The next video in the sequence explores adding a hidden poison counter.

Link to code:

LCD Quick Start Guide

LCD Product Page