DSA2006: Xquery

Visual-Literacy.org has published this great Periodic Table of methods of visualisation. This displays around 100 diagram types, with examples and a multi-faceted classification by:

simple to complex
data/information/concept/strategy/metaphor/compound
process/structure
detail/overview
divergence/convergence

The web page uses a Javascript library to display an example of a diagram type when you mouse-over its box. A neat trick but perhaps not very accessible, so I took the liberty of massaging this table to create a full listing of all the diagram types in alphabetical order. This format is more convenient for my purpose when teaching, and is a nice example of XML-scraping using XQuery.

These listings are made by:

taking the HTML source of the Periodic table
loading it into the eXist database. The source is accepted by eXist even though it is not well formed XML - missing quotes, bare <>
writing a query on XQuery to generate the page.

Find the html document with 'Periodic' in the title
Find all the A tags,
Get the onmouseover attribute
use some string functions to get the name and the source of the image from this string
sort by name
generate a div per tag

Here is the basic XQuery script for the plain listing:

List of methods

for $item in data(/HTML[contains(.//TITLE,'Periodic')]//A/@onmouseover)
let $name := lower-case(substring-before(substring-after($item,"window.status='"),"';"))
let $pix := substring-before(substring-after($item,'src="'),'">')
where string-length($pix) >0
order by $name
return
<div><a href="'http://www.visual-literacy.org/periodic_table/{$pix}'">{$name}</a>
</div>

In fact, instead of running a query against the raw HTML, I wrote a slightly different query to generate a simple XML file in which the basic data was stored in alphabetical order. Using an intermediate file also allow me to correct a couple of typos in the method names, and of course it is faster to generate the page. In addition, I've added the facility for a user to group methods and tag the group. Some links to Google images and Wikipedia have been added too. There's a lot more could be done with this.

Now what would be nice would be to get the raw data including the class names as XML so it could be re-organised and extended, without having to descend to scraping.

DSA2006

Thursday, 22 February 2007

Week 17 - Xquery and XML database

Monday, 19 February 2007

Periodic table of Visualization Methods

Useful Links

Labels

Blog Archive