Showing posts with label Xquery. Show all posts
Showing posts with label Xquery. Show all posts

Thursday, 22 February 2007

Week 17 - Xquery and XML database

This week we look at an alternative technology for working with XML. This uses a language called XQuery which is an extension of XPath. There are many XQuery implementations and we will be using one which is part of an open-source project providing a native XML databas.

Monday, 19 February 2007

Periodic table of Visualization Methods

Visual-Literacy.org has published this great Periodic Table of methods of visualisation. This displays around 100 diagram types, with examples and a multi-faceted classification by:
  • simple to complex
  • data/information/concept/strategy/metaphor/compound
  • process/structure
  • detail/overview
  • divergence/convergence
The web page uses a Javascript library to display an example of a diagram type when you mouse-over its box. A neat trick but perhaps not very accessible, so I took the liberty of massaging this table to create a full listing of all the diagram types in alphabetical order. This format is more convenient for my purpose when teaching, and is a nice example of XML-scraping using XQuery.

These listings are made by:
  1. taking the HTML source of the Periodic table
  2. loading it into the eXist database. The source is accepted by eXist even though it is not well formed XML - missing quotes, bare <>
  3. writing a query on XQuery to generate the page.
    1. Find the html document with 'Periodic' in the title
    2. Find all the A tags,
    3. Get the onmouseover attribute
    4. use some string functions to get the name and the source of the image from this string
    5. sort by name
    6. generate a div per tag
Here is the basic XQuery script for the plain listing:

List of methods

for $item in data(/HTML[contains(.//TITLE,'Periodic')]//A/@onmouseover)
let $name := lower-case(substring-before(substring-after($item,"window.status='"),"';"))
let $pix := substring-before(substring-after($item,'src="'),'">')
where string-length($pix) >0
order by $name
return
<div><a href="'http://www.visual-literacy.org/periodic_table/{$pix}'">{$name}</a>
</div>

In fact, instead of running a query against the raw HTML, I wrote a slightly different query to generate a simple XML file in which the basic data was stored in alphabetical order. Using an intermediate file also allow me to correct a couple of typos in the method names, and of course it is faster to generate the page. In addition, I've added the facility for a user to group methods and tag the group. Some links to Google images and Wikipedia have been added too. There's a lot more could be done with this.


Now what would be nice would be to get the raw data including the class names as XML so it could be re-organised and extended, without having to descend to scraping.