DSA2006: February 2007

Wednesday 28 February 2007

Week 18 - XSLT

XSLT is a rather different language for transforming XML documents than XQuery, but it shares much of the same functionaility. It is more widely used than XQuery partly because there are a number of XSLT processors readily available to use server-side and browsers now include an XSLT processor so that the transformation can be made in the client.

I also want show how XML schemas can be used - for validation of an XML document, and in InfoPath to create a data entry form.

slides

Workshop

Based on the sample data, stylesheet and CSS in this directory

http://www.cems.uwe.ac.uk/~cjwallac/apps/scotch/

1. Copy these files to a directory of your own. There is a zip of the files you need.

2. Test distillery-2.xml to make sure it is working as it does in my directory

3. Make a simple modifucation to the CSS to change the output

4. Add another simple template to the XSLT to display another item of information in the file.

5. Modify the stylesheet for distilleries to display

6. Use InfoPath to create a simple form for either the whisky data or for your own data.

Thursday 22 February 2007

Week 17 - Xquery and XML database

This week we look at an alternative technology for working with XML. This uses a language called XQuery which is an extension of XPath. There are many XQuery implementations and we will be using one which is part of an open-source project providing a native XML databas.

slides
Distillery example

eXist Database
Xquery on w3 schools

Monday 19 February 2007

Periodic table of Visualization Methods

Visual-Literacy.org has published this great Periodic Table of methods of visualisation. This displays around 100 diagram types, with examples and a multi-faceted classification by:

simple to complex
data/information/concept/strategy/metaphor/compound
process/structure
detail/overview
divergence/convergence

The web page uses a Javascript library to display an example of a diagram type when you mouse-over its box. A neat trick but perhaps not very accessible, so I took the liberty of massaging this table to create a full listing of all the diagram types in alphabetical order. This format is more convenient for my purpose when teaching, and is a nice example of XML-scraping using XQuery.

These listings are made by:

taking the HTML source of the Periodic table
loading it into the eXist database. The source is accepted by eXist even though it is not well formed XML - missing quotes, bare <>
writing a query on XQuery to generate the page.

Find the html document with 'Periodic' in the title
Find all the A tags,
Get the onmouseover attribute
use some string functions to get the name and the source of the image from this string
sort by name
generate a div per tag

Here is the basic XQuery script for the plain listing:

List of methods

for $item in data(/HTML[contains(.//TITLE,'Periodic')]//A/@onmouseover)
let $name := lower-case(substring-before(substring-after($item,"window.status='"),"';"))
let $pix := substring-before(substring-after($item,'src="'),'">')
where string-length($pix) >0
order by $name
return
<div><a href="'http://www.visual-literacy.org/periodic_table/{$pix}'">{$name}</a>
</div>

In fact, instead of running a query against the raw HTML, I wrote a slightly different query to generate a simple XML file in which the basic data was stored in alphabetical order. Using an intermediate file also allow me to correct a couple of typos in the method names, and of course it is faster to generate the page. In addition, I've added the facility for a user to group methods and tag the group. Some links to Google images and Wikipedia have been added too. There's a lot more could be done with this.

Now what would be nice would be to get the raw data including the class names as XML so it could be re-organised and extended, without having to descend to scraping.

Sunday 18 February 2007

Hotlinks - week 16

Bloglines Image Wall
Mashups tagged with science
Elliotte Rusty Harold's XML predictions for 2007

Friday 16 February 2007

Coursework tips

Icons

Here is the full set of icons which Goggle supply.

See Lecture 16 for an example of its use

Locations

Google Earth will display the location of a point in either decimal degrees or degrees minutes seconds - you can select which in the options.

With GoogleMaps, the lat and long appear in the URL of a place - you may have to zoom in and out to get it to appear in this format.

If your data has locations in degrees, minutes and seconds, you can convert to decimal degrees using the formula

decimal-degrees := degrees + minutes / 60 + seconds /3600

You will have to take account of the direction too. N and E are positive, S and W negative.

Wednesday 14 February 2007

Week 16 - designing an XML vocabulary, XML Schema

This week we turn our attention to the design of an XML document or documents. We look at designing the schema for a single document using the QSEE case tool to generate XML schema - the top-down route. We also look at a bottom-up approach using trang.

The workshop looks at creating a simple XML document to describe Whisky Distilleries.

Saturday 10 February 2007

Pipes and Filters Architecture

Yahoo have recently launched pipes, a visual programming environment for creating a mashup RSS feed from user inputs and available RSS sources. XML languages for defining pipelines are emerging.

Here are some bloogers on the subject:

John Musser (Programmeable Web)
Tim O'Reilly.
Fred Stutzman (about the need to change web applications to support fine-grained RSS feeds.)
TechCrunch
Kurt Cagle (on XML pipelines)
Jeni Tennison's Xtech paper is an excellent overview of XML pipelines

Issues with these languages include

whether the pipeline itself is expressed in XML (and thus processable with XML tools)
whether non-XML data streams are allowed. For example where an intermediate file is non-XML (e.g. Graphviz dot) or the output is non XML (a GIF image)

The origin of Pipes as a concept in which the output of one process is conected to the input of another is in the Unix operating system - Unix Pipe

Ant is widely used in the Java community as a build tool, but can perform XML pipelining.

To use a pipe architecture, we need component filters to carry out standard transformations.

Dapper is a tool for scraping HTML pages to create an XML or RSS feed . The neat thing about this tool is that you can give it a number of similar pages and Dapper will try to infer which data items differ page to page, and how to recognise each item. You then name the items you want to scrape and can form these into an HTML, XML or RSS feed.
RSS or Atom to PDF e.g. BBC Bristol Weather

Thursday 8 February 2007

Coursework 2

Here is the specification for the second Coursework. This is an individual assignment in which you will build a simple mashup based on GoogleEarth. This brings together the results of workshops in which PHP and SimpleXML is used to transform RSS and generate kml and a basic XML schema and data are developed.

Specification HTML Word

Wednesday 7 February 2007

Lecture and workshop week 15

In this workshop you will be preparing to create a dynamic overlay for GoogleEarth.

The key learning points are

GoogleEarth is extended with user defined overlays, either static or dynamic
the XML vocabulary is called kml
a valid kml file needs very few elements to create a minimal file
the file must have a Mime type of application/vnd.google-earth.kml+xml
there are geocoding services which will translate a place name to its latitude and longitude

Resources

blog entry on Google Earth
my wiki entries on Geocoding and Location
slides
worksheet
simple PHP to kml script

Friday 2 February 2007

Workshop 2 - SimpleXML in PHP

Using the xpath function in SimpleXML in PHP is a bit tricky, so here is how to do the decoding:

Create an XML file like this. called bbcCodes.xml

and these PHP statements do the lookup:

$name = $_REQUEST["name"];

$places = simplexml_load_file("bbcCodes.xml");

$codes = $places->xpath("//Place[name='$name']/code");

print $codes[0];

Here it is running -

This PHP script uses the xpath function which returns an array of SimpleXMLElements (since this match will usually produce a sequence of elements) so you need to pick out the first one (assuming there is only one match)

Lecture week 14 - XML and XPath

In this lecture I will discuss character encoding, an issue which arose form the workshop last week. It turns out that the essential problem here is the same as the problem which namespaces try to solve - how to mix data from multiple sources (here in multiple languages).

Then we do a bit of revision on XML structures and well-formedness, introducing the XML diagrammer in QSEE.

Then I look at the basics of XPath, a language for selecting parts of a XML document.

This leads into the continuation of last week's worksheet, extending the PHP script with the means to enter a place name and get the formatted forecast for that area.

Thursday 1 February 2007

Workshop 2 - continuing with the weather feed.

Last week you wrote a PHP script using SimpleXML to fetch an RSS feed from the BBC and formatted a page to display the forecast which was embedded in the RSS.

You noted that the way detailed weather data was handled by the three feeds (The Weather Channel, BBC and Yahoo) were very different and illustrate the point that merely using XML doesn't solve problems of communicating complex data. We also encountered problems with namespaces and attributes with the Yahoo feed. However the worksheet is about the BBC feed so we will avoid this problem for the moment.

In the last part of the work sheet, it asks you to parameterise the script so it can be used for different locations, identified by name This is a problem because the feeds are identified by an id internal to the BBC.

To solve this problem you can add your own data file which contains pairs of Place names and the corresponding BBC code. This data could be held in any of several forms - as a simple text file, as a MYSQL table but for this part of the course, you will create a small XML file to hold these pairs and then use a bit of XPath to find the matching record.

We will cover the basics of XPath in the lecture and how it is used in SimpleXML.

Next week we will be looking at creating more complex XML - kml files to create overlays for Google Earth. In preparation, please take a look at the introduction to GoogleEarth in this blog.

DSA2006