Wednesday, 14 February 2007

Week 16 - designing an XML vocabulary, XML Schema

This week we turn our attention to the design of an XML document or documents. We look at designing the schema for a single document using the QSEE case tool to generate XML schema - the top-down route. We also look at a bottom-up approach using trang.

The workshop looks at creating a simple XML document to describe Whisky Distilleries.

Saturday, 10 February 2007

Pipes and Filters Architecture

Yahoo have recently launched pipes, a visual programming environment for creating a mashup RSS feed from user inputs and available RSS sources. XML languages for defining pipelines are emerging.

Here are some bloogers on the subject:
Issues with these languages include
  • whether the pipeline itself is expressed in XML (and thus processable with XML tools)
  • whether non-XML data streams are allowed. For example where an intermediate file is non-XML (e.g. Graphviz dot) or the output is non XML (a GIF image)
The origin of Pipes as a concept in which the output of one process is conected to the input of another is in the Unix operating system - Unix Pipe

Ant is widely used in the Java community as a build tool, but can perform XML pipelining.

To use a pipe architecture, we need component filters to carry out standard transformations.
  • Dapper is a tool for scraping HTML pages to create an XML or RSS feed . The neat thing about this tool is that you can give it a number of similar pages and Dapper will try to infer which data items differ page to page, and how to recognise each item. You then name the items you want to scrape and can form these into an HTML, XML or RSS feed.
  • RSS or Atom to PDF e.g. BBC Bristol Weather

Thursday, 8 February 2007

Coursework 2

Here is the specification for the second Coursework. This is an individual assignment in which you will build a simple mashup based on GoogleEarth. This brings together the results of workshops in which PHP and SimpleXML is used to transform RSS and generate kml and a basic XML schema and data are developed.


Specification HTML Word

Wednesday, 7 February 2007

Lecture and workshop week 15

In this workshop you will be preparing to create a dynamic overlay for GoogleEarth.

The key learning points are

  • GoogleEarth is extended with user defined overlays, either static or dynamic
  • the XML vocabulary is called kml
  • a valid kml file needs very few elements to create a minimal file
  • the file must have a Mime type of application/vnd.google-earth.kml+xml
  • there are geocoding services which will translate a place name to its latitude and longitude
Resources

Friday, 2 February 2007

Workshop 2 - SimpleXML in PHP

Using the xpath function in SimpleXML in PHP is a bit tricky, so here is how to do the decoding:

Create an XML file like this. called bbcCodes.xml

and these PHP statements do the lookup:

$name = $_REQUEST["name"];

$places = simplexml_load_file("bbcCodes.xml");

$codes = $places->xpath("//Place[name='$name']/code");

print $codes[0];

Here it is running -

This PHP script uses the xpath function which returns an array of SimpleXMLElements (since this match will usually produce a sequence of elements) so you need to pick out the first one (assuming there is only one match)

Lecture week 14 - XML and XPath

In this lecture I will discuss character encoding, an issue which arose form the workshop last week. It turns out that the essential problem here is the same as the problem which namespaces try to solve - how to mix data from multiple sources (here in multiple languages).

Then we do a bit of revision on XML structures and well-formedness, introducing the XML diagrammer in QSEE.

Then I look at the basics of XPath, a language for selecting parts of a XML document.

This leads into the continuation of last week's worksheet, extending the PHP script with the means to enter a place name and get the formatted forecast for that area.

Thursday, 1 February 2007

Workshop 2 - continuing with the weather feed.

Last week you wrote a PHP script using SimpleXML to fetch an RSS feed from the BBC and formatted a page to display the forecast which was embedded in the RSS.

You noted that the way detailed weather data was handled by the three feeds (The Weather Channel, BBC and Yahoo) were very different and illustrate the point that merely using XML doesn't solve problems of communicating complex data. We also encountered problems with namespaces and attributes with the Yahoo feed. However the worksheet is about the BBC feed so we will avoid this problem for the moment.

In the last part of the work sheet, it asks you to parameterise the script so it can be used for different locations, identified by name This is a problem because the feeds are identified by an id internal to the BBC.

To solve this problem you can add your own data file which contains pairs of Place names and the corresponding BBC code. This data could be held in any of several forms - as a simple text file, as a MYSQL table but for this part of the course, you will create a small XML file to hold these pairs and then use a bit of XPath to find the matching record.

We will cover the basics of XPath in the lecture and how it is used in SimpleXML.

Next week we will be looking at creating more complex XML - kml files to create overlays for Google Earth. In preparation, please take a look at the introduction to GoogleEarth in this blog.