Showing posts with label RSS. Show all posts
Showing posts with label RSS. Show all posts

Saturday, 10 February 2007

Pipes and Filters Architecture

Yahoo have recently launched pipes, a visual programming environment for creating a mashup RSS feed from user inputs and available RSS sources. XML languages for defining pipelines are emerging.

Here are some bloogers on the subject:
Issues with these languages include
  • whether the pipeline itself is expressed in XML (and thus processable with XML tools)
  • whether non-XML data streams are allowed. For example where an intermediate file is non-XML (e.g. Graphviz dot) or the output is non XML (a GIF image)
The origin of Pipes as a concept in which the output of one process is conected to the input of another is in the Unix operating system - Unix Pipe

Ant is widely used in the Java community as a build tool, but can perform XML pipelining.

To use a pipe architecture, we need component filters to carry out standard transformations.
  • Dapper is a tool for scraping HTML pages to create an XML or RSS feed . The neat thing about this tool is that you can give it a number of similar pages and Dapper will try to infer which data items differ page to page, and how to recognise each item. You then name the items you want to scrape and can form these into an HTML, XML or RSS feed.
  • RSS or Atom to PDF e.g. BBC Bristol Weather

Thursday, 1 February 2007

Workshop 2 - continuing with the weather feed.

Last week you wrote a PHP script using SimpleXML to fetch an RSS feed from the BBC and formatted a page to display the forecast which was embedded in the RSS.

You noted that the way detailed weather data was handled by the three feeds (The Weather Channel, BBC and Yahoo) were very different and illustrate the point that merely using XML doesn't solve problems of communicating complex data. We also encountered problems with namespaces and attributes with the Yahoo feed. However the worksheet is about the BBC feed so we will avoid this problem for the moment.

In the last part of the work sheet, it asks you to parameterise the script so it can be used for different locations, identified by name This is a problem because the feeds are identified by an id internal to the BBC.

To solve this problem you can add your own data file which contains pairs of Place names and the corresponding BBC code. This data could be held in any of several forms - as a simple text file, as a MYSQL table but for this part of the course, you will create a small XML file to hold these pairs and then use a bit of XPath to find the matching record.

We will cover the basics of XPath in the lecture and how it is used in SimpleXML.

Next week we will be looking at creating more complex XML - kml files to create overlays for Google Earth. In preparation, please take a look at the introduction to GoogleEarth in this blog.

Tuesday, 23 January 2007

Workshop 1 Term 2 - RSS and PHP

A voice message (2 min 16 secs)

In this workshop we will continue the work looking with PHP and the SimpleXML class by using this approach to transform data from an RSS feed (a weather feed from the BBC).

You will also compare three sources of data - from the Weather Channel,. Yahoo and the BBC to identify differences in both structure and content of these data sources, and explore the reasons for these differences.

You will also be introduced to the notion of namespaces and the basics of location data, in preparation for work with Google Earth in the next workshop

Monday, 22 January 2007

Lecture Week 14

In the lecture this Friday we will cover the following topics:
  • A recap of trees, XML and the Simple XML interface in PHP
  • an overview of the schedule for this term
  • outline of the coursework for this term
  • introduction to the workshop on RSS
Attendance was very poor for the last three lectures of last term. I appreciate that you all put a lot of work into the assignment. Those who missed will, we assume, have looked at (or even listened to) the missed lectures and the workshop sessions which are all available from this blog.