- simple to complex
- taking the HTML source of the Periodic table
- loading it into the eXist database. The source is accepted by eXist even though it is not well formed XML - missing quotes, bare <>
- writing a query on XQuery to generate the page.
- Find the html document with 'Periodic' in the title
- Find all the A tags,
- Get the onmouseover attribute
- use some string functions to get the name and the source of the image from this string
- sort by name
- generate a div per tag
List of methods
for $item in data(/HTML[contains(.//TITLE,'Periodic')]//A/@onmouseover)
let $name := lower-case(substring-before(substring-after($item,"window.status='"),"';"))
let $pix := substring-before(substring-after($item,'src="'),'">')
where string-length($pix) >0
order by $name
In fact, instead of running a query against the raw HTML, I wrote a slightly different query to generate a simple XML file in which the basic data was stored in alphabetical order. Using an intermediate file also allow me to correct a couple of typos in the method names, and of course it is faster to generate the page. In addition, I've added the facility for a user to group methods and tag the group. Some links to Google images and Wikipedia have been added too. There's a lot more could be done with this.
Now what would be nice would be to get the raw data including the class names as XML so it could be re-organised and extended, without having to descend to scraping.