Last year as part of the JISC MOSAIC competition, I put together a script which allowed you to search the online UCAS catalogue using a course code, and get an XML response. The XML it returned was just a basic format which suited my purposes at the time, and in the comments I gave the following response to Alan Paull who mentioned XCRI:
I’m aware of the XCRi model and the XCRi-CAP work, and did wonder if I could output my scraped results in this format, but in the end decided for something quicker and dirtier for my purposes.
XCRI (eXchanging Course Related Information) is a JISC funded initiative to “establish a specification to support the exchange of course-related information”. This has established an XML specification intended to enable courses to be advertised or listed in a consistent manner – this is called ‘XCRI-CAP’ (Course Advertising Profile). A number of projects and institutions have implemented XCRI-CAP (a list of projects is available from the CETIS website).
The key thing for me about this approach is the idea that if all institutions (let’s say UK HE institutions, but XCRI-CAP is not sector specific) published their course catalogue following this specification, it would be a relatively simple matter to use, aggregate, disaggregate and reuse this data.
I’ve wanted to get back to this for a while, and finally got round to it, so you can now get results from the script in XCRI-CAP. I have to admit that I’ve a slight confusion as to what makes valid XCRI-CAP – I’ve run the results through the validator blogged by David Sherlock, and get a small number of warnings regarding the lack of ‘descriptions’ for each provider I list. However, the XCRI wiki entry for the provider element suggests that the Description is ‘optional’ (although it then says it ‘should’ be provided).
The script is at:
The script accepts four parameters described here:
- If left blank, results will be returned in the default XML format (not xcri-cap) – documented below
- If set to the value ‘xcri-cap’ the results will be returned in xcri-cap XML – see notes below. If there is an error, this will use the default XML fomat documented below
- Accepts an UCAS course code, which is used to search the online UCAS catalogue (4 alphanumeric characters)
- Accepts a year in the format YYYY
- If no year is given, this is left blank
- UCAS supports searches against more than one catalogue at a time, to enable searching against the current and coming year. If left blank, as far as I can tell, this defaults to the catalogue for the current year (at time of writing, 2010)
- The UCAS website uses a session identifier in all URLs called the ‘stateID’
- If a stateID is supplied to the script, it will use it (unless it turns out to be invalid)
- If no stateID is supplied, or the stateID supplied is invalid, the script will obtain a new stateID
- If you are doing repeated requests against the script, it would be ‘polite’ to get a stateID the from the first request, and reuse it in subsequent requests so the script isn’t constantly starting new sessions on the UCAS website
So a valid request to the script could be:
In terms of output, there are two formats, the default XML, and the XCRI-CAP XML.
I’m outputting a minimal amount of data, as I’ve limited myself to scraping only information from the UCAS catalogue search results page. This means I’m currently including only the following elements:
<identifier />(I suspect I’ve got a problem here. I’m using the UCAS identifier, which I can’t really find any information about. From the XCRI wiki it looks like I need to be using a URI here)
<url />(I’m using the URL for the UCAS page for the institution. This includes the stateID, as to link to the UCAS page requires a valid session. It isn’t ideal, as this is only valid for a limited period of time [now amended to use a different URL to the UCAS web page which does not include stateID])
<identifier />(I’m using the UCAS identifier for the course, again it looks like I should be using a URI from the wiki?)
<url />(I’m using the URL for the UCAS page for the course. This includes the stateID, as to link to the UCAS page requires a valid session. It isn’t ideal, as this is only valid for a limited period of time)
I am looking at whether I can get more information, but to add to the information I’m currently returning would mean doing some further requets to the UCAS website to scrape information from other pages to supplement the basic information available on the search results page.
The default XML format is documented in my previous blog post, but just to recap:
<ucas_course_results course_code=”R901″ catalogue_year=”2010″ ucas_stateid=”DtDdAozqXysV4GeQbRbhP3DxTGR2m-3eyl”>
<institution code=”P80″ name=”University of Portsmouth”>
<name>Combined Modern Languages</name>
Note that you get the ucas_stateid returned, so it can be reused in future requests. Finally, if there are any errors, these will always be returned in the default XML format (even if you request xcri-cap format):
<ucas_course_results course_code=”” catalogue_year=”” ucas_stateid=””>