UCAS Course code lookup

While I was writing my entry for the JISC MOSAIC competition (which I will write up more thoroughly in a later post I promise – honest), one of the problems I encountered was retrieving details of courses and institutions from the UCAS website. Unfortunately UCAS don’t seem to provide a nice API to their catalogue of course/institution data. To extract the data I was going to have to scrape it out of their HTML pages. Even more unfortunately they require a session ID before you can successfully get back search results – this means you essentially have to start a session on the website and retrieve the session ID before you can start to do a search.

I hacked together something to do enable me to get what I needed to do for the MOSAIC competition. However, I wasn’t the only person who had this problem – in a blog entry on his MOSAIC entry Tony Hirst notes the same problem. At the time Tony asked if I would be making what I’d done available, and I was very happy to – unfortunately the way I’d done it I couldn’t expose just the UCAS course code search. I started to re-write the code but writing something that I could share with other people, with appropriate error checking and feedback proved more challenging than my original dirty hack.

I’ve finally got round to it – it works as follows:

The service is at http://www.meanboyfriend.com/readtolearn/ucas_code_search?
The service currently accepts two parameters:

  • course_code
  • catalogue_year

The course_code parameter simply accepts a UCAS course code. I haven’t been able to find out what the course code format is restricted to – but it looks like it is a maximum of 4 alphanumeric characters, so this is what the script accepts. Assuming the code meets this criteria, the script passes this directly to the UCAS catalogue search. The UCAS catalogue doesn’t seem to care whether alpha characters are upper or lower case and treats them as equivalent. For some examples of UCAS codes, you can see this list provided by Dave Pattern. (see Addendum 2 for more information on UCAS course codes and JACS)

The catalogue_year parameter takes the year in the format yyyy. If no value is given then the UCAS catalogue seems to default to the current year (2010 at the moment). If an invalid year is given the UCAS catalogue also seems to default to the current year. It seems that at most only two years are valid at a single time. However the script doesn’t check any of this – as long as it gets a valid four digit year, it passes it on to the UCAS catalogue search.

An example is http://www.meanboyfriend.com/readtolearn/ucas_code_search/?course_code=R901&catalogue_year=2010

The script’s output is xml of the form:

<ucas_course_results course_code=”” catalogue_year=”” ucas_stateid=””>
<institution code=”” name=””>
<course_name>xxxx</course_name> (repeatable)

(I’ve made a slight change to the output structure since the original publication of this post)
(Finally I’ve added a couple of extra elements inst_ucas_url and course_ucas_url which provide links to the institution and course records on the UCAS website respectively)

<ucas_course_results course_code=”” catalogue_year=”” ucas_stateid=””>
<institution code=”” name=””>
<inst_ucas_url>[URL for Institution record on UCAS website]</inst_ucas_url>
<course ucas_catalogue_id=””> (repeatable) (the ucas_catalogue_id is not currently populated – see Addendum 1)
<course_ucas_url>[URL for course record on UCAS website]</course_ucas_url>

For example:

<ucas_course_results course_code=”R901″ catalogue_year=”2010″ ucas_stateid=”DtDdAozqXysV4GeQbRbhP3DxTGR2m-3eyl”>
<institution code=”P80″ name=”University of Portsmouth”>
<course_name>Combined Modern Languages</course_name>

(I’ve made a slight change to the output structure since the original publication of this post)
(Finally I’ve added a couple of extra elements inst_ucas_url and course_ucas_url which provide links to the institution and course records on the UCAS website respectively)

<ucas_course_results course_code=”R901″ catalogue_year=”2010″ ucas_stateid=”DtDdAozqXysV4GeQbRbhP3DxTGR2m-3eyl”>
<institution code=”P80″ name=”University of Portsmouth”>
<course ucas_catalogue_id=””> (the ucas_catalogue_id is not currently populated – see Addendum 1)
<name>Combined Modern Languages</name>

The values fed to the script and the StateID for the UCAS website is fed back in the response.

If there is an error at some point in the process and error message will be included in the response in an <error> tag.

Addendum 1
The script relies on the HTML returned by UCAS remaining consistent. If this changes, my script will probably break.

Having done the hard work I’d be happy to offer alternative formats for the data returned by the script – just let me know in the comments. I’d also be happy to look at different XML structures for the data so again just leave a comment.

Something I should have mentioned in the original post. Given the data returned by the script you should be able to form a URL which links to an institution on the UCAS website using a URL of the form:
http://search.ucas.com/cgi-bin/hsrun/search/search/StateId/<insert state ID from xml here>/HAHTpage/search.HsInstDetails.run?i=<insert institution code here>

Since finishing this work last night I’ve realised that I’ve left out one important piece of data which is an identifier that would let you form a link to a specific course from a specific institution. I have slightly restructured the XML to leave a space for the ucas_catalogue_id in the XML. I’ll add this in as soon as I can.
This has now been added.

Addendum 2
I’ve just found quite a bit more detail on the format and structure of the UCAS ‘course codes’. UCAS now uses JACS (Joint Academic Coding System) for course codes (see JACS documentation from HESA). JACS codes consist of 4 characters, the first being an uppercase letter and the remaining three characters being digits. JACS codes are essentially hierarchical with the first character representing a general subject area and the digits representing subdivisions (in with increasing granularity). The codes in the UCAS catalogue are a mixture of JACS 1.7 and JACS 2.0 codes. A full listing of JACS v2.0 codes is available from HESA, and a listing of JACS v1.7 codes is available from UCAS as a pdf.

UCAS have an explanation of why and where they use both JACS v2.0 and JACS v1.7.

However because UCAS need to code courses which cover more than one subject area, they have rules for representing these courses while sticking to codes with a total length of 4 characters. These rules are summarised on the UCAS website, but a fuller description is available in pdf format. This last document is most interesting because it indicates how you might create the UCAS code from a HESA Student Record which could be of interest for future mashups.

The implications of all this for my script are relatively small as I currently assume that there is a 4 character alpha-numeric code. On the basis of this documentation I could refine this to check for 3 alpha-numeric characters followed by a single digit I guess – perhaps I will at some point.

Finally it looks like UCAS and HESA are currently looking at JACS v3.0 which could introduce further changes I guess, although it looks unlikely that this will affect the code format, but rather the possible values, and maybe the meaning of some values. While this isn’t a problem for my script, it would mean that historical course codes from datasets such as MOSAIC could not be assumed to represent the same subject areas in the current UCAS course catalogue as they did when the data was recorded – which is, to say the least, a pain.

Addendum 3
A final set of changes (I hope):

  • The ucas_catalogue_id is now populated
  • Added inst_ucas_url element which contains the URL linking to the Institution record in the UCAS catalogue
  • Added course_ucas_url element which contains the URL linking to the Course record in the UCAS catalogue

10 thoughts on “UCAS Course code lookup

  1. MOSAIC as a project is concerned with both the possibilities and the problems (of all types) relating to the use of library and learning activity data to enhance both the user experience and resource management functions – so this post provides some very welcome pointers for the final report to be written next month …

    And therefore (as per the previous post) everyone’s a wiinner 😉

    Keep it coming!

  2. Re:
    “Dealing with UCAS
    UCAS really don’t make it easy to get information out of their website on a machine-to-machine basis. I’ve done an entire post on scraping information from UCAS, which I’m not going to rehash here, but honestly if we are going to see people developing applications which help individuals build personalised learning pathways through Higher Education courses this has got to improve.”

    I strongly sympathise, having worked alongside UCAS on a few JISC-funded projects myself. I’m currently working with a range of projects to do with course advertising information (see: http://www.xcri.org/), which includes working with UCAS (somewhat) to encourage a service oriented approach to the whole range of UCAS vocabularies, including JACS and others related to both courses and applications. Progress is slow, but in the next two to three weeks we are hoping for a bit of a breakthrough. Please contact me if you want more info on the UCAS angle.

    Our eXchanging Course Related Information (XCRi) information model and XML schema, plus work we’ve done already on making course advertising information available may be of interest. Some of this work is out of the University of Huddersfield via the West Yorks Lifelong Learning Network; for contact details, please contact alan@alanpaull.co.uk.

  3. Thanks Alan – it’s very good to hear that there is work happening to make this information more accessible. I’m aware of the XCRi model and the XCRi-CAP work, and did wonder if I could output my scraped results in this format, but in the end decided for something quicker and dirtier for my purposes.

    I don’t know if this is something you are looking at with UCAS, but I’m not sure that their approach to using JACS codes to form UCAS codes is that helpful. It would be nicer if they recorded the full JACS codes (with a searchable index) for each course, rather than creating the short code in the way they do currently.

    I was wondering about linking from UCAS codes to (for example) learning objects in JORUM (which I understand use JACS) – which would require decoding the UCAS code to the constituent JACS codes (and of course the UCAS code is lossy from the original JACS codes)

  4. Do you think there’s ever a chance you might write a service that lets you get the entry requirements/course score for a given course/institution? Would be great to see this expanded.

  5. Hi Matt,

    You may have seen, I did update this service – documented at http://www.meanboyfriend.com/overdue_ideas/2010/03/ucas-course-code-lookup-take-two/

    However, it still doesn’t return the information you mention I’m afraid. There is a bit more detail in one of my comments at http://www.meanboyfriend.com/overdue_ideas/2010/03/read-to-learn-updated/#comments about this. Essentially I’m grabbing data from the UCAS search results – which only includes a minimal amount of data for each course and while there is some extra data on this page that I could scrape with a little extra effort, it doesn’t include the entry requirements etc. To get this information I’d have to go into the next level of the UCAS website.

    I’d need to go back and look, but I’m sure that it wouldn’t be a good idea to do this for all the courses returned against a single search – it would just take forever. However, what might be possible is adding a new function that returns the details for a single course/institution – would that be of interest still? If it is, I would be happy to see how difficult/easy it might be.

  6. Thanks for the reply. It’s a shame that the number of UCAS points required isn’t easier to get really.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.