https://mojdigital.blog.gov.uk/2013/06/03/royal-mails-postcode-address-file-taming-the-beast/

Royal Mail’s Postcode Address File: taming the beast

Developer Dan Baker on drawing on the Royal Mail’s postcode database to help users find their local court or a legal adviser in their area.

When we took over the development of the Legal Services Commission’s legal aid checker website we also inherited access to the Royal Mail’s Postcode Address File. Rather than making an additional payment for a readymade external postcode finding service, like Postcode Anywhere, we decided to build our own in-house service utilising the PAF.

“The Royal Mail’s Postcode Address File (PAF®) is the most up-to-date and complete database in the UK, containing over 28 million addresses.” (Royal Mail website)

In the modern age of celebrity, where millions and billionaires are mentioned in every other article, it’s all too easy to forget just how big a million of anything actually is. I got a stark reminder of this when I looked at the raw data in the Royal Mail’s Postcode Address File. When you need to transfer and use this data in a database, it’s big!

There’s a 205-page programmers’ guide to get you started, but ultimately it comes down to understanding just what level of detail you require for the application that’s being developed. The PAF lists every single business and residential address in the UK, right up to old Mrs. Albertson at no. 52, but do we really need that level of detail?

In short, no!

The court finder and check if you can get legal aid applications, to name but two, will already have access to a database of ‘resources’, such as courts or legal advisers, along with their full address details and geographic coordinates. We’re just interested in locating which of these is in the vicinity of the user’s postcode, town or borough. This is where the postcode/address look-up service comes into play.

The PAF is split into a series of different text files that deal with streets, buildings, sub-buildings, organisations, towns and addresses (including postcodes), all laid out in a relational format to each other.

The ‘address’ file is where the 28 million addresses are stored and each new line looks something like this:

TN103AB1396419401539000001037000100000000000000110000000000000000000100000000S 1A

This basically lists the individual postcode along with a series of pointers to unique identifiers that go to make up an individual delivery address point.

If we were trying to deliver goods to someone’s door that information would be invaluable. As we’re only really interested in the general area of a town or postcode we can instantly strip away most of this excess data, to leave just the postcode and the town identifier. However, that still leaves us with 28 million lines of information – just shorter, more manageable lines.

We can reduce the size of the dataset by removing details relating to any postcodes in Scotland or Northern Ireland, because the services will only cover England and Wales.

In addition, each postcode can be repeated on average 15 times for the various delivery points it covers. For example, the postcode PE1 4QD covers numbers 1 to 43 on Acacia Avenue in Peterborough. If we only need to handle unique postcodes, rather than full addresses, we can make the dataset much smaller.

(We could make the dataset smaller still by only using the ‘outward postcode’ – the first half of any postcode, such as PE1 – rather than the full postcode, PE1 4QD. This would increase query response time but at a cost of absolute accuracy.)

At the end of all this consolidation the address file is a considerably more manageable 28,000 lines of data.

The ‘towns’ file is where around 40,000 ‘localities’ are stored, an example of which is below:

002344                             HUNTINGDON                    ST. NEOTS                      LITTLE BARFORD

Each line consists of a town ID followed by the name of the town it refers to, and may also include a ‘dependency’ (such as a suburb name) and ‘double dependency’ (neighbourhoods, or similarly small areas, by another name).

As with the address file this data can be rationalised, for our specific requirements, to filter out any locations in Scotland and Northern Ireland and, because we just don’t need that level of detail, the double dependencies.

The final piece of the puzzle, and most useful with regards to finding a user’s geographic location, is to utilise the Royal Mail’s Postzon™ data, which is supplied along with the PAF.

Postzon contains a list of around 1.7 million postcodes currently in use along with their easting/northing geographic coordinates (accurate to 100 metres) and a whole host of additional information about the country, county, district, and ward (to name but a few) that the postcode belongs to, an example of which is below:

AL1 1EZ1988025146020700E92000001E10000015E07000240E05004799E19000001E1800000601

Again the data is laid out in a relational format as a series of reference codes. Unlike PAF, where extra info is supplied in another text file by Royal Mail, the majority of the Postzon data is supplied by external data partners (most of it by Ordnance Survey and Office for National Statistics). To be able to get meaningful information from these numeric codes therefore requires several additional downloads from the data partners’ websites.

There’s not much you can do to consolidate this dataset, aside from removing the areas that the services don’t cover, as every possible postcode needs to return a result.

There are some anomalies in the data, where only a postcode is provided without the more useful geographic coordinates and area code information. In these cases it’s easier just to strip out the row entirely, rather than return NULL values later. Even doing that still leaves the dataset coming in at an unwieldy 1.5 million rows.

The only advice I can offer there is to split up the data into a series of smaller tables dealing with specific postcode ranges ([a-c], [d-f], etc…) or invest in a higher-spec database server. Some beasts were never meant to be tamed!

Hopefully I’ve managed to shed some light on the slightly daunting process of grappling with the Royal Mail’s datasets. Of course that’s only half the battle, the real work begins with programming the postcode look-up service, but maybe I’ll save blogging about that for another day.

In the meantime feel free to leave any feedback on your own experiences handling the PAF/Postzon or with any finished postcode look-up services you’ve worked on.

12 comments

  1. David Durant

    I've recently started working for GDS and this article is very interesting.

    Is this a data resource the MoJ has paid for or is it something that's freely available?

    If you've built a software component on top of this data would it be potentially possible for this to be reused by other areas in government?

    Thanks.

    Link to this comment Reply
    • Dan Baker

      Hi David,

      When we took over development work on the Legal Services Commission's various legal aid eligibility web services we inherited the use of PAF. However, I believe this resource costs around £6000, so it's certainly not free.

      We're currently using the postcode lookup service to return a latitude and longitude coordinate (along with some additional area detail) in JSON format ready for use by whichever services require it.

      We'd need to check the licensing agreement to see whether this could be utilised by other government bodies outside of the MOJ but my instinct tells me probably not.

      Cheers,
      Dan

      Link to this comment Reply
  2. Jeni Tennison

    Did you consider using the (open) CodePoint database from OS instead of (closed) PAF?

    Link to this comment Reply
    • Dan Baker

      Hi Jeni,

      Thanks for the comment.

      Due to inheriting the PAF data resource from the Legal Services Commission (which had already been paid for) we didn't look at any other postcode datasets at the time.

      Having just looked at the technical details the CodePoint database looks very similar in layout to the Postzon part of the Royal Mail's resource. However, the ability to search for a city, town, borough or area within a town, which is an important part of the services we've developed would be much less extensive than using it in conjunction with the Royal Mail's PAF.

      Cheers,
      Dan

      Link to this comment Reply
  3. Greg

    Love the level of detail given here - it appeals to the geek in me and I genuinely found it interesting.

    It's really useful and in the spirit of sharing (making connections across the organisation, etc).

    Link to this comment Reply
  4. Mihai

    Hi, Dan !
    How do you get the lat,long out of the postcode ? Currently I'm using a Google Maps API and I would like to develop a more accurate and in house solution based on the PAF ?

    Link to this comment Reply
  5. Tom Conlon

    The fact that the PAF still isn't free after all these (unlike other countries) is, especially now in 2016, frankly farcical.

    Open Government?

    I wonder how much the entire country (and bodies outside our country) lose out to this ridiculous situation.

    Link to this comment Reply
  6. Richard EB

    The ONS provide a free dataset of 2,576,354 postcodes with lat/long.

    It's called ONSPD and is available from https://geoportal.statistics.gov.uk/geoportal/catalog/content/filelist.page?redirect=Docs/Latest%20Products/ONSPD_NOV_2015_csv.zip&pos=0&cat=#LP_ONSPD_NOV_2015_csv.zip

    I imported it into an SQLite database (with no ROWID optimisation) and can perform 400,000 postcode->lat,long queries per second on a desktop PC.

    Link to this comment Reply
  7. aldavidson

    Hi Richard

    In our PostcodeInfo platform, we use data from the ONSPD's sister product (NSPL) as one part of our postcode info platform, to map postcodes to administrative geographies. We also use Ordnance Survey AddressBase to do the address lookups, and a couple of other Government datasets for the names of those administrative regions. The bulk of the work was not so much in making the data searchable, but retrieving and importing the data automatically, even when there is no API available. All the information is available on the Github repository at https://github.com/ministryofjustice/postcodeinfo - and it's all open-source, so feel free to take a look at how we handled it.

    Thanks,

    Al Davidson

    Link to this comment Reply

Leave a comment