• Matt Patterson

    Back in the day - parsing real-world fuzzy dates with Ruby

    About Matt Patterson

    Matt is a freelance web developer living in Berlin, he has been building for the web for more than 10 years. A full-stack developer, he has been involved with projects as varied as critically acclaimed indie videogame International Racing Squirrels, prototpying data visualisations of the evolution of literary texts like Wordsworth's poem The Prelude in Javascript, and helping the UK government reboot its approach to the web as part of the GOV.uk Alpha and Beta team. He co-coaches the Ruby Monsters, a study group born out of Rails Girls Berlin, with Sven Fuchs.

    This talk

    Dates are easy, in the abstract. Then I started working on a project where I had to parse dates like 'mid 1930s' from large chunks of prose.

    Once I'd stopped gibbering I realised that there's nothing wrong with a date like 'mid 1930s' - people talk about dates with varying degrees of precision all the time. The question is, how do you meaningfully parse them?

    These kinds of dates – from almost-complete dates like 'January 2013', to very vague decade dates like 'circa 2000s' – have several kinds of precision, including the sureness of the date (definitely January 2013, maybe January 2013), the possible range of the date (January 2003 – 2003 – 2000s), and whether a date represents a point in time (a single event that happened some time in the 2000s) or a span of time (manufactured during the 2000s).

    In this talk I'll take you through what I discovered about how people write dates, how I went about parsing them, and what to do with them once you have a representation of them as data. We'll have particular fun addressing questions like 'Which is earlier, spring 1930 or mid 1930?', 'where does Winter come?', and looking at the concrete things I was able to do with a bunch of wooly dates.