Back in the day - parsing real-world fuzzy dates with Ruby
About Matt Patterson
Dates are easy, in the abstract. Then I started working on a project where I had to parse dates like 'mid 1930s' from large chunks of prose.
Once I'd stopped gibbering I realised that there's nothing wrong with a date like 'mid 1930s' - people talk about dates with varying degrees of precision all the time. The question is, how do you meaningfully parse them?
These kinds of dates – from almost-complete dates like 'January 2013', to very vague decade dates like 'circa 2000s' – have several kinds of precision, including the sureness of the date (definitely January 2013, maybe January 2013), the possible range of the date (January 2003 – 2003 – 2000s), and whether a date represents a point in time (a single event that happened some time in the 2000s) or a span of time (manufactured during the 2000s).
In this talk I'll take you through what I discovered about how people write dates, how I went about parsing them, and what to do with them once you have a representation of them as data. We'll have particular fun addressing questions like 'Which is earlier, spring 1930 or mid 1930?', 'where does Winter come?', and looking at the concrete things I was able to do with a bunch of wooly dates.