Ids. Oh ids.

Okay so it’s been a long time since the last update, around a month. Mainly it’s because I haven’t actually made much progress – vacation over Easter and then moving for another two weeks. I have made progress but only really starting this week.

One thing I’ve worked on is adding a view in the app where you can see all trips between one stop and another. I have that sort of working now, as shown in the video, but there’s a lot of work left polishing it and testing that the data shown is correct.

Something else I’ve been wanting to look into is getting a better understanding, by which I mean giving the app a better understanding, of the data it’s working with. Right now it only knows that there are stops and trips between them but it doesn’t understand that, for instance, there are things called cities and towns that those trips take you between. When I’m going to say Middelfart I don’t think of it as going a train station called “Middelfart Station (Middelfart Kommune)” – I think of it as going to a city called Middelfart. And when I’m going to Trøjborg I don’t think of it as going to the bus stop “Tordenskjoldsgade/Herluf Trolles Gade (Aarhus)” I think of it as going to a neighborhood called Trøjborg. The data I have though only knows about the bus stops and train stations, so I’ve spent some time building some more detailed information about where the stops are. That’s included looking up the nearest address to each stop, for instance, from google which sometimes tells you the area a stop is in. I’ve also extracted lists of villages, towns, cities, neighborhoods, etc from open street maps. Along with the information I already have, at this point I think I have all the data I need for the app to understand the higher-level geographic concepts. I’m not going to use it quite yet but now I have it.

The last thing I’ve been working on is a hairy issue that I’m happy to finally have a solution to – but creating that solution has been hard going. It’s a subtle but important issue. It’s about what to call stops, stations, routes, and the other “entities” that you deal with in transit schedules. The app stores all the schedules in a file called a “bundle”. Over time the schedules change and when that happens the app receives a new bundle file and throws away the old, now obsolete, one.

All stops and routes have an id in the data I get from rejseplanen. For instance, the stop id for Aarhus Rutebilstation is 000751000100G and the route id for Endelave - Snaptun is 21063_4. I throw those ids away though when I build the bundles that the app uses because they take up a lot of space. Instead I replace them with a simple number, so within the bundle the stop might be called 12620 and the route 247. Besides saving space the numeric ids solve a bunch of other problems that makes the app use less memory and faster. So all’s well right?

No. Here’s the problem: the number for a given entity doesn’t stay the same across bundles – they get generated from scratch every time the schedule changes. So Aarhus Rutebilstation will be 12620 now but after the next update it might be 11394. That’s fine in most cases because you only use one bundle at a time – but “most cases” isn’t always. If for instance you want to save a shortcut in the app that takes you directly to departures from a particular stop I can’t store the numeric id because then the shortcut will break when you update and the ids change. I also can’t store the name of the stop because there’s no guarantee they won’t also change – names of things change. I can’t store the id from the original data either, the 000751000100G, because I’ve thrown that away. So what to do?

The solution I’ve come up with keeps the numeric ids exactly as they are but introduces a new kind of id that is guaranteed to stay the same across updates. So now Aarhus Rutebilstation will have three ids: the one it has in the original data, 000751000100G, which is fixed but takes up too much space. It also has the numeric one it has within a single bundle which is fast to use and saves space but which changes between bundles, that’s the one that is used for almost everything. But now there’s also a new one which is less efficient to use but takes up very little space (like less than 0.1% of the other id) and which never changes.

The reason I’ve had to be really careful with this is that as soon as I start using this new kind of id I’m stuck with it – the whole point is that it doesn’t change so if I mess it up it’ll be really super hard to fix. So that’s taken a while. But I think I’m there now. Almost. Just one more test. Then I’ll be done. Almost.

With that problem solved I can get back to the main thing I want to implement: storing shortcuts to particular schedules on the main page. Easy access to the schedules you use most is essentially the whole point of the app so that’s a core feature and I’m really eager to make some progress on that – my goal is to have some version of that early next week.