Android gets faster

Last week was about getting back to work on the android app – again. So was the week before but it didn’t happen, this week it did.

You can’t see much of a difference between the last screenshots and the ones from today but a lot has happened under the surface. Showing the nearest stops is a lot faster and you don’t just get the top 100 stops, you get a list of all stops ordered by distance that you can scroll through. Is that useful? Not really. But it does mean that the infrastructure for that kind of thing is in place which actually took a lot of work. Also, the new implementation can be tested. Before, if I messed up and broke the list nothing would happen. Now I can test that everything looks right and if it breaks alarms will automatically go off.

I also spent some time trying to compress the schedules even more. I’d noticed that the names of stops and routes have a lot of overlap. For instance, the word Kommune appears again and again. I played around with replacing common words with a much shorter code and then, when I wanted to display the string swap the longer word back in place of the code so you can’t tell it had happened. It didn’t make enough of a difference to be worth the effort. Whether I manually compress the data or not it gets zipped afterwards and the manual pre-compression only gave a few percent improvement.

This week I’ll work towards making it such that if you press a stop it shows you routes that visit that stop. I may also spend some more time on getting your current location and possible the look of the entries in the list of stops.

A new laptop

This week I was meant to make improvements to the android app since I got far enough with compressing data last week. That’s not how it went.

My laptop is really struggling to run the heavy tools you need for android development and I had a much more powerful machine from my old project just lying around unused at home. At the same time I got back a really old MacBook that someone had borrowed. So I thought, I’ll set up the Intel NUC as my primary development machine and the mac for iOS development. I spent a huge amount of time working on that and fiddling with my existing code so it would work with the new installations. But I’m done now and have a lot more processing power at my disposal. By the way, burglars, don’t get any ideas; it’s all safely Kensington’ed to a big radiator next to my desk.

Drunk on all this new processing power I’ve switched from my “dumb” text editor to android studio for android development and pycharm for python. That has taken, and will continue to take, some adjustment. But I needed to do some nontrivial debugging and I felt I needed some more powerful tools, which is an option now.

I did also have time to make progress on the android code, it can now read the compressed scheduling data but I didn’t get to the part where it display it. What I did try was to display bus stop locations on google maps because I wanted to have a sense of how the data is supposed to look so that, when I display it in the android app, I can tell if it looks right. It turns out that’s super easy: it’s like five clicks and then boom, there’s all the data nicely shown on a map. So now I know that some of my test data is from Mexico City.

This week I’ll keep working on the android code. I might take a detour around OpenStreetMaps too. The data I have covers all of Denmark which is what I want, but it would be nice if I could extract smaller parts – like, say, just Aarhus. To be able to do that I need to know what exactly Aarhus is, like, what is the exact region that makes up Aarhus. OpenStreetMaps is a huge database that contains, among much much other interesting information, that kind of information: exactly which geographic region is considered to be Aarhus (

as well as any other city). So I’ll probably spend some time extracting the data I need from there.


A processing pipeline

The last week I’ve spent crunching time tables, making a pipeline that takes the raw timetable data you get from the transit companies, processing and reorganizing it, and outputting a compressed version that contains the same information but is much smaller and easier to use. I’ve been making a lot of progress on that. It’s been a lot of running python programs against a database, looking at the database, seeing something that doesn’t look quite right, tweaking the python code, running it again, and so on.

At this point I can take the 16 MB of raw input I get – and that’s a zip file, uncompressed it’s 150 MB – and compress it down to a file that contains almost all the data but is only 6.4 MB uncompressed, 1.3 MB compressed, and is much easier for the app to use. I’m pretty happy with that for now, it’s small enough that it’s reasonable to bundle it with an app. I can make it even smaller and I will later on but this week I’ll leave it and switch to working on the android app. By the end of the week I should have an app that’s starting to be useful except that it’ll show data that has bits missing and is slightly wrong. Then – possibly next week – I’ll go back and fix the data.

Transit data and relational databases

What makes working with the data I described in the last post manageable is that there are tools – excellent, free tools – for doing exactly that. A lot of the work in understanding the data is about cross-referencing: a line in one file contains an ID that identifies a line in another, which probably also contains IDs that identifies more lines in more files.

An excellent tool for working with this kind of data is a relational database so most of the time I’ve spent so far has been loading the data into one of those. That gives you a way to look through the data much more easily than the flat files.

What’s even better is that if you load the data right it understands the relationship between what’s in the different files and will let you ask questions, really involved questions too. Here, for instance, is the question “which bus routes have at least one stop at a stop with the name Dokk1:

It turns out that there’s 20 and, as expected from the last post, 13 is on the list. 

This is really useful for me working with the data on my machine and you might think you could just put a relational database in the mobile app and it would solve all all your problems. That doesn’t work. When you load the data into the database it grows a lot so it would be too big, and while it can answer really involved questions it takes time – finding the list of busses took about 6 seconds and made my laptop whir up and sound like a vaccum cleaner. That wouldn’t work on a phone. So it’s a super useful tool but doesn’t solve everything, far from it.

What does scheduling data look like?

This week was all about taking the raw scheduling data and processing it and putting it into a form that can be used by a mobile app. To give a sense of what that means it’s maybe useful to look at what the scheduling data looks like.

Below is a bus schedule I printed from the midttrafik website. I’ve highlighted some features of a particular bus departure: bus 13 visiting Dokk1 at 6.34 on weekdays.

This is simple for a human to understand. There’s a bus route called 13. This bus makes a number of trips every weekday, one of which starts at 6.10 from Vejlby Nord, passes 16 other stops before ending up at Frydenlund at 6.54. Along the way it visits a stop called Dokk1 at 6.34.

The data provided by the transit companies for machines to read, the data I need to use, looks much different. There all the different concepts: routes, trips, stops, stops times, etc, are split into separate flat text files. Let me give you an example. Here’s what the data you get about the stop, Dokk1, looks like,


38400,“Viborgvej/Frydenlunds Allé”,,“56.163513”,“10.174106”,0
38600,“Dokk1. Europaplads”,,“56.153984”,“10.212759”,0
38700,”Brendstrupgårdsvej/Skejby Busvej",,“56.187187”,“10.171541”,0

The stops are given in a file called stops.txt which is simply a list, one line for each stop, which gives an ID number for the stop, the name, the geographic position, and various other bits of information. The current file contains around 64,000 stops.

The route, 13, lives in a separate file, routes.txt, which looks just like stops.txt,


19133_3,281,“775”,“”,3,,
19308_3,281,“13”,“”,3,,
19309_3,281,“1A”,“”,3,,

Again, it gives an ID number for the route and its name. There’s around 1,800 of those.

How do we know that bus 13 stops at Dokk1 at 6.34? That’s in another file, stop_times.txt, which is – you guessed it – a list of times busses stop at a particular stop, one per line, around 2,700,000 lines in total.


39185170,6:32:00,6:32:00,03300,21,0,0,“”
39185170,6:34:00,6:34:00,38600,22,0,0,“”
39185170,6:36:00,6:36:00,09200,23,0,0,“”

This says that the bus arrives at the stop with ID 38600 at 6:34 and leaves again also at 6:34. It doesn’t mention route 13 though so how do we know it’s the right bus and not some other one that happens to be there at the same time? For that we have to use the trip ID that’s given in the same line, 39185170. Trips are listed in another file – you guessed it, trips.txt – which I’m sure looks familiar:


19308_3,357,39185099,“Frydenlund”,“”,“0”,,
19308_3,18,39185170,“Frydenlund/Fuglebakkevej”,“”,“0”,,
19308_3,18,39185171,“Frydenlund/Fuglebakkevej”,“”,“0”,,

What this says is that there’s a bus trip belonging to the route with ID 19308_3, which we happen to know from before is route 13, that goes to Frydenlund/Fuglebakkevej. The entry for Dokk1 at 6:34 we found in stop_times.txt says it belongs to this trip so that tells us that that entry does indeed belong to route 13.

Now, based on a handful of files like this – and there’s more than the ones I’ve mentioned now – we need to be able to answer questions like: I’m at some position, what’s the nearest bus stop? I’m at bus stop X, what’s the next bus that arrives? What’s the last bus that goes to stop Y today? What I’ll do is take the files as input but reorganize the data completely into a form that allows those questions to be answered efficiently. That’s what this week has been all about.