The case of the spontaneously falling airship

Airships: Conquer the Skies
12 Jan 2016, 7:05 p.m.

In dev 8, game data is now loaded in from external files. As I expected, this is causing the occasional bug. In particular, some airship designs would consistently fall out of the sky. This needed to be fixed.

The first step in fixing any bug is to find a consistent way to reproduce the problem. If you can't reliably observe the problem, how can you be sure it's fixed? How can you study it? So I found a ship which reliably fell out of the sky, the "large bomber" AI ship design:

Half a minute into any combat, without fail, there would be a cry from the Suspendium chamber: "We need more coal, quick!" Coal would fail to arrive, and the ship would plummet to the ground and explode messily. The question now was why the coal wasn't getting delivered.

Job dispatch in the game works like this: Each module has jobs associated with it. Jobs can be operating the module or delivering resources like coal or water, or simply guarding the module. Jobs have different requirements for who can perform them - e.g guard jobs need to be done by armed crew members - and different priorities. So as a Suspendium chamber starts to run out of coal, the priority of its "deliver coal" job starts rising.

Jobs are allocated by priority, and crew members can abandon one job if a significantly more important one comes along. So why wasn't the coal job for the Suspendium chamber being fulfilled? I couldn't quite figure it out by just looking at the ship. But certainly, I could see no crew member picking up any coal. So perhaps there was a bug in job allocation?

I created a quick new debug overlay that showed the jobs for the currently selected ship, their priorities, and whether they had someone assigned. After filtering out the low-importance ones like "stand in this module in case you're needed", I could observe what everyone was up to while the game was running.

And yes, the coal job appeared, and was assigned, but somehow never completed. I added more detail to the overlay to show the state of the crew member who had been given the job, where they were heading, if they were carrying anything. This showed me that the assigned crew member had walked to the coal store, but was just standing there, not picking up or delivering the coal.

Now that I had an idea which part of the process was going wrong, I stepped through the code for picking up resources, and found that the crew member had a little difficulty picking up the coal: the time required to complete the action was 2147483647 milliseconds. What an interesting number.

2147483647 is the largest number that fits into an integer. Somewhere in the code, I was dividing a floating point number by zero, getting out infinity, and then converting it to an integer. Since infinity can't be represented in an integer field, the code went with the largest possible value.

And indeed: the crew member who was taking 2147483647 milliseconds - about 25 days - to pick up the coal was a guard. Guards aren't meant to do ship work, and have their work efficiency set to zero. The mistake was that the coal job accepted that guard as an assignee.

Before the change to loading in data, the code could refer to explicit crew type. After the change, it had to check properties of the crew type it was presented with, and it wasn't checking work efficiency.

Once I fixed that, the bomber stopped falling out of the sky.