I was alerted to a weird bug by several users: their very largest airships would fail to reload their cannons. The first salvo would fire just fine, but new ammunition would never arrive, despite plenty of crew and ammo stores.
One user helpfully supplied me with a ship design where this was happening. As an aside, this is basically the perfect setup for me to quickly diagnose and fix a problem:
- Tell me what you expect to happen: the cannon should be reloaded.
- Tell me what actually happens: the cannon does not get reloaded.
- Give me a simple and reliable way to see the problem for myself: a ship design where this reliably happens every time.
So I was able to load up that design, the Pale Mare 2, and see for myself what was going wrong.
And indeed, after the initial salvo, crew members started walking to get ammo, only to stop and return to their posts, over and over again.
Once a problem can be reproduced reliably, the next step is to try and reproduce it in as simple a way as possible. The Pale Mare 2 was a huge ship. It would be hard to pick out the exact problem from all the activity going on in the ship.
The working hypothesis was that this problem only happened in very long ships, so I constructed the HMS Longcat:
This ridiculous design was much simpler but still as long as the Pale Mare 2. If the problem appeared here too, it was likely that length was the actual culprit. If it did not happen, perhaps it was related to the overall size of the ship instead.
And indeed, it happened again.
Next, I rebuilt the HMS Longcat into the HMS Shortcat, a ship with essentially the same modules, but more compact.
And behold, the problem went away! Crew started fetching ammunition perfectly reliably.
There was one last thing to test before I started digging into the code: was the problem related to the overall length of the ship, or to the length of the part of the ship actually accessible to the crew? Given that the problem could be related to pathing, maybe the pathing failed if the crew-accessible area was too big?
This resulted in the HMS Shortcat with a Tail:
And the problem was back! Simply attaching a long line of struts to a ship caused the bug to reappear.
It was time to put in some logging.
I started logging cases where a crewman abandoned a task for any reason, to see if I could catch them abandoning the ammo-fetching. And indeed, the log rapidly filled up with messages that crew kept on abandoning ammo jobs.
I improved the logging to indicate the reason why a task was abandoned, and it told me that the ammo fetching got abandoned because there was another, much more important task to be done.
So I homed in on that case and added logging to indicate the nature of these more important tasks, and the relative priority values of the old task and its replacement.
Now these priority values are meant to be roughly between 0 and 1. A task with priority 1.0 is super-important and must be done immediately, while one with a priority less that 0.1 is something an air sailor can do if there's nothing else to do. Which is why I knew things had gone a little wrong when I got the following log message:
AmmoJob (priority -0.81) replaced by ReadyJob (priority 0.0000003).
Negative priorities were... not meant to be a thing. So the crewman was sent out to fetch ammo, but the next time the ship re-evaluated crew assignments, it would see that there was a much more important job for him to do: stand around at the ready in case something needed doing.
The ship would then promptly re-assign the ammo job to the crewman, and the cycle would start anew.
Next stop: the code for calculating the AmmoJob's priority, where I discovered the culprit, a single-letter typo:
return staffJobPriority(ship, self, type, x);
The staffJobPriority function calculates the appropriate priority for a normal job, such as fetching ammo. It's meant to be given the following information
- ship, the ship the job is in
- self, the module it's for
- type, the module's type
- n, the number of the job
Here, instead of n, it was given x. In this context, x is the x-coordinate of the module in the ship's grid.
So what is the job number meant to be for? A module can have multiple jobs of the same type - two people to fetch ammo at the same time, or many people to put out fires or repair it. To break ties, each job beyond the first has a slightly lower priority.
Based on the job number, the priority is calculated as follows:
0.7 - n * 0.01;
So the first AmmoJob is meant to have a priority of 0.69, the second 0.68, and so on.
But the code passed in x instead of n. Which meant that the further to the right of the ship a weapon was, the less important was its AmmoJob. And on very long ships like the Pale Mare 2, where x was greater than 70, the resulting priority became negative.
And indeed, upon fixing that line to read
return staffJobPriority(ship, self, type, n);
everything started working just fine!
What's more, I found the same typo in the priority calculation for supplying coal to modules. It had lain there undetected simply because coal-using modules tend to be at the back of the ship, where the x-values are low enough to keep the priorities positive.
In the end, a very unlikely-sounding problem was all due to some bad cross-wiring of values. Careful investigation yielded results, and the problem will be gone in the next update.