Results
I conducted 50 experiments, spending 2 days just setting up petri and agents. Getting them to run experiments in accordance with dimensionality proved surprisingly difficult.
Moreover, I'm disappointed that the petri add-on for inspect_ai doesn't launch Docker, which was absolutely not obvious to me. It seems like one could ping the developers or propose to make it open-source.
Based on the results, I'm currently documenting a notebook: statistical significance was found only for systematic production and the number of tasks, which, IMHO, is just an example of implicit bias, but I still need to properly analyze the logs to ensure the adequacy of the experiments.
For now, we see direct confirmation that if an agent is left alone after a bunch of tasks, it will likely continue doing them. It's unclear, however, where the other behaviors went; I'm betting on implicit goal directives given to the agents.