Hey Rob,
I had something like that setup for a few months at work to churn through the data, by running a few commandlets on data in Unreal. I ended up making the commandlet faster in the end so I didn’t need to parallelize but the farm was working.
Here’s how I did it and to answer your question more directly below I’ll list the requirements I see in a farm.
I started by doing something similar to what you have, had a MySQL db setup to keep track of all of the work, and the clients would grab the latest available job from the DB. It worked fine until I wanted to have different clients do different things and pass the data back and forward between them.
The pipeline was, run a commandlet on a piece of data, parse the output and dump it to a database, it was to manage the assets and basic data mining.
The parsing that data easily became the choke point as some of the files were over 100 megs of text. So I changed the system to be written with Twisted in Python and I set up a master server which managed all of the jobs that were requested, as well as kept temporary data sent between clients.
So it would be, Client A logged in to the server as a crawler, grabbed the asset name, ran whatever it needed to run, then pickled the data and sent it up to the server, rinse and repeat, then clients C D E would grab the data, parse it and throw it up into the DB.
The advantage of this was that now clients C D and E didn’t need any software installed, just python, so I could harness all of the available desktops I could get my hands on.
That’s the 5 second overview but on the requirements.
Need to log what files need to be worked on
Need to know what machine grabbed what file
Need to know at what time that machine grabbed that file and have the job timeout if the machine doesn’t report back
Ideally it’s nice to see what machine is doing what and be able to cancel jobs remotely
That’s the barebones, then you can get fancy and keep track of average times, determine which machines are faster and throw larger files at them, develop a smarter client in case you’re using a common machine that is needed for a dog and pony show you can kill it remotely, as well as the ability of having machine pools. Like we can have 10 dedicated machines that run all day, but at night we can turn on an extra 50 and have them shut off automatically.
Hope that helps some,
Luiz