Game Job Finder...HELP!

I hope you’re ready to solve a problem.

Here we go. I’ve been working on a web app to increase my chances of getting a job for quite some time, but I’ve come to a halt in the design. So I’m asking you to help me come up with a solution. I’ve copied the next section from my website, so you can get an idea of the project and what not. Question to follow.

Game Job Finder
This is a personal project that I began working on in the Fall of 2010. It helps automate the process of finding a job in the video game industry. This project is still a work in progress.

Old Site | New Site(no graphics yet)

I was looking for jobs in the games industry and was having a rough time finding much of anything I was qualified for. I exhausted places like creativeheads.com and gamejobhunter.com pretty quickly and started looking at job pages on several hundered company sites each week to see if there were any new jobs. I’m not going to lie, this took a lot more time than I wanted to devote to mindlessly clicking links. So I decide to script it. First, I wrote a standalone script to search a list of webpages and open pages to the sites that it found matches to. It only took a minute to run, and saved me a few hours a week. I was re-learning Python at the same time and thought, “All these sites (creativeheads.com) make companies fill out the same information that’s on their company website all over again, what if I make this a web app to cut the middle man?” So I began looking into python’s web functionality. This was a major learning process, but I got the server and site running pretty quickly. As you can see it’s very slow, it takes 30 seconds to get your results. I needed it to be faster, I needed a database system. I chose Google App Engine, after looking at all the systems, and I’m really glad I chose it. Although the site isn’t visually finished, I have achieved a massive speed up. Where a single searchword took 30 seconds, I can have 10 search words take under a tenth of a second. I’m still working on this, but I’ve hit a snag in the design. Hopefully a new version will be up soon!

So here is my issue. Right now i’m using regex to match keywords on the page. This limits me to non-flash sites, but I’ve accepted that. What I can’t accept is writing a custom parser for each website on the list. In my mind, this is bad for many reasons, mainly #1 and #2.
[ol]
[li]a lot of work
[/li][li]have to update the parser everytime the company upgrades their webpage
[/li][li]a lot of work
[/li][/ol]

for instance, Blizzard’s job’s page has an array at the top of the html, that has all the information for all the jobs. This in itself isn’t a huge problem, I just need to write a parser for it. The issue is writing a parser for each of the 400 websites.

It’s good in the sense of:
[ol]
[li]gives me more functionality
[/li][li]makes twitter, rss feeds, new jobs more of a possibility
[/li][li]gives me more search results
[/li][/ol]

Any opinions/thoughts on solutions to this, or do I just need to get really good at writing parsers?

Awesome! There is a real need for a game jobs meta search service. I’m not aware of a comparable, similarly specialized offering.

As for the dilemma of how to accommodate all of the variation present in the various individual jobs pages/systems, if you could generate some momentum, you could probably convince many game companies and recruiting services to do the work themselves of integrating with your service. In the meantime, you might consider just returning a minimum of information for matches on pages/systems that aren’t fully support. For example, a marginally supported match result might look like the first returned search result in the example below.

search results for “script monkey”:

-keywords match found at “http://www.unsupportedGameCo.com
-Script Monkey position at “http://www.supportedGameCo.com/jobs/101
-Script Monkey position at “http://www.supportedGameCo.com/jobs/123

In the case of an unsupported match, it would be up to the user to verify whether or not an opening actually existed, and if a valid position did exist, the user would likely need to manually dig around the page or service to find the job’s actual description.

This is a job for some sort of teachable parsing system- simple is fine.

You’re right in that you will NOT be able to create a custom parser- it is pointless except for well-established formats or sites, and even then probably not worth it. I’d create parsers only to get info into your ‘decision engine’- ie, the twitter parser would just parse twitter feeds to feed into (possibly following links?), NOT make decisions on what the job is. Same for RSS.

It wouldn’t be too difficult to drive your engine via regex. You could even just ‘sanitize’ the html and content into something more regex-able. Remove all punctuation, etc.

From there, you just run the actual parser on your content, and tweak it to give you better and better results. I’ve never done this sort of thing so I can’t say how it is best done- the teaching aspect is probably best done by making the parsing engine data-driven and just tweaking the data/regexes. In order to ensure the quality of your tweaks, you’d probably want some sort of testing harness set up, so you can tweak and verify that you didn’t lose valid results.

Ultimately this is one of those places where you can turn an otherwise incredible amount of software architecture into a scalable algorithm. The trade-off is that algorithms can be more difficult and should be tested very, very well.

Thanks for the responses guys. I’m glad you confirmed my thoughts on creating a parsing system…I’ve been doing some research, mainly looking at how people are building their websites, trying to generate regex matches and what-not that will give me the best results. I’m not 100% sure what you mean by teachable parsing system. Something along the lines of genetic algorithms? Or assigning values to regex matches? (is it a link: +1, is it in the “jobs” div? +2, etc…) and best sort by matches?

As for testing I’ve written a base case for searching. Basically it takes 15 or so keywords I’ve gathered and will run a couple of times a day and record how many matches were found. I’m using the most basic type of matching I can find right to hopefully give me the most number of matches. This will hopefully give me an idea of how many searches I should be able to find, and if I mess up the searching, it will be something to compare it to.

I’ve also re-written the html parsing. I was parsing based on EOL characters, but after reviewing the sites source, saw that a lot of pages have links spanning multiple lines. I was missing perfectly good matches. I’ve turned it into a Tree based architecture which I should have done in the beginning, which also allows me to filter out more things. such as <head> data, Makes my DB much smaller, (especially for people still putting css in html files.) and just makes more sense.

I’m sure I’ll have more questions. I appreciate the help so far.

I wrote a search engine + crawler once so I might be of help. Depending on how in depth you want to get you biggest thing is going to be on you keyword weighting algorithm.

They way my system worked (for just the keyword weighting, which is all you should have to worry about) is to first parse through the full html of the site and weight the words based on context in the html (is it in a <h1> tag and so on). then I would strip the html and do a second weighting pass. Then I would us those weights and the stripped content and do a keyword pairing weighting pass. From there you add it all up (I stored it all separably) and add it to your database.

Not sure what your coding in (I did mine in PHP and MySQL) but if you want any help just let me know.

Ok, but what kind of game artist? One of the first things you might want to think about is what area you want to specialize in. Is specialization absolutely necessary? No, it’s not…in fact the smaller game companies will probably be looking for artists that can do a lot of different things. If a small company has say, two artists, those artists are going to need to know how to do a lot of various art related tasks. But as companies get larger and have say, 20 or more artists, there’s probably going to be more specialization going on.

So that’s probably a good place for us to start…let’s take a look at the different types of game art specialties. These are some of what you’ll frequently see listed at game company websites under the Now Hiring or Jobs Available sections.