The Tech Art Aggregator

So for my personal project in my month off, I’m writing an aggregator for web content related to TAO and tech artistry. Mostly this is to work on my python skills and fill in the gap of my knowledge (web-based systems and architecture) but ‘social media’ is one area TAO has been behind the curve in.

Aggregating content technically isn’t the hard part- I can pull from Twitter, WordPress, any RSS, FaceBook, and LinkedIn, already. The questions is, how do I choose what content to aggregate on each of these platforms, and what do I do with the content?

Is that question clear enough or does it need more information to be answered effectively?

ummm…have you seen this? http://www.youtube.com/watch?v=v2vpvEDS00o&feature=player_embedded

Hey Rob,

I’ve found that social media has a lot of noise – but I think if someone posts an item – and that item is re-tweeted, reposted, etc – then it may be of interest to others.

LinkedIn attempts to show you relevant news stories using this method right now by showing you items that were tagged by other users in your network – and the number of times it was ‘liked’ or suggested. If you have the iphone linkedin app, you can view the “news” section to see what they are doing. The top news story for me right now shows that “1093” shares occurred within my network to three degrees.

I have found you really need a human element to help with the filtering and content submission. You can auto grab content also, but a internal voting and ranking system is likely to be the best way to ensure good links.

I wrote a vote based content aggregation site before, but it was much broader. If you a non user voting system, I would recommend looking at Google’s Page Rank (PageRank - Wikipedia) but you have to come up with a system to pull down stale content.

If you want any help, I’d be happy to lend a hand, although I did all my stuff in PHP. Before I was a TA I was a SEO ‘engineer’ and wrote a bunch of tools and websites right up this alley.

Chad

Edit: Also just thought, depending on how you do your auto aggregation, beware the bandwidth. I shut my site down when my ISP shut down my connection and launched a formal investigation against me. Apparently they think you are doing very bad things when you use up 850gb of bandwidth in 3 weeks.

For real quality you need an editor who knows the subject matter to choose articles and news.

Basing rankings on re-tweets or likes is only useful when the likers and re-tweeters are TAs themselves. If it’s everyone else, you might just end up with pages that are interesting for everyone else but not necessarily for TAs.

“Likes” can also work in a negative way. If the likers are just uneducated about a topic, it will not even appear on your radar because it’s not liked enough. There was an interesting article on slashdot recently where some folks invented a 8 bit upscale algorith based on splines which was really cool. But nobody else understood it, because to them HQ2x is already good enough, even though it works totally differently. Most people don’t “liked” this article because they didn’t understand it. Dumbness of the crowds… it’s a two sided sword.

The other problem I have with likes is that they often don’t produce any real news. Lots of people like the same, so stuff gets posted everywhere anyway. If I’ve seen it on 10 websites before it’s hardly news and you can save your bandwidth and time. Same reason newspapers suck nowadays. They buy all their stuff from AP rather than doing their own digging to come up with interesting and new content that sets them apart and gives them readership.

Now you could do a pre selection based on facebook/linkedin/etc ranking, paired with user submissions, so content that is unpopular with the general internet populace but which may be interesting for us TAs gets posted.
Once the story is in the TAO system, rank it internally by how many clicks it gets. Each click on a news items increases it’s lifespan of how long it occupies the headline, before it slowly moves down into the archives.

My current plan is to have it editor-centric and deploy a desktop app to anyone who wants to be an editor (I may make it web-based at some point). The aggregator would pull down content from a number of pre-defined sources- blogs, twitter, facebook, any rss feeds, etc. The pre-defined sources can also be collectively edited by the editors. The editors could then ‘push’ any of the aggregated content into a number of feeds- we could have specific feeds for topics or sources, however we want to split it up.

I feel like we have a small enough community that this sort of manual editing would be good enough- I am not looking to take on or replace any ‘social media’ site, just provide some sort of integration for this community. We can, of course, use the much more sophisticated API’s of a site/service to try to pull down relevant content. Then we can avoid any complex algorithms on our side, and rely on a simple ‘popularity’ keep-alive system like Kist mentions.

Chad, what was the cause of such gargantuan bandwidth? I don’t plan to do anything with images, I’m not sure how much 850gb of text would be…

My version had a crawler. I had ~500 sources to search and I would ping these through the day, but I would also crawl from there to discover new content.

Since I had a shared host, I did all the crawling from my home machine.

Just a heads up- there may be some strange posts on the forum and blog while I try out some ideas.

I’m still looking for good ideas on where this content should go- it’d be easy to put it on a twitter feed and an rss, what I’d love to do is hook it up to the forums so we could have a forum that links to the content are posted to and discussion can possibly go on there.

I also need to look at this forum/blog integration and see if it’s worth salvaging or scrapping.

And I’ll need to figure out some way to crack the VBulletin API… they don’t seem to have moved into the web 2.0 age with their API.

Alright I’m officially going forward with this unless I get some better ideas. I’ll write up a more official post after I get some feedback. Here’s the idea:

Members can sign up to be ‘editors.’
Editors can add sources to a content feed (rss/atom, twitter, facebook will be supported initially, can add more as needed)
Editors can use an app (desktop, later on web and hopefully mobile) to view content from the feeds.
Editors can use the same app to ‘forward’ content to the tech-artists.org blog (which I’ve upgraded, fixed up, and disabled the VB integration plugin).
So members (and non-members) can subscribe to the TAO blog RSS to get an aggregated feed of all editor-approved content.

Ideally, having a mobile or web app will allow a larger number of editors to do relatively little work in finding and posting content. It would be a convenient thing to do during a commute or on the toilet, you know, when you’re checking your other RSS feeds :wink:

Alright, basic system is working. I’m going to start using it to deploy content to the blog (a couple things have been deployed already). I’ll be tweaking the format of the blog to provide better re-direction, formatting, and features. Suggestions welcome. I’ll also be working this week on getting the app deployed to the server, rather than running on my home machine.

If you want to take a look at the actual app, ping me on IRC.

This is probably an obvious improvment, but is it possible to get some syntax highlighting/code autoformat that is more legible and easy to copy/paste?

Thanks for the feedback and ideas. Further discussion can move into the FAQ: http://tech-artists.org/forum/showthread.php?t=1770.