View in #pipelines on Slack
@ozzmeister00: anyone want to jam on asset databases for a bit?
@dhruvagovil: Sure go for it
@ozzmeister00: right now our asset db is a json file that contains a list of dicts describing our assets. This works, mostly, because we don’t have to change it that often, but we’re about to be in a situation where we might have to merge changes to that json file from multiple sources making changes and additions simultaneously. I’m worried that we’re going to run into more problems than I’m willing to solve if we have to constantly manage that merging.
my current thinking is to split each asset out into a separate file, and let the main database file just store links to each of those files but that feels… wrong somehow
@jhultgre: We used to have a json describing all of our rigs. Most of that has been moved into shotgun now.
@instinct-vfx: you may want to look into moving that into a nosql db for faster queries if you split out. Relations should be way more efficient that way
plus you get queries without file hits
@dhruvagovil: Or just use postgres with a jsonb field
Gives you the advantages of a SQL with the laxness of a nosql
Unless this has to live inside your project repository. Generally I’d use an actual database to store all the data and do all queries etc.
If I need an immutable snapshot of the data for an engine or something, I’d do a snapshot dump into an sqlite db or something.
Generally I’d avoid JSON as the format because it’s hard to do queries without loading the whole thing in memory and parsing it.
@instinct-vfx: Yep, it escalates quickly if the file grows beyond a certain size.
@alex.fleming: I am currently refactoring a pipeline that was setup to have per asset CSVs with a lot of the same data duplicated in Shotgun. It’s been a nightmare making sure the CSVs were sync’d properly in multiple locations. I would definitely recommend the real database route. I’m converting all of our CSV calls to Shotgun API calls. I don’t like it as much as when I had written a custom Django solution for asset tracking, but it’s so much better than file based in my opinion.
@bob.w: So @adam.pletcher would be the better person to speak to this. But we’ve had a lot of success using perforce attributes as our metadata database for assets. Which solves one of the more annoying problems, keeping source control and the database in sync.
We use a local sqlite database as a cache so that we’re not constantly hammering perforce, but perforce does act as the arbiter of truth.
Granted this method doesn’t work if you’re not using perforce.
But I assume that other VCS systems have some kind of similar metadata storage mechanism.
@dhruvagovil: One thing to keep in mind is that sqlite isn’t great for concurrent read/writes from multiple sources versus a server based solution like postgres, but is still a lot better than a text based solution. This is compounded more if you’re storing the sqlite db on a fileshare, especially one with caching because there’s a lot of chances of race conditions.
@bob.w: Right, that’s why the sqlite is a local cache
Every user has their own, so single reader single writer
@dhruvagovil: And make sqlalchemy your best friend
@bob.w: But yeah, if the goal is to have a single database for all users, don’t go with sqlite, use postgres, mysql, or mssql.
Or some random nosql thing. I don’t know as much about those
@dhruvagovil: I recommend against NoSQL because it’s great to get started but it can really bite you later.
Especially now that postgres has JSON fields
@bob.w: That is what I’ve heard. And yeah, postgres with json sounds like a much better idea.
@instinct-vfx: We went multi-model DB with our internal asset management effort. For multiple reasons (mostly for performance, cause we love the stack and because we plan to build on Graph stuff later on)
standard nosql we only use for non-persistent data
e.g. we have a mongodb in our render-job generation system that gets JSON dicts for every renderjob and holds these until submitted
@dhruvagovil: Yeah for stuff like that it makes sense
Especially since there’s not a lot of join ops etc going on
@instinct-vfx: yeah and we don’t query anything inside the documents. We use it as a high performance temporary json store heh
@ozzmeister00: I’ll give postgres/json a peek and report back
thanks for your input everyone!