[Maya / Python] Performance & Memory

I’m performing a very heavy task that basically goes… for every mesh, for every color set (3 each) on each mesh, shove the specified channels into a new color set by iterating over every vertex in the mesh. The reason for this is that… for one of those kinds of reasons that result in a slow head shake with averted eye contact, Maya can’t paint vertex colors per channel. The shader I’m developing takes information from all four (RGBA) channels to use as information for the shader, such as how much shading occurs at each vertex or specularity, the channels aren’t related.

For the heavier meshes, while I do it, it will use my RAM well beyond capacity and windows compresses it as it goes and it takes forever. As I’m posting this Maya is using > 6GB of RAM thats continuously being compressed just to keep it’s head above water.

Questions:
[ul]
[li]Why does Maya use <20% of my CPU, or more specifically, what is the bottle neck here - I assume RAM speed?[/li][li]Why isn’t it releasing the memory as it goes through each loop, can I force this? What is being stored that has to be kept every iteration, other than the current index…? It shouldn’t use that much memory, right?[/li][/ul]

My script is very specific to what I’m currently working on and it uses hard-coded values to save time. The description above iterated through each color set as well, I’ve modified it hoping it will aid performance. It takes a very long time and my system barely survives.

I’m very likely doing this wrong, but the point of this thread isn’t how to achieve what I’m doing, it’s very specifically the questions listed.

couple of Q’s

  1. Are you using API calls or just plain cmds? Usually for this kind of thing the API will be significantly faster than cmds
  2. Are you thinking about how to manage your loops? If you are generating a lot of long lists – instead of, say, using generators – you’ll be fragmenting memory like crazy.
  3. How big is the mesh in question? For counts in the low thousands (1) and (2) should not matter. If you’re crawling at that size, it’s probably a frankenstein loop that’s generating unneeded work. For scores of thousands of verts it will be slow and cmds vs api times will start to dominate.

If you are really running out of RAM that is the reason things go slow: you’re dropping out of electronic access times in real RAM into mechanical access times for VRAM. That’s death.

The way you do your loops will determine a lot about how much you have in memory at any given time and how long Python will wait before dropping things. What goes in there also matters. Code or at least pseudocode would make it much easier to diagnose.

(edit) Also: you’re just doing isolated ops on each vertex? Or does processing one vert involve asking questions of others?

[QUOTE=Theodox;29428]couple of Q’s

  1. Are you using API calls or just plain cmds? Usually for this kind of thing the API will be significantly faster than cmds
  2. Are you thinking about how to manage your loops? If you are generating a lot of long lists – instead of, say, using generators – you’ll be fragmenting memory like crazy.
  3. How big is the mesh in question? For counts in the low thousands (1) and (2) should not matter. If you’re crawling at that size, it’s probably a frankenstein loop that’s generating unneeded work. For scores of thousands of verts it will be slow and cmds vs api times will start to dominate.

If you are really running out of RAM that is the reason things go slow: you’re dropping out of electronic access times in real RAM into mechanical access times for VRAM. That’s death.

The way you do your loops will determine a lot about how much you have in memory at any given time and how long Python will wait before dropping things. What goes in there also matters. Code or at least pseudocode would make it much easier to diagnose.

(edit) Also: you’re just doing isolated ops on each vertex? Or does processing one vert involve asking questions of others?[/QUOTE]

  1. plain cmds, yeah I could do it with the API but it was just meant as a quick and dirty method, I’ve noticed I can do it inside of Maya by setting color set operation to ‘add’ instead of ‘over’, duplicating all 3 primary sets, and merging them with a new 4th set to get the same outcome without spending much time.
  2. I’ll just post the code at this point
  3. 5563 poly / 5721 vert

I’m interested to hear how I should be doing this where memory usage is concerned, since it will make good reference in the future.

Here is the code:

import maya.cmds as cmds

def cs_buildSet():
    for mesh in cmds.ls(sl = 1):
        cs_replaceValues(mesh)
    cmds.select(clear = 1)
    print "
Completed Successfully",

def cs_replaceValues(mesh):
    vtxCount = cmds.polyEvaluate(mesh, vertex = 1)
    
    if not "tolerance" in cmds.polyColorSet(q = 1, allColorSets = 1): # make color set if it doesn't exist
        cmds.polyColorSet(colorSet = "tolerance", create = 1)
    
    for vertex in range(vtxCount-1):
        vertex = mesh + ".vtx[" + str(vertex) + "]" # build a vertex string, eg mesh.vtx[]

        cmds.polyColorSet(colorSet = "shading", currentColorSet = 1) # set source color set
        value = cmds.polyColorPerVertex(vertex, q = 1, r = 1) # query value
        cmds.polyColorSet(colorSet = "tolerance", currentColorSet = 1) # set destination color set
        cmds.polyColorPerVertex(vertex, r = value[0]) # apply value
        
        cmds.polyColorSet(colorSet = "shading", currentColorSet = 1) # set source color set
        value = cmds.polyColorPerVertex(vertex, q = 1, g = 1) # query value
        cmds.polyColorSet(colorSet = "tolerance", currentColorSet = 1) # set destination color set
        cmds.polyColorPerVertex(vertex, a = value[0]) # apply value
        
        cmds.polyColorSet(colorSet = "outline", currentColorSet = 1) # set source color set
        value = cmds.polyColorPerVertex(vertex, q = 1, g = 1) # query value
        cmds.polyColorSet(colorSet = "tolerance", currentColorSet = 1) # set destination color set
        cmds.polyColorPerVertex(vertex, g = value[0]) # apply value
        
        cmds.polyColorSet(colorSet = "spec", currentColorSet = 1) # set source color set
        value = cmds.polyColorPerVertex(vertex, q = 1, b = 1) # query value
        cmds.polyColorSet(colorSet = "tolerance", currentColorSet = 1) # set destination color set
        cmds.polyColorPerVertex(vertex, b = value[0]) # apply value

Oh, I noticed some old left-over code from before I modified it hadn’t been fixed and that was causing the enormous memory usage, but then that throws confusion into the processor doing so little. Is it executing on a single CPU thread? That’d be really bad.
Edit: Researched, there’s no multithreading in Maya, it requires a lot of work-around.

Originally I had it taking all vertices and looping them with:


vertex = mesh + ".vtx[0:" + str(vertex) + "]"

When it should of been this:


vertex = mesh + ".vtx[" + str(vertex) + "]"

So it was storing data for a massive amount of vertices, so it’s cleared up the RAM issues but not the processing speed.

Yep – range syntax = evil most of the time! But the color set blend is the way to go, much faster than python.

At my previous job we wrote a script to extract RGB channels for painting and re-assembly, I believe it was written using the Python API. However, Vertex Chameleon is now open source, you can download and compile it. It offers quite a range of vertex color painting options.

BTW that posted code is going to miss the last vertex in every mesh: range (x) goes from 0 to x-1.

FWIW Maya is itself single threaded. You can do offline jobs in other threads – but touching the scene and/or the UI is only on the main thread. It’s one of Maya’s biggest limitations. They’ve just begun moving the DAG graph to multiple cores but it’s not complete – and in any case, Python itself is also effectively single threaded because of the GIL.

Also the way you have this written I’m not sure what will happen if there you have ‘hard edges’ in your vertex colors: like normals you may have more than one per vertex if they user has painted with face selections and the vertex-face option.

Here’s an untested thought experiment on how you could do this in cmds without anything too fancy but using generators to keep the memory load lower:



    
    
def get_colorset(mesh, setname):
    cmds.polyColorSet(mesh, colorSet = setname, currentColorSet = 1) 
    return iter(cmds.polyColorPerVertex(mesh + ".vtx[:]", q=True, rgb=True)) #  generator!

def extract_channel(colorset, stride):
    """
    get the Nth component from a stream
    """
    for offset in range(stride):
        colorset.next()
    while colorset:
        yield colorset.next()
        colorset.next()
        colorset.next()

def assemble( red_stream, green_stream, blue_stream):
    """
    put them back together
    """
    while red_stream:
        yield (red_stream.next(), green_stream.next(), blue_stream.next())
        

def merge_channels(model, red_set, green_set, blue_set):
    try:
        cmds.undoInfo(ock=1)
        red = extract_channel( get_colorset(model, red_set), 0)
        green = extract_channel( get_colorset(model, green_set), 1)
        blue = extract_channel( get_colorset(model, blue_set), 2)
        cmds.polyColorSet(colorSet = "output", create = 1)

        cmds.polyColorSet(colorSet = "output", currentColorSet = 1)
        
        for idx, rgb in enumerate(assemble(red, green, blue)):
            cmds.polyColorPerVertex(model + ".vtx[%i]" % idx, rgb= rgb)
    finally:
        cmds.undoInfo(cck=1)
    

merge_channels('pSphere1', 'colorSet', 'colorSet1', 'colorSet2')

It’s annoying that there’s no way to bulk set the values though.

great example, maybe a topic for your next blog post? :slight_smile:

split task in right way try multi-thread again

[QUOTE=Theodox;29436]FWIW Maya is itself single threaded. You can do offline jobs in other threads – but touching the scene and/or the UI is only on the main thread. It’s one of Maya’s biggest limitations. They’ve just begun moving the DAG graph to multiple cores but it’s not complete – and in any case, Python itself is also effectively single threaded because of the GIL.

Also the way you have this written I’m not sure what will happen if there you have ‘hard edges’ in your vertex colors: like normals you may have more than one per vertex if they user has painted with face selections and the vertex-face option.[/QUOTE]
according to official maya manual multi-thread does not work, but if you can split task in right way, it works pretty good, I had a per mesh per vertex task, it can running with 12 cores(with maya UI) got 10 times faster than single threading

can you elaborate on what you mean by “split task”?

In this case you’d still be stuck, because access to the contents of the Maya scene is only via the main thread: you’d have to repeateadly pause execution and use an executeInMainThreadWithResult() to get safe access to the scene. It would only make sense if the calculations were very expensive compared to the scene access. If you could dump the relevant info into a thread-safe data structure first and parcel that out to the threads it would work, though that’s going to be a memory hog in most cases.

[QUOTE=Theodox;29468]In this case you’d still be stuck, because access to the contents of the Maya scene is only via the main thread: you’d have to repeateadly pause execution and use an executeInMainThreadWithResult() to get safe access to the scene. It would only make sense if the calculations were very expensive compared to the scene access. If you could dump the relevant info into a thread-safe data structure first and parcel that out to the threads it would work, though that’s going to be a memory hog in most cases.[/QUOTE]

this is Orthodox ,regular way, I guess few people want to do something like this,
if you spend a little more time on split task correctly , do not need so complex but tricky
when i get time , i will post the maya multi-thread(100% use multi-cores cpu) script for heavy, splittable task in here

Are you farming it out to multiprocessing rather than threading? AFAIK that spins up separate copies of Maya unless they’ve finally fixed that.

[QUOTE=Theodox;29496]Are you farming it out to multiprocessing rather than threading? AFAIK that spins up separate copies of Maya unless they’ve finally fixed that.[/QUOTE]

Doesn’t look like it has been fixed in 2016 yet. The best method I’ve seen for dealing multiprocessing from within maya involves using subprocess to launch mayapy, and then running the multiprocessing script from that instance. Collecting the data, and then piping it back before closing the subprocess.

The real bottleneck here is just the cmds.polyColorPerVertex calls. Having to call this command at least once per vertex is massively slow. Here is a little benchmark:


from itertools import izip
from contextlib import contextmanager
from functools import partial
from operator import add
import timeit
from maya.api import OpenMaya
from maya import cmds


def new_colorset(mesh, colorset):
    cmds.polyColorSet(mesh, create=True, colorSet=colorset)
    cmds.polyColorSet(mesh, colorSet=colorset, currentColorSet=True)


def get_colorset(mesh, colorset, **kwargs):

    cmds.polyColorSet(mesh, colorSet=colorset, currentColorSet=True)
    return cmds.polyColorPerVertex(mesh + '.vtx[:]', q=True, **kwargs)


def merge_colorsets(mesh, newset, colorsets):
    '''maya.cmds'''

    # Time rgb value adding and packing
    start = timeit.default_timer()
    num_verts = cmds.polyEvaluate(mesh, vertex=True)
    colorsets = (get_colorset(mesh, s, rgb=True) for s in colorsets)
    colors_per_vert = [[0, 0, 0] for i in xrange(num_verts)]
    for s in colorsets:
        for i, j in enumerate(xrange(0, num_verts, 3)):
            colors_per_vert[i][0] += s[j]
            colors_per_vert[i][1] += s[j + 1]
            colors_per_vert[i][2] += s[j + 2]
    print 'pack and add color: %s seconds' % (timeit.default_timer() - start)

    # Time creating and apply verex colors
    start = timeit.default_timer()
    new_colorset(mesh, newset)
    vtx = mesh + '.vtx[%d]'
    for i, rgb in enumerate(colors_per_vert):
        cmds.polyColorPerVertex(vtx % i, rgb=rgb)
    print 'apply color: %s seconds' % (timeit.default_timer() - start)


def api_merge_colorsets(mesh, newset, colorsets):
    '''maya.api.OpenMaya'''

    # sum doesn't work with OpenMaya.MColor objects, make our own that does
    sum_mcolors = partial(reduce, add)

    new_colorset(mesh, newset)
    dagpath = OpenMaya.MGlobal.getSelectionListByName(mesh).getDagPath(0)
    meshfn = OpenMaya.MFnMesh(dagpath)

    # Time rgba value adding and packing
    start = timeit.default_timer()
    default_color = OpenMaya.MColor((0, 0, 0, 0))
    colorsets = (meshfn.getVertexColors(s, default_color) for s in colorsets)
    colors = [sum_mcolors(vert_colors) for vert_colors in izip(*colorsets)]
    print 'pack and add color: %s seconds' % (timeit.default_timer() - start)

    # Time creating and apply verex colors
    start = timeit.default_timer()
    meshfn.setVertexColors(colors, range(len(colors)))
    print 'apply color: %s seconds' % (timeit.default_timer() - start)


def setup_scene():
    cmds.file(new=True, force=True)
    cmds.polyPlane(name='colored_plane', w=24, h=24, sw=24, sh=24)
    mesh = 'colored_planeShape'
    new_colorset(mesh, 'redset')
    cmds.polyColorPerVertex(mesh, rgb=(1, 0, 0))
    new_colorset(mesh, 'greenset')
    cmds.polyColorPerVertex(mesh, rgb=(0, 1, 0))
    new_colorset(mesh, 'blueset')
    cmds.polyColorPerVertex(mesh, rgb=(0, 0, 1))


def benchmark():

    mesh = 'colored_planeShape'

    setup_scene()
    print 'maya.cmds merge vertex colorsets'
    print '================================'
    start = timeit.default_timer()
    merge_colorsets(
        mesh,
        newset='merged',
        colorsets=('redset', 'greenset', 'blueset')
    )
    print 'total: %s seconds
' % (timeit.default_timer() - start)

    setup_scene()
    print 'maya.api merge vertex colorsets'
    print '==============================='
    start = timeit.default_timer()
    api_merge_colorsets(
        mesh,
        newset='merged',
        colorsets=('redset', 'greenset', 'blueset')
    )
    print 'total: %s seconds
' % (timeit.default_timer() - start)


if __name__ == '__main__':
    benchmark()



I’ve tried to keep this fairly close to theodox’s script sans the clever use of generators and added a maya.api version for comparison. Here are the results:


maya.cmds merge vertex colorsets
================================
pack and add color: 0.00642395019531 seconds
apply color: 103.396903038 seconds
total: 103.408151865 seconds

maya.api merge vertex colorsets
===============================
pack and add color: 0.00123691558838 seconds
apply color: 0.00776481628418 seconds
total: 0.0191380977631 seconds

These results are pretty impressive, we’ve improved our run time by ~5000x by using the lower level maya.api. As you can see virtually ALL the time spent in the maya.cmds version is spent on cmds.polyColorPerVertex calls. The real lesson here is that our handling of data had nothing to do with the performance of our script. It had everything to do with a poorly implemented maya.cmds function.

Attempting to improve performance using multiprocessing or threading in this case would be preemptive. If we had a mesh with 200 million verts and 10 colorsets this would change. Then we might attempt to chunk this large data set up and use multiprocessing.pool to calculate chunks simultaneously. Even then we might have better avenues to go down before trying multiprocessing. For example, maybe we would use MItMeshVertex iterator to build up our resulting colors without storing each colorset in memory OR use numpy datatypes.

I guess this could all be boiled down to a simple list:
[ul]
[li]maya.cmds
[/li][li]maya.api
[/li][li]numpy
[/li][li]python multiprocessing/threading if you can break up your task
[/li][li]c++
[/li][/ul]

Start from the top and work your way down until you’ve solved your performance issues =).

[QUOTE=R.White;29498]Doesn’t look like it has been fixed in 2016 yet. The best method I’ve seen for dealing multiprocessing from within maya involves using subprocess to launch mayapy, and then running the multiprocessing script from that instance. Collecting the data, and then piping it back before closing the subprocess.[/QUOTE]

your right, in this way user not only can use full local cpu cores but also all cpus in LAN for splittable task(most pre mesh pre vertex task can be splited)
it is some kind multi subprocessing, I checked old script, it did not “import multiprocessing” , the multi subprocessing launched by threading