View in #code_tips_and_tricks on Slack
@theodox: so, fun find today.
I was looking at some old code:
def joint_chain(start, end):
'''
This will return a list of the chain of joints from start to end.
'''
chain = list()
end_path = [cmds.ls](http://cmds.ls)(end, long=True, r=True)[0]
end_parts = end_path.split('|')
for i in range(len(end_parts)):
if chain:
chain.append(end_parts[i])
if end_parts[i] == start:
chain = [end_parts[i]]
if end_parts[i] == end:
break
return chain
and I though “you look like you need a nice speedy generator!”
def joint_chain(start, end):
def upwards():
longstart, longend = [cmds.ls](http://cmds.ls)(start, end, long=True)
yield longend
while longend and longend != longstart:
longend, sep, ignore = longend.rpartition("|")
yield longend
but… timeit disagrees:
timeit.timeit(lambda: old_joint_chain('joint3', 'joint15'), number= 1000)
# Result: 0.1306009
timeit.timeit(lambda: new_joint_chain('joint3', 'joint15'), number= 1000)
# Result: 0.2758734
@bob.w: well in the first one you, you’ve got one single call to split, and the second you’re calling rpartition
for each iteration.
@theodox: however if you change
longend, sep, ignore = longend.rpartition("|")
to
longend = longend.rpartition("|")[0]
then
print timeit.timeit(lambda: joint_chain('joint3', 'joint15'), number= 1000)
0.0420124
@bob.w: wait really?
@alanweider: holy crap
@bob.w: So just unpacking was messing it up that bad
@theodox: I actually think it was allocating the names and then throwing them away
@bob.w: huh. It does introduce a new loop when you do an unpack.
@theodox: that cost all the time
@bob.w: For awhile the vscode debugger was catching on the IndexError it raises to know its done unpacking.
@theodox: fun
@bob.w: Yeah, made the thing basically useless.
I wouldn’t have expected it to have that kind of performance impact though
That’s just nuts.
@theodox: another case of 'don’t optimize on faith`
@bob.w: Truth.
@theodox: I’ve heard a lot of talks on the theme of ‘moving memory around is more expensive than doing things’, this is a prime example
@bob.w: So if I run the bad one, I get: # Result: 0.0001962666427317572 #
vs # Result: 0.00020699426048054193 #
For the improved one. So about the same
oh no wait, I see what I’m doing wrong.
There isn’t any call to upwards.
@theodox: hehe that makes it fast!
@bob.w: Okay, so I’m getting # Result: 0.02910402692828029 #
vs # Result: 0.02962724210254919 #
@theodox: hmm, did I mess up the test? That was the only change…
@bob.w:
import timeit
def joint_chain(start, end):
def upwards():
longstart, longend = [cmds.ls](http://cmds.ls)(start, end, long=True)
yield longend
while longend and longend != longstart:
#longend, sep, ignore = longend.rpartition("|")
longend = longend.rpartition("|")[0]
yield longend
return list(upwards())
print(timeit.timeit(lambda : joint_chain("joint3", "joint15"), number=1000))
@theodox: wierd, now my results have converged too
We need a “red herring” emoticon
@bob.w: But the original, non-generator: 0.0904211394535
vs generator: 0.0275231661685
@theodox: there’s still a delta for me, but it’s only about 4% not 3X
your machine is faster than mine, methinks 
@bob.w: Maybe. I honestly have no idea what is in this box
So the generator is still a win
at least for me
@theodox: but the ratio is consistent, the generator version is about 3x faster on both ends
@bob.w: yeah
The big difference in the profiler, seems to be the call to ls
the recursive
flag seems to be the killer.
Goes from 0.023
to 0.084
@theodox: interesting, that’s got to be maya internals
@bob.w: Yeah, I’m honestly not surprised that their recursive is slower than just messing with the strings directly.
@theodox: It should go through the dag on the back end though, not python string mongering
@bob.w: Exactly. its got the walk the nodes, extract the names, and push them out as strings. And it’s probably doing some actual logic to ensure its got everything correct.
Whereas just doing the string splits, is us trusting maya to have been correct the first time, and hoping it continues to play by the rules in the meantime.
@theodox: Here the 1990s threading model is our friend
@bob.w: hahaha, point
@theodox: I think the old version has a latent bug, since it reverts to short names which might not be unique. That rarely matters but when it does it’s a pain in the ass
@bob.w: True. Plus if you pass in long names at the start it will give you an empty list, as start won’t ever equal one of the parts.
The new one guarantees that all the comparisons are done on long names, which solves both problems.
@ldunham1: This was an interesting read!
I’m glad you guys are on here 
@theodox - not sure of the preference for rpartition over rsplit if you’re not using the separator? - I may have misread.
@bob.w: rpartition
is always guaranteed to return 3 items, even if they’re empty strings.
So its a bit more reliable in loops like that
@ldunham1: Gotcha, which makes sense when unpacking.
But for the alternative with only using the first index, then it would be more suitable (if it also had a performance improvement?)
I think I remember reading somewhere that unpacking in python is a little heavy/expensive. Will have to remember where
Ah no. That’s extended unpacking in python 3
In fact, I’m reading that unpacking should be generally quicker than accessing index by avoiding the calls to getitem etc
Although examples often use _ rather than dedicated variables for remainders
@theodox: rpartition is good for this application because it pops things off the end. split()
would give you a list, and you’d have to either take the short names (the old code did that – but it might be a problem in some cases) or re-assemble the pieces (which would be slow)
@ldunham1: But with rsplit(‘|’, 1) doesn’t it just pop off the end too?
Or return a list with the original string if no separators found?
On mobile ATM, unable to check 
@theodox:
"a|b|c".rsplit("|") # Result: ['a', 'b', 'c'] #
"a|b|c".rsplit("|",1) # Result: ['a|b', 'c'] #
so yeah, that’s an improvement
OTOH it seems slower… interesting
yeah, same code runs about 25% faster with rpartition
, though I’d not have guessed that
of course ‘perfromance’ is not the right metric for this kind of code anyway, it’s just a fun thing to noodle on. But writing for clarity is more important for this kind of stuff
@ldunham1: Hmm that’s interesting!
Used rsplit a fair amount when hierarchy crawling, considering how frequently it may have been used, squeezing a little more out of it (25% is definitely potentially noticeable) is low hanging fruit 
Agreed, if we’re looking for performance, we wouldn’t be using python…
But, with older, well established code. Sometimes small ‘optimisations’ like this can go a long way 
@theodox: looking for functions that return lists that only ever get iterated on, and returning generators instead is a nice way to find speed
@passerby: would that be a speed up though if you have to iterate every element, or if you have to do so more then once
in cases where you might be returning or breaking early it would defiantly save memory though
@bob.w: Yeah, usually a generator is more about saving memory, and saving time if you don’t iterate the entire list.
import timeit
def joint_chain(start, end):
def upwards():
longstart, longend = [cmds.ls](http://cmds.ls)(start, end, long=True)
yield longend
while longend and longend != longstart:
#longend, sep, ignore = longend.rpartition("|")
longend = longend.rpartition("|")[0]
yield longend
return list(upwards())
def joint_chain2(start, end):
chain = []
longstart, longend = [cmds.ls](http://cmds.ls)(start, end, long=True)
chain.append(longend)
while longend and longend != longstart:
longend = longend.rpartition('|')[0]
chain.append(longend)
return chain
print(timeit.timeit(lambda : joint_chain("joint3", "joint15"), number=1000)) # 0.0514771011074
print(timeit.timeit(lambda : joint_chain2("joint3", "joint15"), number=1000)) # 0.0484267352377
In this particular case, returning a list is only marginally faster.
The real speedup vs the original function was avoiding the ls
call with recursive=True
@theodox: this is a slightly special case because the likelihood of a long list coming back is low. The ideal generator v list case is one where you only want some one item from the middle of the list and you can bail when you find it. Then you have never had to allocate and re-allocate the list for all those other items
it’s not really memory vs perf: memory is perf if the lists are not trivially small
@bob.w: Also true
if there’s half a chance of the list being huge, a generator is going to be a win. I tend towards them because its super easy to convert a generator into a collection. Going the other way isn’t really a thing.
Also one you can escape to the better version of python, chaining generators with yield from
becomes a wonderful feature.
@theodox: ++