I was wondering if someone could explain this script to me and also a cleaner way to write it. I have been using it instead of the sort() because it gives me more organized sorted list, but I can’t really wrap my head around how or why it works. Thanks for the help.
def natural_sort_key(key):
import re
return [int(t) if t.isdigit() else t for t in re.split(r’(\d+)', key)]
def sorted_nicely(strings):
“Sort strings the way humans are said to expect.”
return sorted(strings, key=natural_sort_key)
sorted_nicely is running each item in ‘strings’ (collection of strings) through the natural_sort, and actually comparing that. So you need to see what natural_sort_key is returning for a string.
This could be split up into a few lines:
splitOnAnyNumber = re.split(r’(\d+)’, key) #try this out in an interpreter
So you’d have: ‘abcd’ => [‘abcd’]
‘ab2c’ => [‘ab’, ‘2’, ‘c’]
‘ab23c’ => [‘ab’, ‘23’, ‘c’]
‘ab2c3f’ => [‘ab’, ‘2’, ‘c’, ‘3’, ‘f’]
And so on.
The next part is saying, convert each string in that split list into a number, if it is a number:
convertedStrings = []
for piece in splitOnAnyNumber:
if piece.isdigit():
convertedStrings.append(int(piece))
else:
convertedStrings.append(piece)
Then that list (convertedStrings) is returned, and THAT is what is compared.
When collections are compared, they are compared item-by-item (and [1, 2] < [1, 2, 3] as well). And ints are always ‘less than’ strings in python. So numbers will always be ‘earlier’ than strings, and numbers will sort against each other properly (and obviously strings will sort against each other properly), and longer strings will be sorted to be ‘after’ shorter strings all other things equal.
Stuff like this is easiest to analyze in the interpreter, if you’re still unclear.
Wow, thank you so much Rob for the explanation. I was just about to say that it wasn’t making much sense to me, but then I tried it out line by line and I finally understood what was going on. And now I have a much better understanding of it.
I guess the question I have now is, what exactly is re.split(r’(\d+)’, key) doing. What does the r’(\d)’ mean?
The ‘r’ character indicates a raw string. You can google ‘python raw string’ to read about it.
People often use raw strings when writing regex, because regex is actually an entire separate language that is contained in a string. So normally you write ’ foo’ and that means ‘<tab space>foo’, or if you write ‘\ foo’ that’d mean ’ foo’, the characters, which is actually what you want with regex. It sort of requires understanding how string parsing works, which a lot of people get wrong.
So the ‘\d’ characters in regex mean something (any digit), and the paranthesis mean it is a ‘group’, in thsi case, I don’t think the group matters but I’m probably wrong.
That’s the thing with regex, is it is an entire nother language, that you tend to never use enough of to remember. So I’d read up about it and not worry much about understanding as you read it for now (everyone uses ‘cheat sheets’).
Arite I definitely will check it out and read more into it. Thank you for shedding some light on this problem, because now I at least have a start and don’t feel as lost. Very much appreciated!