Help with splitting a file path

Hi guys,

I am pretty new at Python and I’ve hit a wall. I am reading a file path from an xml file and store it in a variable. What I need is to separate the file path from the file name and because in the path name I have " " it doesn’t return the desired result.

I’ve read about the raw strings and it works perfect if i just type in the path but how do you transform a string from a variable to a raw string? I bet is a simple thing but I cannot figure it out :frowning:

So here are couple examples of what’s happening:

this is with the normal string:

 >>> pathFromXML = 'D:\Backburner testing	est animation 01	eapot01_.png'
 >>> pathFromXML.split('\\')
 ['D:', 'Backburner testing	est animation 01	eapot01_.png']

this is with the raw string

>>> pathFromXML = r'D:\Backburner testing	est animation 01	eapot01_.png'
>>> pathFromXML.split('\\')
['D:', 'Backburner testing', 'test animation 01', 'teapot01_.png']

So how would I transform the value of ‘pathFromXML’ into a raw string?

Thanks in advance!
Anton

hey there,

you should be able to use the os.path.baseName() function

Thanks lkruel but the result is similar with the regular split.

The problem is with the path as it has the ’ ’ character so it does not see it as just a ‘’. If it’s a raw string then the parsing works properly…

try:

import re
re.split(r’\’,re.sub(r’ ‘,r’\ ', pathFromXML))

Heh, I think this is why paths should always be stored with forward slashes :slight_smile:

I guess this should do,


import os
#always put \\ because its an escape character
pathFromXML = 'D:\\Backburner testing\	est animation 01\	eapot01_.png'
dirName = os.path.dirname(pathFromXML)
fileName = os.path.basename(pathFromXML)
print dirName,fileName

:):

Unless the path is a directory ending in a backslash, raw strings are a great time saver. No need to escape them, you can paste in paths from explorer or whatever without adding double backslashes or changing them to forward.

pathFromXML = [B]r[/B]'D:\Backburner testing	est animation 01	eapot01_.png'

But the OP is already hip to that.

The problem is that I am reading that path from an XML file and in there is formatted with single backslashes that’s what started the whole thing. So basically my question is how to convert a regular string stored in a variable to a raw one.

loocas:
Your solution is the one closest to what I need but it works only with which is great but there are a lot of other escape characters like
, \r etc.

What I got until now is this:


import string
s = 'D:\Backburner testing	est animation 01
ew\x0bzzz\reapot01_.png'
for r in string.whitespace[:-1]:
   s = s.replace(r, '\\%s' % repr(r)[2:-1])

which works great but I think you should be able to convert them easier without using this workaround…

This doesn’t really help, but with C# and MaxScript (version 2009 at least) you can use @“C:\blahblah.txt” which helps sometimes for some test hacks…

For real purposes we’ve just made a few helper functions like “stringToFilename” which eat everything and return a valid filename or some error codes if the file doesn’t exist yet or is not writeable etc… Also useful for stripping out “//depot/this_project_name” since all of our file storage paths are always relative to the game source data root folder.

I’m curious how escape characters got into your variable by reading a line from XML. If the entry in your XML looked something like this:

<filename>D:\Backburner testing	est animation 01
ew\x0bzzz\reapot01_.png</filename>

… then reading it in with any XML parser (or even straight file readline, etc) would give you a string without escape characters in it. The only time those things need escaping is if you’re typing a non-raw literal string inside a script, or you’re typing it in IDLE or a similar command prompt.

The way you typed that “s” path in the example above would definitely leave you with escaped characters, since it doesn’t have the “r” raw prefix. But you said it came from XML, and I’m curious how.

Are you able to post a lengthier code snippet that includes the XML parsing/reading bit?

Here is the code:


tagOutput = theLines.getElementsByTagName('Output')[0]
tagName = tagOutput.getElementsByTagName('Name')[0]
pathValue = tagName.firstChild.data

And when I print pathValue it returns: D:\Backburner testing est animation 01 eapot01_.png

Using “print” on your value will never show double-backslash escapes, it shows the raw contents of the string. If there were escaped characters in it, the print result would be something like this:

D:\Backburner testing est animation 01 eapot01_.png

… with gaps where the " " was interpreted as tabs. But instead you’re seeing the correct raw string:

D:\Backburner testing est animation 01 eapot01_.png

It looks to me like everything is behaving as-expected. I might still be missing something, however.

Hi Adam, I didn’t know that when printed it will be displayed differently. That’ s great to know!

But what happens is that when I use any kind of split on the path read from the xml, whether is the “split” or “os.path” it returns a list with 2 items in it: “D:” and “Backburner testing est animation 01 eapot01_.png”. If you want I can upload the XML and a little .py file to play with.

Yeah, a little repro case would be helpful. I’d be glad to have a look.

I tried goofing around with the below code some more. Encoding can have it’s own weird results. But maybe someone can figure it out :wink:


How about:

myString = 'D:\Backburner testing	est animation 01	eapot01_.png'
print myString.encode("unicode_escape").replace("\\", "/")

# D://Backburner testing/test animation 01/teapot01_.png

So this turns embeds the escape chars, then flips all the slashes the other way. For some reason it adds another double-frontslash at the beginning, so I suppose you could do something like:

print myString.encode("unicode_escape").replace("\\", "/").replace("//", "/")

And be done with it, but it seems a bit like overkill. Maybe there’s a better option with .encode()