python string vs. raw string chars

Started by ggnfs000, January 06, 2017, 12:01:10 PM

Previous topic - Next topic

ggnfs000

python string vs. raw string chars

This section can be helpful in understanding certain advanced features of regexp in python, so here i decided to provide more details in how the string declaration and raw string declaration differs. The idea of coming this section came from "The Quick Python Book" by Daryl L. Harms and Kenneth M. McDonald, after reading its section titled
17.3. Regular expressions and raw strings
17.3.1. Raw strings to the rescue

Here in these two sections, author has brought up important points regarding how python declares strings and how python's regular expression interprets them but in my opinion, I think author has mixed up the two topics and made it harder to understand. Therefore in this part of my blog, I decided to detail over how python interprets special characters when declaring string and also compares the result with the raw string declared with the same value and completely avoiding the regular expressions topics here.

In the following examples string "asd\nasd" is declared as normal string as well as raw string and compares the result. In order to do so, full comparison made by from python console by 1. printing the variable using print statement and third 2. Print them as raw hex variable showing their its ASCII values of each of its characters in each of the string.

I declared string "asd\nasd" into variable a as normal string and declared as raw string into variable araw:
a = "asd\nasd"
araw = r"asd\nasd"

By observing its outputs below, you will see that during raw declaration python interpreter did not do any kind of special character conversion, instead just declared the string as typed including the chars '\' and 'n'.
However during normal string declaration, \n is interpreted and it is converted to 0x0a (new line) after declaration. This is obvious after outputing each string as hex:

Printing each variable with print output shows a is interpreted version and araw is uninterpreted version of special characters:
>>>print a 
asd
asd

>>>print araw
asd\nasd

Python's hex print syntax for string type is awkwardly long but it works:
>>>":".join("{:02x}".format(ord(c)) for c in a
61:73:64:0a:61:73:64

What is 0a? This is a new line char. Remember how we declare it as asd\nasd, so it is easy to see \n was converted to '0xa' char when normal string was declared.

>>>":".join("{:02x}".format(ord(c)) for c in araw
61:73:64:5c:6e:61:73:64

Here, instead of seeing new line in variable a, we see 5c and 6e hex which is theASCII char of character '\' and 'n'. Here we also declared raw string with exactly same value as the normal string: "asd\nasd" however the conversion to 0xa did not take place, that is why we are seeing the ASCII hex of \ and n chars.






Source: python string vs. raw string chars