Escaping Characters in Perl
2012-05-22
Introduction
I had been programming with Perl for many years before I actually took the time to understand what the rules are for escaping characters. The rules differ for 'single quoted strings', "double quoted strings", /regular expressions/ and [character classes]. This article will explain the escaping rules for each case.
Summary
Here is a summary of all the rules:
Usages | Characters Needing Escaping |
---|---|
Single Quoted Strings | ' - apostrophe/single quote \ - backslash |
Double Quoted Strings | " - double quote $ - dollar @ - at symbol \ - backslash |
Regular Expressions | ^ - caret $ - dollar . - period/full stop ? - question mark * - asterisk + - plus [ - left square bracket (,) - parentheses/round brackets | - pipe/vertical bar \ - backslashWhen they can be confused with a range: {,} - curly bracketsWhen entered directly into regex: @ - at symbol / - forward slash |
Character Classes |
[,] - square brackets \ - backslashUnless the first or last character in class: - - dash or hyphenIf the first character in class: ^ - caret |
Single Quoted Strings
Single Quotes (')
Within a single-quoted string, it is required to escape single quote
characters with a backslash (\'
).
my $str = 'You can\'t do this without a backslash.';
Result:
You can't do this without a backslash.
Backslashes (\)
Escaping a backslash with another backslash is optional, but allowed. Two consecutive backslashes WILL be interpreted as a single backslash. Also, you must escape a backslash if it is the last character in the string.
my $str = 'Escaping a \ is optional (\\), except at the end \\';
Result:
Escaping a \ is optional (\), except at the end \
No other characters need to be escaped within a single quoted string.
Single Quoted Summary
Rule | Characters | Example | Result |
---|---|---|---|
Single quotes | ' | 'Did\'t' | Didn't |
Backslashes (sometimes) | \ | 'c:\test\\' | c:\test\ |
Double Quoted Strings
Double-quoted strings behave quite differently from single-quoted strings in Perl. This may come as a surprise to someone familiar with other programming languages, such as JavaScript.
Double Quotes (")
To enter a double quote within a double-quoted string, you must escape it with a backslash:
my $str = "He said, \"Not a chance!\"";
Result:
He said, "Not a chance!"
Scalar Variables ($, @)
Double quoted strings allow variable interpolation, so the '$' and '@' characters must be escaped so that they are not confused with variables:
my $cost = "25 items \@ \$1.90 each, 62% profit.";
Result:
25 items @ $1.90 each, 62% profit.
Backslashes (\)
Double quoted strings also allow special characters to be inserted using escape sequences that begin with a backslash. Regular backslashes must always be escaped to avoid them being interpreted as an escape sequence.
my $str = "Create a single \\ with two backslashes";
Result:
Create a single \ with two backslashes
Double Quoted Summary
Rule | Characters | Example | Result |
---|---|---|---|
Double quotes | " | "\"OK?\"" | "OK?" |
Variable characters | $, @ | "\$1.90 each" | $1.90 each |
Backslashes | \ | "c:\\files\\" | c:\files\ |
Regular Expressions
Escaping rules become more difficult with regular expressions because there are more characters that are used for special purposes. In total, there are eleven characters that must be escaped within a regular expression. The following table lists each character, along with it’s non-escaped use.
Character | Non-Escaped Use | |
---|---|---|
1 | ^ | Start of string or line |
2 | $ | End of string or line |
3 | . | Any single character |
4 | ? | Zero or one occurrence of previous character |
5 | * | Zero or more occurrences of previous character |
6 | + | One or more occurrences of previous character |
7 | [ | Bracketed character class |
8, 9 | (, ) | Group items |
10 | | | Alternation (OR operator) |
11 | \ | Escape next character |
The following characters must be escaped only when they can be confused with their special use case:
Character | Non-Escaped Use | Example |
---|---|---|
{, } | Range quantifier | {2,5} |
Finally, when entering a Perl regular expression directly, you must escape any characters that would be interpreted as variables or the end of expression.
Character | Non-Escaped Use |
---|---|
$, @ | Scalar variable |
/ | End of regex |
Forward-Slash Example (/)
The following example shows how to remove an HTML tag containing a forward slash. The forward slash must be escaped because it is entered directly into the regular expression.
my $str = 'Some text</p>'; $str =~ s/<\/p>//;
We could avoid having to escape the forward slash if we defined the regular expression as a variable first. This is because forward slashes do not need to be escaped in either single quoted or double quoted strings and only need to be escaped in regular expressions when entered directly:
my $str = 'Some text</p>'; my $regex = '</p>'; $str =~ s/$regex//;
Back-Slash Example (\)
Searching for backslashes can get quite tricky, as the backslashes sometimes need to be escaped twice. For example, if we want to replace all backslashes with forward slashes, both the backslash and the forward slash must be escaped:
$str =~ s/\\/\//g; # Replace all \ with /
If we pre-defined the search string using a double-quoted string, we would have to double escape the backslash. The same substitution becomes:
my $regex = "\\\\"; # $regex contains \\ $str =~ s/$regex/\//g; # Replace all \ with /
A Complex Example
Replacing $10 with £6.32 requires the following:
Condition | Find | Replace |
---|---|---|
Desired text | $10.00 | £6.32 |
Regex escaped | \$10\.00 | £6.32 |
Single-quoted string | '\$10\.00' | not possible |
Double-quoted string | "\\\$10\\.00" | "\x{a3}6.32" |
Implementing the double-quoted string option in Perl code:
my $str = 'It costs $10.00'; my $find = "\\\$10\\.00"; # contains \$10\.00 my $replace = "\x{a3}6.32"; # contains £6.32 binmode STDOUT, ":utf8"; print "$str\n"; print "find: $find\n"; print "replace: $replace\n"; $str =~ s/$find/$replace/; print "$str\n";
Results in the following:
It costs $10.00 find: \$10\.00 replace: £6.32 It costs £6.32
Character Classes
Character classes define a set of characters to be used within a
regular expression. They are delimited with square brackets, such as
[a-zA-F]
. Even though character classes are found within
regular expressions, they have different escaping rules than regular
expressions.
Character Range (-)
The hyphen (-) character is used to indicate a range of characters.
For example, [0-9]
means any digit from 0 to 9. If
you
wish to create a character class that includes a hyphen, it should be
escaped to avoid it begin interpreted as a range.
A hyphen as the first character or last character in a character class does not need to be escaped, as it would not create a valid range. For example:
[0-9] # Any digit from 0 to 9 [0\-9] # '0', '-' or '9' characters [-0-9] # '-' character or digit from 0 to 9 [0-9-] # '-' character or digit from 0 to 9
Set Inversion or Negation (^)
A caret (^) character at the beginning of a character class is used
to invert the set. For example: [^0-9]
means any
non-digit character. If you wish to use the caret character at the
beginning of a character class, it must be escaped.
A caret anywhere other than the first character in a character class does not need to be escaped.
[^0-9] # Any non-digit [\^0-9] # '^' character or any digit [0-9^] # '^' character or any digit
Character Class Delimiters ([, ])
The character class delimiters themselves should be escaped:
[\[\]] # Match any '[' or ']' character
The Escape Character (\)
The backslash escape character should be escaped:
[\\] # Match any '\' character
Character Class Summary
The following table shows the characters that need to be escaped within a character class and their non-escaped usage:
Character | Non-Escaped Use |
---|---|
- | Character range, unless located at the beginning or end of the set |
^ | Invert or negate set (only if found at the beginning of set) |
[, ] | Start or end of character class |
\ | Escape character |