Match any unicode letter?
up vote
9
down vote
favorite
In .net you can use p{L}
to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.
python regex character-properties
add a comment |
up vote
9
down vote
favorite
In .net you can use p{L}
to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.
python regex character-properties
1
See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08
2
You know that'é'
isn't aunicode
in 2.x, right?
– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46
2
Tryr.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55
@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09
add a comment |
up vote
9
down vote
favorite
up vote
9
down vote
favorite
In .net you can use p{L}
to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.
python regex character-properties
In .net you can use p{L}
to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.
python regex character-properties
python regex character-properties
edited Feb 20 '15 at 16:36
asked Jun 11 '11 at 7:05
mpen
120k166636932
120k166636932
1
See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08
2
You know that'é'
isn't aunicode
in 2.x, right?
– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46
2
Tryr.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55
@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09
add a comment |
1
See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08
2
You know that'é'
isn't aunicode
in 2.x, right?
– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46
2
Tryr.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55
@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09
1
1
See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08
See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08
2
2
You know that
'é'
isn't a unicode
in 2.x, right?– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46
You know that
'é'
isn't a unicode
in 2.x, right?– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46
2
2
Try
r.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55
Try
r.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55
@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09
@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09
add a comment |
1 Answer
1
active
oldest
votes
up vote
21
down vote
accepted
Python's re
module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE
flag, and then the character class shorthand w
will match Unicode letters, too.
Since w
will also match digits, you need to then subtract those from your character class, along with the underscore:
[^Wd_]
will match any Unicode letter.
>>> import re
>>> r = re.compile(r'[^Wd_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>
Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44
It works perfectly, but'é'
is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48
Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10
@rosh tryu'é'
– Seán Hayes
Mar 9 '17 at 19:55
^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
21
down vote
accepted
Python's re
module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE
flag, and then the character class shorthand w
will match Unicode letters, too.
Since w
will also match digits, you need to then subtract those from your character class, along with the underscore:
[^Wd_]
will match any Unicode letter.
>>> import re
>>> r = re.compile(r'[^Wd_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>
Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44
It works perfectly, but'é'
is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48
Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10
@rosh tryu'é'
– Seán Hayes
Mar 9 '17 at 19:55
^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08
add a comment |
up vote
21
down vote
accepted
Python's re
module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE
flag, and then the character class shorthand w
will match Unicode letters, too.
Since w
will also match digits, you need to then subtract those from your character class, along with the underscore:
[^Wd_]
will match any Unicode letter.
>>> import re
>>> r = re.compile(r'[^Wd_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>
Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44
It works perfectly, but'é'
is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48
Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10
@rosh tryu'é'
– Seán Hayes
Mar 9 '17 at 19:55
^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08
add a comment |
up vote
21
down vote
accepted
up vote
21
down vote
accepted
Python's re
module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE
flag, and then the character class shorthand w
will match Unicode letters, too.
Since w
will also match digits, you need to then subtract those from your character class, along with the underscore:
[^Wd_]
will match any Unicode letter.
>>> import re
>>> r = re.compile(r'[^Wd_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>
Python's re
module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE
flag, and then the character class shorthand w
will match Unicode letters, too.
Since w
will also match digits, you need to then subtract those from your character class, along with the underscore:
[^Wd_]
will match any Unicode letter.
>>> import re
>>> r = re.compile(r'[^Wd_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>
edited Jun 11 '11 at 7:56
answered Jun 11 '11 at 7:09
Tim Pietzcker
244k40363453
244k40363453
Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44
It works perfectly, but'é'
is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48
Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10
@rosh tryu'é'
– Seán Hayes
Mar 9 '17 at 19:55
^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08
add a comment |
Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44
It works perfectly, but'é'
is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48
Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10
@rosh tryu'é'
– Seán Hayes
Mar 9 '17 at 19:55
^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08
Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44
Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44
It works perfectly, but
'é'
is not an Unicode object, it's a string of bytes.– Rosh Oxymoron
Jun 11 '11 at 7:48
It works perfectly, but
'é'
is not an Unicode object, it's a string of bytes.– Rosh Oxymoron
Jun 11 '11 at 7:48
Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10
Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10
@rosh try
u'é'
– Seán Hayes
Mar 9 '17 at 19:55
@rosh try
u'é'
– Seán Hayes
Mar 9 '17 at 19:55
^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08
^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6314614%2fmatch-any-unicode-letter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08
2
You know that
'é'
isn't aunicode
in 2.x, right?– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46
2
Try
r.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55
@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09