Match any unicode letter?











up vote
9
down vote

favorite
3












In .net you can use p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.










share|improve this question




















  • 1




    See: stackoverflow.com/questions/1832893/…
    – Jeff Mercado
    Jun 11 '11 at 7:08






  • 2




    You know that 'é' isn't a unicode in 2.x, right?
    – Ignacio Vazquez-Abrams
    Jun 11 '11 at 7:46






  • 2




    Try r.match(u'é')
    – Tim Pietzcker
    Jun 11 '11 at 7:55










  • @Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
    – mpen
    Jun 11 '11 at 17:09















up vote
9
down vote

favorite
3












In .net you can use p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.










share|improve this question




















  • 1




    See: stackoverflow.com/questions/1832893/…
    – Jeff Mercado
    Jun 11 '11 at 7:08






  • 2




    You know that 'é' isn't a unicode in 2.x, right?
    – Ignacio Vazquez-Abrams
    Jun 11 '11 at 7:46






  • 2




    Try r.match(u'é')
    – Tim Pietzcker
    Jun 11 '11 at 7:55










  • @Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
    – mpen
    Jun 11 '11 at 17:09













up vote
9
down vote

favorite
3









up vote
9
down vote

favorite
3






3





In .net you can use p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.










share|improve this question















In .net you can use p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.







python regex character-properties






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 20 '15 at 16:36

























asked Jun 11 '11 at 7:05









mpen

120k166636932




120k166636932








  • 1




    See: stackoverflow.com/questions/1832893/…
    – Jeff Mercado
    Jun 11 '11 at 7:08






  • 2




    You know that 'é' isn't a unicode in 2.x, right?
    – Ignacio Vazquez-Abrams
    Jun 11 '11 at 7:46






  • 2




    Try r.match(u'é')
    – Tim Pietzcker
    Jun 11 '11 at 7:55










  • @Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
    – mpen
    Jun 11 '11 at 17:09














  • 1




    See: stackoverflow.com/questions/1832893/…
    – Jeff Mercado
    Jun 11 '11 at 7:08






  • 2




    You know that 'é' isn't a unicode in 2.x, right?
    – Ignacio Vazquez-Abrams
    Jun 11 '11 at 7:46






  • 2




    Try r.match(u'é')
    – Tim Pietzcker
    Jun 11 '11 at 7:55










  • @Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
    – mpen
    Jun 11 '11 at 17:09








1




1




See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08




See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08




2




2




You know that 'é' isn't a unicode in 2.x, right?
– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46




You know that 'é' isn't a unicode in 2.x, right?
– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46




2




2




Try r.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55




Try r.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55












@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09




@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09












1 Answer
1






active

oldest

votes

















up vote
21
down vote



accepted










Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand w will match Unicode letters, too.



Since w will also match digits, you need to then subtract those from your character class, along with the underscore:



[^Wd_]


will match any Unicode letter.



>>> import re
>>> r = re.compile(r'[^Wd_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>





share|improve this answer























  • Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
    – mpen
    Jun 11 '11 at 7:44










  • It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
    – Rosh Oxymoron
    Jun 11 '11 at 7:48










  • Thanks guys! Darn unicode :) Causes nothing but problems.
    – mpen
    Jun 11 '11 at 17:10










  • @rosh try u'é'
    – Seán Hayes
    Mar 9 '17 at 19:55










  • ^[a-zœéèâêçàñ ]+$
    – Natim
    Mar 30 at 14:08











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6314614%2fmatch-any-unicode-letter%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
21
down vote



accepted










Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand w will match Unicode letters, too.



Since w will also match digits, you need to then subtract those from your character class, along with the underscore:



[^Wd_]


will match any Unicode letter.



>>> import re
>>> r = re.compile(r'[^Wd_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>





share|improve this answer























  • Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
    – mpen
    Jun 11 '11 at 7:44










  • It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
    – Rosh Oxymoron
    Jun 11 '11 at 7:48










  • Thanks guys! Darn unicode :) Causes nothing but problems.
    – mpen
    Jun 11 '11 at 17:10










  • @rosh try u'é'
    – Seán Hayes
    Mar 9 '17 at 19:55










  • ^[a-zœéèâêçàñ ]+$
    – Natim
    Mar 30 at 14:08















up vote
21
down vote



accepted










Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand w will match Unicode letters, too.



Since w will also match digits, you need to then subtract those from your character class, along with the underscore:



[^Wd_]


will match any Unicode letter.



>>> import re
>>> r = re.compile(r'[^Wd_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>





share|improve this answer























  • Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
    – mpen
    Jun 11 '11 at 7:44










  • It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
    – Rosh Oxymoron
    Jun 11 '11 at 7:48










  • Thanks guys! Darn unicode :) Causes nothing but problems.
    – mpen
    Jun 11 '11 at 17:10










  • @rosh try u'é'
    – Seán Hayes
    Mar 9 '17 at 19:55










  • ^[a-zœéèâêçàñ ]+$
    – Natim
    Mar 30 at 14:08













up vote
21
down vote



accepted







up vote
21
down vote



accepted






Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand w will match Unicode letters, too.



Since w will also match digits, you need to then subtract those from your character class, along with the underscore:



[^Wd_]


will match any Unicode letter.



>>> import re
>>> r = re.compile(r'[^Wd_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>





share|improve this answer














Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand w will match Unicode letters, too.



Since w will also match digits, you need to then subtract those from your character class, along with the underscore:



[^Wd_]


will match any Unicode letter.



>>> import re
>>> r = re.compile(r'[^Wd_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>






share|improve this answer














share|improve this answer



share|improve this answer








edited Jun 11 '11 at 7:56

























answered Jun 11 '11 at 7:09









Tim Pietzcker

244k40363453




244k40363453












  • Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
    – mpen
    Jun 11 '11 at 7:44










  • It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
    – Rosh Oxymoron
    Jun 11 '11 at 7:48










  • Thanks guys! Darn unicode :) Causes nothing but problems.
    – mpen
    Jun 11 '11 at 17:10










  • @rosh try u'é'
    – Seán Hayes
    Mar 9 '17 at 19:55










  • ^[a-zœéèâêçàñ ]+$
    – Natim
    Mar 30 at 14:08


















  • Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
    – mpen
    Jun 11 '11 at 7:44










  • It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
    – Rosh Oxymoron
    Jun 11 '11 at 7:48










  • Thanks guys! Darn unicode :) Causes nothing but problems.
    – mpen
    Jun 11 '11 at 17:10










  • @rosh try u'é'
    – Seán Hayes
    Mar 9 '17 at 19:55










  • ^[a-zœéèâêçàñ ]+$
    – Natim
    Mar 30 at 14:08
















Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44




Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44












It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48




It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48












Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10




Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10












@rosh try u'é'
– Seán Hayes
Mar 9 '17 at 19:55




@rosh try u'é'
– Seán Hayes
Mar 9 '17 at 19:55












^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08




^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6314614%2fmatch-any-unicode-letter%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

TypeError: fit_transform() missing 1 required positional argument: 'X'