Match any unicode letter?

up vote
9
down vote

favorite

In .net you can use p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.

edited Feb 20 '15 at 16:36

asked Jun 11 '11 at 7:05

mpen

120k166636932

1

See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08

2

You know that 'é' isn't a unicode in 2.x, right?
– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46

2

Try r.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55

@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09

add a comment |

up vote
9
down vote

favorite

In .net you can use p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.

edited Feb 20 '15 at 16:36

asked Jun 11 '11 at 7:05

mpen

120k166636932

1

See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08

2

You know that 'é' isn't a unicode in 2.x, right?
– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46

2

Try r.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55

@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09

add a comment |

up vote
9
down vote

favorite

In .net you can use p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.

edited Feb 20 '15 at 16:36

asked Jun 11 '11 at 7:05

mpen

120k166636932

In .net you can use p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.

python regex character-properties

edited Feb 20 '15 at 16:36

asked Jun 11 '11 at 7:05

mpen

120k166636932

edited Feb 20 '15 at 16:36

asked Jun 11 '11 at 7:05

mpen

120k166636932

edited Feb 20 '15 at 16:36

asked Jun 11 '11 at 7:05

mpen

120k166636932

asked Jun 11 '11 at 7:05

mpen

120k166636932

asked Jun 11 '11 at 7:05

mpen

120k166636932

1

See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08

2

You know that 'é' isn't a unicode in 2.x, right?
– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46

2

Try r.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55

@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09

add a comment |

1

See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08

2

You know that 'é' isn't a unicode in 2.x, right?
– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46

2

Try r.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55

@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09

See: stackoverflow.com/questions/1832893/…
– Jeff Mercado
Jun 11 '11 at 7:08

You know that 'é' isn't a unicode in 2.x, right?
– Ignacio Vazquez-Abrams
Jun 11 '11 at 7:46

Try r.match(u'é')
– Tim Pietzcker
Jun 11 '11 at 7:55

@Ignacio/Tim: Oh! Right. Forgot about that! Thanks :D It's a little confusing because it doesn't throw an error or anything either.
– mpen
Jun 11 '11 at 17:09

add a comment |

1 Answer
1

active

oldest

votes

up vote
21
down vote

accepted

Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand w will match Unicode letters, too.

Since w will also match digits, you need to then subtract those from your character class, along with the underscore:

[^Wd_]

will match any Unicode letter.

>>> import re

>>> r = re.compile(r'[^Wd_]', re.U)

>>> r.match('x')

<_sre.SRE_Match object at 0x0000000001DBCF38>

>>> r.match(u'é')

<_sre.SRE_Match object at 0x0000000002253030>

edited Jun 11 '11 at 7:56

answered Jun 11 '11 at 7:09

Tim Pietzcker

244k40363453

Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44

It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48

Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10

@rosh try u'é'
– Seán Hayes
Mar 9 '17 at 19:55

^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6314614%2fmatch-any-unicode-letter%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
21
down vote

accepted

Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand w will match Unicode letters, too.

Since w will also match digits, you need to then subtract those from your character class, along with the underscore:

[^Wd_]

will match any Unicode letter.

>>> import re

>>> r = re.compile(r'[^Wd_]', re.U)

>>> r.match('x')

<_sre.SRE_Match object at 0x0000000001DBCF38>

>>> r.match(u'é')

<_sre.SRE_Match object at 0x0000000002253030>

edited Jun 11 '11 at 7:56

answered Jun 11 '11 at 7:09

Tim Pietzcker

244k40363453

Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44

It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48

Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10

@rosh try u'é'
– Seán Hayes
Mar 9 '17 at 19:55

^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08

add a comment |

up vote
21
down vote

accepted

Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand w will match Unicode letters, too.

Since w will also match digits, you need to then subtract those from your character class, along with the underscore:

[^Wd_]

will match any Unicode letter.

>>> import re

>>> r = re.compile(r'[^Wd_]', re.U)

>>> r.match('x')

<_sre.SRE_Match object at 0x0000000001DBCF38>

>>> r.match(u'é')

<_sre.SRE_Match object at 0x0000000002253030>

edited Jun 11 '11 at 7:56

answered Jun 11 '11 at 7:09

Tim Pietzcker

244k40363453

Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44

It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48

Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10

@rosh try u'é'
– Seán Hayes
Mar 9 '17 at 19:55

^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08

add a comment |

up vote
21
down vote

accepted

Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand w will match Unicode letters, too.

Since w will also match digits, you need to then subtract those from your character class, along with the underscore:

[^Wd_]

will match any Unicode letter.

>>> import re

>>> r = re.compile(r'[^Wd_]', re.U)

>>> r.match('x')

<_sre.SRE_Match object at 0x0000000001DBCF38>

>>> r.match(u'é')

<_sre.SRE_Match object at 0x0000000002253030>

edited Jun 11 '11 at 7:56

answered Jun 11 '11 at 7:09

Tim Pietzcker

244k40363453

Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand w will match Unicode letters, too.

Since w will also match digits, you need to then subtract those from your character class, along with the underscore:

[^Wd_]

will match any Unicode letter.

>>> import re

>>> r = re.compile(r'[^Wd_]', re.U)

>>> r.match('x')

<_sre.SRE_Match object at 0x0000000001DBCF38>

>>> r.match(u'é')

<_sre.SRE_Match object at 0x0000000002253030>

edited Jun 11 '11 at 7:56

answered Jun 11 '11 at 7:09

Tim Pietzcker

244k40363453

edited Jun 11 '11 at 7:56

answered Jun 11 '11 at 7:09

Tim Pietzcker

244k40363453

answered Jun 11 '11 at 7:09

Tim Pietzcker

244k40363453

answered Jun 11 '11 at 7:09

Tim Pietzcker

244k40363453

Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44

It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48

Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10

@rosh try u'é'
– Seán Hayes
Mar 9 '17 at 19:55

^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08

add a comment |

Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44

It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48

Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10

@rosh try u'é'
– Seán Hayes
Mar 9 '17 at 19:55

^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08

Clever, but it doesn't seem to work. See update. I copied that e off of en.wikipedia.org/wiki/List_of_Unicode_characters, it doesn't seem to recognize it.
– mpen
Jun 11 '11 at 7:44

It works perfectly, but 'é' is not an Unicode object, it's a string of bytes.
– Rosh Oxymoron
Jun 11 '11 at 7:48

Thanks guys! Darn unicode :) Causes nothing but problems.
– mpen
Jun 11 '11 at 17:10

@rosh try u'é'
– Seán Hayes
Mar 9 '17 at 19:55

^[a-zœéèâêçàñ ]+$
– Natim
Mar 30 at 14:08

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk