Python re.findall behaves weird
The source string is:
# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'
and here is my pattern:
pattern = r'-?[0-9]+(\.[0-9]*)?|-?\.[0-9]+'
however, re.search can give me correct result:
m = re.search(pattern, s)
print(m) # output: <_sre.SRE_Match object; span=(3, 6), match='123'>
re.findall just dump out an empty list:
L = re.findall(pattern, s)
print(L) # output: ['', '', '']
why can't re.findall give me the expected list:
['123', '3.1415926']
python regex
add a comment |
The source string is:
# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'
and here is my pattern:
pattern = r'-?[0-9]+(\.[0-9]*)?|-?\.[0-9]+'
however, re.search can give me correct result:
m = re.search(pattern, s)
print(m) # output: <_sre.SRE_Match object; span=(3, 6), match='123'>
re.findall just dump out an empty list:
L = re.findall(pattern, s)
print(L) # output: ['', '', '']
why can't re.findall give me the expected list:
['123', '3.1415926']
python regex
turn capturing group to non-capturing group.
– Avinash Raj
Aug 10 '15 at 8:36
@AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result
– O'Skywalker
Aug 10 '15 at 8:37
@stribizhev, it's not, '3.1415926' should be a float number in the result
– O'Skywalker
Aug 10 '15 at 8:38
@O'Skywalker Try to use puttern like -?d?.?d+
– Dmitry.Samborskyi
Aug 10 '15 at 8:39
add a comment |
The source string is:
# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'
and here is my pattern:
pattern = r'-?[0-9]+(\.[0-9]*)?|-?\.[0-9]+'
however, re.search can give me correct result:
m = re.search(pattern, s)
print(m) # output: <_sre.SRE_Match object; span=(3, 6), match='123'>
re.findall just dump out an empty list:
L = re.findall(pattern, s)
print(L) # output: ['', '', '']
why can't re.findall give me the expected list:
['123', '3.1415926']
python regex
The source string is:
# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'
and here is my pattern:
pattern = r'-?[0-9]+(\.[0-9]*)?|-?\.[0-9]+'
however, re.search can give me correct result:
m = re.search(pattern, s)
print(m) # output: <_sre.SRE_Match object; span=(3, 6), match='123'>
re.findall just dump out an empty list:
L = re.findall(pattern, s)
print(L) # output: ['', '', '']
why can't re.findall give me the expected list:
['123', '3.1415926']
python regex
python regex
edited Aug 10 '15 at 16:01
Alan Moore
61.1k979133
61.1k979133
asked Aug 10 '15 at 8:33
O'SkywalkerO'Skywalker
17610
17610
turn capturing group to non-capturing group.
– Avinash Raj
Aug 10 '15 at 8:36
@AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result
– O'Skywalker
Aug 10 '15 at 8:37
@stribizhev, it's not, '3.1415926' should be a float number in the result
– O'Skywalker
Aug 10 '15 at 8:38
@O'Skywalker Try to use puttern like -?d?.?d+
– Dmitry.Samborskyi
Aug 10 '15 at 8:39
add a comment |
turn capturing group to non-capturing group.
– Avinash Raj
Aug 10 '15 at 8:36
@AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result
– O'Skywalker
Aug 10 '15 at 8:37
@stribizhev, it's not, '3.1415926' should be a float number in the result
– O'Skywalker
Aug 10 '15 at 8:38
@O'Skywalker Try to use puttern like -?d?.?d+
– Dmitry.Samborskyi
Aug 10 '15 at 8:39
turn capturing group to non-capturing group.
– Avinash Raj
Aug 10 '15 at 8:36
turn capturing group to non-capturing group.
– Avinash Raj
Aug 10 '15 at 8:36
@AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result
– O'Skywalker
Aug 10 '15 at 8:37
@AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result
– O'Skywalker
Aug 10 '15 at 8:37
@stribizhev, it's not, '3.1415926' should be a float number in the result
– O'Skywalker
Aug 10 '15 at 8:38
@stribizhev, it's not, '3.1415926' should be a float number in the result
– O'Skywalker
Aug 10 '15 at 8:38
@O'Skywalker Try to use puttern like -?d?.?d+
– Dmitry.Samborskyi
Aug 10 '15 at 8:39
@O'Skywalker Try to use puttern like -?d?.?d+
– Dmitry.Samborskyi
Aug 10 '15 at 8:39
add a comment |
2 Answers
2
active
oldest
votes
s = r'abc123d, hello 3.1415926, this is my book'
print re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s)
You dont need to escape twice when you are using raw mode.
Output:['123', '3.1415926']
Also the return type will be a list of strings.If you want return type as integers and floats use map
import re,ast
s = r'abc123d, hello 3.1415926, this is my book'
print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s))
Output: [123, 3.1415926]
2
Although this regex is less efficient than mine, I admit the trick withastis cool (although not required in the OP).
– Wiktor Stribiżew
Aug 10 '15 at 8:51
1
@stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the resultso i included that in my answer :)
– vks
Aug 10 '15 at 8:53
you two are both geniuses, it's difficult for me to choose which one to accept. :)
– O'Skywalker
Aug 10 '15 at 8:53
2
@O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!
– vks
Aug 10 '15 at 8:56
You can also reduce the steps using the first character discrimination like this:(?=[-d.])-?(?:d+(?:.d*)?|.d+)
– Casimir et Hippolyte
May 6 '17 at 22:15
|
show 1 more comment
There are two things to note here:
re.findallreturns captured texts if the regex pattern contains capturing groups in it- the
r'\.'part in your pattern matches two consecutive chars,and any char other than a newline.
See findall reference:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
Note that to make re.findall return just match values, you may usually
- remove redundant capturing groups (e.g.
(a(b)c)->abc) - convert all capturing groups into non-capturing (that is, replace
(with(?:) unless there are backreferences that refer to the group values in the pattern (then see below) - use
re.finditerinstead ([x.group() for x in re.finditer(pattern, s)])
In your case, findall returned all captured texts that were empty because you have \ within r'' string literal that tried to match a literal .
To match the numbers, you need to use
-?d*.?d+
The regex matches:
-?- Optional minus sign
d*- Optional digits
.?- Optional decimal separator
d+- 1 or more digits.
See demo
Here is IDEONE demo:
import re
s = r'abc123d, hello 3.1415926, this is my book'
pattern = r'-?d*.?d+'
L = re.findall(pattern, s)
print(L)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f31915018%2fpython-re-findall-behaves-weird%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
s = r'abc123d, hello 3.1415926, this is my book'
print re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s)
You dont need to escape twice when you are using raw mode.
Output:['123', '3.1415926']
Also the return type will be a list of strings.If you want return type as integers and floats use map
import re,ast
s = r'abc123d, hello 3.1415926, this is my book'
print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s))
Output: [123, 3.1415926]
2
Although this regex is less efficient than mine, I admit the trick withastis cool (although not required in the OP).
– Wiktor Stribiżew
Aug 10 '15 at 8:51
1
@stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the resultso i included that in my answer :)
– vks
Aug 10 '15 at 8:53
you two are both geniuses, it's difficult for me to choose which one to accept. :)
– O'Skywalker
Aug 10 '15 at 8:53
2
@O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!
– vks
Aug 10 '15 at 8:56
You can also reduce the steps using the first character discrimination like this:(?=[-d.])-?(?:d+(?:.d*)?|.d+)
– Casimir et Hippolyte
May 6 '17 at 22:15
|
show 1 more comment
s = r'abc123d, hello 3.1415926, this is my book'
print re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s)
You dont need to escape twice when you are using raw mode.
Output:['123', '3.1415926']
Also the return type will be a list of strings.If you want return type as integers and floats use map
import re,ast
s = r'abc123d, hello 3.1415926, this is my book'
print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s))
Output: [123, 3.1415926]
2
Although this regex is less efficient than mine, I admit the trick withastis cool (although not required in the OP).
– Wiktor Stribiżew
Aug 10 '15 at 8:51
1
@stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the resultso i included that in my answer :)
– vks
Aug 10 '15 at 8:53
you two are both geniuses, it's difficult for me to choose which one to accept. :)
– O'Skywalker
Aug 10 '15 at 8:53
2
@O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!
– vks
Aug 10 '15 at 8:56
You can also reduce the steps using the first character discrimination like this:(?=[-d.])-?(?:d+(?:.d*)?|.d+)
– Casimir et Hippolyte
May 6 '17 at 22:15
|
show 1 more comment
s = r'abc123d, hello 3.1415926, this is my book'
print re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s)
You dont need to escape twice when you are using raw mode.
Output:['123', '3.1415926']
Also the return type will be a list of strings.If you want return type as integers and floats use map
import re,ast
s = r'abc123d, hello 3.1415926, this is my book'
print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s))
Output: [123, 3.1415926]
s = r'abc123d, hello 3.1415926, this is my book'
print re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s)
You dont need to escape twice when you are using raw mode.
Output:['123', '3.1415926']
Also the return type will be a list of strings.If you want return type as integers and floats use map
import re,ast
s = r'abc123d, hello 3.1415926, this is my book'
print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s))
Output: [123, 3.1415926]
answered Aug 10 '15 at 8:41
vksvks
56.5k55076
56.5k55076
2
Although this regex is less efficient than mine, I admit the trick withastis cool (although not required in the OP).
– Wiktor Stribiżew
Aug 10 '15 at 8:51
1
@stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the resultso i included that in my answer :)
– vks
Aug 10 '15 at 8:53
you two are both geniuses, it's difficult for me to choose which one to accept. :)
– O'Skywalker
Aug 10 '15 at 8:53
2
@O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!
– vks
Aug 10 '15 at 8:56
You can also reduce the steps using the first character discrimination like this:(?=[-d.])-?(?:d+(?:.d*)?|.d+)
– Casimir et Hippolyte
May 6 '17 at 22:15
|
show 1 more comment
2
Although this regex is less efficient than mine, I admit the trick withastis cool (although not required in the OP).
– Wiktor Stribiżew
Aug 10 '15 at 8:51
1
@stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the resultso i included that in my answer :)
– vks
Aug 10 '15 at 8:53
you two are both geniuses, it's difficult for me to choose which one to accept. :)
– O'Skywalker
Aug 10 '15 at 8:53
2
@O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!
– vks
Aug 10 '15 at 8:56
You can also reduce the steps using the first character discrimination like this:(?=[-d.])-?(?:d+(?:.d*)?|.d+)
– Casimir et Hippolyte
May 6 '17 at 22:15
2
2
Although this regex is less efficient than mine, I admit the trick with
ast is cool (although not required in the OP).– Wiktor Stribiżew
Aug 10 '15 at 8:51
Although this regex is less efficient than mine, I admit the trick with
ast is cool (although not required in the OP).– Wiktor Stribiżew
Aug 10 '15 at 8:51
1
1
@stribizhev i read one of his comments....
@stribizhev, it's not, '3.1415926' should be a float number in the result so i included that in my answer :)– vks
Aug 10 '15 at 8:53
@stribizhev i read one of his comments....
@stribizhev, it's not, '3.1415926' should be a float number in the result so i included that in my answer :)– vks
Aug 10 '15 at 8:53
you two are both geniuses, it's difficult for me to choose which one to accept. :)
– O'Skywalker
Aug 10 '15 at 8:53
you two are both geniuses, it's difficult for me to choose which one to accept. :)
– O'Skywalker
Aug 10 '15 at 8:53
2
2
@O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!
– vks
Aug 10 '15 at 8:56
@O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!
– vks
Aug 10 '15 at 8:56
You can also reduce the steps using the first character discrimination like this:
(?=[-d.])-?(?:d+(?:.d*)?|.d+)– Casimir et Hippolyte
May 6 '17 at 22:15
You can also reduce the steps using the first character discrimination like this:
(?=[-d.])-?(?:d+(?:.d*)?|.d+)– Casimir et Hippolyte
May 6 '17 at 22:15
|
show 1 more comment
There are two things to note here:
re.findallreturns captured texts if the regex pattern contains capturing groups in it- the
r'\.'part in your pattern matches two consecutive chars,and any char other than a newline.
See findall reference:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
Note that to make re.findall return just match values, you may usually
- remove redundant capturing groups (e.g.
(a(b)c)->abc) - convert all capturing groups into non-capturing (that is, replace
(with(?:) unless there are backreferences that refer to the group values in the pattern (then see below) - use
re.finditerinstead ([x.group() for x in re.finditer(pattern, s)])
In your case, findall returned all captured texts that were empty because you have \ within r'' string literal that tried to match a literal .
To match the numbers, you need to use
-?d*.?d+
The regex matches:
-?- Optional minus sign
d*- Optional digits
.?- Optional decimal separator
d+- 1 or more digits.
See demo
Here is IDEONE demo:
import re
s = r'abc123d, hello 3.1415926, this is my book'
pattern = r'-?d*.?d+'
L = re.findall(pattern, s)
print(L)
add a comment |
There are two things to note here:
re.findallreturns captured texts if the regex pattern contains capturing groups in it- the
r'\.'part in your pattern matches two consecutive chars,and any char other than a newline.
See findall reference:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
Note that to make re.findall return just match values, you may usually
- remove redundant capturing groups (e.g.
(a(b)c)->abc) - convert all capturing groups into non-capturing (that is, replace
(with(?:) unless there are backreferences that refer to the group values in the pattern (then see below) - use
re.finditerinstead ([x.group() for x in re.finditer(pattern, s)])
In your case, findall returned all captured texts that were empty because you have \ within r'' string literal that tried to match a literal .
To match the numbers, you need to use
-?d*.?d+
The regex matches:
-?- Optional minus sign
d*- Optional digits
.?- Optional decimal separator
d+- 1 or more digits.
See demo
Here is IDEONE demo:
import re
s = r'abc123d, hello 3.1415926, this is my book'
pattern = r'-?d*.?d+'
L = re.findall(pattern, s)
print(L)
add a comment |
There are two things to note here:
re.findallreturns captured texts if the regex pattern contains capturing groups in it- the
r'\.'part in your pattern matches two consecutive chars,and any char other than a newline.
See findall reference:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
Note that to make re.findall return just match values, you may usually
- remove redundant capturing groups (e.g.
(a(b)c)->abc) - convert all capturing groups into non-capturing (that is, replace
(with(?:) unless there are backreferences that refer to the group values in the pattern (then see below) - use
re.finditerinstead ([x.group() for x in re.finditer(pattern, s)])
In your case, findall returned all captured texts that were empty because you have \ within r'' string literal that tried to match a literal .
To match the numbers, you need to use
-?d*.?d+
The regex matches:
-?- Optional minus sign
d*- Optional digits
.?- Optional decimal separator
d+- 1 or more digits.
See demo
Here is IDEONE demo:
import re
s = r'abc123d, hello 3.1415926, this is my book'
pattern = r'-?d*.?d+'
L = re.findall(pattern, s)
print(L)
There are two things to note here:
re.findallreturns captured texts if the regex pattern contains capturing groups in it- the
r'\.'part in your pattern matches two consecutive chars,and any char other than a newline.
See findall reference:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
Note that to make re.findall return just match values, you may usually
- remove redundant capturing groups (e.g.
(a(b)c)->abc) - convert all capturing groups into non-capturing (that is, replace
(with(?:) unless there are backreferences that refer to the group values in the pattern (then see below) - use
re.finditerinstead ([x.group() for x in re.finditer(pattern, s)])
In your case, findall returned all captured texts that were empty because you have \ within r'' string literal that tried to match a literal .
To match the numbers, you need to use
-?d*.?d+
The regex matches:
-?- Optional minus sign
d*- Optional digits
.?- Optional decimal separator
d+- 1 or more digits.
See demo
Here is IDEONE demo:
import re
s = r'abc123d, hello 3.1415926, this is my book'
pattern = r'-?d*.?d+'
L = re.findall(pattern, s)
print(L)
edited Apr 12 '18 at 9:52
answered Aug 10 '15 at 8:40
Wiktor StribiżewWiktor Stribiżew
315k16133214
315k16133214
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f31915018%2fpython-re-findall-behaves-weird%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
turn capturing group to non-capturing group.
– Avinash Raj
Aug 10 '15 at 8:36
@AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result
– O'Skywalker
Aug 10 '15 at 8:37
@stribizhev, it's not, '3.1415926' should be a float number in the result
– O'Skywalker
Aug 10 '15 at 8:38
@O'Skywalker Try to use puttern like -?d?.?d+
– Dmitry.Samborskyi
Aug 10 '15 at 8:39