Python re.findall behaves weird












14















The source string is:



# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'


and here is my pattern:



pattern = r'-?[0-9]+(\.[0-9]*)?|-?\.[0-9]+'


however, re.search can give me correct result:



m = re.search(pattern, s)
print(m) # output: <_sre.SRE_Match object; span=(3, 6), match='123'>


re.findall just dump out an empty list:



L = re.findall(pattern, s)
print(L) # output: ['', '', '']


why can't re.findall give me the expected list:



['123', '3.1415926']









share|improve this question

























  • turn capturing group to non-capturing group.

    – Avinash Raj
    Aug 10 '15 at 8:36











  • @AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result

    – O'Skywalker
    Aug 10 '15 at 8:37











  • @stribizhev, it's not, '3.1415926' should be a float number in the result

    – O'Skywalker
    Aug 10 '15 at 8:38











  • @O'Skywalker Try to use puttern like -?d?.?d+

    – Dmitry.Samborskyi
    Aug 10 '15 at 8:39


















14















The source string is:



# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'


and here is my pattern:



pattern = r'-?[0-9]+(\.[0-9]*)?|-?\.[0-9]+'


however, re.search can give me correct result:



m = re.search(pattern, s)
print(m) # output: <_sre.SRE_Match object; span=(3, 6), match='123'>


re.findall just dump out an empty list:



L = re.findall(pattern, s)
print(L) # output: ['', '', '']


why can't re.findall give me the expected list:



['123', '3.1415926']









share|improve this question

























  • turn capturing group to non-capturing group.

    – Avinash Raj
    Aug 10 '15 at 8:36











  • @AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result

    – O'Skywalker
    Aug 10 '15 at 8:37











  • @stribizhev, it's not, '3.1415926' should be a float number in the result

    – O'Skywalker
    Aug 10 '15 at 8:38











  • @O'Skywalker Try to use puttern like -?d?.?d+

    – Dmitry.Samborskyi
    Aug 10 '15 at 8:39
















14












14








14


2






The source string is:



# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'


and here is my pattern:



pattern = r'-?[0-9]+(\.[0-9]*)?|-?\.[0-9]+'


however, re.search can give me correct result:



m = re.search(pattern, s)
print(m) # output: <_sre.SRE_Match object; span=(3, 6), match='123'>


re.findall just dump out an empty list:



L = re.findall(pattern, s)
print(L) # output: ['', '', '']


why can't re.findall give me the expected list:



['123', '3.1415926']









share|improve this question
















The source string is:



# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'


and here is my pattern:



pattern = r'-?[0-9]+(\.[0-9]*)?|-?\.[0-9]+'


however, re.search can give me correct result:



m = re.search(pattern, s)
print(m) # output: <_sre.SRE_Match object; span=(3, 6), match='123'>


re.findall just dump out an empty list:



L = re.findall(pattern, s)
print(L) # output: ['', '', '']


why can't re.findall give me the expected list:



['123', '3.1415926']






python regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 10 '15 at 16:01









Alan Moore

61.1k979133




61.1k979133










asked Aug 10 '15 at 8:33









O'SkywalkerO'Skywalker

17610




17610













  • turn capturing group to non-capturing group.

    – Avinash Raj
    Aug 10 '15 at 8:36











  • @AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result

    – O'Skywalker
    Aug 10 '15 at 8:37











  • @stribizhev, it's not, '3.1415926' should be a float number in the result

    – O'Skywalker
    Aug 10 '15 at 8:38











  • @O'Skywalker Try to use puttern like -?d?.?d+

    – Dmitry.Samborskyi
    Aug 10 '15 at 8:39





















  • turn capturing group to non-capturing group.

    – Avinash Raj
    Aug 10 '15 at 8:36











  • @AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result

    – O'Skywalker
    Aug 10 '15 at 8:37











  • @stribizhev, it's not, '3.1415926' should be a float number in the result

    – O'Skywalker
    Aug 10 '15 at 8:38











  • @O'Skywalker Try to use puttern like -?d?.?d+

    – Dmitry.Samborskyi
    Aug 10 '15 at 8:39



















turn capturing group to non-capturing group.

– Avinash Raj
Aug 10 '15 at 8:36





turn capturing group to non-capturing group.

– Avinash Raj
Aug 10 '15 at 8:36













@AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result

– O'Skywalker
Aug 10 '15 at 8:37





@AvinashRaj, um.., if I remove that capturing group, even re.search gives me a None result

– O'Skywalker
Aug 10 '15 at 8:37













@stribizhev, it's not, '3.1415926' should be a float number in the result

– O'Skywalker
Aug 10 '15 at 8:38





@stribizhev, it's not, '3.1415926' should be a float number in the result

– O'Skywalker
Aug 10 '15 at 8:38













@O'Skywalker Try to use puttern like -?d?.?d+

– Dmitry.Samborskyi
Aug 10 '15 at 8:39







@O'Skywalker Try to use puttern like -?d?.?d+

– Dmitry.Samborskyi
Aug 10 '15 at 8:39














2 Answers
2






active

oldest

votes


















5














s = r'abc123d, hello 3.1415926, this is my book'
print re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s)


You dont need to escape twice when you are using raw mode.



Output:['123', '3.1415926']



Also the return type will be a list of strings.If you want return type as integers and floats use map



import re,ast
s = r'abc123d, hello 3.1415926, this is my book'
print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s))


Output: [123, 3.1415926]






share|improve this answer



















  • 2





    Although this regex is less efficient than mine, I admit the trick with ast is cool (although not required in the OP).

    – Wiktor Stribiżew
    Aug 10 '15 at 8:51






  • 1





    @stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the result so i included that in my answer :)

    – vks
    Aug 10 '15 at 8:53











  • you two are both geniuses, it's difficult for me to choose which one to accept. :)

    – O'Skywalker
    Aug 10 '15 at 8:53






  • 2





    @O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!

    – vks
    Aug 10 '15 at 8:56











  • You can also reduce the steps using the first character discrimination like this: (?=[-d.])-?(?:d+(?:.d*)?|.d+)

    – Casimir et Hippolyte
    May 6 '17 at 22:15



















12














There are two things to note here:





  • re.findall returns captured texts if the regex pattern contains capturing groups in it

  • the r'\.' part in your pattern matches two consecutive chars, and any char other than a newline.


See findall reference:




If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.




Note that to make re.findall return just match values, you may usually




  • remove redundant capturing groups (e.g. (a(b)c) -> abc)

  • convert all capturing groups into non-capturing (that is, replace ( with (?:) unless there are backreferences that refer to the group values in the pattern (then see below)

  • use re.finditer instead ([x.group() for x in re.finditer(pattern, s)])


In your case, findall returned all captured texts that were empty because you have \ within r'' string literal that tried to match a literal .



To match the numbers, you need to use



-?d*.?d+


The regex matches:





  • -? - Optional minus sign


  • d* - Optional digits


  • .? - Optional decimal separator


  • d+ - 1 or more digits.


See demo



Here is IDEONE demo:



import re
s = r'abc123d, hello 3.1415926, this is my book'
pattern = r'-?d*.?d+'
L = re.findall(pattern, s)
print(L)





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f31915018%2fpython-re-findall-behaves-weird%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    5














    s = r'abc123d, hello 3.1415926, this is my book'
    print re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s)


    You dont need to escape twice when you are using raw mode.



    Output:['123', '3.1415926']



    Also the return type will be a list of strings.If you want return type as integers and floats use map



    import re,ast
    s = r'abc123d, hello 3.1415926, this is my book'
    print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s))


    Output: [123, 3.1415926]






    share|improve this answer



















    • 2





      Although this regex is less efficient than mine, I admit the trick with ast is cool (although not required in the OP).

      – Wiktor Stribiżew
      Aug 10 '15 at 8:51






    • 1





      @stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the result so i included that in my answer :)

      – vks
      Aug 10 '15 at 8:53











    • you two are both geniuses, it's difficult for me to choose which one to accept. :)

      – O'Skywalker
      Aug 10 '15 at 8:53






    • 2





      @O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!

      – vks
      Aug 10 '15 at 8:56











    • You can also reduce the steps using the first character discrimination like this: (?=[-d.])-?(?:d+(?:.d*)?|.d+)

      – Casimir et Hippolyte
      May 6 '17 at 22:15
















    5














    s = r'abc123d, hello 3.1415926, this is my book'
    print re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s)


    You dont need to escape twice when you are using raw mode.



    Output:['123', '3.1415926']



    Also the return type will be a list of strings.If you want return type as integers and floats use map



    import re,ast
    s = r'abc123d, hello 3.1415926, this is my book'
    print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s))


    Output: [123, 3.1415926]






    share|improve this answer



















    • 2





      Although this regex is less efficient than mine, I admit the trick with ast is cool (although not required in the OP).

      – Wiktor Stribiżew
      Aug 10 '15 at 8:51






    • 1





      @stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the result so i included that in my answer :)

      – vks
      Aug 10 '15 at 8:53











    • you two are both geniuses, it's difficult for me to choose which one to accept. :)

      – O'Skywalker
      Aug 10 '15 at 8:53






    • 2





      @O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!

      – vks
      Aug 10 '15 at 8:56











    • You can also reduce the steps using the first character discrimination like this: (?=[-d.])-?(?:d+(?:.d*)?|.d+)

      – Casimir et Hippolyte
      May 6 '17 at 22:15














    5












    5








    5







    s = r'abc123d, hello 3.1415926, this is my book'
    print re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s)


    You dont need to escape twice when you are using raw mode.



    Output:['123', '3.1415926']



    Also the return type will be a list of strings.If you want return type as integers and floats use map



    import re,ast
    s = r'abc123d, hello 3.1415926, this is my book'
    print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s))


    Output: [123, 3.1415926]






    share|improve this answer













    s = r'abc123d, hello 3.1415926, this is my book'
    print re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s)


    You dont need to escape twice when you are using raw mode.



    Output:['123', '3.1415926']



    Also the return type will be a list of strings.If you want return type as integers and floats use map



    import re,ast
    s = r'abc123d, hello 3.1415926, this is my book'
    print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s))


    Output: [123, 3.1415926]







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Aug 10 '15 at 8:41









    vksvks

    56.5k55076




    56.5k55076








    • 2





      Although this regex is less efficient than mine, I admit the trick with ast is cool (although not required in the OP).

      – Wiktor Stribiżew
      Aug 10 '15 at 8:51






    • 1





      @stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the result so i included that in my answer :)

      – vks
      Aug 10 '15 at 8:53











    • you two are both geniuses, it's difficult for me to choose which one to accept. :)

      – O'Skywalker
      Aug 10 '15 at 8:53






    • 2





      @O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!

      – vks
      Aug 10 '15 at 8:56











    • You can also reduce the steps using the first character discrimination like this: (?=[-d.])-?(?:d+(?:.d*)?|.d+)

      – Casimir et Hippolyte
      May 6 '17 at 22:15














    • 2





      Although this regex is less efficient than mine, I admit the trick with ast is cool (although not required in the OP).

      – Wiktor Stribiżew
      Aug 10 '15 at 8:51






    • 1





      @stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the result so i included that in my answer :)

      – vks
      Aug 10 '15 at 8:53











    • you two are both geniuses, it's difficult for me to choose which one to accept. :)

      – O'Skywalker
      Aug 10 '15 at 8:53






    • 2





      @O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!

      – vks
      Aug 10 '15 at 8:56











    • You can also reduce the steps using the first character discrimination like this: (?=[-d.])-?(?:d+(?:.d*)?|.d+)

      – Casimir et Hippolyte
      May 6 '17 at 22:15








    2




    2





    Although this regex is less efficient than mine, I admit the trick with ast is cool (although not required in the OP).

    – Wiktor Stribiżew
    Aug 10 '15 at 8:51





    Although this regex is less efficient than mine, I admit the trick with ast is cool (although not required in the OP).

    – Wiktor Stribiżew
    Aug 10 '15 at 8:51




    1




    1





    @stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the result so i included that in my answer :)

    – vks
    Aug 10 '15 at 8:53





    @stribizhev i read one of his comments....@stribizhev, it's not, '3.1415926' should be a float number in the result so i included that in my answer :)

    – vks
    Aug 10 '15 at 8:53













    you two are both geniuses, it's difficult for me to choose which one to accept. :)

    – O'Skywalker
    Aug 10 '15 at 8:53





    you two are both geniuses, it's difficult for me to choose which one to accept. :)

    – O'Skywalker
    Aug 10 '15 at 8:53




    2




    2





    @O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!

    – vks
    Aug 10 '15 at 8:56





    @O'Skywalker nothing like genius :P..........just practice.......you will become an aficianado soon!!!!!!!!

    – vks
    Aug 10 '15 at 8:56













    You can also reduce the steps using the first character discrimination like this: (?=[-d.])-?(?:d+(?:.d*)?|.d+)

    – Casimir et Hippolyte
    May 6 '17 at 22:15





    You can also reduce the steps using the first character discrimination like this: (?=[-d.])-?(?:d+(?:.d*)?|.d+)

    – Casimir et Hippolyte
    May 6 '17 at 22:15













    12














    There are two things to note here:





    • re.findall returns captured texts if the regex pattern contains capturing groups in it

    • the r'\.' part in your pattern matches two consecutive chars, and any char other than a newline.


    See findall reference:




    If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.




    Note that to make re.findall return just match values, you may usually




    • remove redundant capturing groups (e.g. (a(b)c) -> abc)

    • convert all capturing groups into non-capturing (that is, replace ( with (?:) unless there are backreferences that refer to the group values in the pattern (then see below)

    • use re.finditer instead ([x.group() for x in re.finditer(pattern, s)])


    In your case, findall returned all captured texts that were empty because you have \ within r'' string literal that tried to match a literal .



    To match the numbers, you need to use



    -?d*.?d+


    The regex matches:





    • -? - Optional minus sign


    • d* - Optional digits


    • .? - Optional decimal separator


    • d+ - 1 or more digits.


    See demo



    Here is IDEONE demo:



    import re
    s = r'abc123d, hello 3.1415926, this is my book'
    pattern = r'-?d*.?d+'
    L = re.findall(pattern, s)
    print(L)





    share|improve this answer






























      12














      There are two things to note here:





      • re.findall returns captured texts if the regex pattern contains capturing groups in it

      • the r'\.' part in your pattern matches two consecutive chars, and any char other than a newline.


      See findall reference:




      If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.




      Note that to make re.findall return just match values, you may usually




      • remove redundant capturing groups (e.g. (a(b)c) -> abc)

      • convert all capturing groups into non-capturing (that is, replace ( with (?:) unless there are backreferences that refer to the group values in the pattern (then see below)

      • use re.finditer instead ([x.group() for x in re.finditer(pattern, s)])


      In your case, findall returned all captured texts that were empty because you have \ within r'' string literal that tried to match a literal .



      To match the numbers, you need to use



      -?d*.?d+


      The regex matches:





      • -? - Optional minus sign


      • d* - Optional digits


      • .? - Optional decimal separator


      • d+ - 1 or more digits.


      See demo



      Here is IDEONE demo:



      import re
      s = r'abc123d, hello 3.1415926, this is my book'
      pattern = r'-?d*.?d+'
      L = re.findall(pattern, s)
      print(L)





      share|improve this answer




























        12












        12








        12







        There are two things to note here:





        • re.findall returns captured texts if the regex pattern contains capturing groups in it

        • the r'\.' part in your pattern matches two consecutive chars, and any char other than a newline.


        See findall reference:




        If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.




        Note that to make re.findall return just match values, you may usually




        • remove redundant capturing groups (e.g. (a(b)c) -> abc)

        • convert all capturing groups into non-capturing (that is, replace ( with (?:) unless there are backreferences that refer to the group values in the pattern (then see below)

        • use re.finditer instead ([x.group() for x in re.finditer(pattern, s)])


        In your case, findall returned all captured texts that were empty because you have \ within r'' string literal that tried to match a literal .



        To match the numbers, you need to use



        -?d*.?d+


        The regex matches:





        • -? - Optional minus sign


        • d* - Optional digits


        • .? - Optional decimal separator


        • d+ - 1 or more digits.


        See demo



        Here is IDEONE demo:



        import re
        s = r'abc123d, hello 3.1415926, this is my book'
        pattern = r'-?d*.?d+'
        L = re.findall(pattern, s)
        print(L)





        share|improve this answer















        There are two things to note here:





        • re.findall returns captured texts if the regex pattern contains capturing groups in it

        • the r'\.' part in your pattern matches two consecutive chars, and any char other than a newline.


        See findall reference:




        If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.




        Note that to make re.findall return just match values, you may usually




        • remove redundant capturing groups (e.g. (a(b)c) -> abc)

        • convert all capturing groups into non-capturing (that is, replace ( with (?:) unless there are backreferences that refer to the group values in the pattern (then see below)

        • use re.finditer instead ([x.group() for x in re.finditer(pattern, s)])


        In your case, findall returned all captured texts that were empty because you have \ within r'' string literal that tried to match a literal .



        To match the numbers, you need to use



        -?d*.?d+


        The regex matches:





        • -? - Optional minus sign


        • d* - Optional digits


        • .? - Optional decimal separator


        • d+ - 1 or more digits.


        See demo



        Here is IDEONE demo:



        import re
        s = r'abc123d, hello 3.1415926, this is my book'
        pattern = r'-?d*.?d+'
        L = re.findall(pattern, s)
        print(L)






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 12 '18 at 9:52

























        answered Aug 10 '15 at 8:40









        Wiktor StribiżewWiktor Stribiżew

        315k16133214




        315k16133214






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f31915018%2fpython-re-findall-behaves-weird%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Feedback on college project

            Futebolista

            Albești (Vaslui)