String split using regex with pattern present in text












2















I have many string that I need to split by commas. Example:



myString = r'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
myString = r'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'


My desired output would be:



["test", "Test", "NEAR(this,that,DISTANCE=4)", "test again", """another test"""] #list length = 5


I can't figure out how to keep the commas between "this,that,DISTANCE" in one item. I tried this:



l = re.compile(r',').split(myString) # matches all commas
l = re.compile(r'(?<!(),(?=))').split(myString) # (negative lookback/lookforward) - no matches at all


Any ideas? Let's say the list of allowed "functions" is defined as:



f = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]









share|improve this question


















  • 1





    Possible duplicate of How to split by commas that are not within parentheses?

    – Austin
    Nov 22 '18 at 14:20
















2















I have many string that I need to split by commas. Example:



myString = r'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
myString = r'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'


My desired output would be:



["test", "Test", "NEAR(this,that,DISTANCE=4)", "test again", """another test"""] #list length = 5


I can't figure out how to keep the commas between "this,that,DISTANCE" in one item. I tried this:



l = re.compile(r',').split(myString) # matches all commas
l = re.compile(r'(?<!(),(?=))').split(myString) # (negative lookback/lookforward) - no matches at all


Any ideas? Let's say the list of allowed "functions" is defined as:



f = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]









share|improve this question


















  • 1





    Possible duplicate of How to split by commas that are not within parentheses?

    – Austin
    Nov 22 '18 at 14:20














2












2








2








I have many string that I need to split by commas. Example:



myString = r'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
myString = r'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'


My desired output would be:



["test", "Test", "NEAR(this,that,DISTANCE=4)", "test again", """another test"""] #list length = 5


I can't figure out how to keep the commas between "this,that,DISTANCE" in one item. I tried this:



l = re.compile(r',').split(myString) # matches all commas
l = re.compile(r'(?<!(),(?=))').split(myString) # (negative lookback/lookforward) - no matches at all


Any ideas? Let's say the list of allowed "functions" is defined as:



f = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]









share|improve this question














I have many string that I need to split by commas. Example:



myString = r'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
myString = r'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'


My desired output would be:



["test", "Test", "NEAR(this,that,DISTANCE=4)", "test again", """another test"""] #list length = 5


I can't figure out how to keep the commas between "this,that,DISTANCE" in one item. I tried this:



l = re.compile(r',').split(myString) # matches all commas
l = re.compile(r'(?<!(),(?=))').split(myString) # (negative lookback/lookforward) - no matches at all


Any ideas? Let's say the list of allowed "functions" is defined as:



f = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]






python regex token






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 22 '18 at 14:17









michal111michal111

6910




6910








  • 1





    Possible duplicate of How to split by commas that are not within parentheses?

    – Austin
    Nov 22 '18 at 14:20














  • 1





    Possible duplicate of How to split by commas that are not within parentheses?

    – Austin
    Nov 22 '18 at 14:20








1




1





Possible duplicate of How to split by commas that are not within parentheses?

– Austin
Nov 22 '18 at 14:20





Possible duplicate of How to split by commas that are not within parentheses?

– Austin
Nov 22 '18 at 14:20












2 Answers
2






active

oldest

votes


















2














You may use



(?:([^()]*)|[^,])+


See the regex demo.



The (?:([^()]*)|[^,])+ pattern matches one or more occurrences of any substring between parentheses with no ( and ) in them or any char other than ,.



See the Python demo:



import re
rx = r"(?:([^()]*)|[^,])+"
s = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
print(re.findall(rx, s))
# => ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']





share|improve this answer































    0














    If explicitly want to specify which strings count as functions, you need to build the regex dynamically. Otherwise, go with Wiktor's solution.



    >>> functions = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]
    >>> funcs = '|'.join('{}([^)]+)'.format(f) for f in functions)
    >>> regex = '({})|,'.format(funcs)
    >>>
    >>> myString1 = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
    >>> list(filter(None, re.split(regex, myString1)))
    ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']
    >>> myString2 = 'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'
    >>> list(filter(None, re.split(regex, myString2)))
    ['test',
    'Test',
    'FOLLOWEDBY(this,that,DISTANCE=4)',
    'test again',
    '"another test"']





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432915%2fstring-split-using-regex-with-pattern-present-in-text%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      2














      You may use



      (?:([^()]*)|[^,])+


      See the regex demo.



      The (?:([^()]*)|[^,])+ pattern matches one or more occurrences of any substring between parentheses with no ( and ) in them or any char other than ,.



      See the Python demo:



      import re
      rx = r"(?:([^()]*)|[^,])+"
      s = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
      print(re.findall(rx, s))
      # => ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']





      share|improve this answer




























        2














        You may use



        (?:([^()]*)|[^,])+


        See the regex demo.



        The (?:([^()]*)|[^,])+ pattern matches one or more occurrences of any substring between parentheses with no ( and ) in them or any char other than ,.



        See the Python demo:



        import re
        rx = r"(?:([^()]*)|[^,])+"
        s = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
        print(re.findall(rx, s))
        # => ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']





        share|improve this answer


























          2












          2








          2







          You may use



          (?:([^()]*)|[^,])+


          See the regex demo.



          The (?:([^()]*)|[^,])+ pattern matches one or more occurrences of any substring between parentheses with no ( and ) in them or any char other than ,.



          See the Python demo:



          import re
          rx = r"(?:([^()]*)|[^,])+"
          s = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
          print(re.findall(rx, s))
          # => ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']





          share|improve this answer













          You may use



          (?:([^()]*)|[^,])+


          See the regex demo.



          The (?:([^()]*)|[^,])+ pattern matches one or more occurrences of any substring between parentheses with no ( and ) in them or any char other than ,.



          See the Python demo:



          import re
          rx = r"(?:([^()]*)|[^,])+"
          s = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
          print(re.findall(rx, s))
          # => ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 22 '18 at 14:20









          Wiktor StribiżewWiktor Stribiżew

          311k16132207




          311k16132207

























              0














              If explicitly want to specify which strings count as functions, you need to build the regex dynamically. Otherwise, go with Wiktor's solution.



              >>> functions = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]
              >>> funcs = '|'.join('{}([^)]+)'.format(f) for f in functions)
              >>> regex = '({})|,'.format(funcs)
              >>>
              >>> myString1 = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
              >>> list(filter(None, re.split(regex, myString1)))
              ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']
              >>> myString2 = 'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'
              >>> list(filter(None, re.split(regex, myString2)))
              ['test',
              'Test',
              'FOLLOWEDBY(this,that,DISTANCE=4)',
              'test again',
              '"another test"']





              share|improve this answer




























                0














                If explicitly want to specify which strings count as functions, you need to build the regex dynamically. Otherwise, go with Wiktor's solution.



                >>> functions = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]
                >>> funcs = '|'.join('{}([^)]+)'.format(f) for f in functions)
                >>> regex = '({})|,'.format(funcs)
                >>>
                >>> myString1 = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
                >>> list(filter(None, re.split(regex, myString1)))
                ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']
                >>> myString2 = 'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'
                >>> list(filter(None, re.split(regex, myString2)))
                ['test',
                'Test',
                'FOLLOWEDBY(this,that,DISTANCE=4)',
                'test again',
                '"another test"']





                share|improve this answer


























                  0












                  0








                  0







                  If explicitly want to specify which strings count as functions, you need to build the regex dynamically. Otherwise, go with Wiktor's solution.



                  >>> functions = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]
                  >>> funcs = '|'.join('{}([^)]+)'.format(f) for f in functions)
                  >>> regex = '({})|,'.format(funcs)
                  >>>
                  >>> myString1 = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
                  >>> list(filter(None, re.split(regex, myString1)))
                  ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']
                  >>> myString2 = 'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'
                  >>> list(filter(None, re.split(regex, myString2)))
                  ['test',
                  'Test',
                  'FOLLOWEDBY(this,that,DISTANCE=4)',
                  'test again',
                  '"another test"']





                  share|improve this answer













                  If explicitly want to specify which strings count as functions, you need to build the regex dynamically. Otherwise, go with Wiktor's solution.



                  >>> functions = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]
                  >>> funcs = '|'.join('{}([^)]+)'.format(f) for f in functions)
                  >>> regex = '({})|,'.format(funcs)
                  >>>
                  >>> myString1 = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
                  >>> list(filter(None, re.split(regex, myString1)))
                  ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']
                  >>> myString2 = 'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'
                  >>> list(filter(None, re.split(regex, myString2)))
                  ['test',
                  'Test',
                  'FOLLOWEDBY(this,that,DISTANCE=4)',
                  'test again',
                  '"another test"']






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 22 '18 at 14:26









                  timgebtimgeb

                  50.5k116392




                  50.5k116392






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432915%2fstring-split-using-regex-with-pattern-present-in-text%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      404 Error Contact Form 7 ajax form submitting

                      How to know if a Active Directory user can login interactively

                      TypeError: fit_transform() missing 1 required positional argument: 'X'