Regex replace text but exclude when text is between specific tag












8















I have the following string:



Lorem ipsum Test dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed Test dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.


Now I would replace the string 'Test' outside of tags an not between tags (e.g. replaced with '1234').



Lorem ipsum 1234 dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed 1234 dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.


I started with this regex: (?!<a[^>]*>)(Test)([^<])(?!</a>)



But two problems are not solved:




  1. The text 'Test' gets also replaced inside Tags (e.g. )

  2. Does the text between the tag not exactly match the searched text, it will be also replaced(e.g. <a href="http://url">Test xyz</a>)


I hope someone has a solution to solve this problem.










share|improve this question



























    8















    I have the following string:



    Lorem ipsum Test dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed Test dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.


    Now I would replace the string 'Test' outside of tags an not between tags (e.g. replaced with '1234').



    Lorem ipsum 1234 dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed 1234 dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.


    I started with this regex: (?!<a[^>]*>)(Test)([^<])(?!</a>)



    But two problems are not solved:




    1. The text 'Test' gets also replaced inside Tags (e.g. )

    2. Does the text between the tag not exactly match the searched text, it will be also replaced(e.g. <a href="http://url">Test xyz</a>)


    I hope someone has a solution to solve this problem.










    share|improve this question

























      8












      8








      8


      4






      I have the following string:



      Lorem ipsum Test dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed Test dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.


      Now I would replace the string 'Test' outside of tags an not between tags (e.g. replaced with '1234').



      Lorem ipsum 1234 dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed 1234 dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.


      I started with this regex: (?!<a[^>]*>)(Test)([^<])(?!</a>)



      But two problems are not solved:




      1. The text 'Test' gets also replaced inside Tags (e.g. )

      2. Does the text between the tag not exactly match the searched text, it will be also replaced(e.g. <a href="http://url">Test xyz</a>)


      I hope someone has a solution to solve this problem.










      share|improve this question














      I have the following string:



      Lorem ipsum Test dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed Test dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.


      Now I would replace the string 'Test' outside of tags an not between tags (e.g. replaced with '1234').



      Lorem ipsum 1234 dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed 1234 dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.


      I started with this regex: (?!<a[^>]*>)(Test)([^<])(?!</a>)



      But two problems are not solved:




      1. The text 'Test' gets also replaced inside Tags (e.g. )

      2. Does the text between the tag not exactly match the searched text, it will be also replaced(e.g. <a href="http://url">Test xyz</a>)


      I hope someone has a solution to solve this problem.







      regex






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Sep 19 '12 at 10:44









      WeriWeri

      43114




      43114
























          4 Answers
          4






          active

          oldest

          votes


















          10














          (?!<a[^>]*?>)(Test)(?![^<]*?</a>)


          same as zb226, but optimized with a lazy match



          Also, using regexes on raw HTML is not recommended.






          share|improve this answer
























          • I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)

            – Weri
            Sep 19 '12 at 12:34











          • That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as _Test_, _Test, or Test_ are not in your document (and assuming you would not care to match them if they were).

            – protist
            Sep 19 '12 at 13:10













          • The lookaheaed before Test and the lazy match are meaningless. See my answer.

            – Adam
            Oct 25 '17 at 16:38



















          8














          Answer



          Use



          (Test)(?!(.(?!<a))*</a>)


          Explanation



          Let me remind you of the meaning of some symbols:



          1) ?! is a negative lookahead, for example r(?!d) selects all r that are not directly followed by an d:



          enter image description here



          2) Therefore never start a negative lookahead without a character. Just (?!d) is meaningless:



          enter image description here



          3) The ? can be used as a lazy match. For example .+E would select from




          123EEE




          the whole string 123EEE. However, .+?E selects as few "any charater" (.+) as needed. It would only select 123E.



          Answer:



          Protist answer is that you should use (?!<a[^>]*?>)(Test)(?![^<]*?</a>). Let me explain how to make this shorter first.



          As mentioned in 2), it is meaningless to put a lookahead before the match. So the following is equivalent to protist answer:



          (Test)(?![^<]*?</a>)


          also since < is not allowed, the lazy match ? is superfluous, so its also equivalent to



          (Test)(?![^<]*</a>)


          This selects all Test that are not followed by an </a> without the symbol < in between. This is why Test which appears before or after any <a ...> .. </a> will be replaced.



          However, note that



          Lorem Test dolor <a href="http://Test.com/url">Test <strong>dolor</strong></a> eirmod


          would be changed to



          Lorem 1234 dolor <a href="http://1234.com/url">1234 <strong>dolor</strong></a> eirmod 


          In order to catch that you could change your regex to



          (Test)(?!(.(?!<a))*</a>)


          which does the following:




          Select every word Test that is not followed by a string ***</a> where each character in *** is not followed by <a.




          Note that the dot . is important (see 2)).



          Note that a lazy match like (Test)(?!(.(?!<a))*?</a>) is not relevant because nested links are illegal in HTML4 and HTML5 (smth like <a href="#">..<a href="#">...</a>..</a>).



          protist said




          Also, using regexes on raw HTML is not recommended.




          I agree with that. A problem is that it would cause problems if a tag is not closed or opened. For example all mentioned solutions here would change



          Lorem Test dolor Test <strong>dolor</strong></a> eirmod


          to



          Lorem Test dolor Test <strong>dolor</strong></a> eirmod 1234 dolores sea 1234 takimata 





          share|improve this answer


























          • Great answer, worked perfect for me

            – Justin E. Samuels
            Oct 16 '18 at 22:38



















          3














          This should do the trick:



          (<a[^>]*>)(Test)(?![^<]*</a>)


          Try it yourself on regexr.






          share|improve this answer





















          • 1





            It is meaningless to put a lookahead before the match

            – Adam
            Apr 17 '18 at 22:43






          • 1





            @Adam That's of course correct, thanks for the heads up :)

            – zb226
            Apr 17 '18 at 23:04



















          2














          Resurrecting this ancient question because it had a simple solution that wasn't mentioned.



          With all the disclaimers about using regex to parse html, here is a simple way to do it.



          Method for Perl / PCRE



          <a[^>]*>[^<]*</a(*SKIP)(*F)|Test


          demo



          General Solution



          <a[^>]*>[^<]*</a|(Test)


          In this version, the text to be replaced is captured in Group 1 and the replacement is performed by a simple callback or lambda.



          demo



          Reference




          1. How to match pattern except in situations s1, s2, s3

          2. For code implementation see the code samples in How to match a pattern unless...






          share|improve this answer


























          • The most important part for me was to know $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return m[0] if m[1] is empty. Really nice to know. Thank you!

            – mgutt
            Apr 4 '15 at 14:03













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f12493128%2fregex-replace-text-but-exclude-when-text-is-between-specific-tag%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          10














          (?!<a[^>]*?>)(Test)(?![^<]*?</a>)


          same as zb226, but optimized with a lazy match



          Also, using regexes on raw HTML is not recommended.






          share|improve this answer
























          • I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)

            – Weri
            Sep 19 '12 at 12:34











          • That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as _Test_, _Test, or Test_ are not in your document (and assuming you would not care to match them if they were).

            – protist
            Sep 19 '12 at 13:10













          • The lookaheaed before Test and the lazy match are meaningless. See my answer.

            – Adam
            Oct 25 '17 at 16:38
















          10














          (?!<a[^>]*?>)(Test)(?![^<]*?</a>)


          same as zb226, but optimized with a lazy match



          Also, using regexes on raw HTML is not recommended.






          share|improve this answer
























          • I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)

            – Weri
            Sep 19 '12 at 12:34











          • That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as _Test_, _Test, or Test_ are not in your document (and assuming you would not care to match them if they were).

            – protist
            Sep 19 '12 at 13:10













          • The lookaheaed before Test and the lazy match are meaningless. See my answer.

            – Adam
            Oct 25 '17 at 16:38














          10












          10








          10







          (?!<a[^>]*?>)(Test)(?![^<]*?</a>)


          same as zb226, but optimized with a lazy match



          Also, using regexes on raw HTML is not recommended.






          share|improve this answer













          (?!<a[^>]*?>)(Test)(?![^<]*?</a>)


          same as zb226, but optimized with a lazy match



          Also, using regexes on raw HTML is not recommended.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Sep 19 '12 at 11:48









          protistprotist

          75249




          75249













          • I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)

            – Weri
            Sep 19 '12 at 12:34











          • That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as _Test_, _Test, or Test_ are not in your document (and assuming you would not care to match them if they were).

            – protist
            Sep 19 '12 at 13:10













          • The lookaheaed before Test and the lazy match are meaningless. See my answer.

            – Adam
            Oct 25 '17 at 16:38



















          • I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)

            – Weri
            Sep 19 '12 at 12:34











          • That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as _Test_, _Test, or Test_ are not in your document (and assuming you would not care to match them if they were).

            – protist
            Sep 19 '12 at 13:10













          • The lookaheaed before Test and the lazy match are meaningless. See my answer.

            – Adam
            Oct 25 '17 at 16:38

















          I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)

          – Weri
          Sep 19 '12 at 12:34





          I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)

          – Weri
          Sep 19 '12 at 12:34













          That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as _Test_, _Test, or Test_ are not in your document (and assuming you would not care to match them if they were).

          – protist
          Sep 19 '12 at 13:10







          That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as _Test_, _Test, or Test_ are not in your document (and assuming you would not care to match them if they were).

          – protist
          Sep 19 '12 at 13:10















          The lookaheaed before Test and the lazy match are meaningless. See my answer.

          – Adam
          Oct 25 '17 at 16:38





          The lookaheaed before Test and the lazy match are meaningless. See my answer.

          – Adam
          Oct 25 '17 at 16:38













          8














          Answer



          Use



          (Test)(?!(.(?!<a))*</a>)


          Explanation



          Let me remind you of the meaning of some symbols:



          1) ?! is a negative lookahead, for example r(?!d) selects all r that are not directly followed by an d:



          enter image description here



          2) Therefore never start a negative lookahead without a character. Just (?!d) is meaningless:



          enter image description here



          3) The ? can be used as a lazy match. For example .+E would select from




          123EEE




          the whole string 123EEE. However, .+?E selects as few "any charater" (.+) as needed. It would only select 123E.



          Answer:



          Protist answer is that you should use (?!<a[^>]*?>)(Test)(?![^<]*?</a>). Let me explain how to make this shorter first.



          As mentioned in 2), it is meaningless to put a lookahead before the match. So the following is equivalent to protist answer:



          (Test)(?![^<]*?</a>)


          also since < is not allowed, the lazy match ? is superfluous, so its also equivalent to



          (Test)(?![^<]*</a>)


          This selects all Test that are not followed by an </a> without the symbol < in between. This is why Test which appears before or after any <a ...> .. </a> will be replaced.



          However, note that



          Lorem Test dolor <a href="http://Test.com/url">Test <strong>dolor</strong></a> eirmod


          would be changed to



          Lorem 1234 dolor <a href="http://1234.com/url">1234 <strong>dolor</strong></a> eirmod 


          In order to catch that you could change your regex to



          (Test)(?!(.(?!<a))*</a>)


          which does the following:




          Select every word Test that is not followed by a string ***</a> where each character in *** is not followed by <a.




          Note that the dot . is important (see 2)).



          Note that a lazy match like (Test)(?!(.(?!<a))*?</a>) is not relevant because nested links are illegal in HTML4 and HTML5 (smth like <a href="#">..<a href="#">...</a>..</a>).



          protist said




          Also, using regexes on raw HTML is not recommended.




          I agree with that. A problem is that it would cause problems if a tag is not closed or opened. For example all mentioned solutions here would change



          Lorem Test dolor Test <strong>dolor</strong></a> eirmod


          to



          Lorem Test dolor Test <strong>dolor</strong></a> eirmod 1234 dolores sea 1234 takimata 





          share|improve this answer


























          • Great answer, worked perfect for me

            – Justin E. Samuels
            Oct 16 '18 at 22:38
















          8














          Answer



          Use



          (Test)(?!(.(?!<a))*</a>)


          Explanation



          Let me remind you of the meaning of some symbols:



          1) ?! is a negative lookahead, for example r(?!d) selects all r that are not directly followed by an d:



          enter image description here



          2) Therefore never start a negative lookahead without a character. Just (?!d) is meaningless:



          enter image description here



          3) The ? can be used as a lazy match. For example .+E would select from




          123EEE




          the whole string 123EEE. However, .+?E selects as few "any charater" (.+) as needed. It would only select 123E.



          Answer:



          Protist answer is that you should use (?!<a[^>]*?>)(Test)(?![^<]*?</a>). Let me explain how to make this shorter first.



          As mentioned in 2), it is meaningless to put a lookahead before the match. So the following is equivalent to protist answer:



          (Test)(?![^<]*?</a>)


          also since < is not allowed, the lazy match ? is superfluous, so its also equivalent to



          (Test)(?![^<]*</a>)


          This selects all Test that are not followed by an </a> without the symbol < in between. This is why Test which appears before or after any <a ...> .. </a> will be replaced.



          However, note that



          Lorem Test dolor <a href="http://Test.com/url">Test <strong>dolor</strong></a> eirmod


          would be changed to



          Lorem 1234 dolor <a href="http://1234.com/url">1234 <strong>dolor</strong></a> eirmod 


          In order to catch that you could change your regex to



          (Test)(?!(.(?!<a))*</a>)


          which does the following:




          Select every word Test that is not followed by a string ***</a> where each character in *** is not followed by <a.




          Note that the dot . is important (see 2)).



          Note that a lazy match like (Test)(?!(.(?!<a))*?</a>) is not relevant because nested links are illegal in HTML4 and HTML5 (smth like <a href="#">..<a href="#">...</a>..</a>).



          protist said




          Also, using regexes on raw HTML is not recommended.




          I agree with that. A problem is that it would cause problems if a tag is not closed or opened. For example all mentioned solutions here would change



          Lorem Test dolor Test <strong>dolor</strong></a> eirmod


          to



          Lorem Test dolor Test <strong>dolor</strong></a> eirmod 1234 dolores sea 1234 takimata 





          share|improve this answer


























          • Great answer, worked perfect for me

            – Justin E. Samuels
            Oct 16 '18 at 22:38














          8












          8








          8







          Answer



          Use



          (Test)(?!(.(?!<a))*</a>)


          Explanation



          Let me remind you of the meaning of some symbols:



          1) ?! is a negative lookahead, for example r(?!d) selects all r that are not directly followed by an d:



          enter image description here



          2) Therefore never start a negative lookahead without a character. Just (?!d) is meaningless:



          enter image description here



          3) The ? can be used as a lazy match. For example .+E would select from




          123EEE




          the whole string 123EEE. However, .+?E selects as few "any charater" (.+) as needed. It would only select 123E.



          Answer:



          Protist answer is that you should use (?!<a[^>]*?>)(Test)(?![^<]*?</a>). Let me explain how to make this shorter first.



          As mentioned in 2), it is meaningless to put a lookahead before the match. So the following is equivalent to protist answer:



          (Test)(?![^<]*?</a>)


          also since < is not allowed, the lazy match ? is superfluous, so its also equivalent to



          (Test)(?![^<]*</a>)


          This selects all Test that are not followed by an </a> without the symbol < in between. This is why Test which appears before or after any <a ...> .. </a> will be replaced.



          However, note that



          Lorem Test dolor <a href="http://Test.com/url">Test <strong>dolor</strong></a> eirmod


          would be changed to



          Lorem 1234 dolor <a href="http://1234.com/url">1234 <strong>dolor</strong></a> eirmod 


          In order to catch that you could change your regex to



          (Test)(?!(.(?!<a))*</a>)


          which does the following:




          Select every word Test that is not followed by a string ***</a> where each character in *** is not followed by <a.




          Note that the dot . is important (see 2)).



          Note that a lazy match like (Test)(?!(.(?!<a))*?</a>) is not relevant because nested links are illegal in HTML4 and HTML5 (smth like <a href="#">..<a href="#">...</a>..</a>).



          protist said




          Also, using regexes on raw HTML is not recommended.




          I agree with that. A problem is that it would cause problems if a tag is not closed or opened. For example all mentioned solutions here would change



          Lorem Test dolor Test <strong>dolor</strong></a> eirmod


          to



          Lorem Test dolor Test <strong>dolor</strong></a> eirmod 1234 dolores sea 1234 takimata 





          share|improve this answer















          Answer



          Use



          (Test)(?!(.(?!<a))*</a>)


          Explanation



          Let me remind you of the meaning of some symbols:



          1) ?! is a negative lookahead, for example r(?!d) selects all r that are not directly followed by an d:



          enter image description here



          2) Therefore never start a negative lookahead without a character. Just (?!d) is meaningless:



          enter image description here



          3) The ? can be used as a lazy match. For example .+E would select from




          123EEE




          the whole string 123EEE. However, .+?E selects as few "any charater" (.+) as needed. It would only select 123E.



          Answer:



          Protist answer is that you should use (?!<a[^>]*?>)(Test)(?![^<]*?</a>). Let me explain how to make this shorter first.



          As mentioned in 2), it is meaningless to put a lookahead before the match. So the following is equivalent to protist answer:



          (Test)(?![^<]*?</a>)


          also since < is not allowed, the lazy match ? is superfluous, so its also equivalent to



          (Test)(?![^<]*</a>)


          This selects all Test that are not followed by an </a> without the symbol < in between. This is why Test which appears before or after any <a ...> .. </a> will be replaced.



          However, note that



          Lorem Test dolor <a href="http://Test.com/url">Test <strong>dolor</strong></a> eirmod


          would be changed to



          Lorem 1234 dolor <a href="http://1234.com/url">1234 <strong>dolor</strong></a> eirmod 


          In order to catch that you could change your regex to



          (Test)(?!(.(?!<a))*</a>)


          which does the following:




          Select every word Test that is not followed by a string ***</a> where each character in *** is not followed by <a.




          Note that the dot . is important (see 2)).



          Note that a lazy match like (Test)(?!(.(?!<a))*?</a>) is not relevant because nested links are illegal in HTML4 and HTML5 (smth like <a href="#">..<a href="#">...</a>..</a>).



          protist said




          Also, using regexes on raw HTML is not recommended.




          I agree with that. A problem is that it would cause problems if a tag is not closed or opened. For example all mentioned solutions here would change



          Lorem Test dolor Test <strong>dolor</strong></a> eirmod


          to



          Lorem Test dolor Test <strong>dolor</strong></a> eirmod 1234 dolores sea 1234 takimata 






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 14 '18 at 9:12

























          answered Oct 25 '17 at 16:38









          AdamAdam

          3,30842870




          3,30842870













          • Great answer, worked perfect for me

            – Justin E. Samuels
            Oct 16 '18 at 22:38



















          • Great answer, worked perfect for me

            – Justin E. Samuels
            Oct 16 '18 at 22:38

















          Great answer, worked perfect for me

          – Justin E. Samuels
          Oct 16 '18 at 22:38





          Great answer, worked perfect for me

          – Justin E. Samuels
          Oct 16 '18 at 22:38











          3














          This should do the trick:



          (<a[^>]*>)(Test)(?![^<]*</a>)


          Try it yourself on regexr.






          share|improve this answer





















          • 1





            It is meaningless to put a lookahead before the match

            – Adam
            Apr 17 '18 at 22:43






          • 1





            @Adam That's of course correct, thanks for the heads up :)

            – zb226
            Apr 17 '18 at 23:04
















          3














          This should do the trick:



          (<a[^>]*>)(Test)(?![^<]*</a>)


          Try it yourself on regexr.






          share|improve this answer





















          • 1





            It is meaningless to put a lookahead before the match

            – Adam
            Apr 17 '18 at 22:43






          • 1





            @Adam That's of course correct, thanks for the heads up :)

            – zb226
            Apr 17 '18 at 23:04














          3












          3








          3







          This should do the trick:



          (<a[^>]*>)(Test)(?![^<]*</a>)


          Try it yourself on regexr.






          share|improve this answer















          This should do the trick:



          (<a[^>]*>)(Test)(?![^<]*</a>)


          Try it yourself on regexr.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 17 '18 at 23:03

























          answered Sep 19 '12 at 11:24









          zb226zb226

          5,71132850




          5,71132850








          • 1





            It is meaningless to put a lookahead before the match

            – Adam
            Apr 17 '18 at 22:43






          • 1





            @Adam That's of course correct, thanks for the heads up :)

            – zb226
            Apr 17 '18 at 23:04














          • 1





            It is meaningless to put a lookahead before the match

            – Adam
            Apr 17 '18 at 22:43






          • 1





            @Adam That's of course correct, thanks for the heads up :)

            – zb226
            Apr 17 '18 at 23:04








          1




          1





          It is meaningless to put a lookahead before the match

          – Adam
          Apr 17 '18 at 22:43





          It is meaningless to put a lookahead before the match

          – Adam
          Apr 17 '18 at 22:43




          1




          1





          @Adam That's of course correct, thanks for the heads up :)

          – zb226
          Apr 17 '18 at 23:04





          @Adam That's of course correct, thanks for the heads up :)

          – zb226
          Apr 17 '18 at 23:04











          2














          Resurrecting this ancient question because it had a simple solution that wasn't mentioned.



          With all the disclaimers about using regex to parse html, here is a simple way to do it.



          Method for Perl / PCRE



          <a[^>]*>[^<]*</a(*SKIP)(*F)|Test


          demo



          General Solution



          <a[^>]*>[^<]*</a|(Test)


          In this version, the text to be replaced is captured in Group 1 and the replacement is performed by a simple callback or lambda.



          demo



          Reference




          1. How to match pattern except in situations s1, s2, s3

          2. For code implementation see the code samples in How to match a pattern unless...






          share|improve this answer


























          • The most important part for me was to know $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return m[0] if m[1] is empty. Really nice to know. Thank you!

            – mgutt
            Apr 4 '15 at 14:03


















          2














          Resurrecting this ancient question because it had a simple solution that wasn't mentioned.



          With all the disclaimers about using regex to parse html, here is a simple way to do it.



          Method for Perl / PCRE



          <a[^>]*>[^<]*</a(*SKIP)(*F)|Test


          demo



          General Solution



          <a[^>]*>[^<]*</a|(Test)


          In this version, the text to be replaced is captured in Group 1 and the replacement is performed by a simple callback or lambda.



          demo



          Reference




          1. How to match pattern except in situations s1, s2, s3

          2. For code implementation see the code samples in How to match a pattern unless...






          share|improve this answer


























          • The most important part for me was to know $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return m[0] if m[1] is empty. Really nice to know. Thank you!

            – mgutt
            Apr 4 '15 at 14:03
















          2












          2








          2







          Resurrecting this ancient question because it had a simple solution that wasn't mentioned.



          With all the disclaimers about using regex to parse html, here is a simple way to do it.



          Method for Perl / PCRE



          <a[^>]*>[^<]*</a(*SKIP)(*F)|Test


          demo



          General Solution



          <a[^>]*>[^<]*</a|(Test)


          In this version, the text to be replaced is captured in Group 1 and the replacement is performed by a simple callback or lambda.



          demo



          Reference




          1. How to match pattern except in situations s1, s2, s3

          2. For code implementation see the code samples in How to match a pattern unless...






          share|improve this answer















          Resurrecting this ancient question because it had a simple solution that wasn't mentioned.



          With all the disclaimers about using regex to parse html, here is a simple way to do it.



          Method for Perl / PCRE



          <a[^>]*>[^<]*</a(*SKIP)(*F)|Test


          demo



          General Solution



          <a[^>]*>[^<]*</a|(Test)


          In this version, the text to be replaced is captured in Group 1 and the replacement is performed by a simple callback or lambda.



          demo



          Reference




          1. How to match pattern except in situations s1, s2, s3

          2. For code implementation see the code samples in How to match a pattern unless...







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited May 23 '17 at 12:25









          Community

          11




          11










          answered May 15 '14 at 0:06









          zx81zx81

          32.9k85585




          32.9k85585













          • The most important part for me was to know $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return m[0] if m[1] is empty. Really nice to know. Thank you!

            – mgutt
            Apr 4 '15 at 14:03





















          • The most important part for me was to know $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return m[0] if m[1] is empty. Really nice to know. Thank you!

            – mgutt
            Apr 4 '15 at 14:03



















          The most important part for me was to know $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return m[0] if m[1] is empty. Really nice to know. Thank you!

          – mgutt
          Apr 4 '15 at 14:03







          The most important part for me was to know $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return m[0] if m[1] is empty. Really nice to know. Thank you!

          – mgutt
          Apr 4 '15 at 14:03




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f12493128%2fregex-replace-text-but-exclude-when-text-is-between-specific-tag%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          404 Error Contact Form 7 ajax form submitting

          How to know if a Active Directory user can login interactively

          Refactoring coordinates for Minecraft Pi buildings written in Python