Regex replace text but exclude when text is between specific tag
I have the following string:
Lorem ipsum Test dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed Test dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.
Now I would replace the string 'Test' outside of tags an not between tags (e.g. replaced with '1234').
Lorem ipsum 1234 dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed 1234 dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.
I started with this regex: (?!<a[^>]*>)(Test)([^<])(?!</a>)
But two problems are not solved:
- The text 'Test' gets also replaced inside Tags (e.g. )
- Does the text between the tag not exactly match the searched text, it will be also replaced(e.g. <a href="http://url">Test xyz</a>)
I hope someone has a solution to solve this problem.
regex
add a comment |
I have the following string:
Lorem ipsum Test dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed Test dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.
Now I would replace the string 'Test' outside of tags an not between tags (e.g. replaced with '1234').
Lorem ipsum 1234 dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed 1234 dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.
I started with this regex: (?!<a[^>]*>)(Test)([^<])(?!</a>)
But two problems are not solved:
- The text 'Test' gets also replaced inside Tags (e.g. )
- Does the text between the tag not exactly match the searched text, it will be also replaced(e.g. <a href="http://url">Test xyz</a>)
I hope someone has a solution to solve this problem.
regex
add a comment |
I have the following string:
Lorem ipsum Test dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed Test dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.
Now I would replace the string 'Test' outside of tags an not between tags (e.g. replaced with '1234').
Lorem ipsum 1234 dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed 1234 dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.
I started with this regex: (?!<a[^>]*>)(Test)([^<])(?!</a>)
But two problems are not solved:
- The text 'Test' gets also replaced inside Tags (e.g. )
- Does the text between the tag not exactly match the searched text, it will be also replaced(e.g. <a href="http://url">Test xyz</a>)
I hope someone has a solution to solve this problem.
regex
I have the following string:
Lorem ipsum Test dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed Test dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.
Now I would replace the string 'Test' outside of tags an not between tags (e.g. replaced with '1234').
Lorem ipsum 1234 dolor sit amet, consetetur sadipscing elitr, sed diam nonumy <a href="http://Test.com/url">Test</a> eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd sed 1234 dolores et ea rebum. Stet clita kasd gubergren, no sea <a href="http://url.com">Test xyz</a> takimata sanctus est Lorem ipsum dolor sit amet.
I started with this regex: (?!<a[^>]*>)(Test)([^<])(?!</a>)
But two problems are not solved:
- The text 'Test' gets also replaced inside Tags (e.g. )
- Does the text between the tag not exactly match the searched text, it will be also replaced(e.g. <a href="http://url">Test xyz</a>)
I hope someone has a solution to solve this problem.
regex
regex
asked Sep 19 '12 at 10:44
WeriWeri
43114
43114
add a comment |
add a comment |
                                4 Answers
                            4
                        
active
oldest
votes
(?!<a[^>]*?>)(Test)(?![^<]*?</a>)
same as zb226, but optimized with a lazy match
Also, using regexes on raw HTML is not recommended.
 
 
 
 
 
 
 
 I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)
 
 – Weri
 Sep 19 '12 at 12:34
 
 
 
 
 
 
 
 
 
 
 That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as- _Test_, _Test, or Test_are not in your document (and assuming you would not care to match them if they were).
 
 – protist
 Sep 19 '12 at 13:10
 
 
 
 
 
 
 
 
 
 
 
 
 The lookaheaed before Test and the lazy match are meaningless. See my answer.
 
 – Adam
 Oct 25 '17 at 16:38
 
 
 
add a comment |
Answer
Use
(Test)(?!(.(?!<a))*</a>)
Explanation
Let me remind you of the meaning of some symbols:
1) ?! is a negative lookahead, for example r(?!d) selects all r that are not directly followed by an d:

2) Therefore never start a negative lookahead without a character. Just (?!d) is meaningless:

3) The ? can be used as a lazy match. For example .+E would select from
123EEE
the whole string 123EEE. However, .+?E selects as few "any charater" (.+) as needed. It would only select 123E.
Answer:
Protist answer is that you should use (?!<a[^>]*?>)(Test)(?![^<]*?</a>). Let me explain how to make this shorter first.
As mentioned in 2), it is meaningless to put a lookahead before the match. So the following is equivalent to protist answer:
(Test)(?![^<]*?</a>)
also since < is not allowed, the lazy match ? is superfluous, so its also equivalent to 
(Test)(?![^<]*</a>)
This selects all Test that are not followed by an </a> without the symbol < in between. This is why Test which appears before or after any <a ...> .. </a> will be replaced.
However, note that
Lorem Test dolor <a href="http://Test.com/url">Test <strong>dolor</strong></a> eirmod
would be changed to
Lorem 1234 dolor <a href="http://1234.com/url">1234 <strong>dolor</strong></a> eirmod 
In order to catch that you could change your regex to
(Test)(?!(.(?!<a))*</a>)
which does the following:
Select every word
Testthat is not followed by a string***</a>where each character in***is not followed by<a.
Note that the dot . is important  (see 2)). 
Note that a lazy match  like (Test)(?!(.(?!<a))*?</a>) is not relevant because nested links are illegal in HTML4 and HTML5 (smth like <a href="#">..<a href="#">...</a>..</a>).
protist said
Also, using regexes on raw HTML is not recommended.
I agree with that. A problem is that it would cause problems if a tag is not closed or opened. For example all mentioned solutions here would change
Lorem Test dolor Test <strong>dolor</strong></a> eirmod
to
Lorem Test dolor Test <strong>dolor</strong></a> eirmod 1234 dolores sea 1234 takimata 
 
 
 
 
 
 
 
 Great answer, worked perfect for me
 
 – Justin E. Samuels
 Oct 16 '18 at 22:38
 
 
 
add a comment |
This should do the trick:
(<a[^>]*>)(Test)(?![^<]*</a>)
Try it yourself on regexr.
 
 
 1
 
 
 
 
 
 It is meaningless to put a lookahead before the match
 
 – Adam
 Apr 17 '18 at 22:43
 
 
 
 
 
 1
 
 
 
 
 
 @Adam That's of course correct, thanks for the heads up :)
 
 – zb226
 Apr 17 '18 at 23:04
 
 
 
add a comment |
Resurrecting this ancient question because it had a simple solution that wasn't mentioned.
With all the disclaimers about using regex to parse html, here is a simple way to do it.
Method for Perl / PCRE
<a[^>]*>[^<]*</a(*SKIP)(*F)|Test
demo
General Solution
<a[^>]*>[^<]*</a|(Test)
In this version, the text to be replaced is captured in Group 1 and the replacement is performed by a simple callback or lambda.
demo
Reference
- How to match pattern except in situations s1, s2, s3
- For code implementation see the code samples in How to match a pattern unless...
 
 
 
 
 
 
 
 
 The most important part for me was to know- $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return- m[0]if- m[1]is empty. Really nice to know. Thank you!
 
 – mgutt
 Apr 4 '15 at 14:03
 
 
 
 
 
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f12493128%2fregex-replace-text-but-exclude-when-text-is-between-specific-tag%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
                                4 Answers
                            4
                        
active
oldest
votes
                                4 Answers
                            4
                        
active
oldest
votes
active
oldest
votes
active
oldest
votes
(?!<a[^>]*?>)(Test)(?![^<]*?</a>)
same as zb226, but optimized with a lazy match
Also, using regexes on raw HTML is not recommended.
 
 
 
 
 
 
 
 I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)
 
 – Weri
 Sep 19 '12 at 12:34
 
 
 
 
 
 
 
 
 
 
 That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as- _Test_, _Test, or Test_are not in your document (and assuming you would not care to match them if they were).
 
 – protist
 Sep 19 '12 at 13:10
 
 
 
 
 
 
 
 
 
 
 
 
 The lookaheaed before Test and the lazy match are meaningless. See my answer.
 
 – Adam
 Oct 25 '17 at 16:38
 
 
 
add a comment |
(?!<a[^>]*?>)(Test)(?![^<]*?</a>)
same as zb226, but optimized with a lazy match
Also, using regexes on raw HTML is not recommended.
 
 
 
 
 
 
 
 I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)
 
 – Weri
 Sep 19 '12 at 12:34
 
 
 
 
 
 
 
 
 
 
 That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as- _Test_, _Test, or Test_are not in your document (and assuming you would not care to match them if they were).
 
 – protist
 Sep 19 '12 at 13:10
 
 
 
 
 
 
 
 
 
 
 
 
 The lookaheaed before Test and the lazy match are meaningless. See my answer.
 
 – Adam
 Oct 25 '17 at 16:38
 
 
 
add a comment |
(?!<a[^>]*?>)(Test)(?![^<]*?</a>)
same as zb226, but optimized with a lazy match
Also, using regexes on raw HTML is not recommended.
(?!<a[^>]*?>)(Test)(?![^<]*?</a>)
same as zb226, but optimized with a lazy match
Also, using regexes on raw HTML is not recommended.
answered Sep 19 '12 at 11:48


protistprotist
75249
75249
 
 
 
 
 
 
 
 I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)
 
 – Weri
 Sep 19 '12 at 12:34
 
 
 
 
 
 
 
 
 
 
 That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as- _Test_, _Test, or Test_are not in your document (and assuming you would not care to match them if they were).
 
 – protist
 Sep 19 '12 at 13:10
 
 
 
 
 
 
 
 
 
 
 
 
 The lookaheaed before Test and the lazy match are meaningless. See my answer.
 
 – Adam
 Oct 25 '17 at 16:38
 
 
 
add a comment |
 
 
 
 
 
 
 
 I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)
 
 – Weri
 Sep 19 '12 at 12:34
 
 
 
 
 
 
 
 
 
 
 That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as- _Test_, _Test, or Test_are not in your document (and assuming you would not care to match them if they were).
 
 – protist
 Sep 19 '12 at 13:10
 
 
 
 
 
 
 
 
 
 
 
 
 The lookaheaed before Test and the lazy match are meaningless. See my answer.
 
 – Adam
 Oct 25 '17 at 16:38
 
 
 
I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)
– Weri
Sep 19 '12 at 12:34
I also added the b flag to match a word boundary: (?!<a[^>]*?>)(bTestb)(?![^<]*?</a>)
– Weri
Sep 19 '12 at 12:34
That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as
_Test_, _Test, or Test_ are not in your document (and assuming you would not care to match them if they were).– protist
Sep 19 '12 at 13:10
That should give the regex optimizer more to work with. It also should not adversely affect your matches, as long as
_Test_, _Test, or Test_ are not in your document (and assuming you would not care to match them if they were).– protist
Sep 19 '12 at 13:10
The lookaheaed before Test and the lazy match are meaningless. See my answer.
– Adam
Oct 25 '17 at 16:38
The lookaheaed before Test and the lazy match are meaningless. See my answer.
– Adam
Oct 25 '17 at 16:38
add a comment |
Answer
Use
(Test)(?!(.(?!<a))*</a>)
Explanation
Let me remind you of the meaning of some symbols:
1) ?! is a negative lookahead, for example r(?!d) selects all r that are not directly followed by an d:

2) Therefore never start a negative lookahead without a character. Just (?!d) is meaningless:

3) The ? can be used as a lazy match. For example .+E would select from
123EEE
the whole string 123EEE. However, .+?E selects as few "any charater" (.+) as needed. It would only select 123E.
Answer:
Protist answer is that you should use (?!<a[^>]*?>)(Test)(?![^<]*?</a>). Let me explain how to make this shorter first.
As mentioned in 2), it is meaningless to put a lookahead before the match. So the following is equivalent to protist answer:
(Test)(?![^<]*?</a>)
also since < is not allowed, the lazy match ? is superfluous, so its also equivalent to 
(Test)(?![^<]*</a>)
This selects all Test that are not followed by an </a> without the symbol < in between. This is why Test which appears before or after any <a ...> .. </a> will be replaced.
However, note that
Lorem Test dolor <a href="http://Test.com/url">Test <strong>dolor</strong></a> eirmod
would be changed to
Lorem 1234 dolor <a href="http://1234.com/url">1234 <strong>dolor</strong></a> eirmod 
In order to catch that you could change your regex to
(Test)(?!(.(?!<a))*</a>)
which does the following:
Select every word
Testthat is not followed by a string***</a>where each character in***is not followed by<a.
Note that the dot . is important  (see 2)). 
Note that a lazy match  like (Test)(?!(.(?!<a))*?</a>) is not relevant because nested links are illegal in HTML4 and HTML5 (smth like <a href="#">..<a href="#">...</a>..</a>).
protist said
Also, using regexes on raw HTML is not recommended.
I agree with that. A problem is that it would cause problems if a tag is not closed or opened. For example all mentioned solutions here would change
Lorem Test dolor Test <strong>dolor</strong></a> eirmod
to
Lorem Test dolor Test <strong>dolor</strong></a> eirmod 1234 dolores sea 1234 takimata 
 
 
 
 
 
 
 
 Great answer, worked perfect for me
 
 – Justin E. Samuels
 Oct 16 '18 at 22:38
 
 
 
add a comment |
Answer
Use
(Test)(?!(.(?!<a))*</a>)
Explanation
Let me remind you of the meaning of some symbols:
1) ?! is a negative lookahead, for example r(?!d) selects all r that are not directly followed by an d:

2) Therefore never start a negative lookahead without a character. Just (?!d) is meaningless:

3) The ? can be used as a lazy match. For example .+E would select from
123EEE
the whole string 123EEE. However, .+?E selects as few "any charater" (.+) as needed. It would only select 123E.
Answer:
Protist answer is that you should use (?!<a[^>]*?>)(Test)(?![^<]*?</a>). Let me explain how to make this shorter first.
As mentioned in 2), it is meaningless to put a lookahead before the match. So the following is equivalent to protist answer:
(Test)(?![^<]*?</a>)
also since < is not allowed, the lazy match ? is superfluous, so its also equivalent to 
(Test)(?![^<]*</a>)
This selects all Test that are not followed by an </a> without the symbol < in between. This is why Test which appears before or after any <a ...> .. </a> will be replaced.
However, note that
Lorem Test dolor <a href="http://Test.com/url">Test <strong>dolor</strong></a> eirmod
would be changed to
Lorem 1234 dolor <a href="http://1234.com/url">1234 <strong>dolor</strong></a> eirmod 
In order to catch that you could change your regex to
(Test)(?!(.(?!<a))*</a>)
which does the following:
Select every word
Testthat is not followed by a string***</a>where each character in***is not followed by<a.
Note that the dot . is important  (see 2)). 
Note that a lazy match  like (Test)(?!(.(?!<a))*?</a>) is not relevant because nested links are illegal in HTML4 and HTML5 (smth like <a href="#">..<a href="#">...</a>..</a>).
protist said
Also, using regexes on raw HTML is not recommended.
I agree with that. A problem is that it would cause problems if a tag is not closed or opened. For example all mentioned solutions here would change
Lorem Test dolor Test <strong>dolor</strong></a> eirmod
to
Lorem Test dolor Test <strong>dolor</strong></a> eirmod 1234 dolores sea 1234 takimata 
 
 
 
 
 
 
 
 Great answer, worked perfect for me
 
 – Justin E. Samuels
 Oct 16 '18 at 22:38
 
 
 
add a comment |
Answer
Use
(Test)(?!(.(?!<a))*</a>)
Explanation
Let me remind you of the meaning of some symbols:
1) ?! is a negative lookahead, for example r(?!d) selects all r that are not directly followed by an d:

2) Therefore never start a negative lookahead without a character. Just (?!d) is meaningless:

3) The ? can be used as a lazy match. For example .+E would select from
123EEE
the whole string 123EEE. However, .+?E selects as few "any charater" (.+) as needed. It would only select 123E.
Answer:
Protist answer is that you should use (?!<a[^>]*?>)(Test)(?![^<]*?</a>). Let me explain how to make this shorter first.
As mentioned in 2), it is meaningless to put a lookahead before the match. So the following is equivalent to protist answer:
(Test)(?![^<]*?</a>)
also since < is not allowed, the lazy match ? is superfluous, so its also equivalent to 
(Test)(?![^<]*</a>)
This selects all Test that are not followed by an </a> without the symbol < in between. This is why Test which appears before or after any <a ...> .. </a> will be replaced.
However, note that
Lorem Test dolor <a href="http://Test.com/url">Test <strong>dolor</strong></a> eirmod
would be changed to
Lorem 1234 dolor <a href="http://1234.com/url">1234 <strong>dolor</strong></a> eirmod 
In order to catch that you could change your regex to
(Test)(?!(.(?!<a))*</a>)
which does the following:
Select every word
Testthat is not followed by a string***</a>where each character in***is not followed by<a.
Note that the dot . is important  (see 2)). 
Note that a lazy match  like (Test)(?!(.(?!<a))*?</a>) is not relevant because nested links are illegal in HTML4 and HTML5 (smth like <a href="#">..<a href="#">...</a>..</a>).
protist said
Also, using regexes on raw HTML is not recommended.
I agree with that. A problem is that it would cause problems if a tag is not closed or opened. For example all mentioned solutions here would change
Lorem Test dolor Test <strong>dolor</strong></a> eirmod
to
Lorem Test dolor Test <strong>dolor</strong></a> eirmod 1234 dolores sea 1234 takimata 
Answer
Use
(Test)(?!(.(?!<a))*</a>)
Explanation
Let me remind you of the meaning of some symbols:
1) ?! is a negative lookahead, for example r(?!d) selects all r that are not directly followed by an d:

2) Therefore never start a negative lookahead without a character. Just (?!d) is meaningless:

3) The ? can be used as a lazy match. For example .+E would select from
123EEE
the whole string 123EEE. However, .+?E selects as few "any charater" (.+) as needed. It would only select 123E.
Answer:
Protist answer is that you should use (?!<a[^>]*?>)(Test)(?![^<]*?</a>). Let me explain how to make this shorter first.
As mentioned in 2), it is meaningless to put a lookahead before the match. So the following is equivalent to protist answer:
(Test)(?![^<]*?</a>)
also since < is not allowed, the lazy match ? is superfluous, so its also equivalent to 
(Test)(?![^<]*</a>)
This selects all Test that are not followed by an </a> without the symbol < in between. This is why Test which appears before or after any <a ...> .. </a> will be replaced.
However, note that
Lorem Test dolor <a href="http://Test.com/url">Test <strong>dolor</strong></a> eirmod
would be changed to
Lorem 1234 dolor <a href="http://1234.com/url">1234 <strong>dolor</strong></a> eirmod 
In order to catch that you could change your regex to
(Test)(?!(.(?!<a))*</a>)
which does the following:
Select every word
Testthat is not followed by a string***</a>where each character in***is not followed by<a.
Note that the dot . is important  (see 2)). 
Note that a lazy match  like (Test)(?!(.(?!<a))*?</a>) is not relevant because nested links are illegal in HTML4 and HTML5 (smth like <a href="#">..<a href="#">...</a>..</a>).
protist said
Also, using regexes on raw HTML is not recommended.
I agree with that. A problem is that it would cause problems if a tag is not closed or opened. For example all mentioned solutions here would change
Lorem Test dolor Test <strong>dolor</strong></a> eirmod
to
Lorem Test dolor Test <strong>dolor</strong></a> eirmod 1234 dolores sea 1234 takimata 
edited Nov 14 '18 at 9:12
answered Oct 25 '17 at 16:38


AdamAdam
3,30842870
3,30842870
 
 
 
 
 
 
 
 Great answer, worked perfect for me
 
 – Justin E. Samuels
 Oct 16 '18 at 22:38
 
 
 
add a comment |
 
 
 
 
 
 
 
 Great answer, worked perfect for me
 
 – Justin E. Samuels
 Oct 16 '18 at 22:38
 
 
 
Great answer, worked perfect for me
– Justin E. Samuels
Oct 16 '18 at 22:38
Great answer, worked perfect for me
– Justin E. Samuels
Oct 16 '18 at 22:38
add a comment |
This should do the trick:
(<a[^>]*>)(Test)(?![^<]*</a>)
Try it yourself on regexr.
 
 
 1
 
 
 
 
 
 It is meaningless to put a lookahead before the match
 
 – Adam
 Apr 17 '18 at 22:43
 
 
 
 
 
 1
 
 
 
 
 
 @Adam That's of course correct, thanks for the heads up :)
 
 – zb226
 Apr 17 '18 at 23:04
 
 
 
add a comment |
This should do the trick:
(<a[^>]*>)(Test)(?![^<]*</a>)
Try it yourself on regexr.
 
 
 1
 
 
 
 
 
 It is meaningless to put a lookahead before the match
 
 – Adam
 Apr 17 '18 at 22:43
 
 
 
 
 
 1
 
 
 
 
 
 @Adam That's of course correct, thanks for the heads up :)
 
 – zb226
 Apr 17 '18 at 23:04
 
 
 
add a comment |
This should do the trick:
(<a[^>]*>)(Test)(?![^<]*</a>)
Try it yourself on regexr.
This should do the trick:
(<a[^>]*>)(Test)(?![^<]*</a>)
Try it yourself on regexr.
edited Apr 17 '18 at 23:03
answered Sep 19 '12 at 11:24


zb226zb226
5,71132850
5,71132850
 
 
 1
 
 
 
 
 
 It is meaningless to put a lookahead before the match
 
 – Adam
 Apr 17 '18 at 22:43
 
 
 
 
 
 1
 
 
 
 
 
 @Adam That's of course correct, thanks for the heads up :)
 
 – zb226
 Apr 17 '18 at 23:04
 
 
 
add a comment |
 
 
 1
 
 
 
 
 
 It is meaningless to put a lookahead before the match
 
 – Adam
 Apr 17 '18 at 22:43
 
 
 
 
 
 1
 
 
 
 
 
 @Adam That's of course correct, thanks for the heads up :)
 
 – zb226
 Apr 17 '18 at 23:04
 
 
 
1
1
It is meaningless to put a lookahead before the match
– Adam
Apr 17 '18 at 22:43
It is meaningless to put a lookahead before the match
– Adam
Apr 17 '18 at 22:43
1
1
@Adam That's of course correct, thanks for the heads up :)
– zb226
Apr 17 '18 at 23:04
@Adam That's of course correct, thanks for the heads up :)
– zb226
Apr 17 '18 at 23:04
add a comment |
Resurrecting this ancient question because it had a simple solution that wasn't mentioned.
With all the disclaimers about using regex to parse html, here is a simple way to do it.
Method for Perl / PCRE
<a[^>]*>[^<]*</a(*SKIP)(*F)|Test
demo
General Solution
<a[^>]*>[^<]*</a|(Test)
In this version, the text to be replaced is captured in Group 1 and the replacement is performed by a simple callback or lambda.
demo
Reference
- How to match pattern except in situations s1, s2, s3
- For code implementation see the code samples in How to match a pattern unless...
 
 
 
 
 
 
 
 
 The most important part for me was to know- $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return- m[0]if- m[1]is empty. Really nice to know. Thank you!
 
 – mgutt
 Apr 4 '15 at 14:03
 
 
 
 
 
add a comment |
Resurrecting this ancient question because it had a simple solution that wasn't mentioned.
With all the disclaimers about using regex to parse html, here is a simple way to do it.
Method for Perl / PCRE
<a[^>]*>[^<]*</a(*SKIP)(*F)|Test
demo
General Solution
<a[^>]*>[^<]*</a|(Test)
In this version, the text to be replaced is captured in Group 1 and the replacement is performed by a simple callback or lambda.
demo
Reference
- How to match pattern except in situations s1, s2, s3
- For code implementation see the code samples in How to match a pattern unless...
 
 
 
 
 
 
 
 
 The most important part for me was to know- $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return- m[0]if- m[1]is empty. Really nice to know. Thank you!
 
 – mgutt
 Apr 4 '15 at 14:03
 
 
 
 
 
add a comment |
Resurrecting this ancient question because it had a simple solution that wasn't mentioned.
With all the disclaimers about using regex to parse html, here is a simple way to do it.
Method for Perl / PCRE
<a[^>]*>[^<]*</a(*SKIP)(*F)|Test
demo
General Solution
<a[^>]*>[^<]*</a|(Test)
In this version, the text to be replaced is captured in Group 1 and the replacement is performed by a simple callback or lambda.
demo
Reference
- How to match pattern except in situations s1, s2, s3
- For code implementation see the code samples in How to match a pattern unless...
 
Resurrecting this ancient question because it had a simple solution that wasn't mentioned.
With all the disclaimers about using regex to parse html, here is a simple way to do it.
Method for Perl / PCRE
<a[^>]*>[^<]*</a(*SKIP)(*F)|Test
demo
General Solution
<a[^>]*>[^<]*</a|(Test)
In this version, the text to be replaced is captured in Group 1 and the replacement is performed by a simple callback or lambda.
demo
Reference
- How to match pattern except in situations s1, s2, s3
- For code implementation see the code samples in How to match a pattern unless...
 
edited May 23 '17 at 12:25
Community♦
11
11
answered May 15 '14 at 0:06


zx81zx81
32.9k85585
32.9k85585
 
 
 
 
 
 
 
 The most important part for me was to know- $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return- m[0]if- m[1]is empty. Really nice to know. Thank you!
 
 – mgutt
 Apr 4 '15 at 14:03
 
 
 
 
 
add a comment |
 
 
 
 
 
 
 
 The most important part for me was to know- $replaced = preg_replace_callback( $regex, function($m) { if(empty($m[1])) return $m[0]; else return "Superman";}, $subject);. So I need to return- m[0]if- m[1]is empty. Really nice to know. Thank you!
 
 – mgutt
 Apr 4 '15 at 14:03
 
 
 
 
 
The most important part for me was to know
$replaced = preg_replace_callback( 	$regex, 	function($m) { if(empty($m[1])) return $m[0]; 					else return "Superman";}, 	$subject);. So I need to return m[0] if m[1] is empty. Really nice to know. Thank you!– mgutt
Apr 4 '15 at 14:03
The most important part for me was to know
$replaced = preg_replace_callback( 	$regex, 	function($m) { if(empty($m[1])) return $m[0]; 					else return "Superman";}, 	$subject);. So I need to return m[0] if m[1] is empty. Really nice to know. Thank you!– mgutt
Apr 4 '15 at 14:03
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f12493128%2fregex-replace-text-but-exclude-when-text-is-between-specific-tag%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown