java regex to retrieve link from text
I have a input String
as:
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";
I want to convert this text to:
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it
So here:
1) I want to replace the link tag with plain link. If the tag contains label then it should go in braces after the URL.
2) If the URL is relative, I want to prefix the base URL (http://www.google.com).
3) I want to append a parameter to the URL. (&myParam=pqr)
I am having issues retrieving the tag with URL and label, and replacing it.
I wrote something like:
public static void main(String args) {
String text = "String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";";
text = text.replaceAll("<", "<");
text = text.replaceAll(">", ">");
text = text.replaceAll("&", "&");
// this is not working
Pattern p = Pattern.compile("href="(.*?)"");
Matcher m = p.matcher(text);
String url = null;
if (m.find()) {
url = m.group(1);
}
}
// helper method to append new query params once I have the url
public static URI appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri;
}
Edit1:
Pattern p = Pattern.compile("HREF="(.*?)"");
This works. But then I want it to be capitalization agnostic. Href, HRef, href, hrEF, etc. all should work.
Also, how do I handle if my text has several URLs.
Edit2:
Some progress.
Pattern p = Pattern.compile("href="(.*?)"");
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1);
System.out.println(url);
}
This handles the case of multiple URLs.
Last pending issue is, how do I get hold of the label and replace the href tags in original text with URL and label.
Edit3:
By multiple URL cases, I mean there are multiple url present in given text.
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
Pattern p = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1); // this variable should contain the link URL
url = appendBaseURI(url);
url = appendQueryParams(url, "license=ABCXYZ");
System.out.println(url);
}
java regex string url text
add a comment |
I have a input String
as:
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";
I want to convert this text to:
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it
So here:
1) I want to replace the link tag with plain link. If the tag contains label then it should go in braces after the URL.
2) If the URL is relative, I want to prefix the base URL (http://www.google.com).
3) I want to append a parameter to the URL. (&myParam=pqr)
I am having issues retrieving the tag with URL and label, and replacing it.
I wrote something like:
public static void main(String args) {
String text = "String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";";
text = text.replaceAll("<", "<");
text = text.replaceAll(">", ">");
text = text.replaceAll("&", "&");
// this is not working
Pattern p = Pattern.compile("href="(.*?)"");
Matcher m = p.matcher(text);
String url = null;
if (m.find()) {
url = m.group(1);
}
}
// helper method to append new query params once I have the url
public static URI appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri;
}
Edit1:
Pattern p = Pattern.compile("HREF="(.*?)"");
This works. But then I want it to be capitalization agnostic. Href, HRef, href, hrEF, etc. all should work.
Also, how do I handle if my text has several URLs.
Edit2:
Some progress.
Pattern p = Pattern.compile("href="(.*?)"");
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1);
System.out.println(url);
}
This handles the case of multiple URLs.
Last pending issue is, how do I get hold of the label and replace the href tags in original text with URL and label.
Edit3:
By multiple URL cases, I mean there are multiple url present in given text.
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
Pattern p = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1); // this variable should contain the link URL
url = appendBaseURI(url);
url = appendQueryParams(url, "license=ABCXYZ");
System.out.println(url);
}
java regex string url text
Start by converting the html entities with:import org.apache.commons.lang.StringEscapeUtils; String entities_decode = StringEscapeUtils.unescapeHtml(text );
– Pedro Lobito
Nov 22 '18 at 3:23
add a comment |
I have a input String
as:
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";
I want to convert this text to:
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it
So here:
1) I want to replace the link tag with plain link. If the tag contains label then it should go in braces after the URL.
2) If the URL is relative, I want to prefix the base URL (http://www.google.com).
3) I want to append a parameter to the URL. (&myParam=pqr)
I am having issues retrieving the tag with URL and label, and replacing it.
I wrote something like:
public static void main(String args) {
String text = "String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";";
text = text.replaceAll("<", "<");
text = text.replaceAll(">", ">");
text = text.replaceAll("&", "&");
// this is not working
Pattern p = Pattern.compile("href="(.*?)"");
Matcher m = p.matcher(text);
String url = null;
if (m.find()) {
url = m.group(1);
}
}
// helper method to append new query params once I have the url
public static URI appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri;
}
Edit1:
Pattern p = Pattern.compile("HREF="(.*?)"");
This works. But then I want it to be capitalization agnostic. Href, HRef, href, hrEF, etc. all should work.
Also, how do I handle if my text has several URLs.
Edit2:
Some progress.
Pattern p = Pattern.compile("href="(.*?)"");
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1);
System.out.println(url);
}
This handles the case of multiple URLs.
Last pending issue is, how do I get hold of the label and replace the href tags in original text with URL and label.
Edit3:
By multiple URL cases, I mean there are multiple url present in given text.
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
Pattern p = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1); // this variable should contain the link URL
url = appendBaseURI(url);
url = appendQueryParams(url, "license=ABCXYZ");
System.out.println(url);
}
java regex string url text
I have a input String
as:
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";
I want to convert this text to:
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it
So here:
1) I want to replace the link tag with plain link. If the tag contains label then it should go in braces after the URL.
2) If the URL is relative, I want to prefix the base URL (http://www.google.com).
3) I want to append a parameter to the URL. (&myParam=pqr)
I am having issues retrieving the tag with URL and label, and replacing it.
I wrote something like:
public static void main(String args) {
String text = "String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";";
text = text.replaceAll("<", "<");
text = text.replaceAll(">", ">");
text = text.replaceAll("&", "&");
// this is not working
Pattern p = Pattern.compile("href="(.*?)"");
Matcher m = p.matcher(text);
String url = null;
if (m.find()) {
url = m.group(1);
}
}
// helper method to append new query params once I have the url
public static URI appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri;
}
Edit1:
Pattern p = Pattern.compile("HREF="(.*?)"");
This works. But then I want it to be capitalization agnostic. Href, HRef, href, hrEF, etc. all should work.
Also, how do I handle if my text has several URLs.
Edit2:
Some progress.
Pattern p = Pattern.compile("href="(.*?)"");
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1);
System.out.println(url);
}
This handles the case of multiple URLs.
Last pending issue is, how do I get hold of the label and replace the href tags in original text with URL and label.
Edit3:
By multiple URL cases, I mean there are multiple url present in given text.
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
Pattern p = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1); // this variable should contain the link URL
url = appendBaseURI(url);
url = appendQueryParams(url, "license=ABCXYZ");
System.out.println(url);
}
java regex string url text
java regex string url text
edited Nov 22 '18 at 3:12
Kartik
2,88831333
2,88831333
asked Nov 22 '18 at 2:39
NikNik
5,64937102171
5,64937102171
Start by converting the html entities with:import org.apache.commons.lang.StringEscapeUtils; String entities_decode = StringEscapeUtils.unescapeHtml(text );
– Pedro Lobito
Nov 22 '18 at 3:23
add a comment |
Start by converting the html entities with:import org.apache.commons.lang.StringEscapeUtils; String entities_decode = StringEscapeUtils.unescapeHtml(text );
– Pedro Lobito
Nov 22 '18 at 3:23
Start by converting the html entities with:
import org.apache.commons.lang.StringEscapeUtils; String entities_decode = StringEscapeUtils.unescapeHtml(text );
– Pedro Lobito
Nov 22 '18 at 3:23
Start by converting the html entities with:
import org.apache.commons.lang.StringEscapeUtils; String entities_decode = StringEscapeUtils.unescapeHtml(text );
– Pedro Lobito
Nov 22 '18 at 3:23
add a comment |
4 Answers
4
active
oldest
votes
public static void main(String args) {
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
Pattern p = Pattern.compile("<a href="(.*?)">(.*?)</a>", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) {
text = text.replace(m.group(0), cleanUrlPart(m.group(1), m.group(2)));
}
System.out.println(text);
}
private static String cleanUrlPart(String url, String label) {
if (!url.startsWith("http") && !url.startsWith("www")) {
if (url.startsWith("/")) {
url = "http://www.google.com" + url;
} else {
url = "http://www.google.com/" + url;
}
}
url = appendQueryParams(url, "myParam=pqr").toString();
if (label != null && !label.isEmpty()) url += " (" + label + ")";
return url;
}
Output
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it and another link http://www.google.com/relative-path/vegetables.cgi?param1=abc¶m2=xyz&myParam=pqr (URL2 Label) and some more text
oh.. didn't see this and posted my answer.. l am just struggling with replacing part.. will try to do with my answer first... else will try yours.. thanks!
– Nik
Nov 22 '18 at 6:10
add a comment |
You can use apache commons text StringEscapeUtils
to decode the html entities and then replaceAll
, i.e.:
import org.apache.commons.text.StringEscapeUtils;
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";
String output = StringEscapeUtils.unescapeHtml4(text).replaceAll("([^<]+).+"(.*?)">(.*?)<[^>]+>(.*)", "$1https://google.com$2&your_param ($3)$4");
System.out.print(output);
// Some content which contains link as https://google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&your_param (URL Label) and some text after it
Demos:
- jdoodle
Regex Explanation
This is really sleek and will fit my required solution perfectly if it can handle the multiple URL scenarios. Also, I guess your solution assumes that the URL will always have to be prefixed with google.com, which is not the case as mentioned in point (2) of my question. I will add the base URI only if its missing. Thanks for the answer though! will try to expand on it.
– Nik
Nov 22 '18 at 5:00
make baseurl also dinamic.
– Pedro Lobito
Nov 22 '18 at 15:07
add a comment |
// this is not working
Because your regex is case-sensitive.
Try:-
Pattern p = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Edit1:
To get the label, use Pattern.compile("(?<=>).*?(?=</a>)", Pattern.CASE_INSENSITIVE)
and m.group(0)
.
Edit2:
To replace the tag (including label) with your final string, use:-
text.replaceAll("(?i)<a href="(.*?)</a>", "new substring here")
Thanks. Just found out this. Have edited the question for the same.
– Nik
Nov 22 '18 at 2:47
1
So this doesn't answer your question? If not, what's the next issue?
– Kartik
Nov 22 '18 at 2:48
3 issues actually: 1) how do I handle multiple URL cases, 2) How do I get hold of label, 3) Once I have urls with base URL prefixed and parameter attached, how do I replace them in the original text.
– Nik
Nov 22 '18 at 2:50
1) what do you mean by multiple URL cases? can you update your question with an example? 2) Updated the answer for label 3) just like you replaced before, do the reverse, and oh, usereplace
instead ofreplaceAll
– Kartik
Nov 22 '18 at 3:00
edited. I did not understand the replace part. What do you mean "like you replaced before" ?
– Nik
Nov 22 '18 at 3:05
|
show 5 more comments
Almost there:
public static void main(String args) throws URISyntaxException {
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
System.out.println(text);
System.out.println("**************************************");
Pattern patternTag = Pattern.compile("<a([^>]+)>(.+?)</a>", Pattern.CASE_INSENSITIVE);
Pattern patternLink = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Matcher matcherTag = patternTag.matcher(text);
while (matcherTag.find()) {
String href = matcherTag.group(1); // href
String linkText = matcherTag.group(2); // link text
System.out.println("Href: " + href);
System.out.println("Label: " + linkText);
Matcher matcherLink = patternLink.matcher(href);
String finalText = null;
while (matcherLink.find()) {
String link = matcherLink.group(1);
System.out.println("Link: " + link);
finalText = getFinalText(link, linkText);
break;
}
System.out.println("***************************************");
// replacing logic goes here
}
System.out.println(text);
}
public static String getFinalText(String link, String label) throws URISyntaxException {
link = appendBaseURI(link);
link = appendQueryParams(link, "myParam=ABCXYZ");
return link + " (" + label + ")";
}
public static String appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri.toString();
}
public static String appendBaseURI(String url) {
String baseURI = "http://www.google.com/";
if (url.startsWith("/")) {
url = url.substring(1, url.length());
}
if (url.startsWith(baseURI)) {
return url;
} else {
return baseURI + url;
}
}
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53423132%2fjava-regex-to-retrieve-link-from-text%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
public static void main(String args) {
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
Pattern p = Pattern.compile("<a href="(.*?)">(.*?)</a>", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) {
text = text.replace(m.group(0), cleanUrlPart(m.group(1), m.group(2)));
}
System.out.println(text);
}
private static String cleanUrlPart(String url, String label) {
if (!url.startsWith("http") && !url.startsWith("www")) {
if (url.startsWith("/")) {
url = "http://www.google.com" + url;
} else {
url = "http://www.google.com/" + url;
}
}
url = appendQueryParams(url, "myParam=pqr").toString();
if (label != null && !label.isEmpty()) url += " (" + label + ")";
return url;
}
Output
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it and another link http://www.google.com/relative-path/vegetables.cgi?param1=abc¶m2=xyz&myParam=pqr (URL2 Label) and some more text
oh.. didn't see this and posted my answer.. l am just struggling with replacing part.. will try to do with my answer first... else will try yours.. thanks!
– Nik
Nov 22 '18 at 6:10
add a comment |
public static void main(String args) {
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
Pattern p = Pattern.compile("<a href="(.*?)">(.*?)</a>", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) {
text = text.replace(m.group(0), cleanUrlPart(m.group(1), m.group(2)));
}
System.out.println(text);
}
private static String cleanUrlPart(String url, String label) {
if (!url.startsWith("http") && !url.startsWith("www")) {
if (url.startsWith("/")) {
url = "http://www.google.com" + url;
} else {
url = "http://www.google.com/" + url;
}
}
url = appendQueryParams(url, "myParam=pqr").toString();
if (label != null && !label.isEmpty()) url += " (" + label + ")";
return url;
}
Output
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it and another link http://www.google.com/relative-path/vegetables.cgi?param1=abc¶m2=xyz&myParam=pqr (URL2 Label) and some more text
oh.. didn't see this and posted my answer.. l am just struggling with replacing part.. will try to do with my answer first... else will try yours.. thanks!
– Nik
Nov 22 '18 at 6:10
add a comment |
public static void main(String args) {
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
Pattern p = Pattern.compile("<a href="(.*?)">(.*?)</a>", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) {
text = text.replace(m.group(0), cleanUrlPart(m.group(1), m.group(2)));
}
System.out.println(text);
}
private static String cleanUrlPart(String url, String label) {
if (!url.startsWith("http") && !url.startsWith("www")) {
if (url.startsWith("/")) {
url = "http://www.google.com" + url;
} else {
url = "http://www.google.com/" + url;
}
}
url = appendQueryParams(url, "myParam=pqr").toString();
if (label != null && !label.isEmpty()) url += " (" + label + ")";
return url;
}
Output
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it and another link http://www.google.com/relative-path/vegetables.cgi?param1=abc¶m2=xyz&myParam=pqr (URL2 Label) and some more text
public static void main(String args) {
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
Pattern p = Pattern.compile("<a href="(.*?)">(.*?)</a>", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) {
text = text.replace(m.group(0), cleanUrlPart(m.group(1), m.group(2)));
}
System.out.println(text);
}
private static String cleanUrlPart(String url, String label) {
if (!url.startsWith("http") && !url.startsWith("www")) {
if (url.startsWith("/")) {
url = "http://www.google.com" + url;
} else {
url = "http://www.google.com/" + url;
}
}
url = appendQueryParams(url, "myParam=pqr").toString();
if (label != null && !label.isEmpty()) url += " (" + label + ")";
return url;
}
Output
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it and another link http://www.google.com/relative-path/vegetables.cgi?param1=abc¶m2=xyz&myParam=pqr (URL2 Label) and some more text
answered Nov 22 '18 at 5:48
KartikKartik
2,88831333
2,88831333
oh.. didn't see this and posted my answer.. l am just struggling with replacing part.. will try to do with my answer first... else will try yours.. thanks!
– Nik
Nov 22 '18 at 6:10
add a comment |
oh.. didn't see this and posted my answer.. l am just struggling with replacing part.. will try to do with my answer first... else will try yours.. thanks!
– Nik
Nov 22 '18 at 6:10
oh.. didn't see this and posted my answer.. l am just struggling with replacing part.. will try to do with my answer first... else will try yours.. thanks!
– Nik
Nov 22 '18 at 6:10
oh.. didn't see this and posted my answer.. l am just struggling with replacing part.. will try to do with my answer first... else will try yours.. thanks!
– Nik
Nov 22 '18 at 6:10
add a comment |
You can use apache commons text StringEscapeUtils
to decode the html entities and then replaceAll
, i.e.:
import org.apache.commons.text.StringEscapeUtils;
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";
String output = StringEscapeUtils.unescapeHtml4(text).replaceAll("([^<]+).+"(.*?)">(.*?)<[^>]+>(.*)", "$1https://google.com$2&your_param ($3)$4");
System.out.print(output);
// Some content which contains link as https://google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&your_param (URL Label) and some text after it
Demos:
- jdoodle
Regex Explanation
This is really sleek and will fit my required solution perfectly if it can handle the multiple URL scenarios. Also, I guess your solution assumes that the URL will always have to be prefixed with google.com, which is not the case as mentioned in point (2) of my question. I will add the base URI only if its missing. Thanks for the answer though! will try to expand on it.
– Nik
Nov 22 '18 at 5:00
make baseurl also dinamic.
– Pedro Lobito
Nov 22 '18 at 15:07
add a comment |
You can use apache commons text StringEscapeUtils
to decode the html entities and then replaceAll
, i.e.:
import org.apache.commons.text.StringEscapeUtils;
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";
String output = StringEscapeUtils.unescapeHtml4(text).replaceAll("([^<]+).+"(.*?)">(.*?)<[^>]+>(.*)", "$1https://google.com$2&your_param ($3)$4");
System.out.print(output);
// Some content which contains link as https://google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&your_param (URL Label) and some text after it
Demos:
- jdoodle
Regex Explanation
This is really sleek and will fit my required solution perfectly if it can handle the multiple URL scenarios. Also, I guess your solution assumes that the URL will always have to be prefixed with google.com, which is not the case as mentioned in point (2) of my question. I will add the base URI only if its missing. Thanks for the answer though! will try to expand on it.
– Nik
Nov 22 '18 at 5:00
make baseurl also dinamic.
– Pedro Lobito
Nov 22 '18 at 15:07
add a comment |
You can use apache commons text StringEscapeUtils
to decode the html entities and then replaceAll
, i.e.:
import org.apache.commons.text.StringEscapeUtils;
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";
String output = StringEscapeUtils.unescapeHtml4(text).replaceAll("([^<]+).+"(.*?)">(.*?)<[^>]+>(.*)", "$1https://google.com$2&your_param ($3)$4");
System.out.print(output);
// Some content which contains link as https://google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&your_param (URL Label) and some text after it
Demos:
- jdoodle
Regex Explanation
You can use apache commons text StringEscapeUtils
to decode the html entities and then replaceAll
, i.e.:
import org.apache.commons.text.StringEscapeUtils;
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it";
String output = StringEscapeUtils.unescapeHtml4(text).replaceAll("([^<]+).+"(.*?)">(.*?)<[^>]+>(.*)", "$1https://google.com$2&your_param ($3)$4");
System.out.print(output);
// Some content which contains link as https://google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&your_param (URL Label) and some text after it
Demos:
- jdoodle
Regex Explanation
edited Nov 22 '18 at 13:04
answered Nov 22 '18 at 4:04
Pedro LobitoPedro Lobito
48.2k14133164
48.2k14133164
This is really sleek and will fit my required solution perfectly if it can handle the multiple URL scenarios. Also, I guess your solution assumes that the URL will always have to be prefixed with google.com, which is not the case as mentioned in point (2) of my question. I will add the base URI only if its missing. Thanks for the answer though! will try to expand on it.
– Nik
Nov 22 '18 at 5:00
make baseurl also dinamic.
– Pedro Lobito
Nov 22 '18 at 15:07
add a comment |
This is really sleek and will fit my required solution perfectly if it can handle the multiple URL scenarios. Also, I guess your solution assumes that the URL will always have to be prefixed with google.com, which is not the case as mentioned in point (2) of my question. I will add the base URI only if its missing. Thanks for the answer though! will try to expand on it.
– Nik
Nov 22 '18 at 5:00
make baseurl also dinamic.
– Pedro Lobito
Nov 22 '18 at 15:07
This is really sleek and will fit my required solution perfectly if it can handle the multiple URL scenarios. Also, I guess your solution assumes that the URL will always have to be prefixed with google.com, which is not the case as mentioned in point (2) of my question. I will add the base URI only if its missing. Thanks for the answer though! will try to expand on it.
– Nik
Nov 22 '18 at 5:00
This is really sleek and will fit my required solution perfectly if it can handle the multiple URL scenarios. Also, I guess your solution assumes that the URL will always have to be prefixed with google.com, which is not the case as mentioned in point (2) of my question. I will add the base URI only if its missing. Thanks for the answer though! will try to expand on it.
– Nik
Nov 22 '18 at 5:00
make baseurl also dinamic.
– Pedro Lobito
Nov 22 '18 at 15:07
make baseurl also dinamic.
– Pedro Lobito
Nov 22 '18 at 15:07
add a comment |
// this is not working
Because your regex is case-sensitive.
Try:-
Pattern p = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Edit1:
To get the label, use Pattern.compile("(?<=>).*?(?=</a>)", Pattern.CASE_INSENSITIVE)
and m.group(0)
.
Edit2:
To replace the tag (including label) with your final string, use:-
text.replaceAll("(?i)<a href="(.*?)</a>", "new substring here")
Thanks. Just found out this. Have edited the question for the same.
– Nik
Nov 22 '18 at 2:47
1
So this doesn't answer your question? If not, what's the next issue?
– Kartik
Nov 22 '18 at 2:48
3 issues actually: 1) how do I handle multiple URL cases, 2) How do I get hold of label, 3) Once I have urls with base URL prefixed and parameter attached, how do I replace them in the original text.
– Nik
Nov 22 '18 at 2:50
1) what do you mean by multiple URL cases? can you update your question with an example? 2) Updated the answer for label 3) just like you replaced before, do the reverse, and oh, usereplace
instead ofreplaceAll
– Kartik
Nov 22 '18 at 3:00
edited. I did not understand the replace part. What do you mean "like you replaced before" ?
– Nik
Nov 22 '18 at 3:05
|
show 5 more comments
// this is not working
Because your regex is case-sensitive.
Try:-
Pattern p = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Edit1:
To get the label, use Pattern.compile("(?<=>).*?(?=</a>)", Pattern.CASE_INSENSITIVE)
and m.group(0)
.
Edit2:
To replace the tag (including label) with your final string, use:-
text.replaceAll("(?i)<a href="(.*?)</a>", "new substring here")
Thanks. Just found out this. Have edited the question for the same.
– Nik
Nov 22 '18 at 2:47
1
So this doesn't answer your question? If not, what's the next issue?
– Kartik
Nov 22 '18 at 2:48
3 issues actually: 1) how do I handle multiple URL cases, 2) How do I get hold of label, 3) Once I have urls with base URL prefixed and parameter attached, how do I replace them in the original text.
– Nik
Nov 22 '18 at 2:50
1) what do you mean by multiple URL cases? can you update your question with an example? 2) Updated the answer for label 3) just like you replaced before, do the reverse, and oh, usereplace
instead ofreplaceAll
– Kartik
Nov 22 '18 at 3:00
edited. I did not understand the replace part. What do you mean "like you replaced before" ?
– Nik
Nov 22 '18 at 3:05
|
show 5 more comments
// this is not working
Because your regex is case-sensitive.
Try:-
Pattern p = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Edit1:
To get the label, use Pattern.compile("(?<=>).*?(?=</a>)", Pattern.CASE_INSENSITIVE)
and m.group(0)
.
Edit2:
To replace the tag (including label) with your final string, use:-
text.replaceAll("(?i)<a href="(.*?)</a>", "new substring here")
// this is not working
Because your regex is case-sensitive.
Try:-
Pattern p = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Edit1:
To get the label, use Pattern.compile("(?<=>).*?(?=</a>)", Pattern.CASE_INSENSITIVE)
and m.group(0)
.
Edit2:
To replace the tag (including label) with your final string, use:-
text.replaceAll("(?i)<a href="(.*?)</a>", "new substring here")
edited Nov 22 '18 at 3:06
answered Nov 22 '18 at 2:45
KartikKartik
2,88831333
2,88831333
Thanks. Just found out this. Have edited the question for the same.
– Nik
Nov 22 '18 at 2:47
1
So this doesn't answer your question? If not, what's the next issue?
– Kartik
Nov 22 '18 at 2:48
3 issues actually: 1) how do I handle multiple URL cases, 2) How do I get hold of label, 3) Once I have urls with base URL prefixed and parameter attached, how do I replace them in the original text.
– Nik
Nov 22 '18 at 2:50
1) what do you mean by multiple URL cases? can you update your question with an example? 2) Updated the answer for label 3) just like you replaced before, do the reverse, and oh, usereplace
instead ofreplaceAll
– Kartik
Nov 22 '18 at 3:00
edited. I did not understand the replace part. What do you mean "like you replaced before" ?
– Nik
Nov 22 '18 at 3:05
|
show 5 more comments
Thanks. Just found out this. Have edited the question for the same.
– Nik
Nov 22 '18 at 2:47
1
So this doesn't answer your question? If not, what's the next issue?
– Kartik
Nov 22 '18 at 2:48
3 issues actually: 1) how do I handle multiple URL cases, 2) How do I get hold of label, 3) Once I have urls with base URL prefixed and parameter attached, how do I replace them in the original text.
– Nik
Nov 22 '18 at 2:50
1) what do you mean by multiple URL cases? can you update your question with an example? 2) Updated the answer for label 3) just like you replaced before, do the reverse, and oh, usereplace
instead ofreplaceAll
– Kartik
Nov 22 '18 at 3:00
edited. I did not understand the replace part. What do you mean "like you replaced before" ?
– Nik
Nov 22 '18 at 3:05
Thanks. Just found out this. Have edited the question for the same.
– Nik
Nov 22 '18 at 2:47
Thanks. Just found out this. Have edited the question for the same.
– Nik
Nov 22 '18 at 2:47
1
1
So this doesn't answer your question? If not, what's the next issue?
– Kartik
Nov 22 '18 at 2:48
So this doesn't answer your question? If not, what's the next issue?
– Kartik
Nov 22 '18 at 2:48
3 issues actually: 1) how do I handle multiple URL cases, 2) How do I get hold of label, 3) Once I have urls with base URL prefixed and parameter attached, how do I replace them in the original text.
– Nik
Nov 22 '18 at 2:50
3 issues actually: 1) how do I handle multiple URL cases, 2) How do I get hold of label, 3) Once I have urls with base URL prefixed and parameter attached, how do I replace them in the original text.
– Nik
Nov 22 '18 at 2:50
1) what do you mean by multiple URL cases? can you update your question with an example? 2) Updated the answer for label 3) just like you replaced before, do the reverse, and oh, use
replace
instead of replaceAll
– Kartik
Nov 22 '18 at 3:00
1) what do you mean by multiple URL cases? can you update your question with an example? 2) Updated the answer for label 3) just like you replaced before, do the reverse, and oh, use
replace
instead of replaceAll
– Kartik
Nov 22 '18 at 3:00
edited. I did not understand the replace part. What do you mean "like you replaced before" ?
– Nik
Nov 22 '18 at 3:05
edited. I did not understand the replace part. What do you mean "like you replaced before" ?
– Nik
Nov 22 '18 at 3:05
|
show 5 more comments
Almost there:
public static void main(String args) throws URISyntaxException {
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
System.out.println(text);
System.out.println("**************************************");
Pattern patternTag = Pattern.compile("<a([^>]+)>(.+?)</a>", Pattern.CASE_INSENSITIVE);
Pattern patternLink = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Matcher matcherTag = patternTag.matcher(text);
while (matcherTag.find()) {
String href = matcherTag.group(1); // href
String linkText = matcherTag.group(2); // link text
System.out.println("Href: " + href);
System.out.println("Label: " + linkText);
Matcher matcherLink = patternLink.matcher(href);
String finalText = null;
while (matcherLink.find()) {
String link = matcherLink.group(1);
System.out.println("Link: " + link);
finalText = getFinalText(link, linkText);
break;
}
System.out.println("***************************************");
// replacing logic goes here
}
System.out.println(text);
}
public static String getFinalText(String link, String label) throws URISyntaxException {
link = appendBaseURI(link);
link = appendQueryParams(link, "myParam=ABCXYZ");
return link + " (" + label + ")";
}
public static String appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri.toString();
}
public static String appendBaseURI(String url) {
String baseURI = "http://www.google.com/";
if (url.startsWith("/")) {
url = url.substring(1, url.length());
}
if (url.startsWith(baseURI)) {
return url;
} else {
return baseURI + url;
}
}
add a comment |
Almost there:
public static void main(String args) throws URISyntaxException {
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
System.out.println(text);
System.out.println("**************************************");
Pattern patternTag = Pattern.compile("<a([^>]+)>(.+?)</a>", Pattern.CASE_INSENSITIVE);
Pattern patternLink = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Matcher matcherTag = patternTag.matcher(text);
while (matcherTag.find()) {
String href = matcherTag.group(1); // href
String linkText = matcherTag.group(2); // link text
System.out.println("Href: " + href);
System.out.println("Label: " + linkText);
Matcher matcherLink = patternLink.matcher(href);
String finalText = null;
while (matcherLink.find()) {
String link = matcherLink.group(1);
System.out.println("Link: " + link);
finalText = getFinalText(link, linkText);
break;
}
System.out.println("***************************************");
// replacing logic goes here
}
System.out.println(text);
}
public static String getFinalText(String link, String label) throws URISyntaxException {
link = appendBaseURI(link);
link = appendQueryParams(link, "myParam=ABCXYZ");
return link + " (" + label + ")";
}
public static String appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri.toString();
}
public static String appendBaseURI(String url) {
String baseURI = "http://www.google.com/";
if (url.startsWith("/")) {
url = url.substring(1, url.length());
}
if (url.startsWith(baseURI)) {
return url;
} else {
return baseURI + url;
}
}
add a comment |
Almost there:
public static void main(String args) throws URISyntaxException {
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
System.out.println(text);
System.out.println("**************************************");
Pattern patternTag = Pattern.compile("<a([^>]+)>(.+?)</a>", Pattern.CASE_INSENSITIVE);
Pattern patternLink = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Matcher matcherTag = patternTag.matcher(text);
while (matcherTag.find()) {
String href = matcherTag.group(1); // href
String linkText = matcherTag.group(2); // link text
System.out.println("Href: " + href);
System.out.println("Label: " + linkText);
Matcher matcherLink = patternLink.matcher(href);
String finalText = null;
while (matcherLink.find()) {
String link = matcherLink.group(1);
System.out.println("Link: " + link);
finalText = getFinalText(link, linkText);
break;
}
System.out.println("***************************************");
// replacing logic goes here
}
System.out.println(text);
}
public static String getFinalText(String link, String label) throws URISyntaxException {
link = appendBaseURI(link);
link = appendQueryParams(link, "myParam=ABCXYZ");
return link + " (" + label + ")";
}
public static String appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri.toString();
}
public static String appendBaseURI(String url) {
String baseURI = "http://www.google.com/";
if (url.startsWith("/")) {
url = url.substring(1, url.length());
}
if (url.startsWith(baseURI)) {
return url;
} else {
return baseURI + url;
}
}
Almost there:
public static void main(String args) throws URISyntaxException {
String text = "Some content which contains link as <A HREF="/relative-path/fruit.cgi?param1=abc&param2=xyz">URL Label</A> and some text after it and another link <A HREF="/relative-path/vegetables.cgi?param1=abc&param2=xyz">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
System.out.println(text);
System.out.println("**************************************");
Pattern patternTag = Pattern.compile("<a([^>]+)>(.+?)</a>", Pattern.CASE_INSENSITIVE);
Pattern patternLink = Pattern.compile("href="(.*?)"", Pattern.CASE_INSENSITIVE);
Matcher matcherTag = patternTag.matcher(text);
while (matcherTag.find()) {
String href = matcherTag.group(1); // href
String linkText = matcherTag.group(2); // link text
System.out.println("Href: " + href);
System.out.println("Label: " + linkText);
Matcher matcherLink = patternLink.matcher(href);
String finalText = null;
while (matcherLink.find()) {
String link = matcherLink.group(1);
System.out.println("Link: " + link);
finalText = getFinalText(link, linkText);
break;
}
System.out.println("***************************************");
// replacing logic goes here
}
System.out.println(text);
}
public static String getFinalText(String link, String label) throws URISyntaxException {
link = appendBaseURI(link);
link = appendQueryParams(link, "myParam=ABCXYZ");
return link + " (" + label + ")";
}
public static String appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri.toString();
}
public static String appendBaseURI(String url) {
String baseURI = "http://www.google.com/";
if (url.startsWith("/")) {
url = url.substring(1, url.length());
}
if (url.startsWith(baseURI)) {
return url;
} else {
return baseURI + url;
}
}
answered Nov 22 '18 at 6:09
NikNik
5,64937102171
5,64937102171
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53423132%2fjava-regex-to-retrieve-link-from-text%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Start by converting the html entities with:
import org.apache.commons.lang.StringEscapeUtils; String entities_decode = StringEscapeUtils.unescapeHtml(text );
– Pedro Lobito
Nov 22 '18 at 3:23