Scraping from a website based on a search
Objective
In Java, I want to get the text output of the search result at https://pin1yin1.com/#我是英国人
What I've tried so far
Using JSoup, I've connected to the page, using Jsoup.connect("https://pin1yin1.com/#%E6%88%91%E6%98%AF%E8%8B%B1%E5%9B%BD%E4%BA%BA").get();
(the numbers and percents translate themselves into the Chinese characters)
Problem faced
When I run getAllElements()
to see what it has scraped, the code is just for the landing page, i.e. what the user sees before doing the search, it doesn't pick up anything of the search result.
java jsoup
add a comment |
Objective
In Java, I want to get the text output of the search result at https://pin1yin1.com/#我是英国人
What I've tried so far
Using JSoup, I've connected to the page, using Jsoup.connect("https://pin1yin1.com/#%E6%88%91%E6%98%AF%E8%8B%B1%E5%9B%BD%E4%BA%BA").get();
(the numbers and percents translate themselves into the Chinese characters)
Problem faced
When I run getAllElements()
to see what it has scraped, the code is just for the landing page, i.e. what the user sees before doing the search, it doesn't pick up anything of the search result.
java jsoup
add a comment |
Objective
In Java, I want to get the text output of the search result at https://pin1yin1.com/#我是英国人
What I've tried so far
Using JSoup, I've connected to the page, using Jsoup.connect("https://pin1yin1.com/#%E6%88%91%E6%98%AF%E8%8B%B1%E5%9B%BD%E4%BA%BA").get();
(the numbers and percents translate themselves into the Chinese characters)
Problem faced
When I run getAllElements()
to see what it has scraped, the code is just for the landing page, i.e. what the user sees before doing the search, it doesn't pick up anything of the search result.
java jsoup
Objective
In Java, I want to get the text output of the search result at https://pin1yin1.com/#我是英国人
What I've tried so far
Using JSoup, I've connected to the page, using Jsoup.connect("https://pin1yin1.com/#%E6%88%91%E6%98%AF%E8%8B%B1%E5%9B%BD%E4%BA%BA").get();
(the numbers and percents translate themselves into the Chinese characters)
Problem faced
When I run getAllElements()
to see what it has scraped, the code is just for the landing page, i.e. what the user sees before doing the search, it doesn't pick up anything of the search result.
java jsoup
java jsoup
asked Nov 24 '18 at 18:18
Chris AChris A
325
325
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
I checked the website, you can get the data from their rest API directly, try below:
Document doc = Jsoup.connect("https://pin1yin1.com/pinyin/convert/?c=%E6%88%91%E6%98%AF%E8%8B%B1%E5%9B%BD%E4%BA%BA").get();
The response is below:
<html>
<head></head>
<body>
{ "q": "我是英国人", "s": "我是英国人", "t": "我是英國人", "p":
["wo3","shi4","ying1","guo2","ren2"], "e": ["I; me; my","is; are; am; yes","British person"], "c": [1,1,3] }
</body></html>
Thank you! How did you find that information?
– Chris A
Nov 24 '18 at 19:55
Just looking at the network calls on the website. If you found the answer useful please consider up voting and accepting. Thanks.
– Aditya Narayan Dixit
Nov 24 '18 at 19:58
I have limited java experience. What are network calls and how does one find them?
– Chris A
Nov 24 '18 at 20:07
Inspect the page, a section will open up on the browser, go to the network tab. There you'll see the calls website makes. It consists of API calls made to the backend to fetch the data, download js and html files or images etc. @ChrisA
– Aditya Narayan Dixit
Nov 24 '18 at 20:12
Got it! Many thanks :)
– Chris A
Nov 24 '18 at 20:24
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53461104%2fscraping-from-a-website-based-on-a-search%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I checked the website, you can get the data from their rest API directly, try below:
Document doc = Jsoup.connect("https://pin1yin1.com/pinyin/convert/?c=%E6%88%91%E6%98%AF%E8%8B%B1%E5%9B%BD%E4%BA%BA").get();
The response is below:
<html>
<head></head>
<body>
{ "q": "我是英国人", "s": "我是英国人", "t": "我是英國人", "p":
["wo3","shi4","ying1","guo2","ren2"], "e": ["I; me; my","is; are; am; yes","British person"], "c": [1,1,3] }
</body></html>
Thank you! How did you find that information?
– Chris A
Nov 24 '18 at 19:55
Just looking at the network calls on the website. If you found the answer useful please consider up voting and accepting. Thanks.
– Aditya Narayan Dixit
Nov 24 '18 at 19:58
I have limited java experience. What are network calls and how does one find them?
– Chris A
Nov 24 '18 at 20:07
Inspect the page, a section will open up on the browser, go to the network tab. There you'll see the calls website makes. It consists of API calls made to the backend to fetch the data, download js and html files or images etc. @ChrisA
– Aditya Narayan Dixit
Nov 24 '18 at 20:12
Got it! Many thanks :)
– Chris A
Nov 24 '18 at 20:24
add a comment |
I checked the website, you can get the data from their rest API directly, try below:
Document doc = Jsoup.connect("https://pin1yin1.com/pinyin/convert/?c=%E6%88%91%E6%98%AF%E8%8B%B1%E5%9B%BD%E4%BA%BA").get();
The response is below:
<html>
<head></head>
<body>
{ "q": "我是英国人", "s": "我是英国人", "t": "我是英國人", "p":
["wo3","shi4","ying1","guo2","ren2"], "e": ["I; me; my","is; are; am; yes","British person"], "c": [1,1,3] }
</body></html>
Thank you! How did you find that information?
– Chris A
Nov 24 '18 at 19:55
Just looking at the network calls on the website. If you found the answer useful please consider up voting and accepting. Thanks.
– Aditya Narayan Dixit
Nov 24 '18 at 19:58
I have limited java experience. What are network calls and how does one find them?
– Chris A
Nov 24 '18 at 20:07
Inspect the page, a section will open up on the browser, go to the network tab. There you'll see the calls website makes. It consists of API calls made to the backend to fetch the data, download js and html files or images etc. @ChrisA
– Aditya Narayan Dixit
Nov 24 '18 at 20:12
Got it! Many thanks :)
– Chris A
Nov 24 '18 at 20:24
add a comment |
I checked the website, you can get the data from their rest API directly, try below:
Document doc = Jsoup.connect("https://pin1yin1.com/pinyin/convert/?c=%E6%88%91%E6%98%AF%E8%8B%B1%E5%9B%BD%E4%BA%BA").get();
The response is below:
<html>
<head></head>
<body>
{ "q": "我是英国人", "s": "我是英国人", "t": "我是英國人", "p":
["wo3","shi4","ying1","guo2","ren2"], "e": ["I; me; my","is; are; am; yes","British person"], "c": [1,1,3] }
</body></html>
I checked the website, you can get the data from their rest API directly, try below:
Document doc = Jsoup.connect("https://pin1yin1.com/pinyin/convert/?c=%E6%88%91%E6%98%AF%E8%8B%B1%E5%9B%BD%E4%BA%BA").get();
The response is below:
<html>
<head></head>
<body>
{ "q": "我是英国人", "s": "我是英国人", "t": "我是英國人", "p":
["wo3","shi4","ying1","guo2","ren2"], "e": ["I; me; my","is; are; am; yes","British person"], "c": [1,1,3] }
</body></html>
answered Nov 24 '18 at 19:51
Aditya Narayan DixitAditya Narayan Dixit
1,574512
1,574512
Thank you! How did you find that information?
– Chris A
Nov 24 '18 at 19:55
Just looking at the network calls on the website. If you found the answer useful please consider up voting and accepting. Thanks.
– Aditya Narayan Dixit
Nov 24 '18 at 19:58
I have limited java experience. What are network calls and how does one find them?
– Chris A
Nov 24 '18 at 20:07
Inspect the page, a section will open up on the browser, go to the network tab. There you'll see the calls website makes. It consists of API calls made to the backend to fetch the data, download js and html files or images etc. @ChrisA
– Aditya Narayan Dixit
Nov 24 '18 at 20:12
Got it! Many thanks :)
– Chris A
Nov 24 '18 at 20:24
add a comment |
Thank you! How did you find that information?
– Chris A
Nov 24 '18 at 19:55
Just looking at the network calls on the website. If you found the answer useful please consider up voting and accepting. Thanks.
– Aditya Narayan Dixit
Nov 24 '18 at 19:58
I have limited java experience. What are network calls and how does one find them?
– Chris A
Nov 24 '18 at 20:07
Inspect the page, a section will open up on the browser, go to the network tab. There you'll see the calls website makes. It consists of API calls made to the backend to fetch the data, download js and html files or images etc. @ChrisA
– Aditya Narayan Dixit
Nov 24 '18 at 20:12
Got it! Many thanks :)
– Chris A
Nov 24 '18 at 20:24
Thank you! How did you find that information?
– Chris A
Nov 24 '18 at 19:55
Thank you! How did you find that information?
– Chris A
Nov 24 '18 at 19:55
Just looking at the network calls on the website. If you found the answer useful please consider up voting and accepting. Thanks.
– Aditya Narayan Dixit
Nov 24 '18 at 19:58
Just looking at the network calls on the website. If you found the answer useful please consider up voting and accepting. Thanks.
– Aditya Narayan Dixit
Nov 24 '18 at 19:58
I have limited java experience. What are network calls and how does one find them?
– Chris A
Nov 24 '18 at 20:07
I have limited java experience. What are network calls and how does one find them?
– Chris A
Nov 24 '18 at 20:07
Inspect the page, a section will open up on the browser, go to the network tab. There you'll see the calls website makes. It consists of API calls made to the backend to fetch the data, download js and html files or images etc. @ChrisA
– Aditya Narayan Dixit
Nov 24 '18 at 20:12
Inspect the page, a section will open up on the browser, go to the network tab. There you'll see the calls website makes. It consists of API calls made to the backend to fetch the data, download js and html files or images etc. @ChrisA
– Aditya Narayan Dixit
Nov 24 '18 at 20:12
Got it! Many thanks :)
– Chris A
Nov 24 '18 at 20:24
Got it! Many thanks :)
– Chris A
Nov 24 '18 at 20:24
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53461104%2fscraping-from-a-website-based-on-a-search%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown