python request missing part of the content
I'm scraping job content from a website(https://www.104.com.tw/job/?jobno=66wee). As I send request, only part of the content in the 'p' element are returned.I want all the div class="content" part.
my code :
import requests
from bs4 import BeautifulSoup
payload = {'jobno':'66wee'}
headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',params = payload,headers = headers)
soup= BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div',{'class':'content'})
desctiprion = contents[0].findAll('p')[0].text.strip()
print(desctiprion)
result(the job description part is missing):
4. Develop tools and systems that optimize analysis process efficiency and report quality.ion tools.row and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
but the html code of this part is :
<div class="content">
<p>Appier is a technology company that makes it easy for businesses to use artificial intelligence to grow and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
<br>
<br>Job Description
<br>1. Perform data analysis to help Appier teams to answer business or operational questions.
<br>2. Interpret trends or patterns from complex data sets by using statistical and visualization tools.
<br>3. Conduct data analysis reports to illustrate the results and insight
<br>4. Develop tools and systems that optimize analysis process efficiency and report quality.</p>
python web-scraping beautifulsoup request web-crawler
add a comment |
I'm scraping job content from a website(https://www.104.com.tw/job/?jobno=66wee). As I send request, only part of the content in the 'p' element are returned.I want all the div class="content" part.
my code :
import requests
from bs4 import BeautifulSoup
payload = {'jobno':'66wee'}
headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',params = payload,headers = headers)
soup= BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div',{'class':'content'})
desctiprion = contents[0].findAll('p')[0].text.strip()
print(desctiprion)
result(the job description part is missing):
4. Develop tools and systems that optimize analysis process efficiency and report quality.ion tools.row and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
but the html code of this part is :
<div class="content">
<p>Appier is a technology company that makes it easy for businesses to use artificial intelligence to grow and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
<br>
<br>Job Description
<br>1. Perform data analysis to help Appier teams to answer business or operational questions.
<br>2. Interpret trends or patterns from complex data sets by using statistical and visualization tools.
<br>3. Conduct data analysis reports to illustrate the results and insight
<br>4. Develop tools and systems that optimize analysis process efficiency and report quality.</p>
python web-scraping beautifulsoup request web-crawler
You want thediv
text, Thep
text is just part of it
– pguardiario
Nov 21 at 2:34
Did you try any of the answers given?
– QHarr
Dec 1 at 5:16
add a comment |
I'm scraping job content from a website(https://www.104.com.tw/job/?jobno=66wee). As I send request, only part of the content in the 'p' element are returned.I want all the div class="content" part.
my code :
import requests
from bs4 import BeautifulSoup
payload = {'jobno':'66wee'}
headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',params = payload,headers = headers)
soup= BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div',{'class':'content'})
desctiprion = contents[0].findAll('p')[0].text.strip()
print(desctiprion)
result(the job description part is missing):
4. Develop tools and systems that optimize analysis process efficiency and report quality.ion tools.row and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
but the html code of this part is :
<div class="content">
<p>Appier is a technology company that makes it easy for businesses to use artificial intelligence to grow and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
<br>
<br>Job Description
<br>1. Perform data analysis to help Appier teams to answer business or operational questions.
<br>2. Interpret trends or patterns from complex data sets by using statistical and visualization tools.
<br>3. Conduct data analysis reports to illustrate the results and insight
<br>4. Develop tools and systems that optimize analysis process efficiency and report quality.</p>
python web-scraping beautifulsoup request web-crawler
I'm scraping job content from a website(https://www.104.com.tw/job/?jobno=66wee). As I send request, only part of the content in the 'p' element are returned.I want all the div class="content" part.
my code :
import requests
from bs4 import BeautifulSoup
payload = {'jobno':'66wee'}
headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',params = payload,headers = headers)
soup= BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div',{'class':'content'})
desctiprion = contents[0].findAll('p')[0].text.strip()
print(desctiprion)
result(the job description part is missing):
4. Develop tools and systems that optimize analysis process efficiency and report quality.ion tools.row and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
but the html code of this part is :
<div class="content">
<p>Appier is a technology company that makes it easy for businesses to use artificial intelligence to grow and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
<br>
<br>Job Description
<br>1. Perform data analysis to help Appier teams to answer business or operational questions.
<br>2. Interpret trends or patterns from complex data sets by using statistical and visualization tools.
<br>3. Conduct data analysis reports to illustrate the results and insight
<br>4. Develop tools and systems that optimize analysis process efficiency and report quality.</p>
python web-scraping beautifulsoup request web-crawler
python web-scraping beautifulsoup request web-crawler
asked Nov 21 at 2:30
Aimee Huang
83
83
You want thediv
text, Thep
text is just part of it
– pguardiario
Nov 21 at 2:34
Did you try any of the answers given?
– QHarr
Dec 1 at 5:16
add a comment |
You want thediv
text, Thep
text is just part of it
– pguardiario
Nov 21 at 2:34
Did you try any of the answers given?
– QHarr
Dec 1 at 5:16
You want the
div
text, The p
text is just part of it– pguardiario
Nov 21 at 2:34
You want the
div
text, The p
text is just part of it– pguardiario
Nov 21 at 2:34
Did you try any of the answers given?
– QHarr
Dec 1 at 5:16
Did you try any of the answers given?
– QHarr
Dec 1 at 5:16
add a comment |
3 Answers
3
active
oldest
votes
import requests
from bs4 import BeautifulSoup
payload = {'jobno': '66wee'}
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',
params=payload, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div', {'class': 'content'})
for content in contents[0].findAll('p')[0].text.splitlines():
print(content)
add a comment |
You are accesing only the first p
element with the second [0]
indexation:
description = contents[0].findAll('p')[0].text.strip()
You should iterate through all the p
elements:
description = ""
for p in contents[0].findAll('p'):
description += p.text.strip()
print(description)
add a comment |
There is more within the first content
class tag but assuming you want just up to the end of point 4 i.e. first child p
tag, you can use a descendant combinator with class selector for parent element and element selector for child. Remove the p
from the selector if you truly want everything.
import requests
from bs4 import BeautifulSoup
url = 'https://www.104.com.tw/job/?jobno=66wee'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select_one('.content p').text
print(s)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53404514%2fpython-request-missing-part-of-the-content%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
import requests
from bs4 import BeautifulSoup
payload = {'jobno': '66wee'}
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',
params=payload, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div', {'class': 'content'})
for content in contents[0].findAll('p')[0].text.splitlines():
print(content)
add a comment |
import requests
from bs4 import BeautifulSoup
payload = {'jobno': '66wee'}
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',
params=payload, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div', {'class': 'content'})
for content in contents[0].findAll('p')[0].text.splitlines():
print(content)
add a comment |
import requests
from bs4 import BeautifulSoup
payload = {'jobno': '66wee'}
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',
params=payload, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div', {'class': 'content'})
for content in contents[0].findAll('p')[0].text.splitlines():
print(content)
import requests
from bs4 import BeautifulSoup
payload = {'jobno': '66wee'}
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',
params=payload, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div', {'class': 'content'})
for content in contents[0].findAll('p')[0].text.splitlines():
print(content)
answered Nov 21 at 2:41
taipei
6816
6816
add a comment |
add a comment |
You are accesing only the first p
element with the second [0]
indexation:
description = contents[0].findAll('p')[0].text.strip()
You should iterate through all the p
elements:
description = ""
for p in contents[0].findAll('p'):
description += p.text.strip()
print(description)
add a comment |
You are accesing only the first p
element with the second [0]
indexation:
description = contents[0].findAll('p')[0].text.strip()
You should iterate through all the p
elements:
description = ""
for p in contents[0].findAll('p'):
description += p.text.strip()
print(description)
add a comment |
You are accesing only the first p
element with the second [0]
indexation:
description = contents[0].findAll('p')[0].text.strip()
You should iterate through all the p
elements:
description = ""
for p in contents[0].findAll('p'):
description += p.text.strip()
print(description)
You are accesing only the first p
element with the second [0]
indexation:
description = contents[0].findAll('p')[0].text.strip()
You should iterate through all the p
elements:
description = ""
for p in contents[0].findAll('p'):
description += p.text.strip()
print(description)
edited Nov 21 at 2:51
answered Nov 21 at 2:35
Julian Peller
849511
849511
add a comment |
add a comment |
There is more within the first content
class tag but assuming you want just up to the end of point 4 i.e. first child p
tag, you can use a descendant combinator with class selector for parent element and element selector for child. Remove the p
from the selector if you truly want everything.
import requests
from bs4 import BeautifulSoup
url = 'https://www.104.com.tw/job/?jobno=66wee'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select_one('.content p').text
print(s)
add a comment |
There is more within the first content
class tag but assuming you want just up to the end of point 4 i.e. first child p
tag, you can use a descendant combinator with class selector for parent element and element selector for child. Remove the p
from the selector if you truly want everything.
import requests
from bs4 import BeautifulSoup
url = 'https://www.104.com.tw/job/?jobno=66wee'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select_one('.content p').text
print(s)
add a comment |
There is more within the first content
class tag but assuming you want just up to the end of point 4 i.e. first child p
tag, you can use a descendant combinator with class selector for parent element and element selector for child. Remove the p
from the selector if you truly want everything.
import requests
from bs4 import BeautifulSoup
url = 'https://www.104.com.tw/job/?jobno=66wee'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select_one('.content p').text
print(s)
There is more within the first content
class tag but assuming you want just up to the end of point 4 i.e. first child p
tag, you can use a descendant combinator with class selector for parent element and element selector for child. Remove the p
from the selector if you truly want everything.
import requests
from bs4 import BeautifulSoup
url = 'https://www.104.com.tw/job/?jobno=66wee'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select_one('.content p').text
print(s)
answered Nov 21 at 7:16
QHarr
29.7k81841
29.7k81841
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53404514%2fpython-request-missing-part-of-the-content%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You want the
div
text, Thep
text is just part of it– pguardiario
Nov 21 at 2:34
Did you try any of the answers given?
– QHarr
Dec 1 at 5:16