python request missing part of the content

I'm scraping job content from a website(https://www.104.com.tw/job/?jobno=66wee). As I send request, only part of the content in the 'p' element are returned.I want all the div class="content" part.

my code :

  import requests

  from bs4 import BeautifulSoup



  payload = {'jobno':'66wee'}

  headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

  r = requests.get('https://www.104.com.tw/job/',params = payload,headers = headers)

  soup=  BeautifulSoup(r.text, 'html.parser')

  contents = soup.findAll('div',{'class':'content'})  

  desctiprion = contents[0].findAll('p')[0].text.strip()

  print(desctiprion)

result(the job description part is missing):

4. Develop tools and systems that optimize analysis process efficiency and report quality.ion tools.row and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.

but the html code of this part is :

    <div class="content">

      <p>Appier is a technology company that makes it easy for businesses to use artificial intelligence to grow and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.

<br>

<br>Job Description

<br>1. Perform data analysis to help Appier teams to answer business or operational questions.

<br>2. Interpret trends or patterns from complex data sets by using statistical and visualization tools.

<br>3. Conduct data analysis reports to illustrate the results and insight

<br>4. Develop tools and systems that optimize analysis process efficiency and report quality.</p>

asked Nov 21 at 2:30

Aimee Huang

You want the div text, The p text is just part of it
– pguardiario
Nov 21 at 2:34

Did you try any of the answers given?
– QHarr
Dec 1 at 5:16

add a comment |

I'm scraping job content from a website(https://www.104.com.tw/job/?jobno=66wee). As I send request, only part of the content in the 'p' element are returned.I want all the div class="content" part.

my code :

  import requests

  from bs4 import BeautifulSoup



  payload = {'jobno':'66wee'}

  headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

  r = requests.get('https://www.104.com.tw/job/',params = payload,headers = headers)

  soup=  BeautifulSoup(r.text, 'html.parser')

  contents = soup.findAll('div',{'class':'content'})  

  desctiprion = contents[0].findAll('p')[0].text.strip()

  print(desctiprion)

result(the job description part is missing):

4. Develop tools and systems that optimize analysis process efficiency and report quality.ion tools.row and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.

but the html code of this part is :

    <div class="content">

      <p>Appier is a technology company that makes it easy for businesses to use artificial intelligence to grow and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.

<br>

<br>Job Description

<br>1. Perform data analysis to help Appier teams to answer business or operational questions.

<br>2. Interpret trends or patterns from complex data sets by using statistical and visualization tools.

<br>3. Conduct data analysis reports to illustrate the results and insight

<br>4. Develop tools and systems that optimize analysis process efficiency and report quality.</p>

asked Nov 21 at 2:30

Aimee Huang

You want the div text, The p text is just part of it
– pguardiario
Nov 21 at 2:34

Did you try any of the answers given?
– QHarr
Dec 1 at 5:16

add a comment |

I'm scraping job content from a website(https://www.104.com.tw/job/?jobno=66wee). As I send request, only part of the content in the 'p' element are returned.I want all the div class="content" part.

my code :

  import requests

  from bs4 import BeautifulSoup



  payload = {'jobno':'66wee'}

  headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

  r = requests.get('https://www.104.com.tw/job/',params = payload,headers = headers)

  soup=  BeautifulSoup(r.text, 'html.parser')

  contents = soup.findAll('div',{'class':'content'})  

  desctiprion = contents[0].findAll('p')[0].text.strip()

  print(desctiprion)

result(the job description part is missing):

4. Develop tools and systems that optimize analysis process efficiency and report quality.ion tools.row and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.

but the html code of this part is :

    <div class="content">

      <p>Appier is a technology company that makes it easy for businesses to use artificial intelligence to grow and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.

<br>

<br>Job Description

<br>1. Perform data analysis to help Appier teams to answer business or operational questions.

<br>2. Interpret trends or patterns from complex data sets by using statistical and visualization tools.

<br>3. Conduct data analysis reports to illustrate the results and insight

<br>4. Develop tools and systems that optimize analysis process efficiency and report quality.</p>

asked Nov 21 at 2:30

Aimee Huang

I'm scraping job content from a website(https://www.104.com.tw/job/?jobno=66wee). As I send request, only part of the content in the 'p' element are returned.I want all the div class="content" part.

my code :

  import requests

  from bs4 import BeautifulSoup



  payload = {'jobno':'66wee'}

  headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

  r = requests.get('https://www.104.com.tw/job/',params = payload,headers = headers)

  soup=  BeautifulSoup(r.text, 'html.parser')

  contents = soup.findAll('div',{'class':'content'})  

  desctiprion = contents[0].findAll('p')[0].text.strip()

  print(desctiprion)

result(the job description part is missing):

4. Develop tools and systems that optimize analysis process efficiency and report quality.ion tools.row and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.

but the html code of this part is :

    <div class="content">

      <p>Appier is a technology company that makes it easy for businesses to use artificial intelligence to grow and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.

<br>

<br>Job Description

<br>1. Perform data analysis to help Appier teams to answer business or operational questions.

<br>2. Interpret trends or patterns from complex data sets by using statistical and visualization tools.

<br>3. Conduct data analysis reports to illustrate the results and insight

<br>4. Develop tools and systems that optimize analysis process efficiency and report quality.</p>

python web-scraping beautifulsoup request web-crawler

asked Nov 21 at 2:30

Aimee Huang

asked Nov 21 at 2:30

Aimee Huang

asked Nov 21 at 2:30

Aimee Huang

asked Nov 21 at 2:30

Aimee Huang

asked Nov 21 at 2:30

Aimee Huang

You want the div text, The p text is just part of it
– pguardiario
Nov 21 at 2:34

Did you try any of the answers given?
– QHarr
Dec 1 at 5:16

add a comment |

You want the div text, The p text is just part of it
– pguardiario
Nov 21 at 2:34

Did you try any of the answers given?
– QHarr
Dec 1 at 5:16

You want the div text, The p text is just part of it
– pguardiario
Nov 21 at 2:34

Did you try any of the answers given?
– QHarr
Dec 1 at 5:16

add a comment |

3 Answers
3

active

oldest

votes

import requests

from bs4 import BeautifulSoup



payload = {'jobno': '66wee'}

headers = {

    'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

r = requests.get('https://www.104.com.tw/job/',

                 params=payload, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

contents = soup.findAll('div', {'class': 'content'})

for content in contents[0].findAll('p')[0].text.splitlines():

    print(content)

answered Nov 21 at 2:41

taipei

6816

add a comment |

You are accesing only the first p element with the second [0] indexation:

description = contents[0].findAll('p')[0].text.strip()

You should iterate through all the p elements:

description = ""

for p in contents[0].findAll('p'):

    description += p.text.strip()



print(description)

edited Nov 21 at 2:51

answered Nov 21 at 2:35

Julian Peller

849511

add a comment |

There is more within the first content class tag but assuming you want just up to the end of point 4 i.e. first child p tag, you can use a descendant combinator with class selector for parent element and element selector for child. Remove the p from the selector if you truly want everything.

import requests

from bs4 import BeautifulSoup



url = 'https://www.104.com.tw/job/?jobno=66wee'

res = requests.get(url)

soup = BeautifulSoup(res.content, "lxml")

s = soup.select_one('.content p').text

print(s)

answered Nov 21 at 7:16

QHarr

29.7k81841

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53404514%2fpython-request-missing-part-of-the-content%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

import requests

from bs4 import BeautifulSoup



payload = {'jobno': '66wee'}

headers = {

    'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

r = requests.get('https://www.104.com.tw/job/',

                 params=payload, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

contents = soup.findAll('div', {'class': 'content'})

for content in contents[0].findAll('p')[0].text.splitlines():

    print(content)

answered Nov 21 at 2:41

taipei

6816

add a comment |

import requests

from bs4 import BeautifulSoup



payload = {'jobno': '66wee'}

headers = {

    'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

r = requests.get('https://www.104.com.tw/job/',

                 params=payload, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

contents = soup.findAll('div', {'class': 'content'})

for content in contents[0].findAll('p')[0].text.splitlines():

    print(content)

answered Nov 21 at 2:41

taipei

6816

add a comment |

import requests

from bs4 import BeautifulSoup



payload = {'jobno': '66wee'}

headers = {

    'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

r = requests.get('https://www.104.com.tw/job/',

                 params=payload, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

contents = soup.findAll('div', {'class': 'content'})

for content in contents[0].findAll('p')[0].text.splitlines():

    print(content)

answered Nov 21 at 2:41

taipei

6816

import requests

from bs4 import BeautifulSoup



payload = {'jobno': '66wee'}

headers = {

    'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

r = requests.get('https://www.104.com.tw/job/',

                 params=payload, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

contents = soup.findAll('div', {'class': 'content'})

for content in contents[0].findAll('p')[0].text.splitlines():

    print(content)

answered Nov 21 at 2:41

taipei

6816

answered Nov 21 at 2:41

taipei

6816

answered Nov 21 at 2:41

taipei

6816

answered Nov 21 at 2:41

taipei

6816

add a comment |

You are accesing only the first p element with the second [0] indexation:

description = contents[0].findAll('p')[0].text.strip()

You should iterate through all the p elements:

description = ""

for p in contents[0].findAll('p'):

    description += p.text.strip()



print(description)

edited Nov 21 at 2:51

answered Nov 21 at 2:35

Julian Peller

849511

add a comment |

You are accesing only the first p element with the second [0] indexation:

description = contents[0].findAll('p')[0].text.strip()

You should iterate through all the p elements:

description = ""

for p in contents[0].findAll('p'):

    description += p.text.strip()



print(description)

edited Nov 21 at 2:51

answered Nov 21 at 2:35

Julian Peller

849511

add a comment |

You are accesing only the first p element with the second [0] indexation:

description = contents[0].findAll('p')[0].text.strip()

You should iterate through all the p elements:

description = ""

for p in contents[0].findAll('p'):

    description += p.text.strip()



print(description)

edited Nov 21 at 2:51

answered Nov 21 at 2:35

Julian Peller

849511

You are accesing only the first p element with the second [0] indexation:

description = contents[0].findAll('p')[0].text.strip()

You should iterate through all the p elements:

description = ""

for p in contents[0].findAll('p'):

    description += p.text.strip()



print(description)

edited Nov 21 at 2:51

answered Nov 21 at 2:35

Julian Peller

849511

edited Nov 21 at 2:51

answered Nov 21 at 2:35

Julian Peller

849511

answered Nov 21 at 2:35

Julian Peller

849511

answered Nov 21 at 2:35

Julian Peller

849511

add a comment |

import requests

from bs4 import BeautifulSoup



url = 'https://www.104.com.tw/job/?jobno=66wee'

res = requests.get(url)

soup = BeautifulSoup(res.content, "lxml")

s = soup.select_one('.content p').text

print(s)

answered Nov 21 at 7:16

QHarr

29.7k81841

add a comment |

import requests

from bs4 import BeautifulSoup



url = 'https://www.104.com.tw/job/?jobno=66wee'

res = requests.get(url)

soup = BeautifulSoup(res.content, "lxml")

s = soup.select_one('.content p').text

print(s)

answered Nov 21 at 7:16

QHarr

29.7k81841

add a comment |

import requests

from bs4 import BeautifulSoup



url = 'https://www.104.com.tw/job/?jobno=66wee'

res = requests.get(url)

soup = BeautifulSoup(res.content, "lxml")

s = soup.select_one('.content p').text

print(s)

answered Nov 21 at 7:16

QHarr

29.7k81841

import requests

from bs4 import BeautifulSoup



url = 'https://www.104.com.tw/job/?jobno=66wee'

res = requests.get(url)

soup = BeautifulSoup(res.content, "lxml")

s = soup.select_one('.content p').text

print(s)

answered Nov 21 at 7:16

QHarr

29.7k81841

answered Nov 21 at 7:16

QHarr

29.7k81841

answered Nov 21 at 7:16

QHarr

29.7k81841

answered Nov 21 at 7:16

QHarr

29.7k81841

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk