Scraping with Python and Selenium - how should I return a 'null' if element not present

Good Day,
I am a newbie to Python and Selenium and have searched for the solution for a while now. While some answers come close, I can't see to find one that solves my problem. The snippet of my code that is a slight problem is as follows:

for url in links:

        driver.get(url)

        company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

        date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

        title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

        urlinf = driver.current_url #url info



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:

what I have tried thus far:

1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")

I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.

any assistance and guidance would be greatly appreciated.

EDIT 1:

I have tried the following:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except:

        pass

        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

and:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):

        pass



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

and:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except:

          i = 'Null'

          pass



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

I tried the same try/except at the point of appending to Pandas.

EDIT 2
the error I get:

is attributed to the line:

edited Nov 22 '18 at 7:48

asked Nov 22 '18 at 5:12

qbbq

226

Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed

– Moshe Slavin
Nov 22 '18 at 6:57

I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried

– qbbq
Nov 22 '18 at 7:36

I'll take a look...

– Moshe Slavin
Nov 22 '18 at 10:10

I posted an answer let me know if you need any other assistance!

– Moshe Slavin
Nov 22 '18 at 10:27

add a comment |

for url in links:

        driver.get(url)

        company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

        date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

        title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

        urlinf = driver.current_url #url info



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:

what I have tried thus far:

1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")

I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.

any assistance and guidance would be greatly appreciated.

EDIT 1:

I have tried the following:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except:

        pass

        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

and:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):

        pass



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

and:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except:

          i = 'Null'

          pass



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

I tried the same try/except at the point of appending to Pandas.

EDIT 2
the error I get:

is attributed to the line:

edited Nov 22 '18 at 7:48

asked Nov 22 '18 at 5:12

qbbq

226

Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed

– Moshe Slavin
Nov 22 '18 at 6:57

I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried

– qbbq
Nov 22 '18 at 7:36

I'll take a look...

– Moshe Slavin
Nov 22 '18 at 10:10

I posted an answer let me know if you need any other assistance!

– Moshe Slavin
Nov 22 '18 at 10:27

add a comment |

for url in links:

        driver.get(url)

        company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

        date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

        title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

        urlinf = driver.current_url #url info



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:

what I have tried thus far:

1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")

I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.

any assistance and guidance would be greatly appreciated.

EDIT 1:

I have tried the following:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except:

        pass

        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

and:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):

        pass



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

and:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except:

          i = 'Null'

          pass



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

I tried the same try/except at the point of appending to Pandas.

EDIT 2
the error I get:

is attributed to the line:

edited Nov 22 '18 at 7:48

asked Nov 22 '18 at 5:12

qbbq

226

for url in links:

        driver.get(url)

        company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

        date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

        title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

        urlinf = driver.current_url #url info



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:

what I have tried thus far:

1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")

I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.

any assistance and guidance would be greatly appreciated.

EDIT 1:

I have tried the following:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except:

        pass

        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

and:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):

        pass



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

and:

for url in links:

        driver.get(url)

    try:

            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

            urlinf = driver.current_url #url info

        except:

          i = 'Null'

          pass



        num_page_items = len(date)



        for i in range(num_page_items):

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

I tried the same try/except at the point of appending to Pandas.

EDIT 2
the error I get:

is attributed to the line:

python selenium selenium-chromedriver screen-scraping

edited Nov 22 '18 at 7:48

asked Nov 22 '18 at 5:12

qbbq

226

edited Nov 22 '18 at 7:48

asked Nov 22 '18 at 5:12

qbbq

226

edited Nov 22 '18 at 7:48

asked Nov 22 '18 at 5:12

qbbq

226

asked Nov 22 '18 at 5:12

qbbq

226

asked Nov 22 '18 at 5:12

qbbq

226

Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed

– Moshe Slavin
Nov 22 '18 at 6:57

I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried

– qbbq
Nov 22 '18 at 7:36

I'll take a look...

– Moshe Slavin
Nov 22 '18 at 10:10

I posted an answer let me know if you need any other assistance!

– Moshe Slavin
Nov 22 '18 at 10:27

add a comment |

Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed

– Moshe Slavin
Nov 22 '18 at 6:57

I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried

– qbbq
Nov 22 '18 at 7:36

I'll take a look...

– Moshe Slavin
Nov 22 '18 at 10:10

I posted an answer let me know if you need any other assistance!

– Moshe Slavin
Nov 22 '18 at 10:27

Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed

– Moshe Slavin
Nov 22 '18 at 6:57

I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried

– qbbq
Nov 22 '18 at 7:36

I'll take a look...

– Moshe Slavin
Nov 22 '18 at 10:10

I posted an answer let me know if you need any other assistance!

– Moshe Slavin
Nov 22 '18 at 10:27

add a comment |

1 Answer
1

active

oldest

votes

As your error shows you have an index error!

To overcome that you should add a try except within the area that raises this error.

Also, you are using the driver.current_url which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...

In your case try this:

for url in links:

    driver.get(url)

    company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

    date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

    title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

    urlinf = driver.current_url #url info



    num_page_items = len(date)

    for i in range(num_page_items):

        try:

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)

        except IndexError:

            df.append(None) # or df.append('Null')

Hope you find this helpfull!

answered Nov 22 '18 at 10:23

Moshe Slavin

1,3591721

this solution works! thank you very much - I really appreciate it.

– qbbq
Nov 22 '18 at 11:27

just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

– qbbq
Nov 22 '18 at 11:28

It's a pandas issue probably... Just use None...

– Moshe Slavin
Nov 22 '18 at 11:31

Glad to help!!!

– Moshe Slavin
Nov 22 '18 at 11:33

just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

– qbbq
Nov 26 '18 at 3:07

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424252%2fscraping-with-python-and-selenium-how-should-i-return-a-null-if-element-not%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

As your error shows you have an index error!

To overcome that you should add a try except within the area that raises this error.

Also, you are using the driver.current_url which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...

In your case try this:

for url in links:

    driver.get(url)

    company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

    date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

    title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

    urlinf = driver.current_url #url info



    num_page_items = len(date)

    for i in range(num_page_items):

        try:

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)

        except IndexError:

            df.append(None) # or df.append('Null')

Hope you find this helpfull!

answered Nov 22 '18 at 10:23

Moshe Slavin

1,3591721

this solution works! thank you very much - I really appreciate it.

– qbbq
Nov 22 '18 at 11:27

just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

– qbbq
Nov 22 '18 at 11:28

It's a pandas issue probably... Just use None...

– Moshe Slavin
Nov 22 '18 at 11:31

Glad to help!!!

– Moshe Slavin
Nov 22 '18 at 11:33

just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

– qbbq
Nov 26 '18 at 3:07

|
show 1 more comment

As your error shows you have an index error!

To overcome that you should add a try except within the area that raises this error.

Also, you are using the driver.current_url which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...

In your case try this:

for url in links:

    driver.get(url)

    company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

    date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

    title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

    urlinf = driver.current_url #url info



    num_page_items = len(date)

    for i in range(num_page_items):

        try:

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)

        except IndexError:

            df.append(None) # or df.append('Null')

Hope you find this helpfull!

answered Nov 22 '18 at 10:23

Moshe Slavin

1,3591721

this solution works! thank you very much - I really appreciate it.

– qbbq
Nov 22 '18 at 11:27

just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

– qbbq
Nov 22 '18 at 11:28

It's a pandas issue probably... Just use None...

– Moshe Slavin
Nov 22 '18 at 11:31

Glad to help!!!

– Moshe Slavin
Nov 22 '18 at 11:33

just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

– qbbq
Nov 26 '18 at 3:07

|
show 1 more comment

As your error shows you have an index error!

To overcome that you should add a try except within the area that raises this error.

Also, you are using the driver.current_url which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...

In your case try this:

for url in links:

    driver.get(url)

    company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

    date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

    title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

    urlinf = driver.current_url #url info



    num_page_items = len(date)

    for i in range(num_page_items):

        try:

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)

        except IndexError:

            df.append(None) # or df.append('Null')

Hope you find this helpfull!

answered Nov 22 '18 at 10:23

Moshe Slavin

1,3591721

As your error shows you have an index error!

To overcome that you should add a try except within the area that raises this error.

Also, you are using the driver.current_url which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...

In your case try this:

for url in links:

    driver.get(url)

    company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")

    date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")

    title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")

    urlinf = driver.current_url #url info



    num_page_items = len(date)

    for i in range(num_page_items):

        try:

            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)

        except IndexError:

            df.append(None) # or df.append('Null')

Hope you find this helpfull!

answered Nov 22 '18 at 10:23

Moshe Slavin

1,3591721

answered Nov 22 '18 at 10:23

Moshe Slavin

1,3591721

answered Nov 22 '18 at 10:23

Moshe Slavin

1,3591721

answered Nov 22 '18 at 10:23

Moshe Slavin

1,3591721

this solution works! thank you very much - I really appreciate it.

– qbbq
Nov 22 '18 at 11:27

just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

– qbbq
Nov 22 '18 at 11:28

It's a pandas issue probably... Just use None...

– Moshe Slavin
Nov 22 '18 at 11:31

Glad to help!!!

– Moshe Slavin
Nov 22 '18 at 11:33

just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

– qbbq
Nov 26 '18 at 3:07

|
show 1 more comment

this solution works! thank you very much - I really appreciate it.

– qbbq
Nov 22 '18 at 11:27

just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

– qbbq
Nov 22 '18 at 11:28

It's a pandas issue probably... Just use None...

– Moshe Slavin
Nov 22 '18 at 11:31

Glad to help!!!

– Moshe Slavin
Nov 22 '18 at 11:33

just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

– qbbq
Nov 26 '18 at 3:07

this solution works! thank you very much - I really appreciate it.

– qbbq
Nov 22 '18 at 11:27

just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

– qbbq
Nov 22 '18 at 11:28

It's a pandas issue probably... Just use None...

– Moshe Slavin
Nov 22 '18 at 11:31

Glad to help!!!

– Moshe Slavin
Nov 22 '18 at 11:33

just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

– qbbq
Nov 26 '18 at 3:07

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk