Creating a python dataframe by parsing JSON API response











up vote
2
down vote

favorite












In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.



The following is the answer I wrote to extract the columns of interest from the API response.



Guide to columns of interest:





  1. Course Title = title


  2. Trainer = name (within trainers)


  3. Rating = rating


  4. Vendor = name (within vendors)


  5. IT Path = path_label (within paths)


  6. Skill Level = display (within difficulty)


  7. Course URL = concatenation of base with seoslug


The Vendors field has missing items hence my use of an if statement in the assigment to vendors. I am not sure what the usual placeholder value is for missing string values in Python.



I use repeated list comprehensions in loops over the JSON object data; where data = response.json()



I couldn't think of a way to remove these repeated loops and still have legible code.



I generate a dataframe by joining the lists in a dictionary and then converting with pandas.



I welcome any and all feedback please.





JSON response:



Example JSON dictionary within response. The response has a collection of such dictionaries.







Python 3



import requests
import pandas as pd


def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')

data = response.json()

titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]

df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})

#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )

if __name__ == "__main__":

main()









share|improve this question




















  • 1




    The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
    – Calak
    7 hours ago















up vote
2
down vote

favorite












In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.



The following is the answer I wrote to extract the columns of interest from the API response.



Guide to columns of interest:





  1. Course Title = title


  2. Trainer = name (within trainers)


  3. Rating = rating


  4. Vendor = name (within vendors)


  5. IT Path = path_label (within paths)


  6. Skill Level = display (within difficulty)


  7. Course URL = concatenation of base with seoslug


The Vendors field has missing items hence my use of an if statement in the assigment to vendors. I am not sure what the usual placeholder value is for missing string values in Python.



I use repeated list comprehensions in loops over the JSON object data; where data = response.json()



I couldn't think of a way to remove these repeated loops and still have legible code.



I generate a dataframe by joining the lists in a dictionary and then converting with pandas.



I welcome any and all feedback please.





JSON response:



Example JSON dictionary within response. The response has a collection of such dictionaries.







Python 3



import requests
import pandas as pd


def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')

data = response.json()

titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]

df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})

#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )

if __name__ == "__main__":

main()









share|improve this question




















  • 1




    The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
    – Calak
    7 hours ago













up vote
2
down vote

favorite









up vote
2
down vote

favorite











In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.



The following is the answer I wrote to extract the columns of interest from the API response.



Guide to columns of interest:





  1. Course Title = title


  2. Trainer = name (within trainers)


  3. Rating = rating


  4. Vendor = name (within vendors)


  5. IT Path = path_label (within paths)


  6. Skill Level = display (within difficulty)


  7. Course URL = concatenation of base with seoslug


The Vendors field has missing items hence my use of an if statement in the assigment to vendors. I am not sure what the usual placeholder value is for missing string values in Python.



I use repeated list comprehensions in loops over the JSON object data; where data = response.json()



I couldn't think of a way to remove these repeated loops and still have legible code.



I generate a dataframe by joining the lists in a dictionary and then converting with pandas.



I welcome any and all feedback please.





JSON response:



Example JSON dictionary within response. The response has a collection of such dictionaries.







Python 3



import requests
import pandas as pd


def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')

data = response.json()

titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]

df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})

#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )

if __name__ == "__main__":

main()









share|improve this question















In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.



The following is the answer I wrote to extract the columns of interest from the API response.



Guide to columns of interest:





  1. Course Title = title


  2. Trainer = name (within trainers)


  3. Rating = rating


  4. Vendor = name (within vendors)


  5. IT Path = path_label (within paths)


  6. Skill Level = display (within difficulty)


  7. Course URL = concatenation of base with seoslug


The Vendors field has missing items hence my use of an if statement in the assigment to vendors. I am not sure what the usual placeholder value is for missing string values in Python.



I use repeated list comprehensions in loops over the JSON object data; where data = response.json()



I couldn't think of a way to remove these repeated loops and still have legible code.



I generate a dataframe by joining the lists in a dictionary and then converting with pandas.



I welcome any and all feedback please.





JSON response:



Example JSON dictionary within response. The response has a collection of such dictionaries.







Python 3



import requests
import pandas as pd


def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')

data = response.json()

titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]

df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})

#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )

if __name__ == "__main__":

main()






python beginner python-3.x web-scraping






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 10 hours ago

























asked 10 hours ago









QHarr

1969




1969








  • 1




    The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
    – Calak
    7 hours ago














  • 1




    The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
    – Calak
    7 hours ago








1




1




The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
– Calak
7 hours ago




The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
– Calak
7 hours ago










2 Answers
2






active

oldest

votes

















up vote
1
down vote













The only major thing I would change is that:



if len(item['vendors']) != 0


Is the same as:



if item['vendors']


Because an empty list will return back as False. If you want to try it out:



a = 
bool(a) # False
b = [1,2,3]
bool(b) # True


I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:



    paths = [item['paths'][0]['path_label'] for item in data]





share|improve this answer








New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
    – QHarr
    10 hours ago


















up vote
0
down vote














I couldn't think of a way to remove these repeated loops and still have legible code.




There is a way:



titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))


I haven't tested this, so you should.






share|improve this answer





















  • Thank you. I will test tonight/tomorrow morning and feedback.
    – QHarr
    9 hours ago











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207999%2fcreating-a-python-dataframe-by-parsing-json-api-response%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote













The only major thing I would change is that:



if len(item['vendors']) != 0


Is the same as:



if item['vendors']


Because an empty list will return back as False. If you want to try it out:



a = 
bool(a) # False
b = [1,2,3]
bool(b) # True


I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:



    paths = [item['paths'][0]['path_label'] for item in data]





share|improve this answer








New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
    – QHarr
    10 hours ago















up vote
1
down vote













The only major thing I would change is that:



if len(item['vendors']) != 0


Is the same as:



if item['vendors']


Because an empty list will return back as False. If you want to try it out:



a = 
bool(a) # False
b = [1,2,3]
bool(b) # True


I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:



    paths = [item['paths'][0]['path_label'] for item in data]





share|improve this answer








New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
    – QHarr
    10 hours ago













up vote
1
down vote










up vote
1
down vote









The only major thing I would change is that:



if len(item['vendors']) != 0


Is the same as:



if item['vendors']


Because an empty list will return back as False. If you want to try it out:



a = 
bool(a) # False
b = [1,2,3]
bool(b) # True


I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:



    paths = [item['paths'][0]['path_label'] for item in data]





share|improve this answer








New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









The only major thing I would change is that:



if len(item['vendors']) != 0


Is the same as:



if item['vendors']


Because an empty list will return back as False. If you want to try it out:



a = 
bool(a) # False
b = [1,2,3]
bool(b) # True


I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:



    paths = [item['paths'][0]['path_label'] for item in data]






share|improve this answer








New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer






New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered 10 hours ago









Anthony Herrera

111




111




New contributor




Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Anthony Herrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
    – QHarr
    10 hours ago


















  • Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
    – QHarr
    10 hours ago
















Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
– QHarr
10 hours ago




Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
– QHarr
10 hours ago












up vote
0
down vote














I couldn't think of a way to remove these repeated loops and still have legible code.




There is a way:



titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))


I haven't tested this, so you should.






share|improve this answer





















  • Thank you. I will test tonight/tomorrow morning and feedback.
    – QHarr
    9 hours ago















up vote
0
down vote














I couldn't think of a way to remove these repeated loops and still have legible code.




There is a way:



titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))


I haven't tested this, so you should.






share|improve this answer





















  • Thank you. I will test tonight/tomorrow morning and feedback.
    – QHarr
    9 hours ago













up vote
0
down vote










up vote
0
down vote










I couldn't think of a way to remove these repeated loops and still have legible code.




There is a way:



titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))


I haven't tested this, so you should.






share|improve this answer













I couldn't think of a way to remove these repeated loops and still have legible code.




There is a way:



titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))


I haven't tested this, so you should.







share|improve this answer












share|improve this answer



share|improve this answer










answered 10 hours ago









Reinderien

1,143515




1,143515












  • Thank you. I will test tonight/tomorrow morning and feedback.
    – QHarr
    9 hours ago


















  • Thank you. I will test tonight/tomorrow morning and feedback.
    – QHarr
    9 hours ago
















Thank you. I will test tonight/tomorrow morning and feedback.
– QHarr
9 hours ago




Thank you. I will test tonight/tomorrow morning and feedback.
– QHarr
9 hours ago


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207999%2fcreating-a-python-dataframe-by-parsing-json-api-response%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

TypeError: fit_transform() missing 1 required positional argument: 'X'