Creating a python dataframe by parsing JSON API response
up vote
2
down vote
favorite
In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.
The following is the answer I wrote to extract the columns of interest from the API response.
Guide to columns of interest:
Course Title
=title
Trainer
=name
(withintrainers
)
Rating
=rating
Vendor
=name
(withinvendors
)
IT Path
=path_label
(withinpaths
)
Skill Level
=display
(withindifficulty
)
Course URL
= concatenation ofbase
withseoslug
The Vendors
field has missing items hence my use of an if
statement in the assigment to vendors
. I am not sure what the usual placeholder value is for missing string values in Python.
I use repeated list comprehensions in loops over the JSON object data
; where data = response.json()
I couldn't think of a way to remove these repeated loops and still have legible code.
I generate a dataframe by joining the lists in a dictionary and then converting with pandas.
I welcome any and all feedback please.
JSON response:
Example JSON dictionary within response. The response has a collection of such dictionaries.
Python 3
import requests
import pandas as pd
def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')
data = response.json()
titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]
df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})
#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )
if __name__ == "__main__":
main()
python beginner python-3.x web-scraping
add a comment |
up vote
2
down vote
favorite
In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.
The following is the answer I wrote to extract the columns of interest from the API response.
Guide to columns of interest:
Course Title
=title
Trainer
=name
(withintrainers
)
Rating
=rating
Vendor
=name
(withinvendors
)
IT Path
=path_label
(withinpaths
)
Skill Level
=display
(withindifficulty
)
Course URL
= concatenation ofbase
withseoslug
The Vendors
field has missing items hence my use of an if
statement in the assigment to vendors
. I am not sure what the usual placeholder value is for missing string values in Python.
I use repeated list comprehensions in loops over the JSON object data
; where data = response.json()
I couldn't think of a way to remove these repeated loops and still have legible code.
I generate a dataframe by joining the lists in a dictionary and then converting with pandas.
I welcome any and all feedback please.
JSON response:
Example JSON dictionary within response. The response has a collection of such dictionaries.
Python 3
import requests
import pandas as pd
def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')
data = response.json()
titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]
df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})
#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )
if __name__ == "__main__":
main()
python beginner python-3.x web-scraping
1
The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
– Calak
7 hours ago
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.
The following is the answer I wrote to extract the columns of interest from the API response.
Guide to columns of interest:
Course Title
=title
Trainer
=name
(withintrainers
)
Rating
=rating
Vendor
=name
(withinvendors
)
IT Path
=path_label
(withinpaths
)
Skill Level
=display
(withindifficulty
)
Course URL
= concatenation ofbase
withseoslug
The Vendors
field has missing items hence my use of an if
statement in the assigment to vendors
. I am not sure what the usual placeholder value is for missing string values in Python.
I use repeated list comprehensions in loops over the JSON object data
; where data = response.json()
I couldn't think of a way to remove these repeated loops and still have legible code.
I generate a dataframe by joining the lists in a dictionary and then converting with pandas.
I welcome any and all feedback please.
JSON response:
Example JSON dictionary within response. The response has a collection of such dictionaries.
Python 3
import requests
import pandas as pd
def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')
data = response.json()
titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]
df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})
#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )
if __name__ == "__main__":
main()
python beginner python-3.x web-scraping
In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.
The following is the answer I wrote to extract the columns of interest from the API response.
Guide to columns of interest:
Course Title
=title
Trainer
=name
(withintrainers
)
Rating
=rating
Vendor
=name
(withinvendors
)
IT Path
=path_label
(withinpaths
)
Skill Level
=display
(withindifficulty
)
Course URL
= concatenation ofbase
withseoslug
The Vendors
field has missing items hence my use of an if
statement in the assigment to vendors
. I am not sure what the usual placeholder value is for missing string values in Python.
I use repeated list comprehensions in loops over the JSON object data
; where data = response.json()
I couldn't think of a way to remove these repeated loops and still have legible code.
I generate a dataframe by joining the lists in a dictionary and then converting with pandas.
I welcome any and all feedback please.
JSON response:
Example JSON dictionary within response. The response has a collection of such dictionaries.
Python 3
import requests
import pandas as pd
def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')
data = response.json()
titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]
df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})
#print(df)
df.to_csv(r'C:UsersUserDesktopData.csv', sep=',', encoding='utf-8',index = False )
if __name__ == "__main__":
main()
python beginner python-3.x web-scraping
python beginner python-3.x web-scraping
edited 10 hours ago
asked 10 hours ago
QHarr
1969
1969
1
The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
– Calak
7 hours ago
add a comment |
1
The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
– Calak
7 hours ago
1
1
The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
– Calak
7 hours ago
The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
– Calak
7 hours ago
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
The only major thing I would change is that:
if len(item['vendors']) != 0
Is the same as:
if item['vendors']
Because an empty list will return back as False. If you want to try it out:
a =
bool(a) # False
b = [1,2,3]
bool(b) # True
I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:
paths = [item['paths'][0]['path_label'] for item in data]
New contributor
Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
– QHarr
10 hours ago
add a comment |
up vote
0
down vote
I couldn't think of a way to remove these repeated loops and still have legible code.
There is a way:
titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))
I haven't tested this, so you should.
Thank you. I will test tonight/tomorrow morning and feedback.
– QHarr
9 hours ago
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
The only major thing I would change is that:
if len(item['vendors']) != 0
Is the same as:
if item['vendors']
Because an empty list will return back as False. If you want to try it out:
a =
bool(a) # False
b = [1,2,3]
bool(b) # True
I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:
paths = [item['paths'][0]['path_label'] for item in data]
New contributor
Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
– QHarr
10 hours ago
add a comment |
up vote
1
down vote
The only major thing I would change is that:
if len(item['vendors']) != 0
Is the same as:
if item['vendors']
Because an empty list will return back as False. If you want to try it out:
a =
bool(a) # False
b = [1,2,3]
bool(b) # True
I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:
paths = [item['paths'][0]['path_label'] for item in data]
New contributor
Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
– QHarr
10 hours ago
add a comment |
up vote
1
down vote
up vote
1
down vote
The only major thing I would change is that:
if len(item['vendors']) != 0
Is the same as:
if item['vendors']
Because an empty list will return back as False. If you want to try it out:
a =
bool(a) # False
b = [1,2,3]
bool(b) # True
I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:
paths = [item['paths'][0]['path_label'] for item in data]
New contributor
The only major thing I would change is that:
if len(item['vendors']) != 0
Is the same as:
if item['vendors']
Because an empty list will return back as False. If you want to try it out:
a =
bool(a) # False
b = [1,2,3]
bool(b) # True
I would also be careful with what you have because those dictionaries that you are converting might have more than one value, in which case you would miss them. This is the line that I am referring to:
paths = [item['paths'][0]['path_label'] for item in data]
New contributor
New contributor
answered 10 hours ago
Anthony Herrera
111
111
New contributor
New contributor
Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
– QHarr
10 hours ago
add a comment |
Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
– QHarr
10 hours ago
Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
– QHarr
10 hours ago
Thank you for the feedback. Great point about vendors. And yes, I made an assumption, that the first value would suffice, for the other line you pointed out about dictionaries. +
– QHarr
10 hours ago
add a comment |
up vote
0
down vote
I couldn't think of a way to remove these repeated loops and still have legible code.
There is a way:
titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))
I haven't tested this, so you should.
Thank you. I will test tonight/tomorrow morning and feedback.
– QHarr
9 hours ago
add a comment |
up vote
0
down vote
I couldn't think of a way to remove these repeated loops and still have legible code.
There is a way:
titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))
I haven't tested this, so you should.
Thank you. I will test tonight/tomorrow morning and feedback.
– QHarr
9 hours ago
add a comment |
up vote
0
down vote
up vote
0
down vote
I couldn't think of a way to remove these repeated loops and still have legible code.
There is a way:
titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))
I haven't tested this, so you should.
I couldn't think of a way to remove these repeated loops and still have legible code.
There is a way:
titles, trainers, ratings, vendors, paths, skillLevel, links = zip(*((
item['title'],
item['trainers'][0]['name'],
item['rating'],
item['vendors'][0]['display'],
item['paths'][0]['path_label'],
item['difficulty']['display'],
base + item['seoslug']
) for item in data))
I haven't tested this, so you should.
answered 10 hours ago
Reinderien
1,143515
1,143515
Thank you. I will test tonight/tomorrow morning and feedback.
– QHarr
9 hours ago
add a comment |
Thank you. I will test tonight/tomorrow morning and feedback.
– QHarr
9 hours ago
Thank you. I will test tonight/tomorrow morning and feedback.
– QHarr
9 hours ago
Thank you. I will test tonight/tomorrow morning and feedback.
– QHarr
9 hours ago
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207999%2fcreating-a-python-dataframe-by-parsing-json-api-response%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
The only thing I'll say is that, if each publication on Code Review had a presentation as clear, complete and pleasant as yours, the overall quality of this site would be improved. Pretty nice code BTW.
– Calak
7 hours ago