Selecting DataFrame rows: why is the result filled with NaN values?
I have a dataset where I would like to select data where only the submission date is greater than '2018/11/14 01:26PM'.
The code below is what I have so far, but all other columns in the dataset gets populated with a value of nan. What am I doing wrong?
d = datetime.strptime('2018-11-14 01:26PM', '%Y-%m-%d %H:%M%p')
data[data['submission_date'] > d]
Data sample below:
ID Name submission_date
12 Mike 2018-11-14 01:26PM
13 Mark 2018-11-14 02:00PM
14 Taylor 2018-11-14 03:26PM
14 Taylor 2018-11-15 03:26PM
python pandas dataframe
add a comment |
I have a dataset where I would like to select data where only the submission date is greater than '2018/11/14 01:26PM'.
The code below is what I have so far, but all other columns in the dataset gets populated with a value of nan. What am I doing wrong?
d = datetime.strptime('2018-11-14 01:26PM', '%Y-%m-%d %H:%M%p')
data[data['submission_date'] > d]
Data sample below:
ID Name submission_date
12 Mike 2018-11-14 01:26PM
13 Mark 2018-11-14 02:00PM
14 Taylor 2018-11-14 03:26PM
14 Taylor 2018-11-15 03:26PM
python pandas dataframe
add a comment |
I have a dataset where I would like to select data where only the submission date is greater than '2018/11/14 01:26PM'.
The code below is what I have so far, but all other columns in the dataset gets populated with a value of nan. What am I doing wrong?
d = datetime.strptime('2018-11-14 01:26PM', '%Y-%m-%d %H:%M%p')
data[data['submission_date'] > d]
Data sample below:
ID Name submission_date
12 Mike 2018-11-14 01:26PM
13 Mark 2018-11-14 02:00PM
14 Taylor 2018-11-14 03:26PM
14 Taylor 2018-11-15 03:26PM
python pandas dataframe
I have a dataset where I would like to select data where only the submission date is greater than '2018/11/14 01:26PM'.
The code below is what I have so far, but all other columns in the dataset gets populated with a value of nan. What am I doing wrong?
d = datetime.strptime('2018-11-14 01:26PM', '%Y-%m-%d %H:%M%p')
data[data['submission_date'] > d]
Data sample below:
ID Name submission_date
12 Mike 2018-11-14 01:26PM
13 Mark 2018-11-14 02:00PM
14 Taylor 2018-11-14 03:26PM
14 Taylor 2018-11-15 03:26PM
python pandas dataframe
python pandas dataframe
edited Nov 21 at 13:31
jez
7,6671941
7,6671941
asked Nov 20 at 19:31
mark
388
388
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
I know almost nothing about pandas
but, using your question as a learning exercise, I found the following pattern. When data.columns
is initialized with a flat list, which creates an Index
object, all is well:
data = pandas.DataFrame( numpy.random.randn( 5, 2 ) )
data.columns=[ 'one', 'two' ]
print( data )
# Output:
# one two
# 0 -1.242567 0.430084
# 1 -1.125710 -0.342616
# 2 -0.514284 0.479382
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
criterion = data[ 'one' ] > 0 # NB: criterion.shape is (5,): it is one-dimensional
print( data[ criterion ] )
# Output:
# one two
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
However, if I change the dimensionality of the column structure (creating a MultiIndex
) then I can recreate the NaN syndrome you describe:
data.columns = [ [ 'one', 'two' ] ] # note the double-nesting
print(data) # it "looks" identical to how it did before...
# Output:
# one two
# 0 -1.242567 0.430084
# 1 -1.125710 -0.342616
# 2 -0.514284 0.479382
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
criterion = data[ 'one' ] > 0 # but this criterion.shape is now (5,1): it's two-dimensional...
print( data[ criterion ] )
# Output:
# one two
# 0 NaN NaN
# 1 NaN NaN
# 2 NaN NaN
# 3 0.108649 NaN
# 4 1.489155 NaN
It depends on the (superficially invisible) details of your DataFrame
's column structure. It's very surprising to me that there was no warning or exception when you performed the slicing, and I can't imagine any context in which the NaN-ridden result would be the sensible, expected outcome.
Anyway, the problem can clearly be circumvented by reshaping the array you're using to index your data, so that its shape is (5,)
again:
print( data[ criterion.values.flatten() ] ) # back to sanity
# Output:
# one two
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
However if you don't want to take advantage of any particular MultiIndex
behavior provided by your existing column structure, then the more elegant solution (indicated by your comment) may be simply to reassign data.columns
to ensure that it's a flat list to start with.
I renamed the columns with single square brackets, and it works as you mentioned above. Very helpful.
– mark
Nov 20 at 20:22
NB: digging a little deeper, it seems that nested lists of columns create (and whatever routine created yourDataFrame
might otherwise have set up) aMultiIndex
object instead of anIndex
, indata.columns
. I don't know enough yet to know what aMultiIndex
is capable of, but you might want to make sure that you're not throwing away some essential functionality that it provides.
– jez
Nov 20 at 20:52
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400254%2fselecting-dataframe-rows-why-is-the-result-filled-with-nan-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I know almost nothing about pandas
but, using your question as a learning exercise, I found the following pattern. When data.columns
is initialized with a flat list, which creates an Index
object, all is well:
data = pandas.DataFrame( numpy.random.randn( 5, 2 ) )
data.columns=[ 'one', 'two' ]
print( data )
# Output:
# one two
# 0 -1.242567 0.430084
# 1 -1.125710 -0.342616
# 2 -0.514284 0.479382
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
criterion = data[ 'one' ] > 0 # NB: criterion.shape is (5,): it is one-dimensional
print( data[ criterion ] )
# Output:
# one two
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
However, if I change the dimensionality of the column structure (creating a MultiIndex
) then I can recreate the NaN syndrome you describe:
data.columns = [ [ 'one', 'two' ] ] # note the double-nesting
print(data) # it "looks" identical to how it did before...
# Output:
# one two
# 0 -1.242567 0.430084
# 1 -1.125710 -0.342616
# 2 -0.514284 0.479382
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
criterion = data[ 'one' ] > 0 # but this criterion.shape is now (5,1): it's two-dimensional...
print( data[ criterion ] )
# Output:
# one two
# 0 NaN NaN
# 1 NaN NaN
# 2 NaN NaN
# 3 0.108649 NaN
# 4 1.489155 NaN
It depends on the (superficially invisible) details of your DataFrame
's column structure. It's very surprising to me that there was no warning or exception when you performed the slicing, and I can't imagine any context in which the NaN-ridden result would be the sensible, expected outcome.
Anyway, the problem can clearly be circumvented by reshaping the array you're using to index your data, so that its shape is (5,)
again:
print( data[ criterion.values.flatten() ] ) # back to sanity
# Output:
# one two
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
However if you don't want to take advantage of any particular MultiIndex
behavior provided by your existing column structure, then the more elegant solution (indicated by your comment) may be simply to reassign data.columns
to ensure that it's a flat list to start with.
I renamed the columns with single square brackets, and it works as you mentioned above. Very helpful.
– mark
Nov 20 at 20:22
NB: digging a little deeper, it seems that nested lists of columns create (and whatever routine created yourDataFrame
might otherwise have set up) aMultiIndex
object instead of anIndex
, indata.columns
. I don't know enough yet to know what aMultiIndex
is capable of, but you might want to make sure that you're not throwing away some essential functionality that it provides.
– jez
Nov 20 at 20:52
add a comment |
I know almost nothing about pandas
but, using your question as a learning exercise, I found the following pattern. When data.columns
is initialized with a flat list, which creates an Index
object, all is well:
data = pandas.DataFrame( numpy.random.randn( 5, 2 ) )
data.columns=[ 'one', 'two' ]
print( data )
# Output:
# one two
# 0 -1.242567 0.430084
# 1 -1.125710 -0.342616
# 2 -0.514284 0.479382
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
criterion = data[ 'one' ] > 0 # NB: criterion.shape is (5,): it is one-dimensional
print( data[ criterion ] )
# Output:
# one two
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
However, if I change the dimensionality of the column structure (creating a MultiIndex
) then I can recreate the NaN syndrome you describe:
data.columns = [ [ 'one', 'two' ] ] # note the double-nesting
print(data) # it "looks" identical to how it did before...
# Output:
# one two
# 0 -1.242567 0.430084
# 1 -1.125710 -0.342616
# 2 -0.514284 0.479382
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
criterion = data[ 'one' ] > 0 # but this criterion.shape is now (5,1): it's two-dimensional...
print( data[ criterion ] )
# Output:
# one two
# 0 NaN NaN
# 1 NaN NaN
# 2 NaN NaN
# 3 0.108649 NaN
# 4 1.489155 NaN
It depends on the (superficially invisible) details of your DataFrame
's column structure. It's very surprising to me that there was no warning or exception when you performed the slicing, and I can't imagine any context in which the NaN-ridden result would be the sensible, expected outcome.
Anyway, the problem can clearly be circumvented by reshaping the array you're using to index your data, so that its shape is (5,)
again:
print( data[ criterion.values.flatten() ] ) # back to sanity
# Output:
# one two
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
However if you don't want to take advantage of any particular MultiIndex
behavior provided by your existing column structure, then the more elegant solution (indicated by your comment) may be simply to reassign data.columns
to ensure that it's a flat list to start with.
I renamed the columns with single square brackets, and it works as you mentioned above. Very helpful.
– mark
Nov 20 at 20:22
NB: digging a little deeper, it seems that nested lists of columns create (and whatever routine created yourDataFrame
might otherwise have set up) aMultiIndex
object instead of anIndex
, indata.columns
. I don't know enough yet to know what aMultiIndex
is capable of, but you might want to make sure that you're not throwing away some essential functionality that it provides.
– jez
Nov 20 at 20:52
add a comment |
I know almost nothing about pandas
but, using your question as a learning exercise, I found the following pattern. When data.columns
is initialized with a flat list, which creates an Index
object, all is well:
data = pandas.DataFrame( numpy.random.randn( 5, 2 ) )
data.columns=[ 'one', 'two' ]
print( data )
# Output:
# one two
# 0 -1.242567 0.430084
# 1 -1.125710 -0.342616
# 2 -0.514284 0.479382
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
criterion = data[ 'one' ] > 0 # NB: criterion.shape is (5,): it is one-dimensional
print( data[ criterion ] )
# Output:
# one two
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
However, if I change the dimensionality of the column structure (creating a MultiIndex
) then I can recreate the NaN syndrome you describe:
data.columns = [ [ 'one', 'two' ] ] # note the double-nesting
print(data) # it "looks" identical to how it did before...
# Output:
# one two
# 0 -1.242567 0.430084
# 1 -1.125710 -0.342616
# 2 -0.514284 0.479382
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
criterion = data[ 'one' ] > 0 # but this criterion.shape is now (5,1): it's two-dimensional...
print( data[ criterion ] )
# Output:
# one two
# 0 NaN NaN
# 1 NaN NaN
# 2 NaN NaN
# 3 0.108649 NaN
# 4 1.489155 NaN
It depends on the (superficially invisible) details of your DataFrame
's column structure. It's very surprising to me that there was no warning or exception when you performed the slicing, and I can't imagine any context in which the NaN-ridden result would be the sensible, expected outcome.
Anyway, the problem can clearly be circumvented by reshaping the array you're using to index your data, so that its shape is (5,)
again:
print( data[ criterion.values.flatten() ] ) # back to sanity
# Output:
# one two
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
However if you don't want to take advantage of any particular MultiIndex
behavior provided by your existing column structure, then the more elegant solution (indicated by your comment) may be simply to reassign data.columns
to ensure that it's a flat list to start with.
I know almost nothing about pandas
but, using your question as a learning exercise, I found the following pattern. When data.columns
is initialized with a flat list, which creates an Index
object, all is well:
data = pandas.DataFrame( numpy.random.randn( 5, 2 ) )
data.columns=[ 'one', 'two' ]
print( data )
# Output:
# one two
# 0 -1.242567 0.430084
# 1 -1.125710 -0.342616
# 2 -0.514284 0.479382
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
criterion = data[ 'one' ] > 0 # NB: criterion.shape is (5,): it is one-dimensional
print( data[ criterion ] )
# Output:
# one two
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
However, if I change the dimensionality of the column structure (creating a MultiIndex
) then I can recreate the NaN syndrome you describe:
data.columns = [ [ 'one', 'two' ] ] # note the double-nesting
print(data) # it "looks" identical to how it did before...
# Output:
# one two
# 0 -1.242567 0.430084
# 1 -1.125710 -0.342616
# 2 -0.514284 0.479382
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
criterion = data[ 'one' ] > 0 # but this criterion.shape is now (5,1): it's two-dimensional...
print( data[ criterion ] )
# Output:
# one two
# 0 NaN NaN
# 1 NaN NaN
# 2 NaN NaN
# 3 0.108649 NaN
# 4 1.489155 NaN
It depends on the (superficially invisible) details of your DataFrame
's column structure. It's very surprising to me that there was no warning or exception when you performed the slicing, and I can't imagine any context in which the NaN-ridden result would be the sensible, expected outcome.
Anyway, the problem can clearly be circumvented by reshaping the array you're using to index your data, so that its shape is (5,)
again:
print( data[ criterion.values.flatten() ] ) # back to sanity
# Output:
# one two
# 3 0.108649 -0.789272
# 4 1.489155 0.842427
However if you don't want to take advantage of any particular MultiIndex
behavior provided by your existing column structure, then the more elegant solution (indicated by your comment) may be simply to reassign data.columns
to ensure that it's a flat list to start with.
edited Nov 23 at 3:13
answered Nov 20 at 20:07
jez
7,6671941
7,6671941
I renamed the columns with single square brackets, and it works as you mentioned above. Very helpful.
– mark
Nov 20 at 20:22
NB: digging a little deeper, it seems that nested lists of columns create (and whatever routine created yourDataFrame
might otherwise have set up) aMultiIndex
object instead of anIndex
, indata.columns
. I don't know enough yet to know what aMultiIndex
is capable of, but you might want to make sure that you're not throwing away some essential functionality that it provides.
– jez
Nov 20 at 20:52
add a comment |
I renamed the columns with single square brackets, and it works as you mentioned above. Very helpful.
– mark
Nov 20 at 20:22
NB: digging a little deeper, it seems that nested lists of columns create (and whatever routine created yourDataFrame
might otherwise have set up) aMultiIndex
object instead of anIndex
, indata.columns
. I don't know enough yet to know what aMultiIndex
is capable of, but you might want to make sure that you're not throwing away some essential functionality that it provides.
– jez
Nov 20 at 20:52
I renamed the columns with single square brackets, and it works as you mentioned above. Very helpful.
– mark
Nov 20 at 20:22
I renamed the columns with single square brackets, and it works as you mentioned above. Very helpful.
– mark
Nov 20 at 20:22
NB: digging a little deeper, it seems that nested lists of columns create (and whatever routine created your
DataFrame
might otherwise have set up) a MultiIndex
object instead of an Index
, in data.columns
. I don't know enough yet to know what a MultiIndex
is capable of, but you might want to make sure that you're not throwing away some essential functionality that it provides.– jez
Nov 20 at 20:52
NB: digging a little deeper, it seems that nested lists of columns create (and whatever routine created your
DataFrame
might otherwise have set up) a MultiIndex
object instead of an Index
, in data.columns
. I don't know enough yet to know what a MultiIndex
is capable of, but you might want to make sure that you're not throwing away some essential functionality that it provides.– jez
Nov 20 at 20:52
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400254%2fselecting-dataframe-rows-why-is-the-result-filled-with-nan-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown