Efficiently Reorder DataFrame of Lists/Pairings
I have an efficiency question. Essentially I have a dataframe filled with lists. Each list contains a value and a string describing that value (I assumed that a list format would be the easiest way to sort pairings). I need to separately reorder the values in each row with the highest value to the left and the lowest value to the right. I have found a solution to this, but given that I am a newer programmer, I wanted to know if you believe there is a quicker way of doing this operation without iterating through the indexes. Please feel free to provide any sort of feedback that you have. The only requirement I have is that the final solution is a dataframe where a value is immediately followed by its string descriptor (the string descriptor could be in its own adjacent column, doesn't need to be in a list).
Starting DF:
import pandas as pd
import numpy as np
master_stop = pd.DataFrame([[[56,'Support'],[58, 'MA']],
[[24.4, 'Support'],[23.3,'MA'],[25,'MA']]],
['Symbol_1','Symbol_2']).fillna(np.NaN)
master_stop
Out[2]:
0 1 2
Symbol_1 [56, Support] [58, MA] NaN
Symbol_2 [24.4, Support] [23.3, MA] [25, MA]
Sorting Method That I'm Looking to Improve:
def sort_df():
for index in master_stop.index:
master_stop.loc[index] = master_stop.loc[index].sort_values(ascending=False).values
Sorted DF:
sort_df()
master_stop
Out[3]:
0 1 2
Symbol_1 [58, MA] [56, Support] NaN
Symbol_2 [25, MA] [24.4, Support] [23.3, MA]
python pandas
add a comment |
I have an efficiency question. Essentially I have a dataframe filled with lists. Each list contains a value and a string describing that value (I assumed that a list format would be the easiest way to sort pairings). I need to separately reorder the values in each row with the highest value to the left and the lowest value to the right. I have found a solution to this, but given that I am a newer programmer, I wanted to know if you believe there is a quicker way of doing this operation without iterating through the indexes. Please feel free to provide any sort of feedback that you have. The only requirement I have is that the final solution is a dataframe where a value is immediately followed by its string descriptor (the string descriptor could be in its own adjacent column, doesn't need to be in a list).
Starting DF:
import pandas as pd
import numpy as np
master_stop = pd.DataFrame([[[56,'Support'],[58, 'MA']],
[[24.4, 'Support'],[23.3,'MA'],[25,'MA']]],
['Symbol_1','Symbol_2']).fillna(np.NaN)
master_stop
Out[2]:
0 1 2
Symbol_1 [56, Support] [58, MA] NaN
Symbol_2 [24.4, Support] [23.3, MA] [25, MA]
Sorting Method That I'm Looking to Improve:
def sort_df():
for index in master_stop.index:
master_stop.loc[index] = master_stop.loc[index].sort_values(ascending=False).values
Sorted DF:
sort_df()
master_stop
Out[3]:
0 1 2
Symbol_1 [58, MA] [56, Support] NaN
Symbol_2 [25, MA] [24.4, Support] [23.3, MA]
python pandas
add a comment |
I have an efficiency question. Essentially I have a dataframe filled with lists. Each list contains a value and a string describing that value (I assumed that a list format would be the easiest way to sort pairings). I need to separately reorder the values in each row with the highest value to the left and the lowest value to the right. I have found a solution to this, but given that I am a newer programmer, I wanted to know if you believe there is a quicker way of doing this operation without iterating through the indexes. Please feel free to provide any sort of feedback that you have. The only requirement I have is that the final solution is a dataframe where a value is immediately followed by its string descriptor (the string descriptor could be in its own adjacent column, doesn't need to be in a list).
Starting DF:
import pandas as pd
import numpy as np
master_stop = pd.DataFrame([[[56,'Support'],[58, 'MA']],
[[24.4, 'Support'],[23.3,'MA'],[25,'MA']]],
['Symbol_1','Symbol_2']).fillna(np.NaN)
master_stop
Out[2]:
0 1 2
Symbol_1 [56, Support] [58, MA] NaN
Symbol_2 [24.4, Support] [23.3, MA] [25, MA]
Sorting Method That I'm Looking to Improve:
def sort_df():
for index in master_stop.index:
master_stop.loc[index] = master_stop.loc[index].sort_values(ascending=False).values
Sorted DF:
sort_df()
master_stop
Out[3]:
0 1 2
Symbol_1 [58, MA] [56, Support] NaN
Symbol_2 [25, MA] [24.4, Support] [23.3, MA]
python pandas
I have an efficiency question. Essentially I have a dataframe filled with lists. Each list contains a value and a string describing that value (I assumed that a list format would be the easiest way to sort pairings). I need to separately reorder the values in each row with the highest value to the left and the lowest value to the right. I have found a solution to this, but given that I am a newer programmer, I wanted to know if you believe there is a quicker way of doing this operation without iterating through the indexes. Please feel free to provide any sort of feedback that you have. The only requirement I have is that the final solution is a dataframe where a value is immediately followed by its string descriptor (the string descriptor could be in its own adjacent column, doesn't need to be in a list).
Starting DF:
import pandas as pd
import numpy as np
master_stop = pd.DataFrame([[[56,'Support'],[58, 'MA']],
[[24.4, 'Support'],[23.3,'MA'],[25,'MA']]],
['Symbol_1','Symbol_2']).fillna(np.NaN)
master_stop
Out[2]:
0 1 2
Symbol_1 [56, Support] [58, MA] NaN
Symbol_2 [24.4, Support] [23.3, MA] [25, MA]
Sorting Method That I'm Looking to Improve:
def sort_df():
for index in master_stop.index:
master_stop.loc[index] = master_stop.loc[index].sort_values(ascending=False).values
Sorted DF:
sort_df()
master_stop
Out[3]:
0 1 2
Symbol_1 [58, MA] [56, Support] NaN
Symbol_2 [25, MA] [24.4, Support] [23.3, MA]
python pandas
python pandas
edited Nov 22 '18 at 19:41
Whip
asked Nov 22 '18 at 19:12
WhipWhip
4917
4917
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Using stack
, sort_values
, sort_index
and unstack
can do the job. Not in one line but if you do
master_stack = master_stop.stack().sort_index(level=0,ascending=[True])
master_stop = (pd.Series(data = master_stack.sort_values(ascending=False).sort_index(level=0,ascending=[True]).values,
index = master_stack.index)
.unstack())
then master_stop
will be sorted as expected
0 1 2
Symbol_1 [58, MA] [56, Support] NaN
Symbol_2 [25, MA] [24.4, Support] [23.3, MA]
This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code).for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']
– Whip
Nov 22 '18 at 20:24
@Whip indeed, sorry I fixed my error by addingsort_index
. See the code is edited. You can also have a look atgroupby
but I think usingsort_index
will be faster is you have a lot of rows in your original dataframe
– Ben.T
Nov 22 '18 at 20:57
1
Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.
– Whip
Nov 22 '18 at 21:27
@Whip good :) and I would guess that the gain in time will increase with the number of rows.
– Ben.T
Nov 22 '18 at 21:37
Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.
– Whip
Nov 23 '18 at 17:28
|
show 4 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436907%2fefficiently-reorder-dataframe-of-lists-pairings%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using stack
, sort_values
, sort_index
and unstack
can do the job. Not in one line but if you do
master_stack = master_stop.stack().sort_index(level=0,ascending=[True])
master_stop = (pd.Series(data = master_stack.sort_values(ascending=False).sort_index(level=0,ascending=[True]).values,
index = master_stack.index)
.unstack())
then master_stop
will be sorted as expected
0 1 2
Symbol_1 [58, MA] [56, Support] NaN
Symbol_2 [25, MA] [24.4, Support] [23.3, MA]
This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code).for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']
– Whip
Nov 22 '18 at 20:24
@Whip indeed, sorry I fixed my error by addingsort_index
. See the code is edited. You can also have a look atgroupby
but I think usingsort_index
will be faster is you have a lot of rows in your original dataframe
– Ben.T
Nov 22 '18 at 20:57
1
Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.
– Whip
Nov 22 '18 at 21:27
@Whip good :) and I would guess that the gain in time will increase with the number of rows.
– Ben.T
Nov 22 '18 at 21:37
Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.
– Whip
Nov 23 '18 at 17:28
|
show 4 more comments
Using stack
, sort_values
, sort_index
and unstack
can do the job. Not in one line but if you do
master_stack = master_stop.stack().sort_index(level=0,ascending=[True])
master_stop = (pd.Series(data = master_stack.sort_values(ascending=False).sort_index(level=0,ascending=[True]).values,
index = master_stack.index)
.unstack())
then master_stop
will be sorted as expected
0 1 2
Symbol_1 [58, MA] [56, Support] NaN
Symbol_2 [25, MA] [24.4, Support] [23.3, MA]
This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code).for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']
– Whip
Nov 22 '18 at 20:24
@Whip indeed, sorry I fixed my error by addingsort_index
. See the code is edited. You can also have a look atgroupby
but I think usingsort_index
will be faster is you have a lot of rows in your original dataframe
– Ben.T
Nov 22 '18 at 20:57
1
Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.
– Whip
Nov 22 '18 at 21:27
@Whip good :) and I would guess that the gain in time will increase with the number of rows.
– Ben.T
Nov 22 '18 at 21:37
Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.
– Whip
Nov 23 '18 at 17:28
|
show 4 more comments
Using stack
, sort_values
, sort_index
and unstack
can do the job. Not in one line but if you do
master_stack = master_stop.stack().sort_index(level=0,ascending=[True])
master_stop = (pd.Series(data = master_stack.sort_values(ascending=False).sort_index(level=0,ascending=[True]).values,
index = master_stack.index)
.unstack())
then master_stop
will be sorted as expected
0 1 2
Symbol_1 [58, MA] [56, Support] NaN
Symbol_2 [25, MA] [24.4, Support] [23.3, MA]
Using stack
, sort_values
, sort_index
and unstack
can do the job. Not in one line but if you do
master_stack = master_stop.stack().sort_index(level=0,ascending=[True])
master_stop = (pd.Series(data = master_stack.sort_values(ascending=False).sort_index(level=0,ascending=[True]).values,
index = master_stack.index)
.unstack())
then master_stop
will be sorted as expected
0 1 2
Symbol_1 [58, MA] [56, Support] NaN
Symbol_2 [25, MA] [24.4, Support] [23.3, MA]
edited Nov 22 '18 at 20:56
answered Nov 22 '18 at 19:55
Ben.TBen.T
6,0272725
6,0272725
This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code).for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']
– Whip
Nov 22 '18 at 20:24
@Whip indeed, sorry I fixed my error by addingsort_index
. See the code is edited. You can also have a look atgroupby
but I think usingsort_index
will be faster is you have a lot of rows in your original dataframe
– Ben.T
Nov 22 '18 at 20:57
1
Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.
– Whip
Nov 22 '18 at 21:27
@Whip good :) and I would guess that the gain in time will increase with the number of rows.
– Ben.T
Nov 22 '18 at 21:37
Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.
– Whip
Nov 23 '18 at 17:28
|
show 4 more comments
This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code).for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']
– Whip
Nov 22 '18 at 20:24
@Whip indeed, sorry I fixed my error by addingsort_index
. See the code is edited. You can also have a look atgroupby
but I think usingsort_index
will be faster is you have a lot of rows in your original dataframe
– Ben.T
Nov 22 '18 at 20:57
1
Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.
– Whip
Nov 22 '18 at 21:27
@Whip good :) and I would guess that the gain in time will increase with the number of rows.
– Ben.T
Nov 22 '18 at 21:37
Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.
– Whip
Nov 23 '18 at 17:28
This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code).
for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']
– Whip
Nov 22 '18 at 20:24
This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code).
for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']
– Whip
Nov 22 '18 at 20:24
@Whip indeed, sorry I fixed my error by adding
sort_index
. See the code is edited. You can also have a look at groupby
but I think using sort_index
will be faster is you have a lot of rows in your original dataframe– Ben.T
Nov 22 '18 at 20:57
@Whip indeed, sorry I fixed my error by adding
sort_index
. See the code is edited. You can also have a look at groupby
but I think using sort_index
will be faster is you have a lot of rows in your original dataframe– Ben.T
Nov 22 '18 at 20:57
1
1
Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.
– Whip
Nov 22 '18 at 21:27
Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.
– Whip
Nov 22 '18 at 21:27
@Whip good :) and I would guess that the gain in time will increase with the number of rows.
– Ben.T
Nov 22 '18 at 21:37
@Whip good :) and I would guess that the gain in time will increase with the number of rows.
– Ben.T
Nov 22 '18 at 21:37
Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.
– Whip
Nov 23 '18 at 17:28
Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.
– Whip
Nov 23 '18 at 17:28
|
show 4 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436907%2fefficiently-reorder-dataframe-of-lists-pairings%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown