Efficiently Reorder DataFrame of Lists/Pairings












1















I have an efficiency question. Essentially I have a dataframe filled with lists. Each list contains a value and a string describing that value (I assumed that a list format would be the easiest way to sort pairings). I need to separately reorder the values in each row with the highest value to the left and the lowest value to the right. I have found a solution to this, but given that I am a newer programmer, I wanted to know if you believe there is a quicker way of doing this operation without iterating through the indexes. Please feel free to provide any sort of feedback that you have. The only requirement I have is that the final solution is a dataframe where a value is immediately followed by its string descriptor (the string descriptor could be in its own adjacent column, doesn't need to be in a list).



Starting DF:



import pandas as pd
import numpy as np
master_stop = pd.DataFrame([[[56,'Support'],[58, 'MA']],
[[24.4, 'Support'],[23.3,'MA'],[25,'MA']]],
['Symbol_1','Symbol_2']).fillna(np.NaN)
master_stop

Out[2]:
0 1 2
Symbol_1 [56, Support] [58, MA] NaN
Symbol_2 [24.4, Support] [23.3, MA] [25, MA]


Sorting Method That I'm Looking to Improve:



def sort_df():
for index in master_stop.index:
master_stop.loc[index] = master_stop.loc[index].sort_values(ascending=False).values


Sorted DF:



sort_df()
master_stop
Out[3]:
0 1 2
Symbol_1 [58, MA] [56, Support] NaN
Symbol_2 [25, MA] [24.4, Support] [23.3, MA]









share|improve this question





























    1















    I have an efficiency question. Essentially I have a dataframe filled with lists. Each list contains a value and a string describing that value (I assumed that a list format would be the easiest way to sort pairings). I need to separately reorder the values in each row with the highest value to the left and the lowest value to the right. I have found a solution to this, but given that I am a newer programmer, I wanted to know if you believe there is a quicker way of doing this operation without iterating through the indexes. Please feel free to provide any sort of feedback that you have. The only requirement I have is that the final solution is a dataframe where a value is immediately followed by its string descriptor (the string descriptor could be in its own adjacent column, doesn't need to be in a list).



    Starting DF:



    import pandas as pd
    import numpy as np
    master_stop = pd.DataFrame([[[56,'Support'],[58, 'MA']],
    [[24.4, 'Support'],[23.3,'MA'],[25,'MA']]],
    ['Symbol_1','Symbol_2']).fillna(np.NaN)
    master_stop

    Out[2]:
    0 1 2
    Symbol_1 [56, Support] [58, MA] NaN
    Symbol_2 [24.4, Support] [23.3, MA] [25, MA]


    Sorting Method That I'm Looking to Improve:



    def sort_df():
    for index in master_stop.index:
    master_stop.loc[index] = master_stop.loc[index].sort_values(ascending=False).values


    Sorted DF:



    sort_df()
    master_stop
    Out[3]:
    0 1 2
    Symbol_1 [58, MA] [56, Support] NaN
    Symbol_2 [25, MA] [24.4, Support] [23.3, MA]









    share|improve this question



























      1












      1








      1








      I have an efficiency question. Essentially I have a dataframe filled with lists. Each list contains a value and a string describing that value (I assumed that a list format would be the easiest way to sort pairings). I need to separately reorder the values in each row with the highest value to the left and the lowest value to the right. I have found a solution to this, but given that I am a newer programmer, I wanted to know if you believe there is a quicker way of doing this operation without iterating through the indexes. Please feel free to provide any sort of feedback that you have. The only requirement I have is that the final solution is a dataframe where a value is immediately followed by its string descriptor (the string descriptor could be in its own adjacent column, doesn't need to be in a list).



      Starting DF:



      import pandas as pd
      import numpy as np
      master_stop = pd.DataFrame([[[56,'Support'],[58, 'MA']],
      [[24.4, 'Support'],[23.3,'MA'],[25,'MA']]],
      ['Symbol_1','Symbol_2']).fillna(np.NaN)
      master_stop

      Out[2]:
      0 1 2
      Symbol_1 [56, Support] [58, MA] NaN
      Symbol_2 [24.4, Support] [23.3, MA] [25, MA]


      Sorting Method That I'm Looking to Improve:



      def sort_df():
      for index in master_stop.index:
      master_stop.loc[index] = master_stop.loc[index].sort_values(ascending=False).values


      Sorted DF:



      sort_df()
      master_stop
      Out[3]:
      0 1 2
      Symbol_1 [58, MA] [56, Support] NaN
      Symbol_2 [25, MA] [24.4, Support] [23.3, MA]









      share|improve this question
















      I have an efficiency question. Essentially I have a dataframe filled with lists. Each list contains a value and a string describing that value (I assumed that a list format would be the easiest way to sort pairings). I need to separately reorder the values in each row with the highest value to the left and the lowest value to the right. I have found a solution to this, but given that I am a newer programmer, I wanted to know if you believe there is a quicker way of doing this operation without iterating through the indexes. Please feel free to provide any sort of feedback that you have. The only requirement I have is that the final solution is a dataframe where a value is immediately followed by its string descriptor (the string descriptor could be in its own adjacent column, doesn't need to be in a list).



      Starting DF:



      import pandas as pd
      import numpy as np
      master_stop = pd.DataFrame([[[56,'Support'],[58, 'MA']],
      [[24.4, 'Support'],[23.3,'MA'],[25,'MA']]],
      ['Symbol_1','Symbol_2']).fillna(np.NaN)
      master_stop

      Out[2]:
      0 1 2
      Symbol_1 [56, Support] [58, MA] NaN
      Symbol_2 [24.4, Support] [23.3, MA] [25, MA]


      Sorting Method That I'm Looking to Improve:



      def sort_df():
      for index in master_stop.index:
      master_stop.loc[index] = master_stop.loc[index].sort_values(ascending=False).values


      Sorted DF:



      sort_df()
      master_stop
      Out[3]:
      0 1 2
      Symbol_1 [58, MA] [56, Support] NaN
      Symbol_2 [25, MA] [24.4, Support] [23.3, MA]






      python pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 '18 at 19:41







      Whip

















      asked Nov 22 '18 at 19:12









      WhipWhip

      4917




      4917
























          1 Answer
          1






          active

          oldest

          votes


















          1














          Using stack, sort_values, sort_index and unstack can do the job. Not in one line but if you do



          master_stack = master_stop.stack().sort_index(level=0,ascending=[True])
          master_stop = (pd.Series(data = master_stack.sort_values(ascending=False).sort_index(level=0,ascending=[True]).values,
          index = master_stack.index)
          .unstack())


          then master_stop will be sorted as expected



                           0                1           2
          Symbol_1 [58, MA] [56, Support] NaN
          Symbol_2 [25, MA] [24.4, Support] [23.3, MA]





          share|improve this answer


























          • This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code). for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']

            – Whip
            Nov 22 '18 at 20:24













          • @Whip indeed, sorry I fixed my error by adding sort_index. See the code is edited. You can also have a look at groupby but I think using sort_index will be faster is you have a lot of rows in your original dataframe

            – Ben.T
            Nov 22 '18 at 20:57








          • 1





            Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.

            – Whip
            Nov 22 '18 at 21:27











          • @Whip good :) and I would guess that the gain in time will increase with the number of rows.

            – Ben.T
            Nov 22 '18 at 21:37











          • Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.

            – Whip
            Nov 23 '18 at 17:28













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436907%2fefficiently-reorder-dataframe-of-lists-pairings%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Using stack, sort_values, sort_index and unstack can do the job. Not in one line but if you do



          master_stack = master_stop.stack().sort_index(level=0,ascending=[True])
          master_stop = (pd.Series(data = master_stack.sort_values(ascending=False).sort_index(level=0,ascending=[True]).values,
          index = master_stack.index)
          .unstack())


          then master_stop will be sorted as expected



                           0                1           2
          Symbol_1 [58, MA] [56, Support] NaN
          Symbol_2 [25, MA] [24.4, Support] [23.3, MA]





          share|improve this answer


























          • This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code). for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']

            – Whip
            Nov 22 '18 at 20:24













          • @Whip indeed, sorry I fixed my error by adding sort_index. See the code is edited. You can also have a look at groupby but I think using sort_index will be faster is you have a lot of rows in your original dataframe

            – Ben.T
            Nov 22 '18 at 20:57








          • 1





            Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.

            – Whip
            Nov 22 '18 at 21:27











          • @Whip good :) and I would guess that the gain in time will increase with the number of rows.

            – Ben.T
            Nov 22 '18 at 21:37











          • Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.

            – Whip
            Nov 23 '18 at 17:28


















          1














          Using stack, sort_values, sort_index and unstack can do the job. Not in one line but if you do



          master_stack = master_stop.stack().sort_index(level=0,ascending=[True])
          master_stop = (pd.Series(data = master_stack.sort_values(ascending=False).sort_index(level=0,ascending=[True]).values,
          index = master_stack.index)
          .unstack())


          then master_stop will be sorted as expected



                           0                1           2
          Symbol_1 [58, MA] [56, Support] NaN
          Symbol_2 [25, MA] [24.4, Support] [23.3, MA]





          share|improve this answer


























          • This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code). for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']

            – Whip
            Nov 22 '18 at 20:24













          • @Whip indeed, sorry I fixed my error by adding sort_index. See the code is edited. You can also have a look at groupby but I think using sort_index will be faster is you have a lot of rows in your original dataframe

            – Ben.T
            Nov 22 '18 at 20:57








          • 1





            Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.

            – Whip
            Nov 22 '18 at 21:27











          • @Whip good :) and I would guess that the gain in time will increase with the number of rows.

            – Ben.T
            Nov 22 '18 at 21:37











          • Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.

            – Whip
            Nov 23 '18 at 17:28
















          1












          1








          1







          Using stack, sort_values, sort_index and unstack can do the job. Not in one line but if you do



          master_stack = master_stop.stack().sort_index(level=0,ascending=[True])
          master_stop = (pd.Series(data = master_stack.sort_values(ascending=False).sort_index(level=0,ascending=[True]).values,
          index = master_stack.index)
          .unstack())


          then master_stop will be sorted as expected



                           0                1           2
          Symbol_1 [58, MA] [56, Support] NaN
          Symbol_2 [25, MA] [24.4, Support] [23.3, MA]





          share|improve this answer















          Using stack, sort_values, sort_index and unstack can do the job. Not in one line but if you do



          master_stack = master_stop.stack().sort_index(level=0,ascending=[True])
          master_stop = (pd.Series(data = master_stack.sort_values(ascending=False).sort_index(level=0,ascending=[True]).values,
          index = master_stack.index)
          .unstack())


          then master_stop will be sorted as expected



                           0                1           2
          Symbol_1 [58, MA] [56, Support] NaN
          Symbol_2 [25, MA] [24.4, Support] [23.3, MA]






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 22 '18 at 20:56

























          answered Nov 22 '18 at 19:55









          Ben.TBen.T

          6,0272725




          6,0272725













          • This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code). for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']

            – Whip
            Nov 22 '18 at 20:24













          • @Whip indeed, sorry I fixed my error by adding sort_index. See the code is edited. You can also have a look at groupby but I think using sort_index will be faster is you have a lot of rows in your original dataframe

            – Ben.T
            Nov 22 '18 at 20:57








          • 1





            Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.

            – Whip
            Nov 22 '18 at 21:27











          • @Whip good :) and I would guess that the gain in time will increase with the number of rows.

            – Ben.T
            Nov 22 '18 at 21:37











          • Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.

            – Whip
            Nov 23 '18 at 17:28





















          • This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code). for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']

            – Whip
            Nov 22 '18 at 20:24













          • @Whip indeed, sorry I fixed my error by adding sort_index. See the code is edited. You can also have a look at groupby but I think using sort_index will be faster is you have a lot of rows in your original dataframe

            – Ben.T
            Nov 22 '18 at 20:57








          • 1





            Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.

            – Whip
            Nov 22 '18 at 21:27











          • @Whip good :) and I would guess that the gain in time will increase with the number of rows.

            – Ben.T
            Nov 22 '18 at 21:37











          • Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.

            – Whip
            Nov 23 '18 at 17:28



















          This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code). for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']

          – Whip
          Nov 22 '18 at 20:24







          This solution works for the two symbols but when I run it AFTER the code below, which increases the number of instances in master stop, I get a wild unsorted DF (Sorry, don't know how to make indents on comments for code). for i in range(100): master_stop.loc[i,0] = [100,'Support'] master_stop.loc[i,1] = [102,'MA']

          – Whip
          Nov 22 '18 at 20:24















          @Whip indeed, sorry I fixed my error by adding sort_index. See the code is edited. You can also have a look at groupby but I think using sort_index will be faster is you have a lot of rows in your original dataframe

          – Ben.T
          Nov 22 '18 at 20:57







          @Whip indeed, sorry I fixed my error by adding sort_index. See the code is edited. You can also have a look at groupby but I think using sort_index will be faster is you have a lot of rows in your original dataframe

          – Ben.T
          Nov 22 '18 at 20:57






          1




          1





          Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.

          – Whip
          Nov 22 '18 at 21:27





          Thank you! I can indeed confirm that your code is an improvement. My original code using an additional 100 entries ran at approx. 144ms, while yours is running at 88ms, providing a substantial improvement. Before I accept your answer, I plan on leaving the question open a bit longer in case anybody else has any unique alternative solutions.

          – Whip
          Nov 22 '18 at 21:27













          @Whip good :) and I would guess that the gain in time will increase with the number of rows.

          – Ben.T
          Nov 22 '18 at 21:37





          @Whip good :) and I would guess that the gain in time will increase with the number of rows.

          – Ben.T
          Nov 22 '18 at 21:37













          Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.

          – Whip
          Nov 23 '18 at 17:28







          Quick follow-up question! Why did you have to put s around the second 'True' statement in the second line when you call sort_index? I notice that in the first master_stack line, the s around True didn't make a difference in output, but in the second line having the brackets around [True] makes a big difference in the output. I'm guessing its function specific since sort_values didn't require s around the False call....but i couldn't find anything in the pandas documentation.

          – Whip
          Nov 23 '18 at 17:28




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436907%2fefficiently-reorder-dataframe-of-lists-pairings%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          404 Error Contact Form 7 ajax form submitting

          How to know if a Active Directory user can login interactively

          Refactoring coordinates for Minecraft Pi buildings written in Python