How to count occurence of true positives using pandas or numpy?












2














I have two columns, Prediction and Ground Truth.
I want to get a count of true positives as a series using either numpy or pandas.



For example, my data is:



Prediction GroundTruth
True True
True False
True True
False True
False False
True True


I want a list that should have the following output:



tp_list = [1,1,2,2,2,3]


Is there a one-liner way to do this in numpy or pandas?



Currently, this is my solution:



tp = 0
for p, g in zip(data.Prediction, data.GroundTruth):
if p and g: # TP case
tp = tp + 1
tp_list.append(tp)









share|improve this question





























    2














    I have two columns, Prediction and Ground Truth.
    I want to get a count of true positives as a series using either numpy or pandas.



    For example, my data is:



    Prediction GroundTruth
    True True
    True False
    True True
    False True
    False False
    True True


    I want a list that should have the following output:



    tp_list = [1,1,2,2,2,3]


    Is there a one-liner way to do this in numpy or pandas?



    Currently, this is my solution:



    tp = 0
    for p, g in zip(data.Prediction, data.GroundTruth):
    if p and g: # TP case
    tp = tp + 1
    tp_list.append(tp)









    share|improve this question



























      2












      2








      2







      I have two columns, Prediction and Ground Truth.
      I want to get a count of true positives as a series using either numpy or pandas.



      For example, my data is:



      Prediction GroundTruth
      True True
      True False
      True True
      False True
      False False
      True True


      I want a list that should have the following output:



      tp_list = [1,1,2,2,2,3]


      Is there a one-liner way to do this in numpy or pandas?



      Currently, this is my solution:



      tp = 0
      for p, g in zip(data.Prediction, data.GroundTruth):
      if p and g: # TP case
      tp = tp + 1
      tp_list.append(tp)









      share|improve this question















      I have two columns, Prediction and Ground Truth.
      I want to get a count of true positives as a series using either numpy or pandas.



      For example, my data is:



      Prediction GroundTruth
      True True
      True False
      True True
      False True
      False False
      True True


      I want a list that should have the following output:



      tp_list = [1,1,2,2,2,3]


      Is there a one-liner way to do this in numpy or pandas?



      Currently, this is my solution:



      tp = 0
      for p, g in zip(data.Prediction, data.GroundTruth):
      if p and g: # TP case
      tp = tp + 1
      tp_list.append(tp)






      python pandas numpy






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 21 at 2:36









      RafaelC

      25.9k82649




      25.9k82649










      asked Nov 21 at 2:18









      Raashid

      325




      325
























          3 Answers
          3






          active

          oldest

          votes


















          4














          To get a running count (i.e., cumulative sum) of true positives, i.e., Prediction == True if and only if GroundTruth == True, the solution is a modification of @RafaelC's answer:



          (df['Prediction'] & df['GroundTruth']).cumsum()
          0 1
          1 1
          2 2
          3 2
          4 2
          5 3

          (df['Prediction'] & df['GroundTruth']).cumsum().tolist()
          [1, 1, 2, 2, 2, 3]





          share|improve this answer





























            3














            If you want to know how many True you predicted that are actually True, use



            (df['Prediction'] & df['GroundTruth']).cumsum()

            0 1
            1 1
            2 2
            3 2
            4 2
            5 3
            dtype: int64


            (thanks @Peter Leimbigiler for chiming in)



            If you want to know how many you have predicted correctly just compare and use cumsum



            (df['Prediction'] == df['GroundTruth']).cumsum()


            which outputs



            0    1
            1 1
            2 2
            3 2
            4 3
            5 4
            dtype: int64


            Can always get a list by using .tolist()



            (df4['Prediction'] == df4['GroundTruth']).cumsum().tolist()

            [1, 1, 2, 2, 3, 4]





            share|improve this answer























            • df['Prediction'] == df['GroundTruth'] gives true positives plus true negatives, basically the accuracy score before dividing by the number of predictions. The running count of true positives is given by df['Prediction'] & df['GroundTruth'].
              – Peter Leimbigler
              Nov 21 at 2:22












            • if the ground truth is False and the prediction is False, what do you call this outcome? The predictor correctly classified a negative instance. It's a true negative.
              – Peter Leimbigler
              Nov 21 at 2:25






            • 2




              @RafaelC, the distinction between "correct prediction" and "correct prediction of a negative outcome" is important :) en.wikipedia.org/wiki/Confusion_matrix
              – Peter Leimbigler
              Nov 21 at 2:30






            • 1




              Thanks for chiming in guys :) Got the point
              – RafaelC
              Nov 21 at 2:31






            • 1




              Thanks a ton guys! you are too fast. I posted this question right before boarding a flight and I had an answer after boarding!
              – Raashid
              Nov 21 at 13:42



















            2














            Maybe you can using all



            df.all(1).cumsum().tolist()
            Out[156]: [1, 1, 2, 2, 2, 3]


            numpy solution



            np.cumsum(np.all(df.values,1))
            Out[159]: array([1, 1, 2, 2, 2, 3], dtype=int32)





            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53404432%2fhow-to-count-occurence-of-true-positives-using-pandas-or-numpy%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              4














              To get a running count (i.e., cumulative sum) of true positives, i.e., Prediction == True if and only if GroundTruth == True, the solution is a modification of @RafaelC's answer:



              (df['Prediction'] & df['GroundTruth']).cumsum()
              0 1
              1 1
              2 2
              3 2
              4 2
              5 3

              (df['Prediction'] & df['GroundTruth']).cumsum().tolist()
              [1, 1, 2, 2, 2, 3]





              share|improve this answer


























                4














                To get a running count (i.e., cumulative sum) of true positives, i.e., Prediction == True if and only if GroundTruth == True, the solution is a modification of @RafaelC's answer:



                (df['Prediction'] & df['GroundTruth']).cumsum()
                0 1
                1 1
                2 2
                3 2
                4 2
                5 3

                (df['Prediction'] & df['GroundTruth']).cumsum().tolist()
                [1, 1, 2, 2, 2, 3]





                share|improve this answer
























                  4












                  4








                  4






                  To get a running count (i.e., cumulative sum) of true positives, i.e., Prediction == True if and only if GroundTruth == True, the solution is a modification of @RafaelC's answer:



                  (df['Prediction'] & df['GroundTruth']).cumsum()
                  0 1
                  1 1
                  2 2
                  3 2
                  4 2
                  5 3

                  (df['Prediction'] & df['GroundTruth']).cumsum().tolist()
                  [1, 1, 2, 2, 2, 3]





                  share|improve this answer












                  To get a running count (i.e., cumulative sum) of true positives, i.e., Prediction == True if and only if GroundTruth == True, the solution is a modification of @RafaelC's answer:



                  (df['Prediction'] & df['GroundTruth']).cumsum()
                  0 1
                  1 1
                  2 2
                  3 2
                  4 2
                  5 3

                  (df['Prediction'] & df['GroundTruth']).cumsum().tolist()
                  [1, 1, 2, 2, 2, 3]






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 21 at 2:29









                  Peter Leimbigler

                  3,7041415




                  3,7041415

























                      3














                      If you want to know how many True you predicted that are actually True, use



                      (df['Prediction'] & df['GroundTruth']).cumsum()

                      0 1
                      1 1
                      2 2
                      3 2
                      4 2
                      5 3
                      dtype: int64


                      (thanks @Peter Leimbigiler for chiming in)



                      If you want to know how many you have predicted correctly just compare and use cumsum



                      (df['Prediction'] == df['GroundTruth']).cumsum()


                      which outputs



                      0    1
                      1 1
                      2 2
                      3 2
                      4 3
                      5 4
                      dtype: int64


                      Can always get a list by using .tolist()



                      (df4['Prediction'] == df4['GroundTruth']).cumsum().tolist()

                      [1, 1, 2, 2, 3, 4]





                      share|improve this answer























                      • df['Prediction'] == df['GroundTruth'] gives true positives plus true negatives, basically the accuracy score before dividing by the number of predictions. The running count of true positives is given by df['Prediction'] & df['GroundTruth'].
                        – Peter Leimbigler
                        Nov 21 at 2:22












                      • if the ground truth is False and the prediction is False, what do you call this outcome? The predictor correctly classified a negative instance. It's a true negative.
                        – Peter Leimbigler
                        Nov 21 at 2:25






                      • 2




                        @RafaelC, the distinction between "correct prediction" and "correct prediction of a negative outcome" is important :) en.wikipedia.org/wiki/Confusion_matrix
                        – Peter Leimbigler
                        Nov 21 at 2:30






                      • 1




                        Thanks for chiming in guys :) Got the point
                        – RafaelC
                        Nov 21 at 2:31






                      • 1




                        Thanks a ton guys! you are too fast. I posted this question right before boarding a flight and I had an answer after boarding!
                        – Raashid
                        Nov 21 at 13:42
















                      3














                      If you want to know how many True you predicted that are actually True, use



                      (df['Prediction'] & df['GroundTruth']).cumsum()

                      0 1
                      1 1
                      2 2
                      3 2
                      4 2
                      5 3
                      dtype: int64


                      (thanks @Peter Leimbigiler for chiming in)



                      If you want to know how many you have predicted correctly just compare and use cumsum



                      (df['Prediction'] == df['GroundTruth']).cumsum()


                      which outputs



                      0    1
                      1 1
                      2 2
                      3 2
                      4 3
                      5 4
                      dtype: int64


                      Can always get a list by using .tolist()



                      (df4['Prediction'] == df4['GroundTruth']).cumsum().tolist()

                      [1, 1, 2, 2, 3, 4]





                      share|improve this answer























                      • df['Prediction'] == df['GroundTruth'] gives true positives plus true negatives, basically the accuracy score before dividing by the number of predictions. The running count of true positives is given by df['Prediction'] & df['GroundTruth'].
                        – Peter Leimbigler
                        Nov 21 at 2:22












                      • if the ground truth is False and the prediction is False, what do you call this outcome? The predictor correctly classified a negative instance. It's a true negative.
                        – Peter Leimbigler
                        Nov 21 at 2:25






                      • 2




                        @RafaelC, the distinction between "correct prediction" and "correct prediction of a negative outcome" is important :) en.wikipedia.org/wiki/Confusion_matrix
                        – Peter Leimbigler
                        Nov 21 at 2:30






                      • 1




                        Thanks for chiming in guys :) Got the point
                        – RafaelC
                        Nov 21 at 2:31






                      • 1




                        Thanks a ton guys! you are too fast. I posted this question right before boarding a flight and I had an answer after boarding!
                        – Raashid
                        Nov 21 at 13:42














                      3












                      3








                      3






                      If you want to know how many True you predicted that are actually True, use



                      (df['Prediction'] & df['GroundTruth']).cumsum()

                      0 1
                      1 1
                      2 2
                      3 2
                      4 2
                      5 3
                      dtype: int64


                      (thanks @Peter Leimbigiler for chiming in)



                      If you want to know how many you have predicted correctly just compare and use cumsum



                      (df['Prediction'] == df['GroundTruth']).cumsum()


                      which outputs



                      0    1
                      1 1
                      2 2
                      3 2
                      4 3
                      5 4
                      dtype: int64


                      Can always get a list by using .tolist()



                      (df4['Prediction'] == df4['GroundTruth']).cumsum().tolist()

                      [1, 1, 2, 2, 3, 4]





                      share|improve this answer














                      If you want to know how many True you predicted that are actually True, use



                      (df['Prediction'] & df['GroundTruth']).cumsum()

                      0 1
                      1 1
                      2 2
                      3 2
                      4 2
                      5 3
                      dtype: int64


                      (thanks @Peter Leimbigiler for chiming in)



                      If you want to know how many you have predicted correctly just compare and use cumsum



                      (df['Prediction'] == df['GroundTruth']).cumsum()


                      which outputs



                      0    1
                      1 1
                      2 2
                      3 2
                      4 3
                      5 4
                      dtype: int64


                      Can always get a list by using .tolist()



                      (df4['Prediction'] == df4['GroundTruth']).cumsum().tolist()

                      [1, 1, 2, 2, 3, 4]






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Nov 21 at 2:29

























                      answered Nov 21 at 2:20









                      RafaelC

                      25.9k82649




                      25.9k82649












                      • df['Prediction'] == df['GroundTruth'] gives true positives plus true negatives, basically the accuracy score before dividing by the number of predictions. The running count of true positives is given by df['Prediction'] & df['GroundTruth'].
                        – Peter Leimbigler
                        Nov 21 at 2:22












                      • if the ground truth is False and the prediction is False, what do you call this outcome? The predictor correctly classified a negative instance. It's a true negative.
                        – Peter Leimbigler
                        Nov 21 at 2:25






                      • 2




                        @RafaelC, the distinction between "correct prediction" and "correct prediction of a negative outcome" is important :) en.wikipedia.org/wiki/Confusion_matrix
                        – Peter Leimbigler
                        Nov 21 at 2:30






                      • 1




                        Thanks for chiming in guys :) Got the point
                        – RafaelC
                        Nov 21 at 2:31






                      • 1




                        Thanks a ton guys! you are too fast. I posted this question right before boarding a flight and I had an answer after boarding!
                        – Raashid
                        Nov 21 at 13:42


















                      • df['Prediction'] == df['GroundTruth'] gives true positives plus true negatives, basically the accuracy score before dividing by the number of predictions. The running count of true positives is given by df['Prediction'] & df['GroundTruth'].
                        – Peter Leimbigler
                        Nov 21 at 2:22












                      • if the ground truth is False and the prediction is False, what do you call this outcome? The predictor correctly classified a negative instance. It's a true negative.
                        – Peter Leimbigler
                        Nov 21 at 2:25






                      • 2




                        @RafaelC, the distinction between "correct prediction" and "correct prediction of a negative outcome" is important :) en.wikipedia.org/wiki/Confusion_matrix
                        – Peter Leimbigler
                        Nov 21 at 2:30






                      • 1




                        Thanks for chiming in guys :) Got the point
                        – RafaelC
                        Nov 21 at 2:31






                      • 1




                        Thanks a ton guys! you are too fast. I posted this question right before boarding a flight and I had an answer after boarding!
                        – Raashid
                        Nov 21 at 13:42
















                      df['Prediction'] == df['GroundTruth'] gives true positives plus true negatives, basically the accuracy score before dividing by the number of predictions. The running count of true positives is given by df['Prediction'] & df['GroundTruth'].
                      – Peter Leimbigler
                      Nov 21 at 2:22






                      df['Prediction'] == df['GroundTruth'] gives true positives plus true negatives, basically the accuracy score before dividing by the number of predictions. The running count of true positives is given by df['Prediction'] & df['GroundTruth'].
                      – Peter Leimbigler
                      Nov 21 at 2:22














                      if the ground truth is False and the prediction is False, what do you call this outcome? The predictor correctly classified a negative instance. It's a true negative.
                      – Peter Leimbigler
                      Nov 21 at 2:25




                      if the ground truth is False and the prediction is False, what do you call this outcome? The predictor correctly classified a negative instance. It's a true negative.
                      – Peter Leimbigler
                      Nov 21 at 2:25




                      2




                      2




                      @RafaelC, the distinction between "correct prediction" and "correct prediction of a negative outcome" is important :) en.wikipedia.org/wiki/Confusion_matrix
                      – Peter Leimbigler
                      Nov 21 at 2:30




                      @RafaelC, the distinction between "correct prediction" and "correct prediction of a negative outcome" is important :) en.wikipedia.org/wiki/Confusion_matrix
                      – Peter Leimbigler
                      Nov 21 at 2:30




                      1




                      1




                      Thanks for chiming in guys :) Got the point
                      – RafaelC
                      Nov 21 at 2:31




                      Thanks for chiming in guys :) Got the point
                      – RafaelC
                      Nov 21 at 2:31




                      1




                      1




                      Thanks a ton guys! you are too fast. I posted this question right before boarding a flight and I had an answer after boarding!
                      – Raashid
                      Nov 21 at 13:42




                      Thanks a ton guys! you are too fast. I posted this question right before boarding a flight and I had an answer after boarding!
                      – Raashid
                      Nov 21 at 13:42











                      2














                      Maybe you can using all



                      df.all(1).cumsum().tolist()
                      Out[156]: [1, 1, 2, 2, 2, 3]


                      numpy solution



                      np.cumsum(np.all(df.values,1))
                      Out[159]: array([1, 1, 2, 2, 2, 3], dtype=int32)





                      share|improve this answer


























                        2














                        Maybe you can using all



                        df.all(1).cumsum().tolist()
                        Out[156]: [1, 1, 2, 2, 2, 3]


                        numpy solution



                        np.cumsum(np.all(df.values,1))
                        Out[159]: array([1, 1, 2, 2, 2, 3], dtype=int32)





                        share|improve this answer
























                          2












                          2








                          2






                          Maybe you can using all



                          df.all(1).cumsum().tolist()
                          Out[156]: [1, 1, 2, 2, 2, 3]


                          numpy solution



                          np.cumsum(np.all(df.values,1))
                          Out[159]: array([1, 1, 2, 2, 2, 3], dtype=int32)





                          share|improve this answer












                          Maybe you can using all



                          df.all(1).cumsum().tolist()
                          Out[156]: [1, 1, 2, 2, 2, 3]


                          numpy solution



                          np.cumsum(np.all(df.values,1))
                          Out[159]: array([1, 1, 2, 2, 2, 3], dtype=int32)






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 21 at 2:48









                          W-B

                          99.8k73163




                          99.8k73163






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53404432%2fhow-to-count-occurence-of-true-positives-using-pandas-or-numpy%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              404 Error Contact Form 7 ajax form submitting

                              How to know if a Active Directory user can login interactively

                              TypeError: fit_transform() missing 1 required positional argument: 'X'