Compute how many samples have been improved, according to a minimum threshold or confidence interval, in a...












0















I have the following dataframe:



ID      VAL1    VAL2
Q2241 0.3333 0.3353
Q2242 0.5 0.5
Q2243 0.3333 0.3333
Q2244 0.2137 0.4792
Q2245 0.1429 0.2
Q2246 0.5 0.5
Q2247 0.4167 0.6667
Q2248 1 1
Q2249 0.125 0.0909
Q2250 0.2 0.2
Q2251 0.325 0.2667
Q2252 0.1667 0.2
Q2253 0.3333 0.25
Q2254 0.45 0.8333
Q2255 0.3333 0.5
Q2256 1 1
Q2257 0.5 0.51
Q2258 0.3929 0.3333
Q2259 0.3611 0.625


Is there a way to correctly compute the number of samples (ID) where VAL2 is significantly higher/lower than VAL1 in a given dataframe.
I'm looking for something like t-test, where a measure gives results like the following example:



Win Tie Loss        
64 36 137


where:




Win: number of IDs where VAL2 is higher than VAL1 with some confidence interval  
Tie: number of IDs where VAL2 ~ VAL1 (no significant difference, 0.0001 for example)
Loss: number of IDs where VAL2 is lower than VAL1 with some confidence interval










share|improve this question



























    0















    I have the following dataframe:



    ID      VAL1    VAL2
    Q2241 0.3333 0.3353
    Q2242 0.5 0.5
    Q2243 0.3333 0.3333
    Q2244 0.2137 0.4792
    Q2245 0.1429 0.2
    Q2246 0.5 0.5
    Q2247 0.4167 0.6667
    Q2248 1 1
    Q2249 0.125 0.0909
    Q2250 0.2 0.2
    Q2251 0.325 0.2667
    Q2252 0.1667 0.2
    Q2253 0.3333 0.25
    Q2254 0.45 0.8333
    Q2255 0.3333 0.5
    Q2256 1 1
    Q2257 0.5 0.51
    Q2258 0.3929 0.3333
    Q2259 0.3611 0.625


    Is there a way to correctly compute the number of samples (ID) where VAL2 is significantly higher/lower than VAL1 in a given dataframe.
    I'm looking for something like t-test, where a measure gives results like the following example:



    Win Tie Loss        
    64 36 137


    where:




    Win: number of IDs where VAL2 is higher than VAL1 with some confidence interval  
    Tie: number of IDs where VAL2 ~ VAL1 (no significant difference, 0.0001 for example)
    Loss: number of IDs where VAL2 is lower than VAL1 with some confidence interval










    share|improve this question

























      0












      0








      0








      I have the following dataframe:



      ID      VAL1    VAL2
      Q2241 0.3333 0.3353
      Q2242 0.5 0.5
      Q2243 0.3333 0.3333
      Q2244 0.2137 0.4792
      Q2245 0.1429 0.2
      Q2246 0.5 0.5
      Q2247 0.4167 0.6667
      Q2248 1 1
      Q2249 0.125 0.0909
      Q2250 0.2 0.2
      Q2251 0.325 0.2667
      Q2252 0.1667 0.2
      Q2253 0.3333 0.25
      Q2254 0.45 0.8333
      Q2255 0.3333 0.5
      Q2256 1 1
      Q2257 0.5 0.51
      Q2258 0.3929 0.3333
      Q2259 0.3611 0.625


      Is there a way to correctly compute the number of samples (ID) where VAL2 is significantly higher/lower than VAL1 in a given dataframe.
      I'm looking for something like t-test, where a measure gives results like the following example:



      Win Tie Loss        
      64 36 137


      where:




      Win: number of IDs where VAL2 is higher than VAL1 with some confidence interval  
      Tie: number of IDs where VAL2 ~ VAL1 (no significant difference, 0.0001 for example)
      Loss: number of IDs where VAL2 is lower than VAL1 with some confidence interval










      share|improve this question














      I have the following dataframe:



      ID      VAL1    VAL2
      Q2241 0.3333 0.3353
      Q2242 0.5 0.5
      Q2243 0.3333 0.3333
      Q2244 0.2137 0.4792
      Q2245 0.1429 0.2
      Q2246 0.5 0.5
      Q2247 0.4167 0.6667
      Q2248 1 1
      Q2249 0.125 0.0909
      Q2250 0.2 0.2
      Q2251 0.325 0.2667
      Q2252 0.1667 0.2
      Q2253 0.3333 0.25
      Q2254 0.45 0.8333
      Q2255 0.3333 0.5
      Q2256 1 1
      Q2257 0.5 0.51
      Q2258 0.3929 0.3333
      Q2259 0.3611 0.625


      Is there a way to correctly compute the number of samples (ID) where VAL2 is significantly higher/lower than VAL1 in a given dataframe.
      I'm looking for something like t-test, where a measure gives results like the following example:



      Win Tie Loss        
      64 36 137


      where:




      Win: number of IDs where VAL2 is higher than VAL1 with some confidence interval  
      Tie: number of IDs where VAL2 ~ VAL1 (no significant difference, 0.0001 for example)
      Loss: number of IDs where VAL2 is lower than VAL1 with some confidence interval







      python dataframe statistics difference






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 24 '18 at 10:28









      Belkacem ThiziriBelkacem Thiziri

      69111




      69111
























          1 Answer
          1






          active

          oldest

          votes


















          1














          tol = 0.0001
          win = (df.VAL2 > (df.VAL1 + tol)).sum()
          loss = (df.VAL2 < (df.VAL1 - tol)).sum()
          tie = ((df.VAL1 - df.VAL2).abs() <= tol).sum()

          df = pd.DataFrame([{'Win': win, 'Tie':tie, 'Loss': loss}])
          print (df)
          # Loss Tie Win
          # 0 4 6 9





          share|improve this answer
























          • Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?

            – Belkacem Thiziri
            Nov 24 '18 at 12:31











          • There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can use scipy.stats.ttest_ind in Python.

            – Ghilas BELHADJ
            Nov 24 '18 at 12:37













          • The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?

            – Belkacem Thiziri
            Nov 24 '18 at 12:51











          • Probably not. but I'll let you know if I find something.

            – Ghilas BELHADJ
            Nov 24 '18 at 13:38











          • Okay, thanks. I'll discuss that with my advisors then leave a comment here.

            – Belkacem Thiziri
            Nov 24 '18 at 13:41











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53457222%2fcompute-how-many-samples-have-been-improved-according-to-a-minimum-threshold-or%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          tol = 0.0001
          win = (df.VAL2 > (df.VAL1 + tol)).sum()
          loss = (df.VAL2 < (df.VAL1 - tol)).sum()
          tie = ((df.VAL1 - df.VAL2).abs() <= tol).sum()

          df = pd.DataFrame([{'Win': win, 'Tie':tie, 'Loss': loss}])
          print (df)
          # Loss Tie Win
          # 0 4 6 9





          share|improve this answer
























          • Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?

            – Belkacem Thiziri
            Nov 24 '18 at 12:31











          • There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can use scipy.stats.ttest_ind in Python.

            – Ghilas BELHADJ
            Nov 24 '18 at 12:37













          • The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?

            – Belkacem Thiziri
            Nov 24 '18 at 12:51











          • Probably not. but I'll let you know if I find something.

            – Ghilas BELHADJ
            Nov 24 '18 at 13:38











          • Okay, thanks. I'll discuss that with my advisors then leave a comment here.

            – Belkacem Thiziri
            Nov 24 '18 at 13:41
















          1














          tol = 0.0001
          win = (df.VAL2 > (df.VAL1 + tol)).sum()
          loss = (df.VAL2 < (df.VAL1 - tol)).sum()
          tie = ((df.VAL1 - df.VAL2).abs() <= tol).sum()

          df = pd.DataFrame([{'Win': win, 'Tie':tie, 'Loss': loss}])
          print (df)
          # Loss Tie Win
          # 0 4 6 9





          share|improve this answer
























          • Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?

            – Belkacem Thiziri
            Nov 24 '18 at 12:31











          • There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can use scipy.stats.ttest_ind in Python.

            – Ghilas BELHADJ
            Nov 24 '18 at 12:37













          • The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?

            – Belkacem Thiziri
            Nov 24 '18 at 12:51











          • Probably not. but I'll let you know if I find something.

            – Ghilas BELHADJ
            Nov 24 '18 at 13:38











          • Okay, thanks. I'll discuss that with my advisors then leave a comment here.

            – Belkacem Thiziri
            Nov 24 '18 at 13:41














          1












          1








          1







          tol = 0.0001
          win = (df.VAL2 > (df.VAL1 + tol)).sum()
          loss = (df.VAL2 < (df.VAL1 - tol)).sum()
          tie = ((df.VAL1 - df.VAL2).abs() <= tol).sum()

          df = pd.DataFrame([{'Win': win, 'Tie':tie, 'Loss': loss}])
          print (df)
          # Loss Tie Win
          # 0 4 6 9





          share|improve this answer













          tol = 0.0001
          win = (df.VAL2 > (df.VAL1 + tol)).sum()
          loss = (df.VAL2 < (df.VAL1 - tol)).sum()
          tie = ((df.VAL1 - df.VAL2).abs() <= tol).sum()

          df = pd.DataFrame([{'Win': win, 'Tie':tie, 'Loss': loss}])
          print (df)
          # Loss Tie Win
          # 0 4 6 9






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 24 '18 at 11:20









          Ghilas BELHADJGhilas BELHADJ

          5,31962961




          5,31962961













          • Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?

            – Belkacem Thiziri
            Nov 24 '18 at 12:31











          • There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can use scipy.stats.ttest_ind in Python.

            – Ghilas BELHADJ
            Nov 24 '18 at 12:37













          • The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?

            – Belkacem Thiziri
            Nov 24 '18 at 12:51











          • Probably not. but I'll let you know if I find something.

            – Ghilas BELHADJ
            Nov 24 '18 at 13:38











          • Okay, thanks. I'll discuss that with my advisors then leave a comment here.

            – Belkacem Thiziri
            Nov 24 '18 at 13:41



















          • Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?

            – Belkacem Thiziri
            Nov 24 '18 at 12:31











          • There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can use scipy.stats.ttest_ind in Python.

            – Ghilas BELHADJ
            Nov 24 '18 at 12:37













          • The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?

            – Belkacem Thiziri
            Nov 24 '18 at 12:51











          • Probably not. but I'll let you know if I find something.

            – Ghilas BELHADJ
            Nov 24 '18 at 13:38











          • Okay, thanks. I'll discuss that with my advisors then leave a comment here.

            – Belkacem Thiziri
            Nov 24 '18 at 13:41

















          Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?

          – Belkacem Thiziri
          Nov 24 '18 at 12:31





          Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?

          – Belkacem Thiziri
          Nov 24 '18 at 12:31













          There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can use scipy.stats.ttest_ind in Python.

          – Ghilas BELHADJ
          Nov 24 '18 at 12:37







          There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can use scipy.stats.ttest_ind in Python.

          – Ghilas BELHADJ
          Nov 24 '18 at 12:37















          The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?

          – Belkacem Thiziri
          Nov 24 '18 at 12:51





          The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?

          – Belkacem Thiziri
          Nov 24 '18 at 12:51













          Probably not. but I'll let you know if I find something.

          – Ghilas BELHADJ
          Nov 24 '18 at 13:38





          Probably not. but I'll let you know if I find something.

          – Ghilas BELHADJ
          Nov 24 '18 at 13:38













          Okay, thanks. I'll discuss that with my advisors then leave a comment here.

          – Belkacem Thiziri
          Nov 24 '18 at 13:41





          Okay, thanks. I'll discuss that with my advisors then leave a comment here.

          – Belkacem Thiziri
          Nov 24 '18 at 13:41




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53457222%2fcompute-how-many-samples-have-been-improved-according-to-a-minimum-threshold-or%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          404 Error Contact Form 7 ajax form submitting

          How to know if a Active Directory user can login interactively

          TypeError: fit_transform() missing 1 required positional argument: 'X'