Split multiple pandas dataframes according to thresholds and produce a count of binary classes between...












0















I have a series of dataframes all containing a year's worth of continuous data for rainfall and binary data indicating whether or not a flood occurred. I wish to produce a count of the number of days on which a flood occurs or does not occur on days within given intervals/above thresholds of rainfall. My data looks a bit like this:



    date    ppt    fld
01/01/2016 0.23 0
02/01/2016 1.6 0
03/01/2016 10.5 1
04/01/2016 25.4 1
05/01/2016 0.3 0
06/01/2016 6.5 1
07/01/2016 11.2 1
08/01/2016 5.5 0
...


I have applied the following code to split a single dataframe using a mask:



mask5 = df['ppt3'] >= 5
ppt5 = df[~mask5] #Under 5mm
ppt5p = df[mask5] #Over 5mm

mask10 = ppt5p['ppt3'] >= 10
ppt10 = ppt5p[~mask10] #5-10mm
ppt10p = ppt5p[mask10] #Over 10mm

mask20 = ppt10p['ppt3'] >= 20
ppt20 = ppt10p[~mask20] #10-20mm
ppt20p = ppt10p[mask20] #Over 20mm


And then used the following to produce counts of each interval:



print(ppt5['fld'].value_counts()) #Under 5mm
print(ppt10['fld'].value_counts()) #5-10mm
print(ppt20['fld'].value_counts()) #10-20mm
print(ppt20p['fld'].value_counts()) #Over 20mm


Which produces the following:



0.0     3
1.0 0
Name: SzT, dtype: int64
0.0 1
1.0 1
Name: SzT, dtype: int64
0.0 0
1.0 2
Name: SzT, dtype: int64
0.0 0
1.0 1
Name: SzT, dtype: int64


So what this tells me is that on all the days with less than 5mm no floods occurred; on the days with between 5 and 10mm there was one day with a flood and one with no flood; on both days with between 10 and 20mm a flood occurred, and on the day with over 20mm a flood occurred. Great stuff.



But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?



Thanks so much










share|improve this question



























    0















    I have a series of dataframes all containing a year's worth of continuous data for rainfall and binary data indicating whether or not a flood occurred. I wish to produce a count of the number of days on which a flood occurs or does not occur on days within given intervals/above thresholds of rainfall. My data looks a bit like this:



        date    ppt    fld
    01/01/2016 0.23 0
    02/01/2016 1.6 0
    03/01/2016 10.5 1
    04/01/2016 25.4 1
    05/01/2016 0.3 0
    06/01/2016 6.5 1
    07/01/2016 11.2 1
    08/01/2016 5.5 0
    ...


    I have applied the following code to split a single dataframe using a mask:



    mask5 = df['ppt3'] >= 5
    ppt5 = df[~mask5] #Under 5mm
    ppt5p = df[mask5] #Over 5mm

    mask10 = ppt5p['ppt3'] >= 10
    ppt10 = ppt5p[~mask10] #5-10mm
    ppt10p = ppt5p[mask10] #Over 10mm

    mask20 = ppt10p['ppt3'] >= 20
    ppt20 = ppt10p[~mask20] #10-20mm
    ppt20p = ppt10p[mask20] #Over 20mm


    And then used the following to produce counts of each interval:



    print(ppt5['fld'].value_counts()) #Under 5mm
    print(ppt10['fld'].value_counts()) #5-10mm
    print(ppt20['fld'].value_counts()) #10-20mm
    print(ppt20p['fld'].value_counts()) #Over 20mm


    Which produces the following:



    0.0     3
    1.0 0
    Name: SzT, dtype: int64
    0.0 1
    1.0 1
    Name: SzT, dtype: int64
    0.0 0
    1.0 2
    Name: SzT, dtype: int64
    0.0 0
    1.0 1
    Name: SzT, dtype: int64


    So what this tells me is that on all the days with less than 5mm no floods occurred; on the days with between 5 and 10mm there was one day with a flood and one with no flood; on both days with between 10 and 20mm a flood occurred, and on the day with over 20mm a flood occurred. Great stuff.



    But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?



    Thanks so much










    share|improve this question

























      0












      0








      0








      I have a series of dataframes all containing a year's worth of continuous data for rainfall and binary data indicating whether or not a flood occurred. I wish to produce a count of the number of days on which a flood occurs or does not occur on days within given intervals/above thresholds of rainfall. My data looks a bit like this:



          date    ppt    fld
      01/01/2016 0.23 0
      02/01/2016 1.6 0
      03/01/2016 10.5 1
      04/01/2016 25.4 1
      05/01/2016 0.3 0
      06/01/2016 6.5 1
      07/01/2016 11.2 1
      08/01/2016 5.5 0
      ...


      I have applied the following code to split a single dataframe using a mask:



      mask5 = df['ppt3'] >= 5
      ppt5 = df[~mask5] #Under 5mm
      ppt5p = df[mask5] #Over 5mm

      mask10 = ppt5p['ppt3'] >= 10
      ppt10 = ppt5p[~mask10] #5-10mm
      ppt10p = ppt5p[mask10] #Over 10mm

      mask20 = ppt10p['ppt3'] >= 20
      ppt20 = ppt10p[~mask20] #10-20mm
      ppt20p = ppt10p[mask20] #Over 20mm


      And then used the following to produce counts of each interval:



      print(ppt5['fld'].value_counts()) #Under 5mm
      print(ppt10['fld'].value_counts()) #5-10mm
      print(ppt20['fld'].value_counts()) #10-20mm
      print(ppt20p['fld'].value_counts()) #Over 20mm


      Which produces the following:



      0.0     3
      1.0 0
      Name: SzT, dtype: int64
      0.0 1
      1.0 1
      Name: SzT, dtype: int64
      0.0 0
      1.0 2
      Name: SzT, dtype: int64
      0.0 0
      1.0 1
      Name: SzT, dtype: int64


      So what this tells me is that on all the days with less than 5mm no floods occurred; on the days with between 5 and 10mm there was one day with a flood and one with no flood; on both days with between 10 and 20mm a flood occurred, and on the day with over 20mm a flood occurred. Great stuff.



      But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?



      Thanks so much










      share|improve this question














      I have a series of dataframes all containing a year's worth of continuous data for rainfall and binary data indicating whether or not a flood occurred. I wish to produce a count of the number of days on which a flood occurs or does not occur on days within given intervals/above thresholds of rainfall. My data looks a bit like this:



          date    ppt    fld
      01/01/2016 0.23 0
      02/01/2016 1.6 0
      03/01/2016 10.5 1
      04/01/2016 25.4 1
      05/01/2016 0.3 0
      06/01/2016 6.5 1
      07/01/2016 11.2 1
      08/01/2016 5.5 0
      ...


      I have applied the following code to split a single dataframe using a mask:



      mask5 = df['ppt3'] >= 5
      ppt5 = df[~mask5] #Under 5mm
      ppt5p = df[mask5] #Over 5mm

      mask10 = ppt5p['ppt3'] >= 10
      ppt10 = ppt5p[~mask10] #5-10mm
      ppt10p = ppt5p[mask10] #Over 10mm

      mask20 = ppt10p['ppt3'] >= 20
      ppt20 = ppt10p[~mask20] #10-20mm
      ppt20p = ppt10p[mask20] #Over 20mm


      And then used the following to produce counts of each interval:



      print(ppt5['fld'].value_counts()) #Under 5mm
      print(ppt10['fld'].value_counts()) #5-10mm
      print(ppt20['fld'].value_counts()) #10-20mm
      print(ppt20p['fld'].value_counts()) #Over 20mm


      Which produces the following:



      0.0     3
      1.0 0
      Name: SzT, dtype: int64
      0.0 1
      1.0 1
      Name: SzT, dtype: int64
      0.0 0
      1.0 2
      Name: SzT, dtype: int64
      0.0 0
      1.0 1
      Name: SzT, dtype: int64


      So what this tells me is that on all the days with less than 5mm no floods occurred; on the days with between 5 and 10mm there was one day with a flood and one with no flood; on both days with between 10 and 20mm a flood occurred, and on the day with over 20mm a flood occurred. Great stuff.



      But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?



      Thanks so much







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 23 '18 at 15:33









      SHV_laSHV_la

      597




      597
























          0






          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449398%2fsplit-multiple-pandas-dataframes-according-to-thresholds-and-produce-a-count-of%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449398%2fsplit-multiple-pandas-dataframes-according-to-thresholds-and-produce-a-count-of%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          404 Error Contact Form 7 ajax form submitting

          How to know if a Active Directory user can login interactively

          TypeError: fit_transform() missing 1 required positional argument: 'X'