Drop rows based on float length in Python












2















I have a DataFrame with zip codes, among other things. The data, as a sample, looks like this:



     Zip    Item1     Item2   Item3
78264.0 pan elephant blue
73909.0 steamer panda yellow
2602.0 pot rhino orange
59661.0 fork zebra green
861893.0 sink ocelot red
77892.0 spatula doggie brown


Some of these zip codes are invalid, having either too many or too few digits. I'm trying to remove those rows that have an invalid number of characters/digits (seven characters in this case, because I am checking length based on str() and the .0 is included in there). The following lengths loop:



zips = mydata.iloc[:,0].astype(str)
lengths =
for i in zips:
lengths.append(len(i))


produces a series (not to be confused with Series, although maybe it is--I'm new at Python) of zip code character lengths for each row. I am then trying to subset the DataFrame based on the information from the lengths variable. I tried a couple of different ways; this following was the latest version:



for i in lengths.index(i):
if mydata.iloc[i:,0] != 7:
mydata.iloc[i:,0].drop()


Naturally, this fails, with a ValueError: '44114.0' is not in list error. Can anyone give some advice as to how to do what I'm trying to accomplish?










share|improve this question

























  • if the zip codes are stored as floats, containing too few digits might be because they start with 0 (which is valid for zip codes), but those leading 0s get dropped for floats

    – Henry Woody
    Nov 26 '18 at 2:36













  • can you post a snippet of your data for example

    – Henry Woody
    Nov 26 '18 at 2:37











  • Correct. It's part of what I factored into my analysis. It's how I received the data, and it's part of why I now need to fix that. The data is damaged, but it is what it is.

    – Yehuda
    Nov 26 '18 at 2:37
















2















I have a DataFrame with zip codes, among other things. The data, as a sample, looks like this:



     Zip    Item1     Item2   Item3
78264.0 pan elephant blue
73909.0 steamer panda yellow
2602.0 pot rhino orange
59661.0 fork zebra green
861893.0 sink ocelot red
77892.0 spatula doggie brown


Some of these zip codes are invalid, having either too many or too few digits. I'm trying to remove those rows that have an invalid number of characters/digits (seven characters in this case, because I am checking length based on str() and the .0 is included in there). The following lengths loop:



zips = mydata.iloc[:,0].astype(str)
lengths =
for i in zips:
lengths.append(len(i))


produces a series (not to be confused with Series, although maybe it is--I'm new at Python) of zip code character lengths for each row. I am then trying to subset the DataFrame based on the information from the lengths variable. I tried a couple of different ways; this following was the latest version:



for i in lengths.index(i):
if mydata.iloc[i:,0] != 7:
mydata.iloc[i:,0].drop()


Naturally, this fails, with a ValueError: '44114.0' is not in list error. Can anyone give some advice as to how to do what I'm trying to accomplish?










share|improve this question

























  • if the zip codes are stored as floats, containing too few digits might be because they start with 0 (which is valid for zip codes), but those leading 0s get dropped for floats

    – Henry Woody
    Nov 26 '18 at 2:36













  • can you post a snippet of your data for example

    – Henry Woody
    Nov 26 '18 at 2:37











  • Correct. It's part of what I factored into my analysis. It's how I received the data, and it's part of why I now need to fix that. The data is damaged, but it is what it is.

    – Yehuda
    Nov 26 '18 at 2:37














2












2








2








I have a DataFrame with zip codes, among other things. The data, as a sample, looks like this:



     Zip    Item1     Item2   Item3
78264.0 pan elephant blue
73909.0 steamer panda yellow
2602.0 pot rhino orange
59661.0 fork zebra green
861893.0 sink ocelot red
77892.0 spatula doggie brown


Some of these zip codes are invalid, having either too many or too few digits. I'm trying to remove those rows that have an invalid number of characters/digits (seven characters in this case, because I am checking length based on str() and the .0 is included in there). The following lengths loop:



zips = mydata.iloc[:,0].astype(str)
lengths =
for i in zips:
lengths.append(len(i))


produces a series (not to be confused with Series, although maybe it is--I'm new at Python) of zip code character lengths for each row. I am then trying to subset the DataFrame based on the information from the lengths variable. I tried a couple of different ways; this following was the latest version:



for i in lengths.index(i):
if mydata.iloc[i:,0] != 7:
mydata.iloc[i:,0].drop()


Naturally, this fails, with a ValueError: '44114.0' is not in list error. Can anyone give some advice as to how to do what I'm trying to accomplish?










share|improve this question
















I have a DataFrame with zip codes, among other things. The data, as a sample, looks like this:



     Zip    Item1     Item2   Item3
78264.0 pan elephant blue
73909.0 steamer panda yellow
2602.0 pot rhino orange
59661.0 fork zebra green
861893.0 sink ocelot red
77892.0 spatula doggie brown


Some of these zip codes are invalid, having either too many or too few digits. I'm trying to remove those rows that have an invalid number of characters/digits (seven characters in this case, because I am checking length based on str() and the .0 is included in there). The following lengths loop:



zips = mydata.iloc[:,0].astype(str)
lengths =
for i in zips:
lengths.append(len(i))


produces a series (not to be confused with Series, although maybe it is--I'm new at Python) of zip code character lengths for each row. I am then trying to subset the DataFrame based on the information from the lengths variable. I tried a couple of different ways; this following was the latest version:



for i in lengths.index(i):
if mydata.iloc[i:,0] != 7:
mydata.iloc[i:,0].drop()


Naturally, this fails, with a ValueError: '44114.0' is not in list error. Can anyone give some advice as to how to do what I'm trying to accomplish?







python-3.x pandas dataframe subset






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 2:44







Yehuda

















asked Nov 26 '18 at 2:33









YehudaYehuda

189113




189113













  • if the zip codes are stored as floats, containing too few digits might be because they start with 0 (which is valid for zip codes), but those leading 0s get dropped for floats

    – Henry Woody
    Nov 26 '18 at 2:36













  • can you post a snippet of your data for example

    – Henry Woody
    Nov 26 '18 at 2:37











  • Correct. It's part of what I factored into my analysis. It's how I received the data, and it's part of why I now need to fix that. The data is damaged, but it is what it is.

    – Yehuda
    Nov 26 '18 at 2:37



















  • if the zip codes are stored as floats, containing too few digits might be because they start with 0 (which is valid for zip codes), but those leading 0s get dropped for floats

    – Henry Woody
    Nov 26 '18 at 2:36













  • can you post a snippet of your data for example

    – Henry Woody
    Nov 26 '18 at 2:37











  • Correct. It's part of what I factored into my analysis. It's how I received the data, and it's part of why I now need to fix that. The data is damaged, but it is what it is.

    – Yehuda
    Nov 26 '18 at 2:37

















if the zip codes are stored as floats, containing too few digits might be because they start with 0 (which is valid for zip codes), but those leading 0s get dropped for floats

– Henry Woody
Nov 26 '18 at 2:36







if the zip codes are stored as floats, containing too few digits might be because they start with 0 (which is valid for zip codes), but those leading 0s get dropped for floats

– Henry Woody
Nov 26 '18 at 2:36















can you post a snippet of your data for example

– Henry Woody
Nov 26 '18 at 2:37





can you post a snippet of your data for example

– Henry Woody
Nov 26 '18 at 2:37













Correct. It's part of what I factored into my analysis. It's how I received the data, and it's part of why I now need to fix that. The data is damaged, but it is what it is.

– Yehuda
Nov 26 '18 at 2:37





Correct. It's part of what I factored into my analysis. It's how I received the data, and it's part of why I now need to fix that. The data is damaged, but it is what it is.

– Yehuda
Nov 26 '18 at 2:37












3 Answers
3






active

oldest

votes


















1














You can write this more concisely using Pandas filtering rather than loops and ifs.



Here is an example:



valid_zips = mydata[mydata.astype(str).str.len() == 7]


or



zip_code_upper_bound = 100000
valid_zips = mydata[mydata < zip_code_upper_bound]


assuming fractional numbers are not included in your set. Note that the first example will remove shorter zips, while the second will leave them in, which you might want as they could have had leading zeros.



Sample output:



With df defined as (from your example):



        Zip    Item1     Item2   Item3
0 78264.0 pan elephant blue
1 73909.0 steamer panda yellow
2 2602.0 pot rhino orange
3 59661.0 fork zebra green
4 861893.0 sink ocelot red
5 77892.0 spatula doggie brown


Using the following code:



df[df.Zip.astype(str).str.len() == 7]


The result is:



       Zip    Item1     Item2   Item3
0 78264.0 pan elephant blue
1 73909.0 steamer panda yellow
3 59661.0 fork zebra green
5 77892.0 spatula doggie brown





share|improve this answer


























  • I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

    – Yehuda
    Nov 26 '18 at 2:56











  • The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

    – Henry Woody
    Nov 26 '18 at 3:03





















1














Using str.len



df[df.iloc[:,0].astype(str).str.len()!=7]
A
1 1.222222
2 1.222200


dput :



df=pd.DataFrame({'A':[1.22222,1.222222,1.2222]})





share|improve this answer































    0














    See if this works



    df1 = df['ZipCode'].astype(str).map(len)==5






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53474062%2fdrop-rows-based-on-float-length-in-python%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      You can write this more concisely using Pandas filtering rather than loops and ifs.



      Here is an example:



      valid_zips = mydata[mydata.astype(str).str.len() == 7]


      or



      zip_code_upper_bound = 100000
      valid_zips = mydata[mydata < zip_code_upper_bound]


      assuming fractional numbers are not included in your set. Note that the first example will remove shorter zips, while the second will leave them in, which you might want as they could have had leading zeros.



      Sample output:



      With df defined as (from your example):



              Zip    Item1     Item2   Item3
      0 78264.0 pan elephant blue
      1 73909.0 steamer panda yellow
      2 2602.0 pot rhino orange
      3 59661.0 fork zebra green
      4 861893.0 sink ocelot red
      5 77892.0 spatula doggie brown


      Using the following code:



      df[df.Zip.astype(str).str.len() == 7]


      The result is:



             Zip    Item1     Item2   Item3
      0 78264.0 pan elephant blue
      1 73909.0 steamer panda yellow
      3 59661.0 fork zebra green
      5 77892.0 spatula doggie brown





      share|improve this answer


























      • I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

        – Yehuda
        Nov 26 '18 at 2:56











      • The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

        – Henry Woody
        Nov 26 '18 at 3:03


















      1














      You can write this more concisely using Pandas filtering rather than loops and ifs.



      Here is an example:



      valid_zips = mydata[mydata.astype(str).str.len() == 7]


      or



      zip_code_upper_bound = 100000
      valid_zips = mydata[mydata < zip_code_upper_bound]


      assuming fractional numbers are not included in your set. Note that the first example will remove shorter zips, while the second will leave them in, which you might want as they could have had leading zeros.



      Sample output:



      With df defined as (from your example):



              Zip    Item1     Item2   Item3
      0 78264.0 pan elephant blue
      1 73909.0 steamer panda yellow
      2 2602.0 pot rhino orange
      3 59661.0 fork zebra green
      4 861893.0 sink ocelot red
      5 77892.0 spatula doggie brown


      Using the following code:



      df[df.Zip.astype(str).str.len() == 7]


      The result is:



             Zip    Item1     Item2   Item3
      0 78264.0 pan elephant blue
      1 73909.0 steamer panda yellow
      3 59661.0 fork zebra green
      5 77892.0 spatula doggie brown





      share|improve this answer


























      • I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

        – Yehuda
        Nov 26 '18 at 2:56











      • The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

        – Henry Woody
        Nov 26 '18 at 3:03
















      1












      1








      1







      You can write this more concisely using Pandas filtering rather than loops and ifs.



      Here is an example:



      valid_zips = mydata[mydata.astype(str).str.len() == 7]


      or



      zip_code_upper_bound = 100000
      valid_zips = mydata[mydata < zip_code_upper_bound]


      assuming fractional numbers are not included in your set. Note that the first example will remove shorter zips, while the second will leave them in, which you might want as they could have had leading zeros.



      Sample output:



      With df defined as (from your example):



              Zip    Item1     Item2   Item3
      0 78264.0 pan elephant blue
      1 73909.0 steamer panda yellow
      2 2602.0 pot rhino orange
      3 59661.0 fork zebra green
      4 861893.0 sink ocelot red
      5 77892.0 spatula doggie brown


      Using the following code:



      df[df.Zip.astype(str).str.len() == 7]


      The result is:



             Zip    Item1     Item2   Item3
      0 78264.0 pan elephant blue
      1 73909.0 steamer panda yellow
      3 59661.0 fork zebra green
      5 77892.0 spatula doggie brown





      share|improve this answer















      You can write this more concisely using Pandas filtering rather than loops and ifs.



      Here is an example:



      valid_zips = mydata[mydata.astype(str).str.len() == 7]


      or



      zip_code_upper_bound = 100000
      valid_zips = mydata[mydata < zip_code_upper_bound]


      assuming fractional numbers are not included in your set. Note that the first example will remove shorter zips, while the second will leave them in, which you might want as they could have had leading zeros.



      Sample output:



      With df defined as (from your example):



              Zip    Item1     Item2   Item3
      0 78264.0 pan elephant blue
      1 73909.0 steamer panda yellow
      2 2602.0 pot rhino orange
      3 59661.0 fork zebra green
      4 861893.0 sink ocelot red
      5 77892.0 spatula doggie brown


      Using the following code:



      df[df.Zip.astype(str).str.len() == 7]


      The result is:



             Zip    Item1     Item2   Item3
      0 78264.0 pan elephant blue
      1 73909.0 steamer panda yellow
      3 59661.0 fork zebra green
      5 77892.0 spatula doggie brown






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Nov 26 '18 at 2:57

























      answered Nov 26 '18 at 2:43









      Henry WoodyHenry Woody

      4,6123927




      4,6123927













      • I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

        – Yehuda
        Nov 26 '18 at 2:56











      • The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

        – Henry Woody
        Nov 26 '18 at 3:03





















      • I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

        – Yehuda
        Nov 26 '18 at 2:56











      • The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

        – Henry Woody
        Nov 26 '18 at 3:03



















      I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

      – Yehuda
      Nov 26 '18 at 2:56





      I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

      – Yehuda
      Nov 26 '18 at 2:56













      The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

      – Henry Woody
      Nov 26 '18 at 3:03







      The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

      – Henry Woody
      Nov 26 '18 at 3:03















      1














      Using str.len



      df[df.iloc[:,0].astype(str).str.len()!=7]
      A
      1 1.222222
      2 1.222200


      dput :



      df=pd.DataFrame({'A':[1.22222,1.222222,1.2222]})





      share|improve this answer




























        1














        Using str.len



        df[df.iloc[:,0].astype(str).str.len()!=7]
        A
        1 1.222222
        2 1.222200


        dput :



        df=pd.DataFrame({'A':[1.22222,1.222222,1.2222]})





        share|improve this answer


























          1












          1








          1







          Using str.len



          df[df.iloc[:,0].astype(str).str.len()!=7]
          A
          1 1.222222
          2 1.222200


          dput :



          df=pd.DataFrame({'A':[1.22222,1.222222,1.2222]})





          share|improve this answer













          Using str.len



          df[df.iloc[:,0].astype(str).str.len()!=7]
          A
          1 1.222222
          2 1.222200


          dput :



          df=pd.DataFrame({'A':[1.22222,1.222222,1.2222]})






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 26 '18 at 2:38









          Wen-BenWen-Ben

          116k83369




          116k83369























              0














              See if this works



              df1 = df['ZipCode'].astype(str).map(len)==5






              share|improve this answer




























                0














                See if this works



                df1 = df['ZipCode'].astype(str).map(len)==5






                share|improve this answer


























                  0












                  0








                  0







                  See if this works



                  df1 = df['ZipCode'].astype(str).map(len)==5






                  share|improve this answer













                  See if this works



                  df1 = df['ZipCode'].astype(str).map(len)==5







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 26 '18 at 2:37









                  Ken DekalbKen Dekalb

                  317112




                  317112






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53474062%2fdrop-rows-based-on-float-length-in-python%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      404 Error Contact Form 7 ajax form submitting

                      How to know if a Active Directory user can login interactively

                      TypeError: fit_transform() missing 1 required positional argument: 'X'