Drop rows based on float length in Python

I have a DataFrame with zip codes, among other things. The data, as a sample, looks like this:

     Zip    Item1     Item2   Item3

 78264.0      pan  elephant    blue

 73909.0  steamer     panda  yellow

  2602.0      pot     rhino  orange

 59661.0     fork     zebra   green

861893.0     sink    ocelot     red

 77892.0  spatula    doggie   brown

Some of these zip codes are invalid, having either too many or too few digits. I'm trying to remove those rows that have an invalid number of characters/digits (seven characters in this case, because I am checking length based on str() and the .0 is included in there). The following lengths loop:

zips = mydata.iloc[:,0].astype(str)

lengths = 

for i in zips:

    lengths.append(len(i))

produces a series (not to be confused with Series, although maybe it is--I'm new at Python) of zip code character lengths for each row. I am then trying to subset the DataFrame based on the information from the lengths variable. I tried a couple of different ways; this following was the latest version:

for i in lengths.index(i):

    if mydata.iloc[i:,0] != 7:

        mydata.iloc[i:,0].drop()

Naturally, this fails, with a ValueError: '44114.0' is not in list error. Can anyone give some advice as to how to do what I'm trying to accomplish?

edited Nov 26 '18 at 2:44

asked Nov 26 '18 at 2:33

Yehuda

189113

if the zip codes are stored as floats, containing too few digits might be because they start with 0 (which is valid for zip codes), but those leading 0s get dropped for floats

– Henry Woody
Nov 26 '18 at 2:36

can you post a snippet of your data for example

– Henry Woody
Nov 26 '18 at 2:37

Correct. It's part of what I factored into my analysis. It's how I received the data, and it's part of why I now need to fix that. The data is damaged, but it is what it is.

– Yehuda
Nov 26 '18 at 2:37

add a comment |

I have a DataFrame with zip codes, among other things. The data, as a sample, looks like this:

     Zip    Item1     Item2   Item3

 78264.0      pan  elephant    blue

 73909.0  steamer     panda  yellow

  2602.0      pot     rhino  orange

 59661.0     fork     zebra   green

861893.0     sink    ocelot     red

 77892.0  spatula    doggie   brown

zips = mydata.iloc[:,0].astype(str)

lengths = 

for i in zips:

    lengths.append(len(i))

for i in lengths.index(i):

    if mydata.iloc[i:,0] != 7:

        mydata.iloc[i:,0].drop()

Naturally, this fails, with a ValueError: '44114.0' is not in list error. Can anyone give some advice as to how to do what I'm trying to accomplish?

edited Nov 26 '18 at 2:44

asked Nov 26 '18 at 2:33

Yehuda

189113

if the zip codes are stored as floats, containing too few digits might be because they start with 0 (which is valid for zip codes), but those leading 0s get dropped for floats

– Henry Woody
Nov 26 '18 at 2:36

can you post a snippet of your data for example

– Henry Woody
Nov 26 '18 at 2:37

Correct. It's part of what I factored into my analysis. It's how I received the data, and it's part of why I now need to fix that. The data is damaged, but it is what it is.

– Yehuda
Nov 26 '18 at 2:37

add a comment |

I have a DataFrame with zip codes, among other things. The data, as a sample, looks like this:

     Zip    Item1     Item2   Item3

 78264.0      pan  elephant    blue

 73909.0  steamer     panda  yellow

  2602.0      pot     rhino  orange

 59661.0     fork     zebra   green

861893.0     sink    ocelot     red

 77892.0  spatula    doggie   brown

zips = mydata.iloc[:,0].astype(str)

lengths = 

for i in zips:

    lengths.append(len(i))

for i in lengths.index(i):

    if mydata.iloc[i:,0] != 7:

        mydata.iloc[i:,0].drop()

Naturally, this fails, with a ValueError: '44114.0' is not in list error. Can anyone give some advice as to how to do what I'm trying to accomplish?

edited Nov 26 '18 at 2:44

asked Nov 26 '18 at 2:33

Yehuda

189113

I have a DataFrame with zip codes, among other things. The data, as a sample, looks like this:

     Zip    Item1     Item2   Item3

 78264.0      pan  elephant    blue

 73909.0  steamer     panda  yellow

  2602.0      pot     rhino  orange

 59661.0     fork     zebra   green

861893.0     sink    ocelot     red

 77892.0  spatula    doggie   brown

zips = mydata.iloc[:,0].astype(str)

lengths = 

for i in zips:

    lengths.append(len(i))

for i in lengths.index(i):

    if mydata.iloc[i:,0] != 7:

        mydata.iloc[i:,0].drop()

Naturally, this fails, with a ValueError: '44114.0' is not in list error. Can anyone give some advice as to how to do what I'm trying to accomplish?

python-3.x pandas dataframe subset

edited Nov 26 '18 at 2:44

asked Nov 26 '18 at 2:33

Yehuda

189113

edited Nov 26 '18 at 2:44

asked Nov 26 '18 at 2:33

Yehuda

189113

edited Nov 26 '18 at 2:44

asked Nov 26 '18 at 2:33

Yehuda

189113

asked Nov 26 '18 at 2:33

Yehuda

189113

asked Nov 26 '18 at 2:33

Yehuda

189113

if the zip codes are stored as floats, containing too few digits might be because they start with 0 (which is valid for zip codes), but those leading 0s get dropped for floats

– Henry Woody
Nov 26 '18 at 2:36

can you post a snippet of your data for example

– Henry Woody
Nov 26 '18 at 2:37

Correct. It's part of what I factored into my analysis. It's how I received the data, and it's part of why I now need to fix that. The data is damaged, but it is what it is.

– Yehuda
Nov 26 '18 at 2:37

add a comment |

if the zip codes are stored as floats, containing too few digits might be because they start with 0 (which is valid for zip codes), but those leading 0s get dropped for floats

– Henry Woody
Nov 26 '18 at 2:36

can you post a snippet of your data for example

– Henry Woody
Nov 26 '18 at 2:37

Correct. It's part of what I factored into my analysis. It's how I received the data, and it's part of why I now need to fix that. The data is damaged, but it is what it is.

– Yehuda
Nov 26 '18 at 2:37

if the zip codes are stored as floats, containing too few digits might be because they start with 0 (which is valid for zip codes), but those leading 0s get dropped for floats

– Henry Woody
Nov 26 '18 at 2:36

can you post a snippet of your data for example

– Henry Woody
Nov 26 '18 at 2:37

Correct. It's part of what I factored into my analysis. It's how I received the data, and it's part of why I now need to fix that. The data is damaged, but it is what it is.

– Yehuda
Nov 26 '18 at 2:37

add a comment |

3 Answers
3

active

oldest

votes

You can write this more concisely using Pandas filtering rather than loops and ifs.

Here is an example:

valid_zips = mydata[mydata.astype(str).str.len() == 7]

zip_code_upper_bound = 100000

valid_zips = mydata[mydata < zip_code_upper_bound]

assuming fractional numbers are not included in your set. Note that the first example will remove shorter zips, while the second will leave them in, which you might want as they could have had leading zeros.

Sample output:

With df defined as (from your example):

        Zip    Item1     Item2   Item3

0   78264.0      pan  elephant    blue

1   73909.0  steamer     panda  yellow

2    2602.0      pot     rhino  orange

3   59661.0     fork     zebra   green

4  861893.0     sink    ocelot     red

5   77892.0  spatula    doggie   brown

Using the following code:

df[df.Zip.astype(str).str.len() == 7]

The result is:

       Zip    Item1     Item2   Item3

0  78264.0      pan  elephant    blue

1  73909.0  steamer     panda  yellow

3  59661.0     fork     zebra   green

5  77892.0  spatula    doggie   brown

edited Nov 26 '18 at 2:57

answered Nov 26 '18 at 2:43

Henry Woody

4,6123927

I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

– Yehuda
Nov 26 '18 at 2:56

The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

– Henry Woody
Nov 26 '18 at 3:03

add a comment |

Using str.len

df[df.iloc[:,0].astype(str).str.len()!=7]

          A

1  1.222222

2  1.222200

dput :

df=pd.DataFrame({'A':[1.22222,1.222222,1.2222]})

answered Nov 26 '18 at 2:38

Wen-Ben

116k83369

add a comment |

See if this works

df1 = df['ZipCode'].astype(str).map(len)==5

answered Nov 26 '18 at 2:37

Ken Dekalb

317112

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53474062%2fdrop-rows-based-on-float-length-in-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

You can write this more concisely using Pandas filtering rather than loops and ifs.

Here is an example:

valid_zips = mydata[mydata.astype(str).str.len() == 7]

zip_code_upper_bound = 100000

valid_zips = mydata[mydata < zip_code_upper_bound]

Sample output:

With df defined as (from your example):

        Zip    Item1     Item2   Item3

0   78264.0      pan  elephant    blue

1   73909.0  steamer     panda  yellow

2    2602.0      pot     rhino  orange

3   59661.0     fork     zebra   green

4  861893.0     sink    ocelot     red

5   77892.0  spatula    doggie   brown

Using the following code:

df[df.Zip.astype(str).str.len() == 7]

The result is:

       Zip    Item1     Item2   Item3

0  78264.0      pan  elephant    blue

1  73909.0  steamer     panda  yellow

3  59661.0     fork     zebra   green

5  77892.0  spatula    doggie   brown

edited Nov 26 '18 at 2:57

answered Nov 26 '18 at 2:43

Henry Woody

4,6123927

I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

– Yehuda
Nov 26 '18 at 2:56

The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

– Henry Woody
Nov 26 '18 at 3:03

add a comment |

You can write this more concisely using Pandas filtering rather than loops and ifs.

Here is an example:

valid_zips = mydata[mydata.astype(str).str.len() == 7]

zip_code_upper_bound = 100000

valid_zips = mydata[mydata < zip_code_upper_bound]

Sample output:

With df defined as (from your example):

        Zip    Item1     Item2   Item3

0   78264.0      pan  elephant    blue

1   73909.0  steamer     panda  yellow

2    2602.0      pot     rhino  orange

3   59661.0     fork     zebra   green

4  861893.0     sink    ocelot     red

5   77892.0  spatula    doggie   brown

Using the following code:

df[df.Zip.astype(str).str.len() == 7]

The result is:

       Zip    Item1     Item2   Item3

0  78264.0      pan  elephant    blue

1  73909.0  steamer     panda  yellow

3  59661.0     fork     zebra   green

5  77892.0  spatula    doggie   brown

edited Nov 26 '18 at 2:57

answered Nov 26 '18 at 2:43

Henry Woody

4,6123927

I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

– Yehuda
Nov 26 '18 at 2:56

The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

– Henry Woody
Nov 26 '18 at 3:03

add a comment |

You can write this more concisely using Pandas filtering rather than loops and ifs.

Here is an example:

valid_zips = mydata[mydata.astype(str).str.len() == 7]

zip_code_upper_bound = 100000

valid_zips = mydata[mydata < zip_code_upper_bound]

Sample output:

With df defined as (from your example):

        Zip    Item1     Item2   Item3

0   78264.0      pan  elephant    blue

1   73909.0  steamer     panda  yellow

2    2602.0      pot     rhino  orange

3   59661.0     fork     zebra   green

4  861893.0     sink    ocelot     red

5   77892.0  spatula    doggie   brown

Using the following code:

df[df.Zip.astype(str).str.len() == 7]

The result is:

       Zip    Item1     Item2   Item3

0  78264.0      pan  elephant    blue

1  73909.0  steamer     panda  yellow

3  59661.0     fork     zebra   green

5  77892.0  spatula    doggie   brown

edited Nov 26 '18 at 2:57

answered Nov 26 '18 at 2:43

Henry Woody

4,6123927

You can write this more concisely using Pandas filtering rather than loops and ifs.

Here is an example:

valid_zips = mydata[mydata.astype(str).str.len() == 7]

zip_code_upper_bound = 100000

valid_zips = mydata[mydata < zip_code_upper_bound]

Sample output:

With df defined as (from your example):

        Zip    Item1     Item2   Item3

0   78264.0      pan  elephant    blue

1   73909.0  steamer     panda  yellow

2    2602.0      pot     rhino  orange

3   59661.0     fork     zebra   green

4  861893.0     sink    ocelot     red

5   77892.0  spatula    doggie   brown

Using the following code:

df[df.Zip.astype(str).str.len() == 7]

The result is:

       Zip    Item1     Item2   Item3

0  78264.0      pan  elephant    blue

1  73909.0  steamer     panda  yellow

3  59661.0     fork     zebra   green

5  77892.0  spatula    doggie   brown

edited Nov 26 '18 at 2:57

answered Nov 26 '18 at 2:43

Henry Woody

4,6123927

edited Nov 26 '18 at 2:57

answered Nov 26 '18 at 2:43

Henry Woody

4,6123927

answered Nov 26 '18 at 2:43

Henry Woody

4,6123927

answered Nov 26 '18 at 2:43

Henry Woody

4,6123927

I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

– Yehuda
Nov 26 '18 at 2:56

The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

– Henry Woody
Nov 26 '18 at 3:03

add a comment |

I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

– Yehuda
Nov 26 '18 at 2:56

The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

– Henry Woody
Nov 26 '18 at 3:03

I do these index functions in R all the time, but I'm not familiar enough with Python's index functions yet. I'm going to be referencing this question many times in the future. Where can I find the documentation for Python bracket indexing and slicing?

– Yehuda
Nov 26 '18 at 2:56

The slice notation is shorthand for using .where which accepts a boolean series of the same shape and returns rows for which the series is True, you can think of it as getting rows for which the condition (in brackets) is true. Here is a long page on indexing and selecting in Pandas: pandas.pydata.org/pandas-docs/stable/indexing.html

– Henry Woody
Nov 26 '18 at 3:03

add a comment |

Using str.len

df[df.iloc[:,0].astype(str).str.len()!=7]

          A

1  1.222222

2  1.222200

dput :

df=pd.DataFrame({'A':[1.22222,1.222222,1.2222]})

answered Nov 26 '18 at 2:38

Wen-Ben

116k83369

add a comment |

Using str.len

df[df.iloc[:,0].astype(str).str.len()!=7]

          A

1  1.222222

2  1.222200

dput :

df=pd.DataFrame({'A':[1.22222,1.222222,1.2222]})

answered Nov 26 '18 at 2:38

Wen-Ben

116k83369

add a comment |

Using str.len

df[df.iloc[:,0].astype(str).str.len()!=7]

          A

1  1.222222

2  1.222200

dput :

df=pd.DataFrame({'A':[1.22222,1.222222,1.2222]})

answered Nov 26 '18 at 2:38

Wen-Ben

116k83369

Using str.len

df[df.iloc[:,0].astype(str).str.len()!=7]

          A

1  1.222222

2  1.222200

dput :

df=pd.DataFrame({'A':[1.22222,1.222222,1.2222]})

answered Nov 26 '18 at 2:38

Wen-Ben

116k83369

answered Nov 26 '18 at 2:38

Wen-Ben

116k83369

answered Nov 26 '18 at 2:38

Wen-Ben

116k83369

answered Nov 26 '18 at 2:38

Wen-Ben

116k83369

add a comment |

See if this works

df1 = df['ZipCode'].astype(str).map(len)==5

answered Nov 26 '18 at 2:37

Ken Dekalb

317112

add a comment |

See if this works

df1 = df['ZipCode'].astype(str).map(len)==5

answered Nov 26 '18 at 2:37

Ken Dekalb

317112

add a comment |

See if this works

df1 = df['ZipCode'].astype(str).map(len)==5

answered Nov 26 '18 at 2:37

Ken Dekalb

317112

See if this works

df1 = df['ZipCode'].astype(str).map(len)==5

answered Nov 26 '18 at 2:37

Ken Dekalb

317112

answered Nov 26 '18 at 2:37

Ken Dekalb

317112

answered Nov 26 '18 at 2:37

Ken Dekalb

317112

answered Nov 26 '18 at 2:37

Ken Dekalb

317112

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

ANJ9ehNyyXKgfqrWims0,n5My3 eX8desuxkIml 3LvI6iIhskWr NF,YxRgOn73hdkO5t4nshe

搜尋此網誌

Tukukkk