Pandas str.split() not working in for loop (jupyter)

I am working with a Pandas DataFrame of sports scores which contains a Series 'Score'. All items in this Series contain both team's scores in a single string, separated by hyphen, with no spaces, for example

('25-7', '6-2', ...)

I am attempting to split each value into 2 separate lists: left_score and right_score using Jupyter notebook. I have used the str.split('-') method for Series, which is supposed to convert each string into a list such that my scores would be

['25','7'], ['6','2']

However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.

I have tried using '-' and "-" with no difference. I also tried using a for loop and using the Python core str.split(). The core function works on a standalone string in Jupyter as expected, but when run in a loop, it again returns the entire string as the only element.

I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.

dataframe_name.Score.str.split("-").str[0][0]`

Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.

EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.

In[1]:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline



df = pd.read_csv('./file_name.csv', sep='t')



df.head(3)

Out[1]:

df

_    Score

0    25-7

1    6-2

2    4-4

In[2]:

# Thanks to user Pygo, I attempted the suggested solution to no avail:

df['Score'].str.split('-',n=1,expand=False).values.tolist()

Out[2]:

[['25-7'],

['6-2'],

['4-4'],

... ]

Jupyter Notebook version 5.5.0

Anaconda version 5.2.0

Python version 3.6.5

Pandas version 0.23.0

Numpy version 1.14.3

Is it possible there is a version or reference conflict?

EDIT2:

I tried iterating through each letter in the string to perform the split function manually, and have now discovered that .join(), += are not working inside of for loops either. Where would I look for a Pandas and/or Core String malfunction in Jupyter Notebook loops?

edited Nov 22 '18 at 20:32

asked Nov 22 '18 at 5:23

TL_BoD

214

2

Can you share the snippet code?

– Gaurav Neema
Nov 22 '18 at 5:27

can share sample data in your dataframe

– AI_Learning
Nov 22 '18 at 5:31

add a comment |

('25-7', '6-2', ...)

['25','7'], ['6','2']

However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.

I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.

dataframe_name.Score.str.split("-").str[0][0]`

Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.

EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.

In[1]:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline



df = pd.read_csv('./file_name.csv', sep='t')



df.head(3)

Out[1]:

df

_    Score

0    25-7

1    6-2

2    4-4

In[2]:

# Thanks to user Pygo, I attempted the suggested solution to no avail:

df['Score'].str.split('-',n=1,expand=False).values.tolist()

Out[2]:

[['25-7'],

['6-2'],

['4-4'],

... ]

Jupyter Notebook version 5.5.0

Anaconda version 5.2.0

Python version 3.6.5

Pandas version 0.23.0

Numpy version 1.14.3

Is it possible there is a version or reference conflict?

EDIT2:

edited Nov 22 '18 at 20:32

asked Nov 22 '18 at 5:23

TL_BoD

214

2

Can you share the snippet code?

– Gaurav Neema
Nov 22 '18 at 5:27

can share sample data in your dataframe

– AI_Learning
Nov 22 '18 at 5:31

add a comment |

('25-7', '6-2', ...)

['25','7'], ['6','2']

However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.

I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.

dataframe_name.Score.str.split("-").str[0][0]`

Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.

EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.

In[1]:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline



df = pd.read_csv('./file_name.csv', sep='t')



df.head(3)

Out[1]:

df

_    Score

0    25-7

1    6-2

2    4-4

In[2]:

# Thanks to user Pygo, I attempted the suggested solution to no avail:

df['Score'].str.split('-',n=1,expand=False).values.tolist()

Out[2]:

[['25-7'],

['6-2'],

['4-4'],

... ]

Jupyter Notebook version 5.5.0

Anaconda version 5.2.0

Python version 3.6.5

Pandas version 0.23.0

Numpy version 1.14.3

Is it possible there is a version or reference conflict?

EDIT2:

edited Nov 22 '18 at 20:32

asked Nov 22 '18 at 5:23

TL_BoD

214

('25-7', '6-2', ...)

['25','7'], ['6','2']

However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.

I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.

dataframe_name.Score.str.split("-").str[0][0]`

Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.

EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.

In[1]:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline



df = pd.read_csv('./file_name.csv', sep='t')



df.head(3)

Out[1]:

df

_    Score

0    25-7

1    6-2

2    4-4

In[2]:

# Thanks to user Pygo, I attempted the suggested solution to no avail:

df['Score'].str.split('-',n=1,expand=False).values.tolist()

Out[2]:

[['25-7'],

['6-2'],

['4-4'],

... ]

Jupyter Notebook version 5.5.0

Anaconda version 5.2.0

Python version 3.6.5

Pandas version 0.23.0

Numpy version 1.14.3

Is it possible there is a version or reference conflict?

EDIT2:

python pandas for-loop split jupyter-notebook

edited Nov 22 '18 at 20:32

asked Nov 22 '18 at 5:23

TL_BoD

214

edited Nov 22 '18 at 20:32

asked Nov 22 '18 at 5:23

TL_BoD

214

edited Nov 22 '18 at 20:32

asked Nov 22 '18 at 5:23

TL_BoD

214

asked Nov 22 '18 at 5:23

TL_BoD

214

asked Nov 22 '18 at 5:23

TL_BoD

214

2

Can you share the snippet code?

– Gaurav Neema
Nov 22 '18 at 5:27

can share sample data in your dataframe

– AI_Learning
Nov 22 '18 at 5:31

add a comment |

2

Can you share the snippet code?

– Gaurav Neema
Nov 22 '18 at 5:27

can share sample data in your dataframe

– AI_Learning
Nov 22 '18 at 5:31

Can you share the snippet code?

– Gaurav Neema
Nov 22 '18 at 5:27

can share sample data in your dataframe

– AI_Learning
Nov 22 '18 at 5:31

add a comment |

2 Answers
2

active

oldest

votes

We can use the split function to split the Score column at every "-". Then parameter is set to 1 as the maximum number of separations in a single string will be 1. The expand parameter is False(If False, return Series/Index/DataFrame).

Example DataFrame:

df

   Score

0   25-7

1    6-2

2  19-22

Expected result : Using str.split + values.tolist()

df['Score'].str.split('-', n=1, expand=False).values.tolist()

[['25', '7'], ['6', '2'], ['19', '22']]

Hope this will help on the bare minimum information provided.

answered Nov 22 '18 at 7:58

pygo

2,4281619

Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

– TL_BoD
Nov 22 '18 at 16:27

@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

– pygo
Nov 22 '18 at 17:06

can you check df.dtypes result.

– pygo
Nov 22 '18 at 17:09

No. int64 Date object Location object Winner object Score object homewin bool dtype: object

– TL_BoD
Nov 22 '18 at 17:31

1

Good Luck @TL_BoD.

– pygo
Nov 22 '18 at 18:17

|
show 2 more comments

The Series that I was attempting to parse at the - character was failing at my troubleshooting boolean condition for if letter == '-' ... I realized that the data in my Series had the other kind of hyphen (m-hyphen vs n-hyphen; one is a "wide" character where the other is a "normal" character.) In Jupyter, these look indistinguishable - if there is a trick to discerning these within the notebook, I would love to learn it!

answered Nov 23 '18 at 22:46

TL_BoD

214

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424363%2fpandas-str-split-not-working-in-for-loop-jupyter%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Example DataFrame:

df

   Score

0   25-7

1    6-2

2  19-22

Expected result : Using str.split + values.tolist()

df['Score'].str.split('-', n=1, expand=False).values.tolist()

[['25', '7'], ['6', '2'], ['19', '22']]

Hope this will help on the bare minimum information provided.

answered Nov 22 '18 at 7:58

pygo

2,4281619

Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

– TL_BoD
Nov 22 '18 at 16:27

@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

– pygo
Nov 22 '18 at 17:06

can you check df.dtypes result.

– pygo
Nov 22 '18 at 17:09

No. int64 Date object Location object Winner object Score object homewin bool dtype: object

– TL_BoD
Nov 22 '18 at 17:31

1

Good Luck @TL_BoD.

– pygo
Nov 22 '18 at 18:17

|
show 2 more comments

Example DataFrame:

df

   Score

0   25-7

1    6-2

2  19-22

Expected result : Using str.split + values.tolist()

df['Score'].str.split('-', n=1, expand=False).values.tolist()

[['25', '7'], ['6', '2'], ['19', '22']]

Hope this will help on the bare minimum information provided.

answered Nov 22 '18 at 7:58

pygo

2,4281619

Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

– TL_BoD
Nov 22 '18 at 16:27

@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

– pygo
Nov 22 '18 at 17:06

can you check df.dtypes result.

– pygo
Nov 22 '18 at 17:09

No. int64 Date object Location object Winner object Score object homewin bool dtype: object

– TL_BoD
Nov 22 '18 at 17:31

1

Good Luck @TL_BoD.

– pygo
Nov 22 '18 at 18:17

|
show 2 more comments

Example DataFrame:

df

   Score

0   25-7

1    6-2

2  19-22

Expected result : Using str.split + values.tolist()

df['Score'].str.split('-', n=1, expand=False).values.tolist()

[['25', '7'], ['6', '2'], ['19', '22']]

Hope this will help on the bare minimum information provided.

answered Nov 22 '18 at 7:58

pygo

2,4281619

Example DataFrame:

df

   Score

0   25-7

1    6-2

2  19-22

Expected result : Using str.split + values.tolist()

df['Score'].str.split('-', n=1, expand=False).values.tolist()

[['25', '7'], ['6', '2'], ['19', '22']]

Hope this will help on the bare minimum information provided.

answered Nov 22 '18 at 7:58

pygo

2,4281619

answered Nov 22 '18 at 7:58

pygo

2,4281619

answered Nov 22 '18 at 7:58

pygo

2,4281619

answered Nov 22 '18 at 7:58

pygo

2,4281619

Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

– TL_BoD
Nov 22 '18 at 16:27

@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

– pygo
Nov 22 '18 at 17:06

can you check df.dtypes result.

– pygo
Nov 22 '18 at 17:09

No. int64 Date object Location object Winner object Score object homewin bool dtype: object

– TL_BoD
Nov 22 '18 at 17:31

1

Good Luck @TL_BoD.

– pygo
Nov 22 '18 at 18:17

|
show 2 more comments

Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

– TL_BoD
Nov 22 '18 at 16:27

@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

– pygo
Nov 22 '18 at 17:06

can you check df.dtypes result.

– pygo
Nov 22 '18 at 17:09

No. int64 Date object Location object Winner object Score object homewin bool dtype: object

– TL_BoD
Nov 22 '18 at 17:31

1

Good Luck @TL_BoD.

– pygo
Nov 22 '18 at 18:17

Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

– TL_BoD
Nov 22 '18 at 16:27

@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

– pygo
Nov 22 '18 at 17:06

can you check df.dtypes result.

– pygo
Nov 22 '18 at 17:09

No. int64 Date object Location object Winner object Score object homewin bool dtype: object

– TL_BoD
Nov 22 '18 at 17:31

Good Luck @TL_BoD.

– pygo
Nov 22 '18 at 18:17

|
show 2 more comments

answered Nov 23 '18 at 22:46

TL_BoD

214

add a comment |

answered Nov 23 '18 at 22:46

TL_BoD

214

add a comment |

answered Nov 23 '18 at 22:46

TL_BoD

214

answered Nov 23 '18 at 22:46

TL_BoD

214

answered Nov 23 '18 at 22:46

TL_BoD

214

answered Nov 23 '18 at 22:46

TL_BoD

214

answered Nov 23 '18 at 22:46

TL_BoD

214

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk