Pandas str.split() not working in for loop (jupyter)
I am working with a Pandas DataFrame of sports scores which contains a Series 'Score'. All items in this Series contain both team's scores in a single string, separated by hyphen, with no spaces, for example
('25-7', '6-2', ...)
I am attempting to split each value into 2 separate lists: left_score and right_score using Jupyter notebook. I have used the str.split('-')
method for Series, which is supposed to convert each string into a list such that my scores would be
['25','7'], ['6','2']
However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.
I have tried using '-' and "-" with no difference. I also tried using a for
loop and using the Python core str.split()
. The core function works on a standalone string in Jupyter as expected, but when run in a loop, it again returns the entire string as the only element.
I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.
dataframe_name.Score.str.split("-").str[0][0]`
Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.
EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.
In[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('./file_name.csv', sep='t')
df.head(3)
Out[1]:
df
_ Score
0 25-7
1 6-2
2 4-4
In[2]:
# Thanks to user Pygo, I attempted the suggested solution to no avail:
df['Score'].str.split('-',n=1,expand=False).values.tolist()
Out[2]:
[['25-7'],
['6-2'],
['4-4'],
... ]
- Jupyter Notebook version 5.5.0
- Anaconda version 5.2.0
- Python version 3.6.5
- Pandas version 0.23.0
- Numpy version 1.14.3
Is it possible there is a version or reference conflict?
EDIT2:
I tried iterating through each letter in the string to perform the split function manually, and have now discovered that .join()
, +=
are not working inside of for
loops either. Where would I look for a Pandas and/or Core String malfunction in Jupyter Notebook loops?
python pandas for-loop split jupyter-notebook
add a comment |
I am working with a Pandas DataFrame of sports scores which contains a Series 'Score'. All items in this Series contain both team's scores in a single string, separated by hyphen, with no spaces, for example
('25-7', '6-2', ...)
I am attempting to split each value into 2 separate lists: left_score and right_score using Jupyter notebook. I have used the str.split('-')
method for Series, which is supposed to convert each string into a list such that my scores would be
['25','7'], ['6','2']
However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.
I have tried using '-' and "-" with no difference. I also tried using a for
loop and using the Python core str.split()
. The core function works on a standalone string in Jupyter as expected, but when run in a loop, it again returns the entire string as the only element.
I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.
dataframe_name.Score.str.split("-").str[0][0]`
Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.
EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.
In[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('./file_name.csv', sep='t')
df.head(3)
Out[1]:
df
_ Score
0 25-7
1 6-2
2 4-4
In[2]:
# Thanks to user Pygo, I attempted the suggested solution to no avail:
df['Score'].str.split('-',n=1,expand=False).values.tolist()
Out[2]:
[['25-7'],
['6-2'],
['4-4'],
... ]
- Jupyter Notebook version 5.5.0
- Anaconda version 5.2.0
- Python version 3.6.5
- Pandas version 0.23.0
- Numpy version 1.14.3
Is it possible there is a version or reference conflict?
EDIT2:
I tried iterating through each letter in the string to perform the split function manually, and have now discovered that .join()
, +=
are not working inside of for
loops either. Where would I look for a Pandas and/or Core String malfunction in Jupyter Notebook loops?
python pandas for-loop split jupyter-notebook
2
Can you share the snippet code?
– Gaurav Neema
Nov 22 '18 at 5:27
can share sample data in your dataframe
– AI_Learning
Nov 22 '18 at 5:31
add a comment |
I am working with a Pandas DataFrame of sports scores which contains a Series 'Score'. All items in this Series contain both team's scores in a single string, separated by hyphen, with no spaces, for example
('25-7', '6-2', ...)
I am attempting to split each value into 2 separate lists: left_score and right_score using Jupyter notebook. I have used the str.split('-')
method for Series, which is supposed to convert each string into a list such that my scores would be
['25','7'], ['6','2']
However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.
I have tried using '-' and "-" with no difference. I also tried using a for
loop and using the Python core str.split()
. The core function works on a standalone string in Jupyter as expected, but when run in a loop, it again returns the entire string as the only element.
I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.
dataframe_name.Score.str.split("-").str[0][0]`
Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.
EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.
In[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('./file_name.csv', sep='t')
df.head(3)
Out[1]:
df
_ Score
0 25-7
1 6-2
2 4-4
In[2]:
# Thanks to user Pygo, I attempted the suggested solution to no avail:
df['Score'].str.split('-',n=1,expand=False).values.tolist()
Out[2]:
[['25-7'],
['6-2'],
['4-4'],
... ]
- Jupyter Notebook version 5.5.0
- Anaconda version 5.2.0
- Python version 3.6.5
- Pandas version 0.23.0
- Numpy version 1.14.3
Is it possible there is a version or reference conflict?
EDIT2:
I tried iterating through each letter in the string to perform the split function manually, and have now discovered that .join()
, +=
are not working inside of for
loops either. Where would I look for a Pandas and/or Core String malfunction in Jupyter Notebook loops?
python pandas for-loop split jupyter-notebook
I am working with a Pandas DataFrame of sports scores which contains a Series 'Score'. All items in this Series contain both team's scores in a single string, separated by hyphen, with no spaces, for example
('25-7', '6-2', ...)
I am attempting to split each value into 2 separate lists: left_score and right_score using Jupyter notebook. I have used the str.split('-')
method for Series, which is supposed to convert each string into a list such that my scores would be
['25','7'], ['6','2']
However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.
I have tried using '-' and "-" with no difference. I also tried using a for
loop and using the Python core str.split()
. The core function works on a standalone string in Jupyter as expected, but when run in a loop, it again returns the entire string as the only element.
I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.
dataframe_name.Score.str.split("-").str[0][0]`
Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.
EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.
In[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('./file_name.csv', sep='t')
df.head(3)
Out[1]:
df
_ Score
0 25-7
1 6-2
2 4-4
In[2]:
# Thanks to user Pygo, I attempted the suggested solution to no avail:
df['Score'].str.split('-',n=1,expand=False).values.tolist()
Out[2]:
[['25-7'],
['6-2'],
['4-4'],
... ]
- Jupyter Notebook version 5.5.0
- Anaconda version 5.2.0
- Python version 3.6.5
- Pandas version 0.23.0
- Numpy version 1.14.3
Is it possible there is a version or reference conflict?
EDIT2:
I tried iterating through each letter in the string to perform the split function manually, and have now discovered that .join()
, +=
are not working inside of for
loops either. Where would I look for a Pandas and/or Core String malfunction in Jupyter Notebook loops?
python pandas for-loop split jupyter-notebook
python pandas for-loop split jupyter-notebook
edited Nov 22 '18 at 20:32
TL_BoD
asked Nov 22 '18 at 5:23
TL_BoDTL_BoD
214
214
2
Can you share the snippet code?
– Gaurav Neema
Nov 22 '18 at 5:27
can share sample data in your dataframe
– AI_Learning
Nov 22 '18 at 5:31
add a comment |
2
Can you share the snippet code?
– Gaurav Neema
Nov 22 '18 at 5:27
can share sample data in your dataframe
– AI_Learning
Nov 22 '18 at 5:31
2
2
Can you share the snippet code?
– Gaurav Neema
Nov 22 '18 at 5:27
Can you share the snippet code?
– Gaurav Neema
Nov 22 '18 at 5:27
can share sample data in your dataframe
– AI_Learning
Nov 22 '18 at 5:31
can share sample data in your dataframe
– AI_Learning
Nov 22 '18 at 5:31
add a comment |
2 Answers
2
active
oldest
votes
We can use the split function to split the Score
column at every "-"
. Then parameter is set to 1
as the maximum number of separations in a single string will be 1. The expand
parameter is False(If False, return Series/Index/DataFrame
).
Example DataFrame:
df
Score
0 25-7
1 6-2
2 19-22
Expected result : Using str.split
+ values.tolist()
df['Score'].str.split('-', n=1, expand=False).values.tolist()
[['25', '7'], ['6', '2'], ['19', '22']]
Hope this will help on the bare minimum information provided.
Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.
– TL_BoD
Nov 22 '18 at 16:27
@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1''
) & 3.7(panda='0.23.3', numpy='1.15.0'
) without any issues while i'm using python shell on a Standard Linux machine.
– pygo
Nov 22 '18 at 17:06
can you checkdf.dtypes
result.
– pygo
Nov 22 '18 at 17:09
No. int64
Date object
Location object
Winner object
Score object
homewin bool
dtype: object
– TL_BoD
Nov 22 '18 at 17:31
1
Good Luck @TL_BoD.
– pygo
Nov 22 '18 at 18:17
|
show 2 more comments
The Series that I was attempting to parse at the -
character was failing at my troubleshooting boolean condition for if letter == '-'
... I realized that the data in my Series had the other kind of hyphen (m-hyphen vs n-hyphen; one is a "wide" character where the other is a "normal" character.) In Jupyter, these look indistinguishable - if there is a trick to discerning these within the notebook, I would love to learn it!
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424363%2fpandas-str-split-not-working-in-for-loop-jupyter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
We can use the split function to split the Score
column at every "-"
. Then parameter is set to 1
as the maximum number of separations in a single string will be 1. The expand
parameter is False(If False, return Series/Index/DataFrame
).
Example DataFrame:
df
Score
0 25-7
1 6-2
2 19-22
Expected result : Using str.split
+ values.tolist()
df['Score'].str.split('-', n=1, expand=False).values.tolist()
[['25', '7'], ['6', '2'], ['19', '22']]
Hope this will help on the bare minimum information provided.
Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.
– TL_BoD
Nov 22 '18 at 16:27
@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1''
) & 3.7(panda='0.23.3', numpy='1.15.0'
) without any issues while i'm using python shell on a Standard Linux machine.
– pygo
Nov 22 '18 at 17:06
can you checkdf.dtypes
result.
– pygo
Nov 22 '18 at 17:09
No. int64
Date object
Location object
Winner object
Score object
homewin bool
dtype: object
– TL_BoD
Nov 22 '18 at 17:31
1
Good Luck @TL_BoD.
– pygo
Nov 22 '18 at 18:17
|
show 2 more comments
We can use the split function to split the Score
column at every "-"
. Then parameter is set to 1
as the maximum number of separations in a single string will be 1. The expand
parameter is False(If False, return Series/Index/DataFrame
).
Example DataFrame:
df
Score
0 25-7
1 6-2
2 19-22
Expected result : Using str.split
+ values.tolist()
df['Score'].str.split('-', n=1, expand=False).values.tolist()
[['25', '7'], ['6', '2'], ['19', '22']]
Hope this will help on the bare minimum information provided.
Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.
– TL_BoD
Nov 22 '18 at 16:27
@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1''
) & 3.7(panda='0.23.3', numpy='1.15.0'
) without any issues while i'm using python shell on a Standard Linux machine.
– pygo
Nov 22 '18 at 17:06
can you checkdf.dtypes
result.
– pygo
Nov 22 '18 at 17:09
No. int64
Date object
Location object
Winner object
Score object
homewin bool
dtype: object
– TL_BoD
Nov 22 '18 at 17:31
1
Good Luck @TL_BoD.
– pygo
Nov 22 '18 at 18:17
|
show 2 more comments
We can use the split function to split the Score
column at every "-"
. Then parameter is set to 1
as the maximum number of separations in a single string will be 1. The expand
parameter is False(If False, return Series/Index/DataFrame
).
Example DataFrame:
df
Score
0 25-7
1 6-2
2 19-22
Expected result : Using str.split
+ values.tolist()
df['Score'].str.split('-', n=1, expand=False).values.tolist()
[['25', '7'], ['6', '2'], ['19', '22']]
Hope this will help on the bare minimum information provided.
We can use the split function to split the Score
column at every "-"
. Then parameter is set to 1
as the maximum number of separations in a single string will be 1. The expand
parameter is False(If False, return Series/Index/DataFrame
).
Example DataFrame:
df
Score
0 25-7
1 6-2
2 19-22
Expected result : Using str.split
+ values.tolist()
df['Score'].str.split('-', n=1, expand=False).values.tolist()
[['25', '7'], ['6', '2'], ['19', '22']]
Hope this will help on the bare minimum information provided.
answered Nov 22 '18 at 7:58
pygopygo
2,4281619
2,4281619
Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.
– TL_BoD
Nov 22 '18 at 16:27
@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1''
) & 3.7(panda='0.23.3', numpy='1.15.0'
) without any issues while i'm using python shell on a Standard Linux machine.
– pygo
Nov 22 '18 at 17:06
can you checkdf.dtypes
result.
– pygo
Nov 22 '18 at 17:09
No. int64
Date object
Location object
Winner object
Score object
homewin bool
dtype: object
– TL_BoD
Nov 22 '18 at 17:31
1
Good Luck @TL_BoD.
– pygo
Nov 22 '18 at 18:17
|
show 2 more comments
Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.
– TL_BoD
Nov 22 '18 at 16:27
@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1''
) & 3.7(panda='0.23.3', numpy='1.15.0'
) without any issues while i'm using python shell on a Standard Linux machine.
– pygo
Nov 22 '18 at 17:06
can you checkdf.dtypes
result.
– pygo
Nov 22 '18 at 17:09
No. int64
Date object
Location object
Winner object
Score object
homewin bool
dtype: object
– TL_BoD
Nov 22 '18 at 17:31
1
Good Luck @TL_BoD.
– pygo
Nov 22 '18 at 18:17
Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.
– TL_BoD
Nov 22 '18 at 16:27
Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.
– TL_BoD
Nov 22 '18 at 16:27
@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(
panda='0.21.0', numpy='1.13.1''
) & 3.7(panda='0.23.3', numpy='1.15.0'
) without any issues while i'm using python shell on a Standard Linux machine.– pygo
Nov 22 '18 at 17:06
@TL_BoD, there should not be a issue as i checked this on python version 3.6.1(
panda='0.21.0', numpy='1.13.1''
) & 3.7(panda='0.23.3', numpy='1.15.0'
) without any issues while i'm using python shell on a Standard Linux machine.– pygo
Nov 22 '18 at 17:06
can you check
df.dtypes
result.– pygo
Nov 22 '18 at 17:09
can you check
df.dtypes
result.– pygo
Nov 22 '18 at 17:09
No. int64
Date object
Location object
Winner object
Score object
homewin bool
dtype: object
– TL_BoD
Nov 22 '18 at 17:31
No. int64
Date object
Location object
Winner object
Score object
homewin bool
dtype: object
– TL_BoD
Nov 22 '18 at 17:31
1
1
Good Luck @TL_BoD.
– pygo
Nov 22 '18 at 18:17
Good Luck @TL_BoD.
– pygo
Nov 22 '18 at 18:17
|
show 2 more comments
The Series that I was attempting to parse at the -
character was failing at my troubleshooting boolean condition for if letter == '-'
... I realized that the data in my Series had the other kind of hyphen (m-hyphen vs n-hyphen; one is a "wide" character where the other is a "normal" character.) In Jupyter, these look indistinguishable - if there is a trick to discerning these within the notebook, I would love to learn it!
add a comment |
The Series that I was attempting to parse at the -
character was failing at my troubleshooting boolean condition for if letter == '-'
... I realized that the data in my Series had the other kind of hyphen (m-hyphen vs n-hyphen; one is a "wide" character where the other is a "normal" character.) In Jupyter, these look indistinguishable - if there is a trick to discerning these within the notebook, I would love to learn it!
add a comment |
The Series that I was attempting to parse at the -
character was failing at my troubleshooting boolean condition for if letter == '-'
... I realized that the data in my Series had the other kind of hyphen (m-hyphen vs n-hyphen; one is a "wide" character where the other is a "normal" character.) In Jupyter, these look indistinguishable - if there is a trick to discerning these within the notebook, I would love to learn it!
The Series that I was attempting to parse at the -
character was failing at my troubleshooting boolean condition for if letter == '-'
... I realized that the data in my Series had the other kind of hyphen (m-hyphen vs n-hyphen; one is a "wide" character where the other is a "normal" character.) In Jupyter, these look indistinguishable - if there is a trick to discerning these within the notebook, I would love to learn it!
answered Nov 23 '18 at 22:46
TL_BoDTL_BoD
214
214
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424363%2fpandas-str-split-not-working-in-for-loop-jupyter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Can you share the snippet code?
– Gaurav Neema
Nov 22 '18 at 5:27
can share sample data in your dataframe
– AI_Learning
Nov 22 '18 at 5:31