Pandas str.split() not working in for loop (jupyter)












1















I am working with a Pandas DataFrame of sports scores which contains a Series 'Score'. All items in this Series contain both team's scores in a single string, separated by hyphen, with no spaces, for example



('25-7', '6-2', ...)


I am attempting to split each value into 2 separate lists: left_score and right_score using Jupyter notebook. I have used the str.split('-') method for Series, which is supposed to convert each string into a list such that my scores would be



['25','7'], ['6','2']


However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.



I have tried using '-' and "-" with no difference. I also tried using a for loop and using the Python core str.split(). The core function works on a standalone string in Jupyter as expected, but when run in a loop, it again returns the entire string as the only element.



I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.



dataframe_name.Score.str.split("-").str[0][0]`


Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.



EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.



In[1]:



import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('./file_name.csv', sep='t')

df.head(3)


Out[1]:



df
_ Score
0 25-7
1 6-2
2 4-4


In[2]:



# Thanks to user Pygo, I attempted the suggested solution to no avail:
df['Score'].str.split('-',n=1,expand=False).values.tolist()


Out[2]:



[['25-7'],
['6-2'],
['4-4'],
... ]



  • Jupyter Notebook version 5.5.0

  • Anaconda version 5.2.0

  • Python version 3.6.5

  • Pandas version 0.23.0

  • Numpy version 1.14.3


Is it possible there is a version or reference conflict?



EDIT2:



I tried iterating through each letter in the string to perform the split function manually, and have now discovered that .join(), += are not working inside of for loops either. Where would I look for a Pandas and/or Core String malfunction in Jupyter Notebook loops?










share|improve this question




















  • 2





    Can you share the snippet code?

    – Gaurav Neema
    Nov 22 '18 at 5:27











  • can share sample data in your dataframe

    – AI_Learning
    Nov 22 '18 at 5:31
















1















I am working with a Pandas DataFrame of sports scores which contains a Series 'Score'. All items in this Series contain both team's scores in a single string, separated by hyphen, with no spaces, for example



('25-7', '6-2', ...)


I am attempting to split each value into 2 separate lists: left_score and right_score using Jupyter notebook. I have used the str.split('-') method for Series, which is supposed to convert each string into a list such that my scores would be



['25','7'], ['6','2']


However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.



I have tried using '-' and "-" with no difference. I also tried using a for loop and using the Python core str.split(). The core function works on a standalone string in Jupyter as expected, but when run in a loop, it again returns the entire string as the only element.



I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.



dataframe_name.Score.str.split("-").str[0][0]`


Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.



EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.



In[1]:



import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('./file_name.csv', sep='t')

df.head(3)


Out[1]:



df
_ Score
0 25-7
1 6-2
2 4-4


In[2]:



# Thanks to user Pygo, I attempted the suggested solution to no avail:
df['Score'].str.split('-',n=1,expand=False).values.tolist()


Out[2]:



[['25-7'],
['6-2'],
['4-4'],
... ]



  • Jupyter Notebook version 5.5.0

  • Anaconda version 5.2.0

  • Python version 3.6.5

  • Pandas version 0.23.0

  • Numpy version 1.14.3


Is it possible there is a version or reference conflict?



EDIT2:



I tried iterating through each letter in the string to perform the split function manually, and have now discovered that .join(), += are not working inside of for loops either. Where would I look for a Pandas and/or Core String malfunction in Jupyter Notebook loops?










share|improve this question




















  • 2





    Can you share the snippet code?

    – Gaurav Neema
    Nov 22 '18 at 5:27











  • can share sample data in your dataframe

    – AI_Learning
    Nov 22 '18 at 5:31














1












1








1








I am working with a Pandas DataFrame of sports scores which contains a Series 'Score'. All items in this Series contain both team's scores in a single string, separated by hyphen, with no spaces, for example



('25-7', '6-2', ...)


I am attempting to split each value into 2 separate lists: left_score and right_score using Jupyter notebook. I have used the str.split('-') method for Series, which is supposed to convert each string into a list such that my scores would be



['25','7'], ['6','2']


However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.



I have tried using '-' and "-" with no difference. I also tried using a for loop and using the Python core str.split(). The core function works on a standalone string in Jupyter as expected, but when run in a loop, it again returns the entire string as the only element.



I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.



dataframe_name.Score.str.split("-").str[0][0]`


Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.



EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.



In[1]:



import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('./file_name.csv', sep='t')

df.head(3)


Out[1]:



df
_ Score
0 25-7
1 6-2
2 4-4


In[2]:



# Thanks to user Pygo, I attempted the suggested solution to no avail:
df['Score'].str.split('-',n=1,expand=False).values.tolist()


Out[2]:



[['25-7'],
['6-2'],
['4-4'],
... ]



  • Jupyter Notebook version 5.5.0

  • Anaconda version 5.2.0

  • Python version 3.6.5

  • Pandas version 0.23.0

  • Numpy version 1.14.3


Is it possible there is a version or reference conflict?



EDIT2:



I tried iterating through each letter in the string to perform the split function manually, and have now discovered that .join(), += are not working inside of for loops either. Where would I look for a Pandas and/or Core String malfunction in Jupyter Notebook loops?










share|improve this question
















I am working with a Pandas DataFrame of sports scores which contains a Series 'Score'. All items in this Series contain both team's scores in a single string, separated by hyphen, with no spaces, for example



('25-7', '6-2', ...)


I am attempting to split each value into 2 separate lists: left_score and right_score using Jupyter notebook. I have used the str.split('-') method for Series, which is supposed to convert each string into a list such that my scores would be



['25','7'], ['6','2']


However, when I run this it executes, but it does not recognize the hyphen, and returns the entire string as index 0.



I have tried using '-' and "-" with no difference. I also tried using a for loop and using the Python core str.split(). The core function works on a standalone string in Jupyter as expected, but when run in a loop, it again returns the entire string as the only element.



I've tried accessing the strings within the Series directly as well, and the function still fails. The following should return '25', but it returns '25-7'.



dataframe_name.Score.str.split("-").str[0][0]`


Really enjoying working with Pandas and DataFrames, but the syntax is proving a challenge - any thoughts appreciated.



EDIT: Adding sample code as requested. Note this is across multiple Jupyter cells, but I am executing them in sequence.



In[1]:



import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('./file_name.csv', sep='t')

df.head(3)


Out[1]:



df
_ Score
0 25-7
1 6-2
2 4-4


In[2]:



# Thanks to user Pygo, I attempted the suggested solution to no avail:
df['Score'].str.split('-',n=1,expand=False).values.tolist()


Out[2]:



[['25-7'],
['6-2'],
['4-4'],
... ]



  • Jupyter Notebook version 5.5.0

  • Anaconda version 5.2.0

  • Python version 3.6.5

  • Pandas version 0.23.0

  • Numpy version 1.14.3


Is it possible there is a version or reference conflict?



EDIT2:



I tried iterating through each letter in the string to perform the split function manually, and have now discovered that .join(), += are not working inside of for loops either. Where would I look for a Pandas and/or Core String malfunction in Jupyter Notebook loops?







python pandas for-loop split jupyter-notebook






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 20:32







TL_BoD

















asked Nov 22 '18 at 5:23









TL_BoDTL_BoD

214




214








  • 2





    Can you share the snippet code?

    – Gaurav Neema
    Nov 22 '18 at 5:27











  • can share sample data in your dataframe

    – AI_Learning
    Nov 22 '18 at 5:31














  • 2





    Can you share the snippet code?

    – Gaurav Neema
    Nov 22 '18 at 5:27











  • can share sample data in your dataframe

    – AI_Learning
    Nov 22 '18 at 5:31








2




2





Can you share the snippet code?

– Gaurav Neema
Nov 22 '18 at 5:27





Can you share the snippet code?

– Gaurav Neema
Nov 22 '18 at 5:27













can share sample data in your dataframe

– AI_Learning
Nov 22 '18 at 5:31





can share sample data in your dataframe

– AI_Learning
Nov 22 '18 at 5:31












2 Answers
2






active

oldest

votes


















1














We can use the split function to split the Score column at every "-". Then parameter is set to 1 as the maximum number of separations in a single string will be 1. The expand parameter is False(If False, return Series/Index/DataFrame).



Example DataFrame:



df
Score
0 25-7
1 6-2
2 19-22


Expected result : Using str.split + values.tolist()



df['Score'].str.split('-', n=1, expand=False).values.tolist()
[['25', '7'], ['6', '2'], ['19', '22']]


Hope this will help on the bare minimum information provided.






share|improve this answer
























  • Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

    – TL_BoD
    Nov 22 '18 at 16:27











  • @TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

    – pygo
    Nov 22 '18 at 17:06











  • can you check df.dtypes result.

    – pygo
    Nov 22 '18 at 17:09











  • No. int64 Date object Location object Winner object Score object homewin bool dtype: object

    – TL_BoD
    Nov 22 '18 at 17:31








  • 1





    Good Luck @TL_BoD.

    – pygo
    Nov 22 '18 at 18:17



















0














The Series that I was attempting to parse at the - character was failing at my troubleshooting boolean condition for if letter == '-' ... I realized that the data in my Series had the other kind of hyphen (m-hyphen vs n-hyphen; one is a "wide" character where the other is a "normal" character.) In Jupyter, these look indistinguishable - if there is a trick to discerning these within the notebook, I would love to learn it!






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424363%2fpandas-str-split-not-working-in-for-loop-jupyter%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    We can use the split function to split the Score column at every "-". Then parameter is set to 1 as the maximum number of separations in a single string will be 1. The expand parameter is False(If False, return Series/Index/DataFrame).



    Example DataFrame:



    df
    Score
    0 25-7
    1 6-2
    2 19-22


    Expected result : Using str.split + values.tolist()



    df['Score'].str.split('-', n=1, expand=False).values.tolist()
    [['25', '7'], ['6', '2'], ['19', '22']]


    Hope this will help on the bare minimum information provided.






    share|improve this answer
























    • Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

      – TL_BoD
      Nov 22 '18 at 16:27











    • @TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

      – pygo
      Nov 22 '18 at 17:06











    • can you check df.dtypes result.

      – pygo
      Nov 22 '18 at 17:09











    • No. int64 Date object Location object Winner object Score object homewin bool dtype: object

      – TL_BoD
      Nov 22 '18 at 17:31








    • 1





      Good Luck @TL_BoD.

      – pygo
      Nov 22 '18 at 18:17
















    1














    We can use the split function to split the Score column at every "-". Then parameter is set to 1 as the maximum number of separations in a single string will be 1. The expand parameter is False(If False, return Series/Index/DataFrame).



    Example DataFrame:



    df
    Score
    0 25-7
    1 6-2
    2 19-22


    Expected result : Using str.split + values.tolist()



    df['Score'].str.split('-', n=1, expand=False).values.tolist()
    [['25', '7'], ['6', '2'], ['19', '22']]


    Hope this will help on the bare minimum information provided.






    share|improve this answer
























    • Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

      – TL_BoD
      Nov 22 '18 at 16:27











    • @TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

      – pygo
      Nov 22 '18 at 17:06











    • can you check df.dtypes result.

      – pygo
      Nov 22 '18 at 17:09











    • No. int64 Date object Location object Winner object Score object homewin bool dtype: object

      – TL_BoD
      Nov 22 '18 at 17:31








    • 1





      Good Luck @TL_BoD.

      – pygo
      Nov 22 '18 at 18:17














    1












    1








    1







    We can use the split function to split the Score column at every "-". Then parameter is set to 1 as the maximum number of separations in a single string will be 1. The expand parameter is False(If False, return Series/Index/DataFrame).



    Example DataFrame:



    df
    Score
    0 25-7
    1 6-2
    2 19-22


    Expected result : Using str.split + values.tolist()



    df['Score'].str.split('-', n=1, expand=False).values.tolist()
    [['25', '7'], ['6', '2'], ['19', '22']]


    Hope this will help on the bare minimum information provided.






    share|improve this answer













    We can use the split function to split the Score column at every "-". Then parameter is set to 1 as the maximum number of separations in a single string will be 1. The expand parameter is False(If False, return Series/Index/DataFrame).



    Example DataFrame:



    df
    Score
    0 25-7
    1 6-2
    2 19-22


    Expected result : Using str.split + values.tolist()



    df['Score'].str.split('-', n=1, expand=False).values.tolist()
    [['25', '7'], ['6', '2'], ['19', '22']]


    Hope this will help on the bare minimum information provided.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 22 '18 at 7:58









    pygopygo

    2,4281619




    2,4281619













    • Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

      – TL_BoD
      Nov 22 '18 at 16:27











    • @TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

      – pygo
      Nov 22 '18 at 17:06











    • can you check df.dtypes result.

      – pygo
      Nov 22 '18 at 17:09











    • No. int64 Date object Location object Winner object Score object homewin bool dtype: object

      – TL_BoD
      Nov 22 '18 at 17:31








    • 1





      Good Luck @TL_BoD.

      – pygo
      Nov 22 '18 at 18:17



















    • Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

      – TL_BoD
      Nov 22 '18 at 16:27











    • @TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

      – pygo
      Nov 22 '18 at 17:06











    • can you check df.dtypes result.

      – pygo
      Nov 22 '18 at 17:09











    • No. int64 Date object Location object Winner object Score object homewin bool dtype: object

      – TL_BoD
      Nov 22 '18 at 17:31








    • 1





      Good Luck @TL_BoD.

      – pygo
      Nov 22 '18 at 18:17

















    Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

    – TL_BoD
    Nov 22 '18 at 16:27





    Thanks for this, I tried and it returned the same result as before - I included as an example in my updated code above. Wondering if it is related to a version issue between my libraries.

    – TL_BoD
    Nov 22 '18 at 16:27













    @TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

    – pygo
    Nov 22 '18 at 17:06





    @TL_BoD, there should not be a issue as i checked this on python version 3.6.1(panda='0.21.0', numpy='1.13.1'') & 3.7(panda='0.23.3', numpy='1.15.0') without any issues while i'm using python shell on a Standard Linux machine.

    – pygo
    Nov 22 '18 at 17:06













    can you check df.dtypes result.

    – pygo
    Nov 22 '18 at 17:09





    can you check df.dtypes result.

    – pygo
    Nov 22 '18 at 17:09













    No. int64 Date object Location object Winner object Score object homewin bool dtype: object

    – TL_BoD
    Nov 22 '18 at 17:31







    No. int64 Date object Location object Winner object Score object homewin bool dtype: object

    – TL_BoD
    Nov 22 '18 at 17:31






    1




    1





    Good Luck @TL_BoD.

    – pygo
    Nov 22 '18 at 18:17





    Good Luck @TL_BoD.

    – pygo
    Nov 22 '18 at 18:17













    0














    The Series that I was attempting to parse at the - character was failing at my troubleshooting boolean condition for if letter == '-' ... I realized that the data in my Series had the other kind of hyphen (m-hyphen vs n-hyphen; one is a "wide" character where the other is a "normal" character.) In Jupyter, these look indistinguishable - if there is a trick to discerning these within the notebook, I would love to learn it!






    share|improve this answer




























      0














      The Series that I was attempting to parse at the - character was failing at my troubleshooting boolean condition for if letter == '-' ... I realized that the data in my Series had the other kind of hyphen (m-hyphen vs n-hyphen; one is a "wide" character where the other is a "normal" character.) In Jupyter, these look indistinguishable - if there is a trick to discerning these within the notebook, I would love to learn it!






      share|improve this answer


























        0












        0








        0







        The Series that I was attempting to parse at the - character was failing at my troubleshooting boolean condition for if letter == '-' ... I realized that the data in my Series had the other kind of hyphen (m-hyphen vs n-hyphen; one is a "wide" character where the other is a "normal" character.) In Jupyter, these look indistinguishable - if there is a trick to discerning these within the notebook, I would love to learn it!






        share|improve this answer













        The Series that I was attempting to parse at the - character was failing at my troubleshooting boolean condition for if letter == '-' ... I realized that the data in my Series had the other kind of hyphen (m-hyphen vs n-hyphen; one is a "wide" character where the other is a "normal" character.) In Jupyter, these look indistinguishable - if there is a trick to discerning these within the notebook, I would love to learn it!







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 23 '18 at 22:46









        TL_BoDTL_BoD

        214




        214






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424363%2fpandas-str-split-not-working-in-for-loop-jupyter%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            C# WPF - Problem with Material Design Textbox