Pandas dataframe self-dependency in data to fill a column












-1















I have dataframe with data as:



enter image description here



The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.

Same happens for shoes.



ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.



Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.










share|improve this question




















  • 3





    Will the codes appear only twice? And should one take the name of the first appearance of the code only?

    – Franco Piccolo
    Oct 30 '18 at 11:13
















-1















I have dataframe with data as:



enter image description here



The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.

Same happens for shoes.



ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.



Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.










share|improve this question




















  • 3





    Will the codes appear only twice? And should one take the name of the first appearance of the code only?

    – Franco Piccolo
    Oct 30 '18 at 11:13














-1












-1








-1








I have dataframe with data as:



enter image description here



The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.

Same happens for shoes.



ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.



Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.










share|improve this question
















I have dataframe with data as:



enter image description here



The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.

Same happens for shoes.



ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.



Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 30 '18 at 11:19







frozen shine

















asked Oct 30 '18 at 11:08









frozen shinefrozen shine

5719




5719








  • 3





    Will the codes appear only twice? And should one take the name of the first appearance of the code only?

    – Franco Piccolo
    Oct 30 '18 at 11:13














  • 3





    Will the codes appear only twice? And should one take the name of the first appearance of the code only?

    – Franco Piccolo
    Oct 30 '18 at 11:13








3




3





Will the codes appear only twice? And should one take the name of the first appearance of the code only?

– Franco Piccolo
Oct 30 '18 at 11:13





Will the codes appear only twice? And should one take the name of the first appearance of the code only?

– Franco Piccolo
Oct 30 '18 at 11:13












3 Answers
3






active

oldest

votes


















1














If want only first dupe value to last duplicated use transform with first and then set NaN values by loc with duplicated:



df = pd.DataFrame({'id':[1,2,3,4,5],
'name':list('brslp'),
'codeid':[11,12,13,11,13]})

df['relation'] = df.groupby('codeid')['name'].transform('first')
print (df)
id name codeid relation
0 1 b 11 b
1 2 r 12 r
2 3 s 13 s
3 4 l 11 b
4 5 p 13 s




#get first duplicated values of codeid
print (df['codeid'].duplicated(keep='last'))
0 True
1 False
2 True
3 False
4 False
Name: codeid, dtype: bool

#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows
print (~df['codeid'].duplicated(keep=False))
0 False
1 True
2 False
3 False
4 False
Name: codeid, dtype: bool

#chain boolen mask together
print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))
0 True
1 True
2 True
3 False
4 False
Name: codeid, dtype: bool




#replace True values by mask by NaN 
df.loc[df['codeid'].duplicated(keep='last') |
~df['codeid'].duplicated(keep=False), 'relation'] = np.nan
print (df)
id name codeid relation
0 1 b 11 NaN
1 2 r 12 NaN
2 3 s 13 NaN
3 4 l 11 b
4 5 p 13 s





share|improve this answer


























  • Could you kindly explain the code, as it seems to be working but it's not working at my end

    – frozen shine
    Oct 30 '18 at 11:39











  • @frozenshine - Can you explain more why not working? Problem in sample data or in real?

    – jezrael
    Oct 30 '18 at 11:40











  • testing your logic on real data. the last statement is making all values NaN not just the first ones.

    – frozen shine
    Oct 30 '18 at 11:43













  • @frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

    – jezrael
    Oct 30 '18 at 11:46











  • No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

    – frozen shine
    Oct 30 '18 at 11:57



















0














I think you want to do something like this:



import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])

def codeid_analysis(rows):
if rows['codeid'] == 11:
rows['relation'] = 'bag'
elif rows['codeid'] == 12:
rows['relation'] = 'shirt' #for example. You should put what you want here
elif rows['codeid'] == 13:
rows['relation'] = 'pants' #for example. You should put what you want here
return rows

result = df.apply(codeid_analysis, axis = 1)
print(result)





share|improve this answer
























  • Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

    – frozen shine
    Oct 30 '18 at 11:58



















0














It is not the optimal solution since it is costly to your memory, but here is my try. df1 is created in order to hold the null values of the relation column, since it seems that nulls are the first occurrence. After some cleaning, the two dataframes are merged to provide into one.



import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shopper'],
['something',13,""]], columns = ['name', 'codeid', 'relation'])

df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation
df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry
df1=df1.drop("relation",axis=1)#drop the unneeded column

final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53062984%2fpandas-dataframe-self-dependency-in-data-to-fill-a-column%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    If want only first dupe value to last duplicated use transform with first and then set NaN values by loc with duplicated:



    df = pd.DataFrame({'id':[1,2,3,4,5],
    'name':list('brslp'),
    'codeid':[11,12,13,11,13]})

    df['relation'] = df.groupby('codeid')['name'].transform('first')
    print (df)
    id name codeid relation
    0 1 b 11 b
    1 2 r 12 r
    2 3 s 13 s
    3 4 l 11 b
    4 5 p 13 s




    #get first duplicated values of codeid
    print (df['codeid'].duplicated(keep='last'))
    0 True
    1 False
    2 True
    3 False
    4 False
    Name: codeid, dtype: bool

    #get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows
    print (~df['codeid'].duplicated(keep=False))
    0 False
    1 True
    2 False
    3 False
    4 False
    Name: codeid, dtype: bool

    #chain boolen mask together
    print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))
    0 True
    1 True
    2 True
    3 False
    4 False
    Name: codeid, dtype: bool




    #replace True values by mask by NaN 
    df.loc[df['codeid'].duplicated(keep='last') |
    ~df['codeid'].duplicated(keep=False), 'relation'] = np.nan
    print (df)
    id name codeid relation
    0 1 b 11 NaN
    1 2 r 12 NaN
    2 3 s 13 NaN
    3 4 l 11 b
    4 5 p 13 s





    share|improve this answer


























    • Could you kindly explain the code, as it seems to be working but it's not working at my end

      – frozen shine
      Oct 30 '18 at 11:39











    • @frozenshine - Can you explain more why not working? Problem in sample data or in real?

      – jezrael
      Oct 30 '18 at 11:40











    • testing your logic on real data. the last statement is making all values NaN not just the first ones.

      – frozen shine
      Oct 30 '18 at 11:43













    • @frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

      – jezrael
      Oct 30 '18 at 11:46











    • No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

      – frozen shine
      Oct 30 '18 at 11:57
















    1














    If want only first dupe value to last duplicated use transform with first and then set NaN values by loc with duplicated:



    df = pd.DataFrame({'id':[1,2,3,4,5],
    'name':list('brslp'),
    'codeid':[11,12,13,11,13]})

    df['relation'] = df.groupby('codeid')['name'].transform('first')
    print (df)
    id name codeid relation
    0 1 b 11 b
    1 2 r 12 r
    2 3 s 13 s
    3 4 l 11 b
    4 5 p 13 s




    #get first duplicated values of codeid
    print (df['codeid'].duplicated(keep='last'))
    0 True
    1 False
    2 True
    3 False
    4 False
    Name: codeid, dtype: bool

    #get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows
    print (~df['codeid'].duplicated(keep=False))
    0 False
    1 True
    2 False
    3 False
    4 False
    Name: codeid, dtype: bool

    #chain boolen mask together
    print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))
    0 True
    1 True
    2 True
    3 False
    4 False
    Name: codeid, dtype: bool




    #replace True values by mask by NaN 
    df.loc[df['codeid'].duplicated(keep='last') |
    ~df['codeid'].duplicated(keep=False), 'relation'] = np.nan
    print (df)
    id name codeid relation
    0 1 b 11 NaN
    1 2 r 12 NaN
    2 3 s 13 NaN
    3 4 l 11 b
    4 5 p 13 s





    share|improve this answer


























    • Could you kindly explain the code, as it seems to be working but it's not working at my end

      – frozen shine
      Oct 30 '18 at 11:39











    • @frozenshine - Can you explain more why not working? Problem in sample data or in real?

      – jezrael
      Oct 30 '18 at 11:40











    • testing your logic on real data. the last statement is making all values NaN not just the first ones.

      – frozen shine
      Oct 30 '18 at 11:43













    • @frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

      – jezrael
      Oct 30 '18 at 11:46











    • No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

      – frozen shine
      Oct 30 '18 at 11:57














    1












    1








    1







    If want only first dupe value to last duplicated use transform with first and then set NaN values by loc with duplicated:



    df = pd.DataFrame({'id':[1,2,3,4,5],
    'name':list('brslp'),
    'codeid':[11,12,13,11,13]})

    df['relation'] = df.groupby('codeid')['name'].transform('first')
    print (df)
    id name codeid relation
    0 1 b 11 b
    1 2 r 12 r
    2 3 s 13 s
    3 4 l 11 b
    4 5 p 13 s




    #get first duplicated values of codeid
    print (df['codeid'].duplicated(keep='last'))
    0 True
    1 False
    2 True
    3 False
    4 False
    Name: codeid, dtype: bool

    #get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows
    print (~df['codeid'].duplicated(keep=False))
    0 False
    1 True
    2 False
    3 False
    4 False
    Name: codeid, dtype: bool

    #chain boolen mask together
    print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))
    0 True
    1 True
    2 True
    3 False
    4 False
    Name: codeid, dtype: bool




    #replace True values by mask by NaN 
    df.loc[df['codeid'].duplicated(keep='last') |
    ~df['codeid'].duplicated(keep=False), 'relation'] = np.nan
    print (df)
    id name codeid relation
    0 1 b 11 NaN
    1 2 r 12 NaN
    2 3 s 13 NaN
    3 4 l 11 b
    4 5 p 13 s





    share|improve this answer















    If want only first dupe value to last duplicated use transform with first and then set NaN values by loc with duplicated:



    df = pd.DataFrame({'id':[1,2,3,4,5],
    'name':list('brslp'),
    'codeid':[11,12,13,11,13]})

    df['relation'] = df.groupby('codeid')['name'].transform('first')
    print (df)
    id name codeid relation
    0 1 b 11 b
    1 2 r 12 r
    2 3 s 13 s
    3 4 l 11 b
    4 5 p 13 s




    #get first duplicated values of codeid
    print (df['codeid'].duplicated(keep='last'))
    0 True
    1 False
    2 True
    3 False
    4 False
    Name: codeid, dtype: bool

    #get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows
    print (~df['codeid'].duplicated(keep=False))
    0 False
    1 True
    2 False
    3 False
    4 False
    Name: codeid, dtype: bool

    #chain boolen mask together
    print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))
    0 True
    1 True
    2 True
    3 False
    4 False
    Name: codeid, dtype: bool




    #replace True values by mask by NaN 
    df.loc[df['codeid'].duplicated(keep='last') |
    ~df['codeid'].duplicated(keep=False), 'relation'] = np.nan
    print (df)
    id name codeid relation
    0 1 b 11 NaN
    1 2 r 12 NaN
    2 3 s 13 NaN
    3 4 l 11 b
    4 5 p 13 s






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Oct 30 '18 at 12:03

























    answered Oct 30 '18 at 11:19









    jezraeljezrael

    329k23270349




    329k23270349













    • Could you kindly explain the code, as it seems to be working but it's not working at my end

      – frozen shine
      Oct 30 '18 at 11:39











    • @frozenshine - Can you explain more why not working? Problem in sample data or in real?

      – jezrael
      Oct 30 '18 at 11:40











    • testing your logic on real data. the last statement is making all values NaN not just the first ones.

      – frozen shine
      Oct 30 '18 at 11:43













    • @frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

      – jezrael
      Oct 30 '18 at 11:46











    • No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

      – frozen shine
      Oct 30 '18 at 11:57



















    • Could you kindly explain the code, as it seems to be working but it's not working at my end

      – frozen shine
      Oct 30 '18 at 11:39











    • @frozenshine - Can you explain more why not working? Problem in sample data or in real?

      – jezrael
      Oct 30 '18 at 11:40











    • testing your logic on real data. the last statement is making all values NaN not just the first ones.

      – frozen shine
      Oct 30 '18 at 11:43













    • @frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

      – jezrael
      Oct 30 '18 at 11:46











    • No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

      – frozen shine
      Oct 30 '18 at 11:57

















    Could you kindly explain the code, as it seems to be working but it's not working at my end

    – frozen shine
    Oct 30 '18 at 11:39





    Could you kindly explain the code, as it seems to be working but it's not working at my end

    – frozen shine
    Oct 30 '18 at 11:39













    @frozenshine - Can you explain more why not working? Problem in sample data or in real?

    – jezrael
    Oct 30 '18 at 11:40





    @frozenshine - Can you explain more why not working? Problem in sample data or in real?

    – jezrael
    Oct 30 '18 at 11:40













    testing your logic on real data. the last statement is making all values NaN not just the first ones.

    – frozen shine
    Oct 30 '18 at 11:43







    testing your logic on real data. the last statement is making all values NaN not just the first ones.

    – frozen shine
    Oct 30 '18 at 11:43















    @frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

    – jezrael
    Oct 30 '18 at 11:46





    @frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

    – jezrael
    Oct 30 '18 at 11:46













    No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

    – frozen shine
    Oct 30 '18 at 11:57





    No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

    – frozen shine
    Oct 30 '18 at 11:57













    0














    I think you want to do something like this:



    import pandas as pd
    df = pd.DataFrame([['bag', 11, 'null'],
    ['shoes', 12, 'null'],
    ['shopper', 13, 'null'],
    ['leather', 11, 'bag'],
    ['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])

    def codeid_analysis(rows):
    if rows['codeid'] == 11:
    rows['relation'] = 'bag'
    elif rows['codeid'] == 12:
    rows['relation'] = 'shirt' #for example. You should put what you want here
    elif rows['codeid'] == 13:
    rows['relation'] = 'pants' #for example. You should put what you want here
    return rows

    result = df.apply(codeid_analysis, axis = 1)
    print(result)





    share|improve this answer
























    • Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

      – frozen shine
      Oct 30 '18 at 11:58
















    0














    I think you want to do something like this:



    import pandas as pd
    df = pd.DataFrame([['bag', 11, 'null'],
    ['shoes', 12, 'null'],
    ['shopper', 13, 'null'],
    ['leather', 11, 'bag'],
    ['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])

    def codeid_analysis(rows):
    if rows['codeid'] == 11:
    rows['relation'] = 'bag'
    elif rows['codeid'] == 12:
    rows['relation'] = 'shirt' #for example. You should put what you want here
    elif rows['codeid'] == 13:
    rows['relation'] = 'pants' #for example. You should put what you want here
    return rows

    result = df.apply(codeid_analysis, axis = 1)
    print(result)





    share|improve this answer
























    • Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

      – frozen shine
      Oct 30 '18 at 11:58














    0












    0








    0







    I think you want to do something like this:



    import pandas as pd
    df = pd.DataFrame([['bag', 11, 'null'],
    ['shoes', 12, 'null'],
    ['shopper', 13, 'null'],
    ['leather', 11, 'bag'],
    ['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])

    def codeid_analysis(rows):
    if rows['codeid'] == 11:
    rows['relation'] = 'bag'
    elif rows['codeid'] == 12:
    rows['relation'] = 'shirt' #for example. You should put what you want here
    elif rows['codeid'] == 13:
    rows['relation'] = 'pants' #for example. You should put what you want here
    return rows

    result = df.apply(codeid_analysis, axis = 1)
    print(result)





    share|improve this answer













    I think you want to do something like this:



    import pandas as pd
    df = pd.DataFrame([['bag', 11, 'null'],
    ['shoes', 12, 'null'],
    ['shopper', 13, 'null'],
    ['leather', 11, 'bag'],
    ['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])

    def codeid_analysis(rows):
    if rows['codeid'] == 11:
    rows['relation'] = 'bag'
    elif rows['codeid'] == 12:
    rows['relation'] = 'shirt' #for example. You should put what you want here
    elif rows['codeid'] == 13:
    rows['relation'] = 'pants' #for example. You should put what you want here
    return rows

    result = df.apply(codeid_analysis, axis = 1)
    print(result)






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Oct 30 '18 at 11:30









    SnedecorSnedecor

    674




    674













    • Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

      – frozen shine
      Oct 30 '18 at 11:58



















    • Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

      – frozen shine
      Oct 30 '18 at 11:58

















    Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

    – frozen shine
    Oct 30 '18 at 11:58





    Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

    – frozen shine
    Oct 30 '18 at 11:58











    0














    It is not the optimal solution since it is costly to your memory, but here is my try. df1 is created in order to hold the null values of the relation column, since it seems that nulls are the first occurrence. After some cleaning, the two dataframes are merged to provide into one.



    import pandas as pd
    df = pd.DataFrame([['bag', 11, 'null'],
    ['shoes', 12, 'null'],
    ['shopper', 13, 'null'],
    ['leather', 11, 'bag'],
    ['plastic', 13, 'shopper'],
    ['something',13,""]], columns = ['name', 'codeid', 'relation'])

    df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation
    df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry
    df1=df1.drop("relation",axis=1)#drop the unneeded column

    final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names





    share|improve this answer




























      0














      It is not the optimal solution since it is costly to your memory, but here is my try. df1 is created in order to hold the null values of the relation column, since it seems that nulls are the first occurrence. After some cleaning, the two dataframes are merged to provide into one.



      import pandas as pd
      df = pd.DataFrame([['bag', 11, 'null'],
      ['shoes', 12, 'null'],
      ['shopper', 13, 'null'],
      ['leather', 11, 'bag'],
      ['plastic', 13, 'shopper'],
      ['something',13,""]], columns = ['name', 'codeid', 'relation'])

      df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation
      df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry
      df1=df1.drop("relation",axis=1)#drop the unneeded column

      final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names





      share|improve this answer


























        0












        0








        0







        It is not the optimal solution since it is costly to your memory, but here is my try. df1 is created in order to hold the null values of the relation column, since it seems that nulls are the first occurrence. After some cleaning, the two dataframes are merged to provide into one.



        import pandas as pd
        df = pd.DataFrame([['bag', 11, 'null'],
        ['shoes', 12, 'null'],
        ['shopper', 13, 'null'],
        ['leather', 11, 'bag'],
        ['plastic', 13, 'shopper'],
        ['something',13,""]], columns = ['name', 'codeid', 'relation'])

        df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation
        df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry
        df1=df1.drop("relation",axis=1)#drop the unneeded column

        final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names





        share|improve this answer













        It is not the optimal solution since it is costly to your memory, but here is my try. df1 is created in order to hold the null values of the relation column, since it seems that nulls are the first occurrence. After some cleaning, the two dataframes are merged to provide into one.



        import pandas as pd
        df = pd.DataFrame([['bag', 11, 'null'],
        ['shoes', 12, 'null'],
        ['shopper', 13, 'null'],
        ['leather', 11, 'bag'],
        ['plastic', 13, 'shopper'],
        ['something',13,""]], columns = ['name', 'codeid', 'relation'])

        df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation
        df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry
        df1=df1.drop("relation",axis=1)#drop the unneeded column

        final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 22 '18 at 20:16









        JoPapou13JoPapou13

        913




        913






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53062984%2fpandas-dataframe-self-dependency-in-data-to-fill-a-column%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            TypeError: fit_transform() missing 1 required positional argument: 'X'