How do I unnest (explode) a column in a pandas DataFrame?

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

edited Dec 10 '18 at 15:06

asked Nov 9 '18 at 2:19

W-B

102k73163

2

Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 '18 at 12:00

Couple of related posts: here, here, here, here, ...
– Cleb
2 days ago

add a comment |

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

edited Dec 10 '18 at 15:06

asked Nov 9 '18 at 2:19

W-B

102k73163

2

Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 '18 at 12:00

Couple of related posts: here, here, here, here, ...
– Cleb
2 days ago

add a comment |

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

edited Dec 10 '18 at 15:06

asked Nov 9 '18 at 2:19

W-B

102k73163

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

python pandas dataframe

edited Dec 10 '18 at 15:06

asked Nov 9 '18 at 2:19

W-B

102k73163

edited Dec 10 '18 at 15:06

asked Nov 9 '18 at 2:19

W-B

102k73163

edited Dec 10 '18 at 15:06

asked Nov 9 '18 at 2:19

W-B

102k73163

asked Nov 9 '18 at 2:19

W-B

102k73163

asked Nov 9 '18 at 2:19

W-B

102k73163

2

Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 '18 at 12:00

Couple of related posts: here, here, here, here, ...
– Cleb
2 days ago

add a comment |

2

Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 '18 at 12:00

Couple of related posts: here, here, here, here, ...
– Cleb
2 days ago

Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 '18 at 12:00

Couple of related posts: here, here, here, here, ...
– Cleb
2 days ago

add a comment |

5 Answers
5

active

oldest

votes

+100

As an user with both R and python, I have seen this type of question a couple of times.

In R, they have the built-in function from package tidyr called unnest. But in Python(pandas) there is no built-in function for this type of question.

I know object columns type always make the data hard to convert with a pandas' function. When I received the data like this , the first thing that came to mind was to 'flatten' or unnest the columns .

Method 1
apply + pd.Series (easy to understand but in terms of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly the same as before, add reindex at the end.

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contains unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance:

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case (two columns type object)

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

I am using pandas and python functions for this type of question. If you are worried about the speed of the above solutions, check user3483203's answer , since he is using numpy and most of the time numpy is faster . I recommend Cpython and numba if speed matters in your case.

edited Dec 26 '18 at 15:24

Jack Moody

442419

answered Nov 9 '18 at 2:20

W-B

102k73163

add a comment |

+100

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

30.3k82354

add a comment |

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered Dec 1 '18 at 1:31

Daniel Mesejo

14.3k11027

add a comment |

My 5 cents:

df[['B', 'B2']] = pd.DataFrame(df['B'].values.tolist())



df[['A', 'B']].append(df[['A', 'B2']].rename(columns={'B2': 'B'}),

                      ignore_index=True)

and another 5

df[['B1', 'B2']] = pd.DataFrame([*df['B']]) # if values.tolist() is too boring



(pd.wide_to_long(df.drop('B', 1), 'B', 'A', '')

 .reset_index(level=1, drop=True)

 .reset_index())

both resulting in the same

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,110514

add a comment |

-1

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 '18 at 2:40

U9-Forward

13.3k21237

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53218931%2fhow-do-i-unnest-explode-a-column-in-a-pandas-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

+100

As an user with both R and python, I have seen this type of question a couple of times.

In R, they have the built-in function from package tidyr called unnest. But in Python(pandas) there is no built-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in terms of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly the same as before, add reindex at the end.

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contains unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance:

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case (two columns type object)

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited Dec 26 '18 at 15:24

Jack Moody

442419

answered Nov 9 '18 at 2:20

W-B

102k73163

add a comment |

+100

As an user with both R and python, I have seen this type of question a couple of times.

In R, they have the built-in function from package tidyr called unnest. But in Python(pandas) there is no built-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in terms of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly the same as before, add reindex at the end.

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contains unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance:

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case (two columns type object)

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited Dec 26 '18 at 15:24

Jack Moody

442419

answered Nov 9 '18 at 2:20

W-B

102k73163

add a comment |

+100

As an user with both R and python, I have seen this type of question a couple of times.

In R, they have the built-in function from package tidyr called unnest. But in Python(pandas) there is no built-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in terms of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly the same as before, add reindex at the end.

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contains unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance:

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case (two columns type object)

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited Dec 26 '18 at 15:24

Jack Moody

442419

answered Nov 9 '18 at 2:20

W-B

102k73163

As an user with both R and python, I have seen this type of question a couple of times.

In R, they have the built-in function from package tidyr called unnest. But in Python(pandas) there is no built-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in terms of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly the same as before, add reindex at the end.

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contains unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance:

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case (two columns type object)

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited Dec 26 '18 at 15:24

Jack Moody

442419

answered Nov 9 '18 at 2:20

W-B

102k73163

edited Dec 26 '18 at 15:24

Jack Moody

442419

edited Dec 26 '18 at 15:24

Jack Moody

442419

edited Dec 26 '18 at 15:24

Jack Moody

442419

answered Nov 9 '18 at 2:20

W-B

102k73163

answered Nov 9 '18 at 2:20

W-B

102k73163

answered Nov 9 '18 at 2:20

W-B

102k73163

add a comment |

+100

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

30.3k82354

add a comment |

+100

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

30.3k82354

add a comment |

+100

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

30.3k82354

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

30.3k82354

edited Nov 9 '18 at 4:15

answered Nov 9 '18 at 2:35

user3483203

30.3k82354

answered Nov 9 '18 at 2:35

user3483203

30.3k82354

answered Nov 9 '18 at 2:35

user3483203

30.3k82354

add a comment |

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered Dec 1 '18 at 1:31

Daniel Mesejo

14.3k11027

add a comment |

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered Dec 1 '18 at 1:31

Daniel Mesejo

14.3k11027

add a comment |

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered Dec 1 '18 at 1:31

Daniel Mesejo

14.3k11027

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered Dec 1 '18 at 1:31

Daniel Mesejo

14.3k11027

answered Dec 1 '18 at 1:31

Daniel Mesejo

14.3k11027

answered Dec 1 '18 at 1:31

Daniel Mesejo

14.3k11027

answered Dec 1 '18 at 1:31

Daniel Mesejo

14.3k11027

add a comment |

My 5 cents:

df[['B', 'B2']] = pd.DataFrame(df['B'].values.tolist())



df[['A', 'B']].append(df[['A', 'B2']].rename(columns={'B2': 'B'}),

                      ignore_index=True)

and another 5

df[['B1', 'B2']] = pd.DataFrame([*df['B']]) # if values.tolist() is too boring



(pd.wide_to_long(df.drop('B', 1), 'B', 'A', '')

 .reset_index(level=1, drop=True)

 .reset_index())

both resulting in the same

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,110514

add a comment |

My 5 cents:

df[['B', 'B2']] = pd.DataFrame(df['B'].values.tolist())



df[['A', 'B']].append(df[['A', 'B2']].rename(columns={'B2': 'B'}),

                      ignore_index=True)

and another 5

df[['B1', 'B2']] = pd.DataFrame([*df['B']]) # if values.tolist() is too boring



(pd.wide_to_long(df.drop('B', 1), 'B', 'A', '')

 .reset_index(level=1, drop=True)

 .reset_index())

both resulting in the same

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,110514

add a comment |

My 5 cents:

df[['B', 'B2']] = pd.DataFrame(df['B'].values.tolist())



df[['A', 'B']].append(df[['A', 'B2']].rename(columns={'B2': 'B'}),

                      ignore_index=True)

and another 5

df[['B1', 'B2']] = pd.DataFrame([*df['B']]) # if values.tolist() is too boring



(pd.wide_to_long(df.drop('B', 1), 'B', 'A', '')

 .reset_index(level=1, drop=True)

 .reset_index())

both resulting in the same

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,110514

My 5 cents:

df[['B', 'B2']] = pd.DataFrame(df['B'].values.tolist())



df[['A', 'B']].append(df[['A', 'B2']].rename(columns={'B2': 'B'}),

                      ignore_index=True)

and another 5

df[['B1', 'B2']] = pd.DataFrame([*df['B']]) # if values.tolist() is too boring



(pd.wide_to_long(df.drop('B', 1), 'B', 'A', '')

 .reset_index(level=1, drop=True)

 .reset_index())

both resulting in the same

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,110514

edited Dec 11 '18 at 3:50

answered Dec 11 '18 at 2:05

ayorgo

1,110514

answered Dec 11 '18 at 2:05

ayorgo

1,110514

answered Dec 11 '18 at 2:05

ayorgo

1,110514

add a comment |

-1

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 '18 at 2:40

U9-Forward

13.3k21237

add a comment |

-1

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 '18 at 2:40

U9-Forward

13.3k21237

add a comment |

-1

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 '18 at 2:40

U9-Forward

13.3k21237

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 '18 at 2:40

U9-Forward

13.3k21237

answered Nov 9 '18 at 2:40

U9-Forward

13.3k21237

answered Nov 9 '18 at 2:40

U9-Forward

13.3k21237

answered Nov 9 '18 at 2:40

U9-Forward

13.3k21237

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

qh,32,q,t1ddge Fm r4h,nP7LoXL Pnqzitu 2Xeb4xADLKpxz

搜尋此網誌

Tukukkk