Parallel excel sheet read from dask

Hello All the examples that I came across for using dask thus far has
been multiple csv files in a folder being read using dask read_csv
call.

if I am provided an xlsx file with multiple tabs, can I use anything
in dask to read them parallely?

P.S. I am using pandas 0.19.2 with python 2.7

asked Jun 20 '17 at 13:47

schuler

45110

1

You would be best to write a function to read one tab (taking the tab ID as input), and look into dask's delayed function. Are you wanting to process all the tabs as a single data-frame?

– mdurant
Jun 20 '17 at 14:50

1

This notebook may be of interest: gist.github.com/mrocklin/e7b7b3a65f2835cda813096332ec73ca

– MRocklin
Jun 21 '17 at 4:50

add a comment |

Hello All the examples that I came across for using dask thus far has
been multiple csv files in a folder being read using dask read_csv
call.

if I am provided an xlsx file with multiple tabs, can I use anything
in dask to read them parallely?

P.S. I am using pandas 0.19.2 with python 2.7

asked Jun 20 '17 at 13:47

schuler

45110

1

You would be best to write a function to read one tab (taking the tab ID as input), and look into dask's delayed function. Are you wanting to process all the tabs as a single data-frame?

– mdurant
Jun 20 '17 at 14:50

1

This notebook may be of interest: gist.github.com/mrocklin/e7b7b3a65f2835cda813096332ec73ca

– MRocklin
Jun 21 '17 at 4:50

add a comment |

Hello All the examples that I came across for using dask thus far has
been multiple csv files in a folder being read using dask read_csv
call.

if I am provided an xlsx file with multiple tabs, can I use anything
in dask to read them parallely?

P.S. I am using pandas 0.19.2 with python 2.7

asked Jun 20 '17 at 13:47

schuler

45110

Hello All the examples that I came across for using dask thus far has
been multiple csv files in a folder being read using dask read_csv
call.

if I am provided an xlsx file with multiple tabs, can I use anything
in dask to read them parallely?

P.S. I am using pandas 0.19.2 with python 2.7

python-2.7 dask

asked Jun 20 '17 at 13:47

schuler

45110

asked Jun 20 '17 at 13:47

schuler

45110

asked Jun 20 '17 at 13:47

schuler

45110

asked Jun 20 '17 at 13:47

schuler

45110

asked Jun 20 '17 at 13:47

schuler

45110

1

You would be best to write a function to read one tab (taking the tab ID as input), and look into dask's delayed function. Are you wanting to process all the tabs as a single data-frame?

– mdurant
Jun 20 '17 at 14:50

1

This notebook may be of interest: gist.github.com/mrocklin/e7b7b3a65f2835cda813096332ec73ca

– MRocklin
Jun 21 '17 at 4:50

add a comment |

1

You would be best to write a function to read one tab (taking the tab ID as input), and look into dask's delayed function. Are you wanting to process all the tabs as a single data-frame?

– mdurant
Jun 20 '17 at 14:50

1

This notebook may be of interest: gist.github.com/mrocklin/e7b7b3a65f2835cda813096332ec73ca

– MRocklin
Jun 21 '17 at 4:50

You would be best to write a function to read one tab (taking the tab ID as input), and look into dask's delayed function. Are you wanting to process all the tabs as a single data-frame?

– mdurant
Jun 20 '17 at 14:50

This notebook may be of interest: gist.github.com/mrocklin/e7b7b3a65f2835cda813096332ec73ca

– MRocklin
Jun 21 '17 at 4:50

add a comment |

2 Answers
2

active

oldest

votes

A simple example

fn = 'my_file.xlsx'

parts = dask.delayed(pd.read_excel)(fn, i, **other_options) for i in range(number_of_sheets)

df = dd.from_delayed(parts, meta=parts[0].compute())

Assuming you provide the "other options" to extract the data (which is uniform across sheets) and you want to make a single master data-frame out of the set.

Note that I don't know the internals of the excel reader, so how parallel the reading/parsing part would be is uncertain, but subsequent computations once the data are in memory would definitely be.

answered Jun 21 '17 at 14:55

mdurant

10.3k11436

add a comment |

For those using Python 3.6:

#reading the file using dask

import dask

import dask.dataframe as dd

from dask.delayed import delayed



parts = dask.delayed(pd.read_excel)(excel_file, sheet_name=0, usecols = [1, 2, 7])

df = dd.from_delayed(parts)



print(df.head())

I'm seeing a 50% speed increase on load on a i7, 16GB 5th Gen machine.

answered Nov 23 '18 at 11:25

zorze

815

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44654906%2fparallel-excel-sheet-read-from-dask%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

A simple example

fn = 'my_file.xlsx'

parts = dask.delayed(pd.read_excel)(fn, i, **other_options) for i in range(number_of_sheets)

df = dd.from_delayed(parts, meta=parts[0].compute())

Assuming you provide the "other options" to extract the data (which is uniform across sheets) and you want to make a single master data-frame out of the set.

answered Jun 21 '17 at 14:55

mdurant

10.3k11436

add a comment |

A simple example

fn = 'my_file.xlsx'

parts = dask.delayed(pd.read_excel)(fn, i, **other_options) for i in range(number_of_sheets)

df = dd.from_delayed(parts, meta=parts[0].compute())

Assuming you provide the "other options" to extract the data (which is uniform across sheets) and you want to make a single master data-frame out of the set.

answered Jun 21 '17 at 14:55

mdurant

10.3k11436

add a comment |

A simple example

fn = 'my_file.xlsx'

parts = dask.delayed(pd.read_excel)(fn, i, **other_options) for i in range(number_of_sheets)

df = dd.from_delayed(parts, meta=parts[0].compute())

Assuming you provide the "other options" to extract the data (which is uniform across sheets) and you want to make a single master data-frame out of the set.

answered Jun 21 '17 at 14:55

mdurant

10.3k11436

A simple example

fn = 'my_file.xlsx'

parts = dask.delayed(pd.read_excel)(fn, i, **other_options) for i in range(number_of_sheets)

df = dd.from_delayed(parts, meta=parts[0].compute())

Assuming you provide the "other options" to extract the data (which is uniform across sheets) and you want to make a single master data-frame out of the set.

answered Jun 21 '17 at 14:55

mdurant

10.3k11436

answered Jun 21 '17 at 14:55

mdurant

10.3k11436

answered Jun 21 '17 at 14:55

mdurant

10.3k11436

answered Jun 21 '17 at 14:55

mdurant

10.3k11436

add a comment |

For those using Python 3.6:

#reading the file using dask

import dask

import dask.dataframe as dd

from dask.delayed import delayed



parts = dask.delayed(pd.read_excel)(excel_file, sheet_name=0, usecols = [1, 2, 7])

df = dd.from_delayed(parts)



print(df.head())

I'm seeing a 50% speed increase on load on a i7, 16GB 5th Gen machine.

answered Nov 23 '18 at 11:25

zorze

815

add a comment |

For those using Python 3.6:

#reading the file using dask

import dask

import dask.dataframe as dd

from dask.delayed import delayed



parts = dask.delayed(pd.read_excel)(excel_file, sheet_name=0, usecols = [1, 2, 7])

df = dd.from_delayed(parts)



print(df.head())

I'm seeing a 50% speed increase on load on a i7, 16GB 5th Gen machine.

answered Nov 23 '18 at 11:25

zorze

815

add a comment |

For those using Python 3.6:

#reading the file using dask

import dask

import dask.dataframe as dd

from dask.delayed import delayed



parts = dask.delayed(pd.read_excel)(excel_file, sheet_name=0, usecols = [1, 2, 7])

df = dd.from_delayed(parts)



print(df.head())

I'm seeing a 50% speed increase on load on a i7, 16GB 5th Gen machine.

answered Nov 23 '18 at 11:25

zorze

815

For those using Python 3.6:

#reading the file using dask

import dask

import dask.dataframe as dd

from dask.delayed import delayed



parts = dask.delayed(pd.read_excel)(excel_file, sheet_name=0, usecols = [1, 2, 7])

df = dd.from_delayed(parts)



print(df.head())

I'm seeing a 50% speed increase on load on a i7, 16GB 5th Gen machine.

answered Nov 23 '18 at 11:25

zorze

815

answered Nov 23 '18 at 11:25

zorze

815

answered Nov 23 '18 at 11:25

zorze

815

answered Nov 23 '18 at 11:25

zorze

815

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk