Python multiprocessing.Pool slower than sequential execution












1















I'm trying to write a program that operates on a long list of elements (the list is called training_set in the code example). Each row of the list contains two numbers that have to be found on another list called IDs: hence, my program iterates on training_set's rows and, for each of them, finds the corresponding 2 numbers in IDs and then performs some more computation (not shown in the code).



With sequential execution, this requires about 300s. Since each row of training_set is indipendend from the others, I was thinking of parallelizing the computation by splitting the input by the #cpu_cores, using multiprocessing.Pool.
However, the parallelized version is slower than the sequential one.



num_procs = int(multiprocessing.cpu_count())

with open("training_set.txt", "r") as f:
reader = csv.reader(f)
training_set = list(reader)
training_set = [element[0].split(" ") for element in training_set]

with open("node_information.csv", "r") as f:
reader = csv.reader(f)
node_info = list(reader)
IDs = [element[0] for element in node_info]

batch_size = int(len(training_set)/num_procs)
inputs=

# split list into batches to feed to the different threads
for i in range(num_procs):
if i == (num_procs-1): inputs.append(list(training_set[int(i*batch_size):(len(training_set)-1)]))
else: inputs.append(list(training_set[int(i*batch_size): int((i+1)*batch_size)]))

def init(IDs):
global identities
identities = copy.deepcopy(IDs)

def analyze_pairs(partialList):
pairsSet = copy.deepcopy(partialList)
for i in range(len(pairsSet)):
source = pairsSet[i][0] # an ID of edges
target = pairsSet[i][1] # an ID of edges
## find an index maching to the source ID
index_source = identities.index(source)
index_target = identities.index(target)
***additional computation***

if __name__ == '__main__':
pool = Pool(num_procs, initializer=init, initargs=(IDs,))
training_features = pool.map(analyze_pairs, inputs)


I'm not showing the rest of the code of the for loop (at the end of analyze_pairs()) because the problem persists even if i remove that code, hence it's not there that the problem resides.)



I know that there are already many questions on this topic, but I couldn't find a solution for my case.
I don't think that here parallelism introduces more overhead than speedup because the input of each thread is large (on a 8 threads cpu, each thread should take at least 35s) and there is no explicit message passing. I also tried to use copy.deepcopy to make sure that each thread works on a separate list(althought it shouldn't be a problem since each thread only performs 'read' actions on the list), but it didn't work.

What could the problem be? Thanks in advance.










share|improve this question























  • you might be loading the files multiple times which could take a long time; maybe put the file loading code into the __main__?

    – Sam Mason
    Nov 25 '18 at 23:27











  • @SamMason i tried but it didn't help, anyway thank you for your answer ;)

    – gnigni
    Nov 25 '18 at 23:41











  • my guess is still around file loading time; have you tried adding print statements at various points in the code to figure out where your time is going?

    – Sam Mason
    Nov 26 '18 at 21:51
















1















I'm trying to write a program that operates on a long list of elements (the list is called training_set in the code example). Each row of the list contains two numbers that have to be found on another list called IDs: hence, my program iterates on training_set's rows and, for each of them, finds the corresponding 2 numbers in IDs and then performs some more computation (not shown in the code).



With sequential execution, this requires about 300s. Since each row of training_set is indipendend from the others, I was thinking of parallelizing the computation by splitting the input by the #cpu_cores, using multiprocessing.Pool.
However, the parallelized version is slower than the sequential one.



num_procs = int(multiprocessing.cpu_count())

with open("training_set.txt", "r") as f:
reader = csv.reader(f)
training_set = list(reader)
training_set = [element[0].split(" ") for element in training_set]

with open("node_information.csv", "r") as f:
reader = csv.reader(f)
node_info = list(reader)
IDs = [element[0] for element in node_info]

batch_size = int(len(training_set)/num_procs)
inputs=

# split list into batches to feed to the different threads
for i in range(num_procs):
if i == (num_procs-1): inputs.append(list(training_set[int(i*batch_size):(len(training_set)-1)]))
else: inputs.append(list(training_set[int(i*batch_size): int((i+1)*batch_size)]))

def init(IDs):
global identities
identities = copy.deepcopy(IDs)

def analyze_pairs(partialList):
pairsSet = copy.deepcopy(partialList)
for i in range(len(pairsSet)):
source = pairsSet[i][0] # an ID of edges
target = pairsSet[i][1] # an ID of edges
## find an index maching to the source ID
index_source = identities.index(source)
index_target = identities.index(target)
***additional computation***

if __name__ == '__main__':
pool = Pool(num_procs, initializer=init, initargs=(IDs,))
training_features = pool.map(analyze_pairs, inputs)


I'm not showing the rest of the code of the for loop (at the end of analyze_pairs()) because the problem persists even if i remove that code, hence it's not there that the problem resides.)



I know that there are already many questions on this topic, but I couldn't find a solution for my case.
I don't think that here parallelism introduces more overhead than speedup because the input of each thread is large (on a 8 threads cpu, each thread should take at least 35s) and there is no explicit message passing. I also tried to use copy.deepcopy to make sure that each thread works on a separate list(althought it shouldn't be a problem since each thread only performs 'read' actions on the list), but it didn't work.

What could the problem be? Thanks in advance.










share|improve this question























  • you might be loading the files multiple times which could take a long time; maybe put the file loading code into the __main__?

    – Sam Mason
    Nov 25 '18 at 23:27











  • @SamMason i tried but it didn't help, anyway thank you for your answer ;)

    – gnigni
    Nov 25 '18 at 23:41











  • my guess is still around file loading time; have you tried adding print statements at various points in the code to figure out where your time is going?

    – Sam Mason
    Nov 26 '18 at 21:51














1












1








1








I'm trying to write a program that operates on a long list of elements (the list is called training_set in the code example). Each row of the list contains two numbers that have to be found on another list called IDs: hence, my program iterates on training_set's rows and, for each of them, finds the corresponding 2 numbers in IDs and then performs some more computation (not shown in the code).



With sequential execution, this requires about 300s. Since each row of training_set is indipendend from the others, I was thinking of parallelizing the computation by splitting the input by the #cpu_cores, using multiprocessing.Pool.
However, the parallelized version is slower than the sequential one.



num_procs = int(multiprocessing.cpu_count())

with open("training_set.txt", "r") as f:
reader = csv.reader(f)
training_set = list(reader)
training_set = [element[0].split(" ") for element in training_set]

with open("node_information.csv", "r") as f:
reader = csv.reader(f)
node_info = list(reader)
IDs = [element[0] for element in node_info]

batch_size = int(len(training_set)/num_procs)
inputs=

# split list into batches to feed to the different threads
for i in range(num_procs):
if i == (num_procs-1): inputs.append(list(training_set[int(i*batch_size):(len(training_set)-1)]))
else: inputs.append(list(training_set[int(i*batch_size): int((i+1)*batch_size)]))

def init(IDs):
global identities
identities = copy.deepcopy(IDs)

def analyze_pairs(partialList):
pairsSet = copy.deepcopy(partialList)
for i in range(len(pairsSet)):
source = pairsSet[i][0] # an ID of edges
target = pairsSet[i][1] # an ID of edges
## find an index maching to the source ID
index_source = identities.index(source)
index_target = identities.index(target)
***additional computation***

if __name__ == '__main__':
pool = Pool(num_procs, initializer=init, initargs=(IDs,))
training_features = pool.map(analyze_pairs, inputs)


I'm not showing the rest of the code of the for loop (at the end of analyze_pairs()) because the problem persists even if i remove that code, hence it's not there that the problem resides.)



I know that there are already many questions on this topic, but I couldn't find a solution for my case.
I don't think that here parallelism introduces more overhead than speedup because the input of each thread is large (on a 8 threads cpu, each thread should take at least 35s) and there is no explicit message passing. I also tried to use copy.deepcopy to make sure that each thread works on a separate list(althought it shouldn't be a problem since each thread only performs 'read' actions on the list), but it didn't work.

What could the problem be? Thanks in advance.










share|improve this question














I'm trying to write a program that operates on a long list of elements (the list is called training_set in the code example). Each row of the list contains two numbers that have to be found on another list called IDs: hence, my program iterates on training_set's rows and, for each of them, finds the corresponding 2 numbers in IDs and then performs some more computation (not shown in the code).



With sequential execution, this requires about 300s. Since each row of training_set is indipendend from the others, I was thinking of parallelizing the computation by splitting the input by the #cpu_cores, using multiprocessing.Pool.
However, the parallelized version is slower than the sequential one.



num_procs = int(multiprocessing.cpu_count())

with open("training_set.txt", "r") as f:
reader = csv.reader(f)
training_set = list(reader)
training_set = [element[0].split(" ") for element in training_set]

with open("node_information.csv", "r") as f:
reader = csv.reader(f)
node_info = list(reader)
IDs = [element[0] for element in node_info]

batch_size = int(len(training_set)/num_procs)
inputs=

# split list into batches to feed to the different threads
for i in range(num_procs):
if i == (num_procs-1): inputs.append(list(training_set[int(i*batch_size):(len(training_set)-1)]))
else: inputs.append(list(training_set[int(i*batch_size): int((i+1)*batch_size)]))

def init(IDs):
global identities
identities = copy.deepcopy(IDs)

def analyze_pairs(partialList):
pairsSet = copy.deepcopy(partialList)
for i in range(len(pairsSet)):
source = pairsSet[i][0] # an ID of edges
target = pairsSet[i][1] # an ID of edges
## find an index maching to the source ID
index_source = identities.index(source)
index_target = identities.index(target)
***additional computation***

if __name__ == '__main__':
pool = Pool(num_procs, initializer=init, initargs=(IDs,))
training_features = pool.map(analyze_pairs, inputs)


I'm not showing the rest of the code of the for loop (at the end of analyze_pairs()) because the problem persists even if i remove that code, hence it's not there that the problem resides.)



I know that there are already many questions on this topic, but I couldn't find a solution for my case.
I don't think that here parallelism introduces more overhead than speedup because the input of each thread is large (on a 8 threads cpu, each thread should take at least 35s) and there is no explicit message passing. I also tried to use copy.deepcopy to make sure that each thread works on a separate list(althought it shouldn't be a problem since each thread only performs 'read' actions on the list), but it didn't work.

What could the problem be? Thanks in advance.







python multiprocessing threadpool






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 25 '18 at 22:16









gnignignigni

61




61













  • you might be loading the files multiple times which could take a long time; maybe put the file loading code into the __main__?

    – Sam Mason
    Nov 25 '18 at 23:27











  • @SamMason i tried but it didn't help, anyway thank you for your answer ;)

    – gnigni
    Nov 25 '18 at 23:41











  • my guess is still around file loading time; have you tried adding print statements at various points in the code to figure out where your time is going?

    – Sam Mason
    Nov 26 '18 at 21:51



















  • you might be loading the files multiple times which could take a long time; maybe put the file loading code into the __main__?

    – Sam Mason
    Nov 25 '18 at 23:27











  • @SamMason i tried but it didn't help, anyway thank you for your answer ;)

    – gnigni
    Nov 25 '18 at 23:41











  • my guess is still around file loading time; have you tried adding print statements at various points in the code to figure out where your time is going?

    – Sam Mason
    Nov 26 '18 at 21:51

















you might be loading the files multiple times which could take a long time; maybe put the file loading code into the __main__?

– Sam Mason
Nov 25 '18 at 23:27





you might be loading the files multiple times which could take a long time; maybe put the file loading code into the __main__?

– Sam Mason
Nov 25 '18 at 23:27













@SamMason i tried but it didn't help, anyway thank you for your answer ;)

– gnigni
Nov 25 '18 at 23:41





@SamMason i tried but it didn't help, anyway thank you for your answer ;)

– gnigni
Nov 25 '18 at 23:41













my guess is still around file loading time; have you tried adding print statements at various points in the code to figure out where your time is going?

– Sam Mason
Nov 26 '18 at 21:51





my guess is still around file loading time; have you tried adding print statements at various points in the code to figure out where your time is going?

– Sam Mason
Nov 26 '18 at 21:51












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472551%2fpython-multiprocessing-pool-slower-than-sequential-execution%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472551%2fpython-multiprocessing-pool-slower-than-sequential-execution%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

Refactoring coordinates for Minecraft Pi buildings written in Python