Is continually writing to a file detrimental to the performance of a program?
Imagine a parallel "high performance program" that reads in files, each process performs a task on the input data and then each process writes an output for the task to a single shared output file before repeating this procedure.
In terms of performance, is it inefficient to write the outputs to a file as each process finishes a task?
Would it be more efficient to store the results in an array and write the array to an output file at the end?
c parallel-processing io hpc
add a comment |
Imagine a parallel "high performance program" that reads in files, each process performs a task on the input data and then each process writes an output for the task to a single shared output file before repeating this procedure.
In terms of performance, is it inefficient to write the outputs to a file as each process finishes a task?
Would it be more efficient to store the results in an array and write the array to an output file at the end?
c parallel-processing io hpc
1
You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or someStopwatch
instances).
– Uwe Keim
Nov 22 '18 at 22:18
Please do not add "Thank you" to questions.
– Uwe Keim
Nov 22 '18 at 22:18
The answer depends on context. It can be either way. Code it both ways and benchmark them to see.
– Craig Estey
Nov 22 '18 at 22:24
If your platform supports memory-mapped files, you don't have to choose.
– EOF
Nov 22 '18 at 23:28
I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.
– High Performance Mark
Nov 23 '18 at 8:32
add a comment |
Imagine a parallel "high performance program" that reads in files, each process performs a task on the input data and then each process writes an output for the task to a single shared output file before repeating this procedure.
In terms of performance, is it inefficient to write the outputs to a file as each process finishes a task?
Would it be more efficient to store the results in an array and write the array to an output file at the end?
c parallel-processing io hpc
Imagine a parallel "high performance program" that reads in files, each process performs a task on the input data and then each process writes an output for the task to a single shared output file before repeating this procedure.
In terms of performance, is it inefficient to write the outputs to a file as each process finishes a task?
Would it be more efficient to store the results in an array and write the array to an output file at the end?
c parallel-processing io hpc
c parallel-processing io hpc
edited Nov 22 '18 at 22:39
HCF3301
asked Nov 22 '18 at 22:17
HCF3301HCF3301
456
456
1
You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or someStopwatch
instances).
– Uwe Keim
Nov 22 '18 at 22:18
Please do not add "Thank you" to questions.
– Uwe Keim
Nov 22 '18 at 22:18
The answer depends on context. It can be either way. Code it both ways and benchmark them to see.
– Craig Estey
Nov 22 '18 at 22:24
If your platform supports memory-mapped files, you don't have to choose.
– EOF
Nov 22 '18 at 23:28
I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.
– High Performance Mark
Nov 23 '18 at 8:32
add a comment |
1
You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or someStopwatch
instances).
– Uwe Keim
Nov 22 '18 at 22:18
Please do not add "Thank you" to questions.
– Uwe Keim
Nov 22 '18 at 22:18
The answer depends on context. It can be either way. Code it both ways and benchmark them to see.
– Craig Estey
Nov 22 '18 at 22:24
If your platform supports memory-mapped files, you don't have to choose.
– EOF
Nov 22 '18 at 23:28
I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.
– High Performance Mark
Nov 23 '18 at 8:32
1
1
You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or some
Stopwatch
instances).– Uwe Keim
Nov 22 '18 at 22:18
You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or some
Stopwatch
instances).– Uwe Keim
Nov 22 '18 at 22:18
Please do not add "Thank you" to questions.
– Uwe Keim
Nov 22 '18 at 22:18
Please do not add "Thank you" to questions.
– Uwe Keim
Nov 22 '18 at 22:18
The answer depends on context. It can be either way. Code it both ways and benchmark them to see.
– Craig Estey
Nov 22 '18 at 22:24
The answer depends on context. It can be either way. Code it both ways and benchmark them to see.
– Craig Estey
Nov 22 '18 at 22:24
If your platform supports memory-mapped files, you don't have to choose.
– EOF
Nov 22 '18 at 23:28
If your platform supports memory-mapped files, you don't have to choose.
– EOF
Nov 22 '18 at 23:28
I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.
– High Performance Mark
Nov 23 '18 at 8:32
I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.
– High Performance Mark
Nov 23 '18 at 8:32
add a comment |
2 Answers
2
active
oldest
votes
This is problem where full IO disk read write has to be exploited without delays from client processes or threads. If Std C library calls are used it uses memory buffer that gets flushed at newline or fflush() call is made. If the data is not big enough using array is efficient that can be written to file at the end so performance demanding task will not suffer IO delays.
add a comment |
Files are usually slower than RAM. However, how much slower? If it's less than 1% slowdown, most people would not care. If it's 50% slowdown, some people would still not care.
As always with performance, measure it one way and the other, and then decide whether the difference is significant. The decision will usually depend on factors which are highly application-specific.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53438574%2fis-continually-writing-to-a-file-detrimental-to-the-performance-of-a-program%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
This is problem where full IO disk read write has to be exploited without delays from client processes or threads. If Std C library calls are used it uses memory buffer that gets flushed at newline or fflush() call is made. If the data is not big enough using array is efficient that can be written to file at the end so performance demanding task will not suffer IO delays.
add a comment |
This is problem where full IO disk read write has to be exploited without delays from client processes or threads. If Std C library calls are used it uses memory buffer that gets flushed at newline or fflush() call is made. If the data is not big enough using array is efficient that can be written to file at the end so performance demanding task will not suffer IO delays.
add a comment |
This is problem where full IO disk read write has to be exploited without delays from client processes or threads. If Std C library calls are used it uses memory buffer that gets flushed at newline or fflush() call is made. If the data is not big enough using array is efficient that can be written to file at the end so performance demanding task will not suffer IO delays.
This is problem where full IO disk read write has to be exploited without delays from client processes or threads. If Std C library calls are used it uses memory buffer that gets flushed at newline or fflush() call is made. If the data is not big enough using array is efficient that can be written to file at the end so performance demanding task will not suffer IO delays.
answered Nov 23 '18 at 2:43
anandanand
1375
1375
add a comment |
add a comment |
Files are usually slower than RAM. However, how much slower? If it's less than 1% slowdown, most people would not care. If it's 50% slowdown, some people would still not care.
As always with performance, measure it one way and the other, and then decide whether the difference is significant. The decision will usually depend on factors which are highly application-specific.
add a comment |
Files are usually slower than RAM. However, how much slower? If it's less than 1% slowdown, most people would not care. If it's 50% slowdown, some people would still not care.
As always with performance, measure it one way and the other, and then decide whether the difference is significant. The decision will usually depend on factors which are highly application-specific.
add a comment |
Files are usually slower than RAM. However, how much slower? If it's less than 1% slowdown, most people would not care. If it's 50% slowdown, some people would still not care.
As always with performance, measure it one way and the other, and then decide whether the difference is significant. The decision will usually depend on factors which are highly application-specific.
Files are usually slower than RAM. However, how much slower? If it's less than 1% slowdown, most people would not care. If it's 50% slowdown, some people would still not care.
As always with performance, measure it one way and the other, and then decide whether the difference is significant. The decision will usually depend on factors which are highly application-specific.
answered Nov 22 '18 at 22:22
anatolyganatolyg
16.9k44590
16.9k44590
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53438574%2fis-continually-writing-to-a-file-detrimental-to-the-performance-of-a-program%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You could try creating a MVCE for both scenarios and do some measurements (e. g. use a performance profiler and/or a memory profiler and/or some
Stopwatch
instances).– Uwe Keim
Nov 22 '18 at 22:18
Please do not add "Thank you" to questions.
– Uwe Keim
Nov 22 '18 at 22:18
The answer depends on context. It can be either way. Code it both ways and benchmark them to see.
– Craig Estey
Nov 22 '18 at 22:24
If your platform supports memory-mapped files, you don't have to choose.
– EOF
Nov 22 '18 at 23:28
I read the first paragraph and now I'm imagining 65,356 processes each taking its turn to write to the single shared output file. Even with far fewer processes in play this is going to look pretty much like a serial program and it's not a good outline design.
– High Performance Mark
Nov 23 '18 at 8:32