Concatenate specific number of files
up vote
1
down vote
favorite
I have a bunch of files named uv_set_XXXXXXXX
where the 6 Xs stand for the usual format year, month and day. Imagine I have 325 files of this type. I would like to concatenate by groups of 50 files, so in the end I have 7 files (6 files of 50 and 1 of 25).
I have been thinking in using cat
but I can't see an option to select a number of files from a list. I could do this with Python, but just wondering if some Unix command line utility does it more directly.
Thanks.
bash command cat
add a comment |
up vote
1
down vote
favorite
I have a bunch of files named uv_set_XXXXXXXX
where the 6 Xs stand for the usual format year, month and day. Imagine I have 325 files of this type. I would like to concatenate by groups of 50 files, so in the end I have 7 files (6 files of 50 and 1 of 25).
I have been thinking in using cat
but I can't see an option to select a number of files from a list. I could do this with Python, but just wondering if some Unix command line utility does it more directly.
Thanks.
bash command cat
do you want the file names to be ordered on date (YYMMDD?) before splitting by 50
– stack0114106
Nov 19 at 13:21
@stack0114106: no, it's not necessary at this moment. Thank you.
– David
Nov 19 at 13:30
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a bunch of files named uv_set_XXXXXXXX
where the 6 Xs stand for the usual format year, month and day. Imagine I have 325 files of this type. I would like to concatenate by groups of 50 files, so in the end I have 7 files (6 files of 50 and 1 of 25).
I have been thinking in using cat
but I can't see an option to select a number of files from a list. I could do this with Python, but just wondering if some Unix command line utility does it more directly.
Thanks.
bash command cat
I have a bunch of files named uv_set_XXXXXXXX
where the 6 Xs stand for the usual format year, month and day. Imagine I have 325 files of this type. I would like to concatenate by groups of 50 files, so in the end I have 7 files (6 files of 50 and 1 of 25).
I have been thinking in using cat
but I can't see an option to select a number of files from a list. I could do this with Python, but just wondering if some Unix command line utility does it more directly.
Thanks.
bash command cat
bash command cat
asked Nov 19 at 12:25
David
335213
335213
do you want the file names to be ordered on date (YYMMDD?) before splitting by 50
– stack0114106
Nov 19 at 13:21
@stack0114106: no, it's not necessary at this moment. Thank you.
– David
Nov 19 at 13:30
add a comment |
do you want the file names to be ordered on date (YYMMDD?) before splitting by 50
– stack0114106
Nov 19 at 13:21
@stack0114106: no, it's not necessary at this moment. Thank you.
– David
Nov 19 at 13:30
do you want the file names to be ordered on date (YYMMDD?) before splitting by 50
– stack0114106
Nov 19 at 13:21
do you want the file names to be ordered on date (YYMMDD?) before splitting by 50
– stack0114106
Nov 19 at 13:21
@stack0114106: no, it's not necessary at this moment. Thank you.
– David
Nov 19 at 13:30
@stack0114106: no, it's not necessary at this moment. Thank you.
– David
Nov 19 at 13:30
add a comment |
2 Answers
2
active
oldest
votes
up vote
3
down vote
accepted
With GNU parallel you can use the following command
parallel -n50 "cat {} > out{#}" ::: uv_set_*
This will merge the first 50 files into out1
, the next 50 files into out2
, and so on.
add a comment |
up vote
1
down vote
I would just break down and do this in Awk.
awk 'FNR==1 && (++i%50 == 0) {
if(NR>1) close p;
p = "dest_" ++j }
{ print >p }' uv_set_????????
This creates files dest_1
through dest_7
, the first 6 with 50 files in each and the last with the remainder.
Closing the previous file is necessary because the system only allows Awk to have a limited number of open file handles (though the limit is typically higher than 7 so it's probably not important in your example).
Thinking out loud dept, just to prevent anyone else from wasting time on repeating this dead end.
You could use xargs -L 50 cat
to concatenate 50 files at a time, but there is no simple way to pass in a new redirection for standard output for each invocation. You could try to hack your way around that with something like
# XXX Do not use: incomplete
printf '%sn' uv_set_???????? |
xargs -L 50 sh -c 'cat "$@" > ... something' _
but I can't come up with elegant way to have a different something
each time.
Maybe you could redirect to> starting_at_"$1"
.
– Socowi
Nov 19 at 12:50
Yeah, I was toying with exactly that idea, but it simply sucked too much.
– tripleee
Nov 19 at 12:50
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
With GNU parallel you can use the following command
parallel -n50 "cat {} > out{#}" ::: uv_set_*
This will merge the first 50 files into out1
, the next 50 files into out2
, and so on.
add a comment |
up vote
3
down vote
accepted
With GNU parallel you can use the following command
parallel -n50 "cat {} > out{#}" ::: uv_set_*
This will merge the first 50 files into out1
, the next 50 files into out2
, and so on.
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
With GNU parallel you can use the following command
parallel -n50 "cat {} > out{#}" ::: uv_set_*
This will merge the first 50 files into out1
, the next 50 files into out2
, and so on.
With GNU parallel you can use the following command
parallel -n50 "cat {} > out{#}" ::: uv_set_*
This will merge the first 50 files into out1
, the next 50 files into out2
, and so on.
answered Nov 19 at 12:40
Socowi
5,2832724
5,2832724
add a comment |
add a comment |
up vote
1
down vote
I would just break down and do this in Awk.
awk 'FNR==1 && (++i%50 == 0) {
if(NR>1) close p;
p = "dest_" ++j }
{ print >p }' uv_set_????????
This creates files dest_1
through dest_7
, the first 6 with 50 files in each and the last with the remainder.
Closing the previous file is necessary because the system only allows Awk to have a limited number of open file handles (though the limit is typically higher than 7 so it's probably not important in your example).
Thinking out loud dept, just to prevent anyone else from wasting time on repeating this dead end.
You could use xargs -L 50 cat
to concatenate 50 files at a time, but there is no simple way to pass in a new redirection for standard output for each invocation. You could try to hack your way around that with something like
# XXX Do not use: incomplete
printf '%sn' uv_set_???????? |
xargs -L 50 sh -c 'cat "$@" > ... something' _
but I can't come up with elegant way to have a different something
each time.
Maybe you could redirect to> starting_at_"$1"
.
– Socowi
Nov 19 at 12:50
Yeah, I was toying with exactly that idea, but it simply sucked too much.
– tripleee
Nov 19 at 12:50
add a comment |
up vote
1
down vote
I would just break down and do this in Awk.
awk 'FNR==1 && (++i%50 == 0) {
if(NR>1) close p;
p = "dest_" ++j }
{ print >p }' uv_set_????????
This creates files dest_1
through dest_7
, the first 6 with 50 files in each and the last with the remainder.
Closing the previous file is necessary because the system only allows Awk to have a limited number of open file handles (though the limit is typically higher than 7 so it's probably not important in your example).
Thinking out loud dept, just to prevent anyone else from wasting time on repeating this dead end.
You could use xargs -L 50 cat
to concatenate 50 files at a time, but there is no simple way to pass in a new redirection for standard output for each invocation. You could try to hack your way around that with something like
# XXX Do not use: incomplete
printf '%sn' uv_set_???????? |
xargs -L 50 sh -c 'cat "$@" > ... something' _
but I can't come up with elegant way to have a different something
each time.
Maybe you could redirect to> starting_at_"$1"
.
– Socowi
Nov 19 at 12:50
Yeah, I was toying with exactly that idea, but it simply sucked too much.
– tripleee
Nov 19 at 12:50
add a comment |
up vote
1
down vote
up vote
1
down vote
I would just break down and do this in Awk.
awk 'FNR==1 && (++i%50 == 0) {
if(NR>1) close p;
p = "dest_" ++j }
{ print >p }' uv_set_????????
This creates files dest_1
through dest_7
, the first 6 with 50 files in each and the last with the remainder.
Closing the previous file is necessary because the system only allows Awk to have a limited number of open file handles (though the limit is typically higher than 7 so it's probably not important in your example).
Thinking out loud dept, just to prevent anyone else from wasting time on repeating this dead end.
You could use xargs -L 50 cat
to concatenate 50 files at a time, but there is no simple way to pass in a new redirection for standard output for each invocation. You could try to hack your way around that with something like
# XXX Do not use: incomplete
printf '%sn' uv_set_???????? |
xargs -L 50 sh -c 'cat "$@" > ... something' _
but I can't come up with elegant way to have a different something
each time.
I would just break down and do this in Awk.
awk 'FNR==1 && (++i%50 == 0) {
if(NR>1) close p;
p = "dest_" ++j }
{ print >p }' uv_set_????????
This creates files dest_1
through dest_7
, the first 6 with 50 files in each and the last with the remainder.
Closing the previous file is necessary because the system only allows Awk to have a limited number of open file handles (though the limit is typically higher than 7 so it's probably not important in your example).
Thinking out loud dept, just to prevent anyone else from wasting time on repeating this dead end.
You could use xargs -L 50 cat
to concatenate 50 files at a time, but there is no simple way to pass in a new redirection for standard output for each invocation. You could try to hack your way around that with something like
# XXX Do not use: incomplete
printf '%sn' uv_set_???????? |
xargs -L 50 sh -c 'cat "$@" > ... something' _
but I can't come up with elegant way to have a different something
each time.
edited Nov 19 at 12:50
answered Nov 19 at 12:44
tripleee
86.9k12121176
86.9k12121176
Maybe you could redirect to> starting_at_"$1"
.
– Socowi
Nov 19 at 12:50
Yeah, I was toying with exactly that idea, but it simply sucked too much.
– tripleee
Nov 19 at 12:50
add a comment |
Maybe you could redirect to> starting_at_"$1"
.
– Socowi
Nov 19 at 12:50
Yeah, I was toying with exactly that idea, but it simply sucked too much.
– tripleee
Nov 19 at 12:50
Maybe you could redirect to
> starting_at_"$1"
.– Socowi
Nov 19 at 12:50
Maybe you could redirect to
> starting_at_"$1"
.– Socowi
Nov 19 at 12:50
Yeah, I was toying with exactly that idea, but it simply sucked too much.
– tripleee
Nov 19 at 12:50
Yeah, I was toying with exactly that idea, but it simply sucked too much.
– tripleee
Nov 19 at 12:50
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53374610%2fconcatenate-specific-number-of-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
do you want the file names to be ordered on date (YYMMDD?) before splitting by 50
– stack0114106
Nov 19 at 13:21
@stack0114106: no, it's not necessary at this moment. Thank you.
– David
Nov 19 at 13:30