Speed up find command/alternates
up vote
0
down vote
favorite
I am trying to see how I can speed up the below script that reports disk_usage. The line in bold with the find command is the problematic line that I am trying to speed up. This script is run on directories that have over 6-7TB of data and it takes 16-18hrs. However I want it to run in under 8hrs. Can someone please suggest alternate ways to modify this script?
# -disk_check.csh takes dir name as a mandatory argument and an options <num> or -verbose as a second argument.
# Ex1: disk_check <dir_name> - Reports out the disk usage per user and the total disk consumption
# Ex2: disk_check <dir_name> -verbose -Along with the above, it also lists all files by size in the given directory
# Ex3: disk_check <dir_name> -<num> -Similar to Ex2, But here it reports out the top <num> files by size in the given directory
if ($#argv == 0) then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <verbose>"
echo " verbose gives a list of all files per individual sorted by size"
exit 0
endif
set cwd = $argv[1]
if ($cwd =~ "-help") then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <-verbose>"
echo " -verbose gives a list of all files per individual sorted by size"
exit 0
endif
if ($#argv > 1) then
set opt = $argv[2]
#echo "opt : $opt"
endif
if ( -d $cwd ) then
set ava = `df -h $cwd | tail -1 | awk '{print $1'}`
set tot = `df -h $cwd | tail -1 | awk '{print $2'}`
set ad = `df -h $cwd | tail -1 | awk '{print $3'}`
set pcu = `df -h $cwd | tail -1 | awk '{print $4'}`
echo ""
echo "Summary for dir ${cwd}: $tot Used (${pcu})"
echo "-----------------------------------------------------------------------------"
echo " Total Volume $ava"
echo " Available on disk $ad "
echo " Percentage used $pcu"
echo ""
echo "Summary by User:"
printf "%sUser%15sSize%10sCountn" ""
echo "---------------------------------------------"
**time find $cwd -type f -printf "%u %sn" | awk '{user[$1]+=$2;count[$1]++}; END{ for( i in user) printf "%s%-13s%5s%-0.2f%s%5s%7sn","", i, "", user[i]/1024**3,"GB", "", count[i]}'| sort -nk2 -r**
if ($#argv > 1) then
if ($opt =~ "-verbose") then
echo "nDetail, Sorted by size"
printf " User%15sFile%15sSizen" ""
echo "---------------------------------------------------"
find $cwd -type f -not -path '*/.*' -printf "%-13u | %-50p | %-10s n" | sort -nk5 -r
endif
linux shell unix
New contributor
add a comment |
up vote
0
down vote
favorite
I am trying to see how I can speed up the below script that reports disk_usage. The line in bold with the find command is the problematic line that I am trying to speed up. This script is run on directories that have over 6-7TB of data and it takes 16-18hrs. However I want it to run in under 8hrs. Can someone please suggest alternate ways to modify this script?
# -disk_check.csh takes dir name as a mandatory argument and an options <num> or -verbose as a second argument.
# Ex1: disk_check <dir_name> - Reports out the disk usage per user and the total disk consumption
# Ex2: disk_check <dir_name> -verbose -Along with the above, it also lists all files by size in the given directory
# Ex3: disk_check <dir_name> -<num> -Similar to Ex2, But here it reports out the top <num> files by size in the given directory
if ($#argv == 0) then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <verbose>"
echo " verbose gives a list of all files per individual sorted by size"
exit 0
endif
set cwd = $argv[1]
if ($cwd =~ "-help") then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <-verbose>"
echo " -verbose gives a list of all files per individual sorted by size"
exit 0
endif
if ($#argv > 1) then
set opt = $argv[2]
#echo "opt : $opt"
endif
if ( -d $cwd ) then
set ava = `df -h $cwd | tail -1 | awk '{print $1'}`
set tot = `df -h $cwd | tail -1 | awk '{print $2'}`
set ad = `df -h $cwd | tail -1 | awk '{print $3'}`
set pcu = `df -h $cwd | tail -1 | awk '{print $4'}`
echo ""
echo "Summary for dir ${cwd}: $tot Used (${pcu})"
echo "-----------------------------------------------------------------------------"
echo " Total Volume $ava"
echo " Available on disk $ad "
echo " Percentage used $pcu"
echo ""
echo "Summary by User:"
printf "%sUser%15sSize%10sCountn" ""
echo "---------------------------------------------"
**time find $cwd -type f -printf "%u %sn" | awk '{user[$1]+=$2;count[$1]++}; END{ for( i in user) printf "%s%-13s%5s%-0.2f%s%5s%7sn","", i, "", user[i]/1024**3,"GB", "", count[i]}'| sort -nk2 -r**
if ($#argv > 1) then
if ($opt =~ "-verbose") then
echo "nDetail, Sorted by size"
printf " User%15sFile%15sSizen" ""
echo "---------------------------------------------------"
find $cwd -type f -not -path '*/.*' -printf "%-13u | %-50p | %-10s n" | sort -nk5 -r
endif
linux shell unix
New contributor
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am trying to see how I can speed up the below script that reports disk_usage. The line in bold with the find command is the problematic line that I am trying to speed up. This script is run on directories that have over 6-7TB of data and it takes 16-18hrs. However I want it to run in under 8hrs. Can someone please suggest alternate ways to modify this script?
# -disk_check.csh takes dir name as a mandatory argument and an options <num> or -verbose as a second argument.
# Ex1: disk_check <dir_name> - Reports out the disk usage per user and the total disk consumption
# Ex2: disk_check <dir_name> -verbose -Along with the above, it also lists all files by size in the given directory
# Ex3: disk_check <dir_name> -<num> -Similar to Ex2, But here it reports out the top <num> files by size in the given directory
if ($#argv == 0) then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <verbose>"
echo " verbose gives a list of all files per individual sorted by size"
exit 0
endif
set cwd = $argv[1]
if ($cwd =~ "-help") then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <-verbose>"
echo " -verbose gives a list of all files per individual sorted by size"
exit 0
endif
if ($#argv > 1) then
set opt = $argv[2]
#echo "opt : $opt"
endif
if ( -d $cwd ) then
set ava = `df -h $cwd | tail -1 | awk '{print $1'}`
set tot = `df -h $cwd | tail -1 | awk '{print $2'}`
set ad = `df -h $cwd | tail -1 | awk '{print $3'}`
set pcu = `df -h $cwd | tail -1 | awk '{print $4'}`
echo ""
echo "Summary for dir ${cwd}: $tot Used (${pcu})"
echo "-----------------------------------------------------------------------------"
echo " Total Volume $ava"
echo " Available on disk $ad "
echo " Percentage used $pcu"
echo ""
echo "Summary by User:"
printf "%sUser%15sSize%10sCountn" ""
echo "---------------------------------------------"
**time find $cwd -type f -printf "%u %sn" | awk '{user[$1]+=$2;count[$1]++}; END{ for( i in user) printf "%s%-13s%5s%-0.2f%s%5s%7sn","", i, "", user[i]/1024**3,"GB", "", count[i]}'| sort -nk2 -r**
if ($#argv > 1) then
if ($opt =~ "-verbose") then
echo "nDetail, Sorted by size"
printf " User%15sFile%15sSizen" ""
echo "---------------------------------------------------"
find $cwd -type f -not -path '*/.*' -printf "%-13u | %-50p | %-10s n" | sort -nk5 -r
endif
linux shell unix
New contributor
I am trying to see how I can speed up the below script that reports disk_usage. The line in bold with the find command is the problematic line that I am trying to speed up. This script is run on directories that have over 6-7TB of data and it takes 16-18hrs. However I want it to run in under 8hrs. Can someone please suggest alternate ways to modify this script?
# -disk_check.csh takes dir name as a mandatory argument and an options <num> or -verbose as a second argument.
# Ex1: disk_check <dir_name> - Reports out the disk usage per user and the total disk consumption
# Ex2: disk_check <dir_name> -verbose -Along with the above, it also lists all files by size in the given directory
# Ex3: disk_check <dir_name> -<num> -Similar to Ex2, But here it reports out the top <num> files by size in the given directory
if ($#argv == 0) then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <verbose>"
echo " verbose gives a list of all files per individual sorted by size"
exit 0
endif
set cwd = $argv[1]
if ($cwd =~ "-help") then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <-verbose>"
echo " -verbose gives a list of all files per individual sorted by size"
exit 0
endif
if ($#argv > 1) then
set opt = $argv[2]
#echo "opt : $opt"
endif
if ( -d $cwd ) then
set ava = `df -h $cwd | tail -1 | awk '{print $1'}`
set tot = `df -h $cwd | tail -1 | awk '{print $2'}`
set ad = `df -h $cwd | tail -1 | awk '{print $3'}`
set pcu = `df -h $cwd | tail -1 | awk '{print $4'}`
echo ""
echo "Summary for dir ${cwd}: $tot Used (${pcu})"
echo "-----------------------------------------------------------------------------"
echo " Total Volume $ava"
echo " Available on disk $ad "
echo " Percentage used $pcu"
echo ""
echo "Summary by User:"
printf "%sUser%15sSize%10sCountn" ""
echo "---------------------------------------------"
**time find $cwd -type f -printf "%u %sn" | awk '{user[$1]+=$2;count[$1]++}; END{ for( i in user) printf "%s%-13s%5s%-0.2f%s%5s%7sn","", i, "", user[i]/1024**3,"GB", "", count[i]}'| sort -nk2 -r**
if ($#argv > 1) then
if ($opt =~ "-verbose") then
echo "nDetail, Sorted by size"
printf " User%15sFile%15sSizen" ""
echo "---------------------------------------------------"
find $cwd -type f -not -path '*/.*' -printf "%-13u | %-50p | %-10s n" | sort -nk5 -r
endif
linux shell unix
linux shell unix
New contributor
New contributor
New contributor
asked 6 mins ago
user186743
1
1
New contributor
New contributor
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
user186743 is a new contributor. Be nice, and check out our Code of Conduct.
user186743 is a new contributor. Be nice, and check out our Code of Conduct.
user186743 is a new contributor. Be nice, and check out our Code of Conduct.
user186743 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209342%2fspeed-up-find-command-alternates%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown