Speed up find command/alternates











up vote
0
down vote

favorite












I am trying to see how I can speed up the below script that reports disk_usage. The line in bold with the find command is the problematic line that I am trying to speed up. This script is run on directories that have over 6-7TB of data and it takes 16-18hrs. However I want it to run in under 8hrs. Can someone please suggest alternate ways to modify this script?



# -disk_check.csh takes dir name as a mandatory argument and an options <num> or -verbose as a second argument.
# Ex1: disk_check <dir_name> - Reports out the disk usage per user and the total disk consumption
# Ex2: disk_check <dir_name> -verbose -Along with the above, it also lists all files by size in the given directory
# Ex3: disk_check <dir_name> -<num> -Similar to Ex2, But here it reports out the top <num> files by size in the given directory


if ($#argv == 0) then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <verbose>"
echo " verbose gives a list of all files per individual sorted by size"
exit 0
endif

set cwd = $argv[1]
if ($cwd =~ "-help") then
echo " Error : Dir path missing"
echo " Syntax : disk_check <dir-name> <-verbose>"
echo " -verbose gives a list of all files per individual sorted by size"
exit 0
endif

if ($#argv > 1) then
set opt = $argv[2]
#echo "opt : $opt"
endif

if ( -d $cwd ) then

set ava = `df -h $cwd | tail -1 | awk '{print $1'}`
set tot = `df -h $cwd | tail -1 | awk '{print $2'}`
set ad = `df -h $cwd | tail -1 | awk '{print $3'}`
set pcu = `df -h $cwd | tail -1 | awk '{print $4'}`

echo ""
echo "Summary for dir ${cwd}: $tot Used (${pcu})"
echo "-----------------------------------------------------------------------------"
echo " Total Volume $ava"
echo " Available on disk $ad "
echo " Percentage used $pcu"
echo ""
echo "Summary by User:"
printf "%sUser%15sSize%10sCountn" ""
echo "---------------------------------------------"

**time find $cwd -type f -printf "%u %sn" | awk '{user[$1]+=$2;count[$1]++}; END{ for( i in user) printf "%s%-13s%5s%-0.2f%s%5s%7sn","", i, "", user[i]/1024**3,"GB", "", count[i]}'| sort -nk2 -r**


if ($#argv > 1) then
if ($opt =~ "-verbose") then
echo "nDetail, Sorted by size"
printf " User%15sFile%15sSizen" ""
echo "---------------------------------------------------"
find $cwd -type f -not -path '*/.*' -printf "%-13u | %-50p | %-10s n" | sort -nk5 -r
endif








share







New contributor




user186743 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    0
    down vote

    favorite












    I am trying to see how I can speed up the below script that reports disk_usage. The line in bold with the find command is the problematic line that I am trying to speed up. This script is run on directories that have over 6-7TB of data and it takes 16-18hrs. However I want it to run in under 8hrs. Can someone please suggest alternate ways to modify this script?



    # -disk_check.csh takes dir name as a mandatory argument and an options <num> or -verbose as a second argument.
    # Ex1: disk_check <dir_name> - Reports out the disk usage per user and the total disk consumption
    # Ex2: disk_check <dir_name> -verbose -Along with the above, it also lists all files by size in the given directory
    # Ex3: disk_check <dir_name> -<num> -Similar to Ex2, But here it reports out the top <num> files by size in the given directory


    if ($#argv == 0) then
    echo " Error : Dir path missing"
    echo " Syntax : disk_check <dir-name> <verbose>"
    echo " verbose gives a list of all files per individual sorted by size"
    exit 0
    endif

    set cwd = $argv[1]
    if ($cwd =~ "-help") then
    echo " Error : Dir path missing"
    echo " Syntax : disk_check <dir-name> <-verbose>"
    echo " -verbose gives a list of all files per individual sorted by size"
    exit 0
    endif

    if ($#argv > 1) then
    set opt = $argv[2]
    #echo "opt : $opt"
    endif

    if ( -d $cwd ) then

    set ava = `df -h $cwd | tail -1 | awk '{print $1'}`
    set tot = `df -h $cwd | tail -1 | awk '{print $2'}`
    set ad = `df -h $cwd | tail -1 | awk '{print $3'}`
    set pcu = `df -h $cwd | tail -1 | awk '{print $4'}`

    echo ""
    echo "Summary for dir ${cwd}: $tot Used (${pcu})"
    echo "-----------------------------------------------------------------------------"
    echo " Total Volume $ava"
    echo " Available on disk $ad "
    echo " Percentage used $pcu"
    echo ""
    echo "Summary by User:"
    printf "%sUser%15sSize%10sCountn" ""
    echo "---------------------------------------------"

    **time find $cwd -type f -printf "%u %sn" | awk '{user[$1]+=$2;count[$1]++}; END{ for( i in user) printf "%s%-13s%5s%-0.2f%s%5s%7sn","", i, "", user[i]/1024**3,"GB", "", count[i]}'| sort -nk2 -r**


    if ($#argv > 1) then
    if ($opt =~ "-verbose") then
    echo "nDetail, Sorted by size"
    printf " User%15sFile%15sSizen" ""
    echo "---------------------------------------------------"
    find $cwd -type f -not -path '*/.*' -printf "%-13u | %-50p | %-10s n" | sort -nk5 -r
    endif








    share







    New contributor




    user186743 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I am trying to see how I can speed up the below script that reports disk_usage. The line in bold with the find command is the problematic line that I am trying to speed up. This script is run on directories that have over 6-7TB of data and it takes 16-18hrs. However I want it to run in under 8hrs. Can someone please suggest alternate ways to modify this script?



      # -disk_check.csh takes dir name as a mandatory argument and an options <num> or -verbose as a second argument.
      # Ex1: disk_check <dir_name> - Reports out the disk usage per user and the total disk consumption
      # Ex2: disk_check <dir_name> -verbose -Along with the above, it also lists all files by size in the given directory
      # Ex3: disk_check <dir_name> -<num> -Similar to Ex2, But here it reports out the top <num> files by size in the given directory


      if ($#argv == 0) then
      echo " Error : Dir path missing"
      echo " Syntax : disk_check <dir-name> <verbose>"
      echo " verbose gives a list of all files per individual sorted by size"
      exit 0
      endif

      set cwd = $argv[1]
      if ($cwd =~ "-help") then
      echo " Error : Dir path missing"
      echo " Syntax : disk_check <dir-name> <-verbose>"
      echo " -verbose gives a list of all files per individual sorted by size"
      exit 0
      endif

      if ($#argv > 1) then
      set opt = $argv[2]
      #echo "opt : $opt"
      endif

      if ( -d $cwd ) then

      set ava = `df -h $cwd | tail -1 | awk '{print $1'}`
      set tot = `df -h $cwd | tail -1 | awk '{print $2'}`
      set ad = `df -h $cwd | tail -1 | awk '{print $3'}`
      set pcu = `df -h $cwd | tail -1 | awk '{print $4'}`

      echo ""
      echo "Summary for dir ${cwd}: $tot Used (${pcu})"
      echo "-----------------------------------------------------------------------------"
      echo " Total Volume $ava"
      echo " Available on disk $ad "
      echo " Percentage used $pcu"
      echo ""
      echo "Summary by User:"
      printf "%sUser%15sSize%10sCountn" ""
      echo "---------------------------------------------"

      **time find $cwd -type f -printf "%u %sn" | awk '{user[$1]+=$2;count[$1]++}; END{ for( i in user) printf "%s%-13s%5s%-0.2f%s%5s%7sn","", i, "", user[i]/1024**3,"GB", "", count[i]}'| sort -nk2 -r**


      if ($#argv > 1) then
      if ($opt =~ "-verbose") then
      echo "nDetail, Sorted by size"
      printf " User%15sFile%15sSizen" ""
      echo "---------------------------------------------------"
      find $cwd -type f -not -path '*/.*' -printf "%-13u | %-50p | %-10s n" | sort -nk5 -r
      endif








      share







      New contributor




      user186743 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I am trying to see how I can speed up the below script that reports disk_usage. The line in bold with the find command is the problematic line that I am trying to speed up. This script is run on directories that have over 6-7TB of data and it takes 16-18hrs. However I want it to run in under 8hrs. Can someone please suggest alternate ways to modify this script?



      # -disk_check.csh takes dir name as a mandatory argument and an options <num> or -verbose as a second argument.
      # Ex1: disk_check <dir_name> - Reports out the disk usage per user and the total disk consumption
      # Ex2: disk_check <dir_name> -verbose -Along with the above, it also lists all files by size in the given directory
      # Ex3: disk_check <dir_name> -<num> -Similar to Ex2, But here it reports out the top <num> files by size in the given directory


      if ($#argv == 0) then
      echo " Error : Dir path missing"
      echo " Syntax : disk_check <dir-name> <verbose>"
      echo " verbose gives a list of all files per individual sorted by size"
      exit 0
      endif

      set cwd = $argv[1]
      if ($cwd =~ "-help") then
      echo " Error : Dir path missing"
      echo " Syntax : disk_check <dir-name> <-verbose>"
      echo " -verbose gives a list of all files per individual sorted by size"
      exit 0
      endif

      if ($#argv > 1) then
      set opt = $argv[2]
      #echo "opt : $opt"
      endif

      if ( -d $cwd ) then

      set ava = `df -h $cwd | tail -1 | awk '{print $1'}`
      set tot = `df -h $cwd | tail -1 | awk '{print $2'}`
      set ad = `df -h $cwd | tail -1 | awk '{print $3'}`
      set pcu = `df -h $cwd | tail -1 | awk '{print $4'}`

      echo ""
      echo "Summary for dir ${cwd}: $tot Used (${pcu})"
      echo "-----------------------------------------------------------------------------"
      echo " Total Volume $ava"
      echo " Available on disk $ad "
      echo " Percentage used $pcu"
      echo ""
      echo "Summary by User:"
      printf "%sUser%15sSize%10sCountn" ""
      echo "---------------------------------------------"

      **time find $cwd -type f -printf "%u %sn" | awk '{user[$1]+=$2;count[$1]++}; END{ for( i in user) printf "%s%-13s%5s%-0.2f%s%5s%7sn","", i, "", user[i]/1024**3,"GB", "", count[i]}'| sort -nk2 -r**


      if ($#argv > 1) then
      if ($opt =~ "-verbose") then
      echo "nDetail, Sorted by size"
      printf " User%15sFile%15sSizen" ""
      echo "---------------------------------------------------"
      find $cwd -type f -not -path '*/.*' -printf "%-13u | %-50p | %-10s n" | sort -nk5 -r
      endif






      linux shell unix





      share







      New contributor




      user186743 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.










      share







      New contributor




      user186743 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share



      share






      New contributor




      user186743 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 6 mins ago









      user186743

      1




      1




      New contributor




      user186743 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      user186743 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      user186743 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.



























          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          user186743 is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209342%2fspeed-up-find-command-alternates%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          user186743 is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          user186743 is a new contributor. Be nice, and check out our Code of Conduct.













          user186743 is a new contributor. Be nice, and check out our Code of Conduct.












          user186743 is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209342%2fspeed-up-find-command-alternates%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          404 Error Contact Form 7 ajax form submitting

          How to know if a Active Directory user can login interactively

          TypeError: fit_transform() missing 1 required positional argument: 'X'