Enabling dynamic allocation on spark on YARN mode












3















This question is similar to this but there was no answer.



I am trying to enable dynamic allocation in Spark in YARN mode. I have 11 node cluster with 1 master node and 10 worker nodes. I am following below link for instructions:



For setup in YARN:
http://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service



Config variables needs to be set in spark-defaults.conf: https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior



I have also taken reference from below link and few other resources:
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-dynamic-allocation.html#spark.dynamicAllocation.testing



Here are the steps I am doing:





  1. Setting up config variables in spark-defaults.conf.
    My spark-defaults.conf related to dynamic allocation and shuffle service is as:



    spark.dynamicAllocation.enabled=true
    spark.shuffle.service.enabled=true
    spark.shuffle.service.port=7337



  2. Making changes in yarn-site.xml



    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>spark_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.auxservices.spark_shuffle.class</name>
    <value>org.apache.spark.network.yarn.YarnShuffleService</value>
    </property>
    <property>
    <name>yarn.nodemanager.recovery.enabled</name>
    <value>true</value>
    </property>
    <property>
    <name>yarn.application.classpath</name>
    <value> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/common/*,$HADOOP_MAPRED_HOME/share/hadoop/common/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/tools/*,$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/client/*,$HADOOP_MAPRED_HOME/share/hadoop/client/lib/*,/home/hadoop/spark/common/network-yarn/target/scala-2.11/spark-2.2.2-SNAPSHOT-yarn-shuffle.jar </value>
    </property>


    All these steps are replicated in all worker nodes i.e spark-defaults.conf has the above mentioned values and yarn-site.xml has these properties. I have made sure that /home/hadoop/spark/common/network-yarn/target/scala-2.11/spark-2.2.2-SNAPSHOT-yarn-shuffle.jar exists in all worker nodes.



  3. Then I am running $SPARK_HOME/sbin/start-shuffle-service.sh in worker nodes and master node. In master node, I am restarting the YARN using stop-yarn.sh and then start-yarn.sh


  4. Then I am doing YARN node -list -all to see the worker nodes but I am not able to see any node



  5. When I am disabling the property



    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>spark_shuffle</value>
    </property>


    I can see all the worker nodes as normal so it seems like shuffle service is not properly configured.












share|improve this question





























    3















    This question is similar to this but there was no answer.



    I am trying to enable dynamic allocation in Spark in YARN mode. I have 11 node cluster with 1 master node and 10 worker nodes. I am following below link for instructions:



    For setup in YARN:
    http://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service



    Config variables needs to be set in spark-defaults.conf: https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
    https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior



    I have also taken reference from below link and few other resources:
    https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-dynamic-allocation.html#spark.dynamicAllocation.testing



    Here are the steps I am doing:





    1. Setting up config variables in spark-defaults.conf.
      My spark-defaults.conf related to dynamic allocation and shuffle service is as:



      spark.dynamicAllocation.enabled=true
      spark.shuffle.service.enabled=true
      spark.shuffle.service.port=7337



    2. Making changes in yarn-site.xml



      <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>spark_shuffle</value>
      </property>
      <property>
      <name>yarn.nodemanager.auxservices.spark_shuffle.class</name>
      <value>org.apache.spark.network.yarn.YarnShuffleService</value>
      </property>
      <property>
      <name>yarn.nodemanager.recovery.enabled</name>
      <value>true</value>
      </property>
      <property>
      <name>yarn.application.classpath</name>
      <value> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/common/*,$HADOOP_MAPRED_HOME/share/hadoop/common/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/tools/*,$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/client/*,$HADOOP_MAPRED_HOME/share/hadoop/client/lib/*,/home/hadoop/spark/common/network-yarn/target/scala-2.11/spark-2.2.2-SNAPSHOT-yarn-shuffle.jar </value>
      </property>


      All these steps are replicated in all worker nodes i.e spark-defaults.conf has the above mentioned values and yarn-site.xml has these properties. I have made sure that /home/hadoop/spark/common/network-yarn/target/scala-2.11/spark-2.2.2-SNAPSHOT-yarn-shuffle.jar exists in all worker nodes.



    3. Then I am running $SPARK_HOME/sbin/start-shuffle-service.sh in worker nodes and master node. In master node, I am restarting the YARN using stop-yarn.sh and then start-yarn.sh


    4. Then I am doing YARN node -list -all to see the worker nodes but I am not able to see any node



    5. When I am disabling the property



      <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>spark_shuffle</value>
      </property>


      I can see all the worker nodes as normal so it seems like shuffle service is not properly configured.












    share|improve this question



























      3












      3








      3








      This question is similar to this but there was no answer.



      I am trying to enable dynamic allocation in Spark in YARN mode. I have 11 node cluster with 1 master node and 10 worker nodes. I am following below link for instructions:



      For setup in YARN:
      http://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service



      Config variables needs to be set in spark-defaults.conf: https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
      https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior



      I have also taken reference from below link and few other resources:
      https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-dynamic-allocation.html#spark.dynamicAllocation.testing



      Here are the steps I am doing:





      1. Setting up config variables in spark-defaults.conf.
        My spark-defaults.conf related to dynamic allocation and shuffle service is as:



        spark.dynamicAllocation.enabled=true
        spark.shuffle.service.enabled=true
        spark.shuffle.service.port=7337



      2. Making changes in yarn-site.xml



        <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>spark_shuffle</value>
        </property>
        <property>
        <name>yarn.nodemanager.auxservices.spark_shuffle.class</name>
        <value>org.apache.spark.network.yarn.YarnShuffleService</value>
        </property>
        <property>
        <name>yarn.nodemanager.recovery.enabled</name>
        <value>true</value>
        </property>
        <property>
        <name>yarn.application.classpath</name>
        <value> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/common/*,$HADOOP_MAPRED_HOME/share/hadoop/common/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/tools/*,$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/client/*,$HADOOP_MAPRED_HOME/share/hadoop/client/lib/*,/home/hadoop/spark/common/network-yarn/target/scala-2.11/spark-2.2.2-SNAPSHOT-yarn-shuffle.jar </value>
        </property>


        All these steps are replicated in all worker nodes i.e spark-defaults.conf has the above mentioned values and yarn-site.xml has these properties. I have made sure that /home/hadoop/spark/common/network-yarn/target/scala-2.11/spark-2.2.2-SNAPSHOT-yarn-shuffle.jar exists in all worker nodes.



      3. Then I am running $SPARK_HOME/sbin/start-shuffle-service.sh in worker nodes and master node. In master node, I am restarting the YARN using stop-yarn.sh and then start-yarn.sh


      4. Then I am doing YARN node -list -all to see the worker nodes but I am not able to see any node



      5. When I am disabling the property



        <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>spark_shuffle</value>
        </property>


        I can see all the worker nodes as normal so it seems like shuffle service is not properly configured.












      share|improve this question
















      This question is similar to this but there was no answer.



      I am trying to enable dynamic allocation in Spark in YARN mode. I have 11 node cluster with 1 master node and 10 worker nodes. I am following below link for instructions:



      For setup in YARN:
      http://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service



      Config variables needs to be set in spark-defaults.conf: https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
      https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior



      I have also taken reference from below link and few other resources:
      https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-dynamic-allocation.html#spark.dynamicAllocation.testing



      Here are the steps I am doing:





      1. Setting up config variables in spark-defaults.conf.
        My spark-defaults.conf related to dynamic allocation and shuffle service is as:



        spark.dynamicAllocation.enabled=true
        spark.shuffle.service.enabled=true
        spark.shuffle.service.port=7337



      2. Making changes in yarn-site.xml



        <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>spark_shuffle</value>
        </property>
        <property>
        <name>yarn.nodemanager.auxservices.spark_shuffle.class</name>
        <value>org.apache.spark.network.yarn.YarnShuffleService</value>
        </property>
        <property>
        <name>yarn.nodemanager.recovery.enabled</name>
        <value>true</value>
        </property>
        <property>
        <name>yarn.application.classpath</name>
        <value> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/common/*,$HADOOP_MAPRED_HOME/share/hadoop/common/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/tools/*,$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/client/*,$HADOOP_MAPRED_HOME/share/hadoop/client/lib/*,/home/hadoop/spark/common/network-yarn/target/scala-2.11/spark-2.2.2-SNAPSHOT-yarn-shuffle.jar </value>
        </property>


        All these steps are replicated in all worker nodes i.e spark-defaults.conf has the above mentioned values and yarn-site.xml has these properties. I have made sure that /home/hadoop/spark/common/network-yarn/target/scala-2.11/spark-2.2.2-SNAPSHOT-yarn-shuffle.jar exists in all worker nodes.



      3. Then I am running $SPARK_HOME/sbin/start-shuffle-service.sh in worker nodes and master node. In master node, I am restarting the YARN using stop-yarn.sh and then start-yarn.sh


      4. Then I am doing YARN node -list -all to see the worker nodes but I am not able to see any node



      5. When I am disabling the property



        <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>spark_shuffle</value>
        </property>


        I can see all the worker nodes as normal so it seems like shuffle service is not properly configured.









      apache-spark yarn dynamic-memory-allocation






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 24 '18 at 13:18









      user10465355

      1,9432416




      1,9432416










      asked Nov 24 '18 at 12:40









      Divay Darshan DDDivay Darshan DD

      315




      315
























          0






          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458237%2fenabling-dynamic-allocation-on-spark-on-yarn-mode%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458237%2fenabling-dynamic-allocation-on-spark-on-yarn-mode%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          404 Error Contact Form 7 ajax form submitting

          How to know if a Active Directory user can login interactively

          TypeError: fit_transform() missing 1 required positional argument: 'X'