Puppet: Recovering from failed run











up vote
0
down vote

favorite












Consider the following code:



file { '/etc/systemd/system/docker.service.d/http-proxy.conf':
ensure => 'present',
owner => 'root',
group => 'root',
mode => '644',
content => '[Service]
Environment="HTTP_PROXY=http://10.0.2.2:3128"
Environment="HTTPS_PROXY=http://10.0.2.2:3128"
',
notify => Exec['daemon-reload'],
require => Package['docker-ce'],
}

exec { 'daemon-reload':
command => 'systemctl daemon-reload',
path => '/sbin',
refreshonly => true,
}

service { 'docker':
ensure => 'running',
subscribe => File['/etc/systemd/system/docker.service.d/http-proxy.conf'],
require => Exec['daemon-reload'],
}


I would like to edit some systemd service. In this instance, it is the environment for docker, but it could be any other need.



Since a systemd unit file has been changed, systemctl daemon-reload must be run for the new configuration to be picked up.



Running puppet apply fails:



Notice: Compiled catalog for puppet-docker-test.<redacted> in environment production in 0.18 seconds
Notice: /Stage[main]/Main/File[/etc/systemd/system/docker.service.d/http-proxy.conf]/ensure: defined content as '{md5}dace796a9904d2c5e2c438e6faba2332'
Error: /Stage[main]/Main/Exec[daemon-reload]: Failed to call refresh: Could not find command 'systemctl'
Error: /Stage[main]/Main/Exec[daemon-reload]: Could not find command 'systemctl'
Notice: /Stage[main]/Main/Service[docker]: Dependency Exec[daemon-reload] has failures: false
Warning: /Stage[main]/Main/Service[docker]: Skipping because of failed dependencies
Notice: Applied catalog in 0.15 seconds


The cause is immediately obvious: systemctl lives in /bin, not /sbin, as configured. However, fixing this, then running puppet apply again will neither cause the service to be restarted nor systemctl daemon-reload to be run:



Notice: Compiled catalog for puppet-docker-test.<redacted> in environment production in 0.19 seconds
Notice: Applied catalog in 0.16 seconds


Apparently, this happens because there were no changes to the file resource (since it was applied on the failed run), which would have refreshed the daemon-reload and then triggered the service to restart.



In order to force puppet to reload the service and restart it, I could change the contents of the file on disk, I could change the contents on the puppet code, but it feels like I'm missing some better way of doing this.



How to better recover from such scenario? Or, how to write puppet code that doesn't have this issue?










share|improve this question


























    up vote
    0
    down vote

    favorite












    Consider the following code:



    file { '/etc/systemd/system/docker.service.d/http-proxy.conf':
    ensure => 'present',
    owner => 'root',
    group => 'root',
    mode => '644',
    content => '[Service]
    Environment="HTTP_PROXY=http://10.0.2.2:3128"
    Environment="HTTPS_PROXY=http://10.0.2.2:3128"
    ',
    notify => Exec['daemon-reload'],
    require => Package['docker-ce'],
    }

    exec { 'daemon-reload':
    command => 'systemctl daemon-reload',
    path => '/sbin',
    refreshonly => true,
    }

    service { 'docker':
    ensure => 'running',
    subscribe => File['/etc/systemd/system/docker.service.d/http-proxy.conf'],
    require => Exec['daemon-reload'],
    }


    I would like to edit some systemd service. In this instance, it is the environment for docker, but it could be any other need.



    Since a systemd unit file has been changed, systemctl daemon-reload must be run for the new configuration to be picked up.



    Running puppet apply fails:



    Notice: Compiled catalog for puppet-docker-test.<redacted> in environment production in 0.18 seconds
    Notice: /Stage[main]/Main/File[/etc/systemd/system/docker.service.d/http-proxy.conf]/ensure: defined content as '{md5}dace796a9904d2c5e2c438e6faba2332'
    Error: /Stage[main]/Main/Exec[daemon-reload]: Failed to call refresh: Could not find command 'systemctl'
    Error: /Stage[main]/Main/Exec[daemon-reload]: Could not find command 'systemctl'
    Notice: /Stage[main]/Main/Service[docker]: Dependency Exec[daemon-reload] has failures: false
    Warning: /Stage[main]/Main/Service[docker]: Skipping because of failed dependencies
    Notice: Applied catalog in 0.15 seconds


    The cause is immediately obvious: systemctl lives in /bin, not /sbin, as configured. However, fixing this, then running puppet apply again will neither cause the service to be restarted nor systemctl daemon-reload to be run:



    Notice: Compiled catalog for puppet-docker-test.<redacted> in environment production in 0.19 seconds
    Notice: Applied catalog in 0.16 seconds


    Apparently, this happens because there were no changes to the file resource (since it was applied on the failed run), which would have refreshed the daemon-reload and then triggered the service to restart.



    In order to force puppet to reload the service and restart it, I could change the contents of the file on disk, I could change the contents on the puppet code, but it feels like I'm missing some better way of doing this.



    How to better recover from such scenario? Or, how to write puppet code that doesn't have this issue?










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      Consider the following code:



      file { '/etc/systemd/system/docker.service.d/http-proxy.conf':
      ensure => 'present',
      owner => 'root',
      group => 'root',
      mode => '644',
      content => '[Service]
      Environment="HTTP_PROXY=http://10.0.2.2:3128"
      Environment="HTTPS_PROXY=http://10.0.2.2:3128"
      ',
      notify => Exec['daemon-reload'],
      require => Package['docker-ce'],
      }

      exec { 'daemon-reload':
      command => 'systemctl daemon-reload',
      path => '/sbin',
      refreshonly => true,
      }

      service { 'docker':
      ensure => 'running',
      subscribe => File['/etc/systemd/system/docker.service.d/http-proxy.conf'],
      require => Exec['daemon-reload'],
      }


      I would like to edit some systemd service. In this instance, it is the environment for docker, but it could be any other need.



      Since a systemd unit file has been changed, systemctl daemon-reload must be run for the new configuration to be picked up.



      Running puppet apply fails:



      Notice: Compiled catalog for puppet-docker-test.<redacted> in environment production in 0.18 seconds
      Notice: /Stage[main]/Main/File[/etc/systemd/system/docker.service.d/http-proxy.conf]/ensure: defined content as '{md5}dace796a9904d2c5e2c438e6faba2332'
      Error: /Stage[main]/Main/Exec[daemon-reload]: Failed to call refresh: Could not find command 'systemctl'
      Error: /Stage[main]/Main/Exec[daemon-reload]: Could not find command 'systemctl'
      Notice: /Stage[main]/Main/Service[docker]: Dependency Exec[daemon-reload] has failures: false
      Warning: /Stage[main]/Main/Service[docker]: Skipping because of failed dependencies
      Notice: Applied catalog in 0.15 seconds


      The cause is immediately obvious: systemctl lives in /bin, not /sbin, as configured. However, fixing this, then running puppet apply again will neither cause the service to be restarted nor systemctl daemon-reload to be run:



      Notice: Compiled catalog for puppet-docker-test.<redacted> in environment production in 0.19 seconds
      Notice: Applied catalog in 0.16 seconds


      Apparently, this happens because there were no changes to the file resource (since it was applied on the failed run), which would have refreshed the daemon-reload and then triggered the service to restart.



      In order to force puppet to reload the service and restart it, I could change the contents of the file on disk, I could change the contents on the puppet code, but it feels like I'm missing some better way of doing this.



      How to better recover from such scenario? Or, how to write puppet code that doesn't have this issue?










      share|improve this question













      Consider the following code:



      file { '/etc/systemd/system/docker.service.d/http-proxy.conf':
      ensure => 'present',
      owner => 'root',
      group => 'root',
      mode => '644',
      content => '[Service]
      Environment="HTTP_PROXY=http://10.0.2.2:3128"
      Environment="HTTPS_PROXY=http://10.0.2.2:3128"
      ',
      notify => Exec['daemon-reload'],
      require => Package['docker-ce'],
      }

      exec { 'daemon-reload':
      command => 'systemctl daemon-reload',
      path => '/sbin',
      refreshonly => true,
      }

      service { 'docker':
      ensure => 'running',
      subscribe => File['/etc/systemd/system/docker.service.d/http-proxy.conf'],
      require => Exec['daemon-reload'],
      }


      I would like to edit some systemd service. In this instance, it is the environment for docker, but it could be any other need.



      Since a systemd unit file has been changed, systemctl daemon-reload must be run for the new configuration to be picked up.



      Running puppet apply fails:



      Notice: Compiled catalog for puppet-docker-test.<redacted> in environment production in 0.18 seconds
      Notice: /Stage[main]/Main/File[/etc/systemd/system/docker.service.d/http-proxy.conf]/ensure: defined content as '{md5}dace796a9904d2c5e2c438e6faba2332'
      Error: /Stage[main]/Main/Exec[daemon-reload]: Failed to call refresh: Could not find command 'systemctl'
      Error: /Stage[main]/Main/Exec[daemon-reload]: Could not find command 'systemctl'
      Notice: /Stage[main]/Main/Service[docker]: Dependency Exec[daemon-reload] has failures: false
      Warning: /Stage[main]/Main/Service[docker]: Skipping because of failed dependencies
      Notice: Applied catalog in 0.15 seconds


      The cause is immediately obvious: systemctl lives in /bin, not /sbin, as configured. However, fixing this, then running puppet apply again will neither cause the service to be restarted nor systemctl daemon-reload to be run:



      Notice: Compiled catalog for puppet-docker-test.<redacted> in environment production in 0.19 seconds
      Notice: Applied catalog in 0.16 seconds


      Apparently, this happens because there were no changes to the file resource (since it was applied on the failed run), which would have refreshed the daemon-reload and then triggered the service to restart.



      In order to force puppet to reload the service and restart it, I could change the contents of the file on disk, I could change the contents on the puppet code, but it feels like I'm missing some better way of doing this.



      How to better recover from such scenario? Or, how to write puppet code that doesn't have this issue?







      puppet systemd






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 19 at 21:15









      Thiago Vinicius

      11




      11
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          Puppet does not provide a mechanism for resuming a failed run. Doing so would not make much sense to me, since one would expect that for a resumption to have a different result would depend on the machine state being changed since the failure, and a machine-state change outside Puppet potentially invalidates the catalog that was being applied.



          The agent does, by default, send run reports to the master, so in the event of a failed run, you should be able to determine from that what went wrong. Supposing that you don't want to scour reports to figure out how to recover from a failed run, however, you could consider compiling a recovery script.



          For example, you know that any failure may have caused a daemon-reload to be missed, and it's harmless to perform one when it isn't required, so just put that in your script. You might also put in a restart of each service under management. Basically, you're looking for anything that has non-trivial refresh behavior (Execs and Services are the main ones that come to my mind).



          It occurs to me that if you're extra clever then it might be possible to put that in the form of one or more Puppet classes, and to determine on the master whether the last run for the target node failed, so as to apply the recovery.



          As for avoiding the problem in the first place, I can only suggest testing, testing, and more testing. To that end, if you don't have dedicated machines on which to test Puppet updates, then at least select a small number of normal machines that get updates first, so that any problems that arise are closely contained.






          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382754%2fpuppet-recovering-from-failed-run%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            0
            down vote













            Puppet does not provide a mechanism for resuming a failed run. Doing so would not make much sense to me, since one would expect that for a resumption to have a different result would depend on the machine state being changed since the failure, and a machine-state change outside Puppet potentially invalidates the catalog that was being applied.



            The agent does, by default, send run reports to the master, so in the event of a failed run, you should be able to determine from that what went wrong. Supposing that you don't want to scour reports to figure out how to recover from a failed run, however, you could consider compiling a recovery script.



            For example, you know that any failure may have caused a daemon-reload to be missed, and it's harmless to perform one when it isn't required, so just put that in your script. You might also put in a restart of each service under management. Basically, you're looking for anything that has non-trivial refresh behavior (Execs and Services are the main ones that come to my mind).



            It occurs to me that if you're extra clever then it might be possible to put that in the form of one or more Puppet classes, and to determine on the master whether the last run for the target node failed, so as to apply the recovery.



            As for avoiding the problem in the first place, I can only suggest testing, testing, and more testing. To that end, if you don't have dedicated machines on which to test Puppet updates, then at least select a small number of normal machines that get updates first, so that any problems that arise are closely contained.






            share|improve this answer

























              up vote
              0
              down vote













              Puppet does not provide a mechanism for resuming a failed run. Doing so would not make much sense to me, since one would expect that for a resumption to have a different result would depend on the machine state being changed since the failure, and a machine-state change outside Puppet potentially invalidates the catalog that was being applied.



              The agent does, by default, send run reports to the master, so in the event of a failed run, you should be able to determine from that what went wrong. Supposing that you don't want to scour reports to figure out how to recover from a failed run, however, you could consider compiling a recovery script.



              For example, you know that any failure may have caused a daemon-reload to be missed, and it's harmless to perform one when it isn't required, so just put that in your script. You might also put in a restart of each service under management. Basically, you're looking for anything that has non-trivial refresh behavior (Execs and Services are the main ones that come to my mind).



              It occurs to me that if you're extra clever then it might be possible to put that in the form of one or more Puppet classes, and to determine on the master whether the last run for the target node failed, so as to apply the recovery.



              As for avoiding the problem in the first place, I can only suggest testing, testing, and more testing. To that end, if you don't have dedicated machines on which to test Puppet updates, then at least select a small number of normal machines that get updates first, so that any problems that arise are closely contained.






              share|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                Puppet does not provide a mechanism for resuming a failed run. Doing so would not make much sense to me, since one would expect that for a resumption to have a different result would depend on the machine state being changed since the failure, and a machine-state change outside Puppet potentially invalidates the catalog that was being applied.



                The agent does, by default, send run reports to the master, so in the event of a failed run, you should be able to determine from that what went wrong. Supposing that you don't want to scour reports to figure out how to recover from a failed run, however, you could consider compiling a recovery script.



                For example, you know that any failure may have caused a daemon-reload to be missed, and it's harmless to perform one when it isn't required, so just put that in your script. You might also put in a restart of each service under management. Basically, you're looking for anything that has non-trivial refresh behavior (Execs and Services are the main ones that come to my mind).



                It occurs to me that if you're extra clever then it might be possible to put that in the form of one or more Puppet classes, and to determine on the master whether the last run for the target node failed, so as to apply the recovery.



                As for avoiding the problem in the first place, I can only suggest testing, testing, and more testing. To that end, if you don't have dedicated machines on which to test Puppet updates, then at least select a small number of normal machines that get updates first, so that any problems that arise are closely contained.






                share|improve this answer












                Puppet does not provide a mechanism for resuming a failed run. Doing so would not make much sense to me, since one would expect that for a resumption to have a different result would depend on the machine state being changed since the failure, and a machine-state change outside Puppet potentially invalidates the catalog that was being applied.



                The agent does, by default, send run reports to the master, so in the event of a failed run, you should be able to determine from that what went wrong. Supposing that you don't want to scour reports to figure out how to recover from a failed run, however, you could consider compiling a recovery script.



                For example, you know that any failure may have caused a daemon-reload to be missed, and it's harmless to perform one when it isn't required, so just put that in your script. You might also put in a restart of each service under management. Basically, you're looking for anything that has non-trivial refresh behavior (Execs and Services are the main ones that come to my mind).



                It occurs to me that if you're extra clever then it might be possible to put that in the form of one or more Puppet classes, and to determine on the master whether the last run for the target node failed, so as to apply the recovery.



                As for avoiding the problem in the first place, I can only suggest testing, testing, and more testing. To that end, if you don't have dedicated machines on which to test Puppet updates, then at least select a small number of normal machines that get updates first, so that any problems that arise are closely contained.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 19 at 23:07









                John Bollinger

                76.8k63771




                76.8k63771






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382754%2fpuppet-recovering-from-failed-run%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    404 Error Contact Form 7 ajax form submitting

                    How to know if a Active Directory user can login interactively

                    Refactoring coordinates for Minecraft Pi buildings written in Python