Q-learning vs dynamic programming

Is the classic Q-learning algorithm, using lookup table (instead of function approximation), equivalent to dynamic programming?

edited Nov 22 '18 at 16:01

nbro

5,60084994

asked Aug 17 '16 at 5:16

D_Wills

8318

add a comment |

Is the classic Q-learning algorithm, using lookup table (instead of function approximation), equivalent to dynamic programming?

edited Nov 22 '18 at 16:01

nbro

5,60084994

asked Aug 17 '16 at 5:16

D_Wills

8318

add a comment |

Is the classic Q-learning algorithm, using lookup table (instead of function approximation), equivalent to dynamic programming?

edited Nov 22 '18 at 16:01

nbro

5,60084994

asked Aug 17 '16 at 5:16

D_Wills

8318

Is the classic Q-learning algorithm, using lookup table (instead of function approximation), equivalent to dynamic programming?

machine-learning dynamic-programming reinforcement-learning q-learning

edited Nov 22 '18 at 16:01

nbro

5,60084994

asked Aug 17 '16 at 5:16

D_Wills

8318

edited Nov 22 '18 at 16:01

nbro

5,60084994

asked Aug 17 '16 at 5:16

D_Wills

8318

edited Nov 22 '18 at 16:01

nbro

5,60084994

edited Nov 22 '18 at 16:01

nbro

5,60084994

edited Nov 22 '18 at 16:01

nbro

5,60084994

asked Aug 17 '16 at 5:16

D_Wills

8318

asked Aug 17 '16 at 5:16

D_Wills

8318

asked Aug 17 '16 at 5:16

D_Wills

8318

add a comment |

3 Answers
3

active

oldest

votes

Dynamic Programming is an umbrella encompassing many algorithms. Q-Learning is a specific algorithm. So, no, it is not the same.

Also, if you mean Dynamic Programming as in Value Iteration or Policy Iteration, still not the same. These algorithms are "planning" methods. You have to give them a transition and a reward function and they will iteratively compute a value function and an optimal policy.

Q-Learning is a model-free reinforcement learning method. This one is "model-free", not because it doesn't use a machine learning model or anything like that, but because they don't require, and don't use a model of the environment, also known as MDP, to obtain an optimal policy. You also have "model-based" methods. These, unlike Dynamic Programming methods, are based on learning a model, not simply using one. And, unlike model-free methods, don't throw away samples after estimating values, they instead try to reconstruct the transition and reward function to get better performance.

Model-based methods combine model-free and planning algorithms to get same good results with less amount of samples than required by model-free methods (Q-Learning), and without needing a model like Dynamic Programming methods (Value/Policy Iteration).

edited Nov 16 '17 at 15:54

answered Nov 16 '17 at 15:48

mimoralea

6,14844046

add a comment |

From Sutton & Barto's book (Reinforcement Learning: An Introduction, chapter 4)

The term dynamic programming (DP) refers to a collection of algorithms
that can be used to compute optimal policies given a perfect model of
the environment as a Markov decision process (MDP). Classical DP
algorithms are of limited utility in reinforcement learning both
because of their assumption of a perfect model and because of their
great computational expense, but they are still important
theoretically.

So, although both share the same working principles (either using tabular Reinforcement Learning/Dynamic Programming or approximated RL/DP), the key difference between classic DP and classic RL is that the first assume the model is known. This basically means knowing the transition probabilities (which indicates the probability of change from state s to state s' given action a) and the expected immediate reward function.

On the contrary, RL methods only require to have access to a set of samples, either collected online or offline (depending on the algorithm).

Of course, there are hybrid methods that can be place between RL and DP, for example those that learn a model from the samples, and then use that model in the learning process.

NOTE: The term Dynamic Programming, in addition to a set of mathematical optimization techniques related with RL, is also used to refer a "general algorithmic pattern", as pointed in some comment. In both case, the fundaments are the same, but depending on the context may have a different meaning.

edited Nov 22 '18 at 15:57

nbro

5,60084994

answered Aug 17 '16 at 8:24

Pablo EM

2,63421627

add a comment |

-1

If you use Q-learning in an offline setup, like AlphaGo, for example, then it is equivalent to dynamic programming. The difference is that it can also be used in an online setup.

answered Aug 17 '16 at 5:49

Don Reba

10.8k23452

1

no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.

– cel
Aug 17 '16 at 6:16

@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.

– Don Reba
Aug 17 '16 at 11:22

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f38988582%2fq-learning-vs-dynamic-programming%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Dynamic Programming is an umbrella encompassing many algorithms. Q-Learning is a specific algorithm. So, no, it is not the same.

edited Nov 16 '17 at 15:54

answered Nov 16 '17 at 15:48

mimoralea

6,14844046

add a comment |

Dynamic Programming is an umbrella encompassing many algorithms. Q-Learning is a specific algorithm. So, no, it is not the same.

edited Nov 16 '17 at 15:54

answered Nov 16 '17 at 15:48

mimoralea

6,14844046

add a comment |

Dynamic Programming is an umbrella encompassing many algorithms. Q-Learning is a specific algorithm. So, no, it is not the same.

edited Nov 16 '17 at 15:54

answered Nov 16 '17 at 15:48

mimoralea

6,14844046

Dynamic Programming is an umbrella encompassing many algorithms. Q-Learning is a specific algorithm. So, no, it is not the same.

edited Nov 16 '17 at 15:54

answered Nov 16 '17 at 15:48

mimoralea

6,14844046

edited Nov 16 '17 at 15:54

answered Nov 16 '17 at 15:48

mimoralea

6,14844046

answered Nov 16 '17 at 15:48

mimoralea

6,14844046

answered Nov 16 '17 at 15:48

mimoralea

6,14844046

add a comment |

From Sutton & Barto's book (Reinforcement Learning: An Introduction, chapter 4)

The term dynamic programming (DP) refers to a collection of algorithms
that can be used to compute optimal policies given a perfect model of
the environment as a Markov decision process (MDP). Classical DP
algorithms are of limited utility in reinforcement learning both
because of their assumption of a perfect model and because of their
great computational expense, but they are still important
theoretically.

On the contrary, RL methods only require to have access to a set of samples, either collected online or offline (depending on the algorithm).

Of course, there are hybrid methods that can be place between RL and DP, for example those that learn a model from the samples, and then use that model in the learning process.

edited Nov 22 '18 at 15:57

nbro

5,60084994

answered Aug 17 '16 at 8:24

Pablo EM

2,63421627

add a comment |

From Sutton & Barto's book (Reinforcement Learning: An Introduction, chapter 4)

The term dynamic programming (DP) refers to a collection of algorithms
that can be used to compute optimal policies given a perfect model of
the environment as a Markov decision process (MDP). Classical DP
algorithms are of limited utility in reinforcement learning both
because of their assumption of a perfect model and because of their
great computational expense, but they are still important
theoretically.

On the contrary, RL methods only require to have access to a set of samples, either collected online or offline (depending on the algorithm).

Of course, there are hybrid methods that can be place between RL and DP, for example those that learn a model from the samples, and then use that model in the learning process.

edited Nov 22 '18 at 15:57

nbro

5,60084994

answered Aug 17 '16 at 8:24

Pablo EM

2,63421627

add a comment |

From Sutton & Barto's book (Reinforcement Learning: An Introduction, chapter 4)

The term dynamic programming (DP) refers to a collection of algorithms
that can be used to compute optimal policies given a perfect model of
the environment as a Markov decision process (MDP). Classical DP
algorithms are of limited utility in reinforcement learning both
because of their assumption of a perfect model and because of their
great computational expense, but they are still important
theoretically.

On the contrary, RL methods only require to have access to a set of samples, either collected online or offline (depending on the algorithm).

Of course, there are hybrid methods that can be place between RL and DP, for example those that learn a model from the samples, and then use that model in the learning process.

edited Nov 22 '18 at 15:57

nbro

5,60084994

answered Aug 17 '16 at 8:24

Pablo EM

2,63421627

From Sutton & Barto's book (Reinforcement Learning: An Introduction, chapter 4)

The term dynamic programming (DP) refers to a collection of algorithms
that can be used to compute optimal policies given a perfect model of
the environment as a Markov decision process (MDP). Classical DP
algorithms are of limited utility in reinforcement learning both
because of their assumption of a perfect model and because of their
great computational expense, but they are still important
theoretically.

On the contrary, RL methods only require to have access to a set of samples, either collected online or offline (depending on the algorithm).

Of course, there are hybrid methods that can be place between RL and DP, for example those that learn a model from the samples, and then use that model in the learning process.

edited Nov 22 '18 at 15:57

nbro

5,60084994

answered Aug 17 '16 at 8:24

Pablo EM

2,63421627

edited Nov 22 '18 at 15:57

nbro

5,60084994

edited Nov 22 '18 at 15:57

nbro

5,60084994

edited Nov 22 '18 at 15:57

nbro

5,60084994

answered Aug 17 '16 at 8:24

Pablo EM

2,63421627

answered Aug 17 '16 at 8:24

Pablo EM

2,63421627

answered Aug 17 '16 at 8:24

Pablo EM

2,63421627

add a comment |

-1

If you use Q-learning in an offline setup, like AlphaGo, for example, then it is equivalent to dynamic programming. The difference is that it can also be used in an online setup.

answered Aug 17 '16 at 5:49

Don Reba

10.8k23452

1

no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.

– cel
Aug 17 '16 at 6:16

@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.

– Don Reba
Aug 17 '16 at 11:22

add a comment |

-1

If you use Q-learning in an offline setup, like AlphaGo, for example, then it is equivalent to dynamic programming. The difference is that it can also be used in an online setup.

answered Aug 17 '16 at 5:49

Don Reba

10.8k23452

1

no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.

– cel
Aug 17 '16 at 6:16

@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.

– Don Reba
Aug 17 '16 at 11:22

add a comment |

-1

If you use Q-learning in an offline setup, like AlphaGo, for example, then it is equivalent to dynamic programming. The difference is that it can also be used in an online setup.

answered Aug 17 '16 at 5:49

Don Reba

10.8k23452

If you use Q-learning in an offline setup, like AlphaGo, for example, then it is equivalent to dynamic programming. The difference is that it can also be used in an online setup.

answered Aug 17 '16 at 5:49

Don Reba

10.8k23452

answered Aug 17 '16 at 5:49

Don Reba

10.8k23452

answered Aug 17 '16 at 5:49

Don Reba

10.8k23452

answered Aug 17 '16 at 5:49

Don Reba

10.8k23452

1

no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.

– cel
Aug 17 '16 at 6:16

@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.

– Don Reba
Aug 17 '16 at 11:22

add a comment |

1

no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.

– cel
Aug 17 '16 at 6:16

@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.

– Don Reba
Aug 17 '16 at 11:22

no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.

– cel
Aug 17 '16 at 6:16

@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.

– Don Reba
Aug 17 '16 at 11:22

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk