Sample-based Learning Methods

Sample-based Learning Methods

This course is part of Reinforcement Learning Specialization

Instructors: Martha White

35,630 already enrolled

Included with Coursera Plus

Learn more

5 modules

Gain insight into a topic and learn the fundamentals.

4.8

(1,247 reviews)

Intermediate level

Recommended experience

Flexible schedule

2 weeks at 10 hours a week

Learn at your own pace

90%

Most learners liked this course

5 modules

Gain insight into a topic and learn the fundamentals.

4.8

(1,247 reviews)

Intermediate level

Recommended experience

Flexible schedule

2 weeks at 10 hours a week

Learn at your own pace

90%

Most learners liked this course

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

5 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Reinforcement Learning Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 5 modules in this course

In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning.

By the end of this course you will be able to: - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model - Understand the connections between Monte Carlo and Dynamic Programming and TD. - Implement and apply the TD algorithm, for estimating value functions - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between on-policy and off-policy control - Understand planning with simulated experience (as opposed to classic planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dyna

Welcome to the second course in the Reinforcement Learning Specialization: Sample-Based Learning Methods, brought to you by the University of Alberta, Onlea, and Coursera. In this pre-course module, you'll be introduced to your instructors, and get a flavour of what the course has in store for you. Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

What's included

2 videos2 readings1 discussion prompt

This week you will learn how to estimate value functions and optimal policies, using only sampled experience from the environment. This module represents our first step toward incremental learning methods that learn from the agent’s own interaction with the world, rather than a model of the world. You will learn about on-policy and off-policy methods for prediction and control, using Monte Carlo methods---methods that use sampled returns. You will also be reintroduced to the exploration problem, but more generally in RL, beyond bandits.

What's included

11 videos3 readings1 assignment1 programming assignment1 discussion prompt

11 videosTotal 58 minutes

What is Monte Carlo?6 minutesPreview module
Using Monte Carlo for Prediction6 minutes
Using Monte Carlo for Action Values2 minutes
Using Monte Carlo methods for generalized policy iteration2 minutes
Solving the Blackjack Example3 minutes
Epsilon-soft policies5 minutes
Why does off-policy learning matter?4 minutes
Importance Sampling4 minutes
Off-Policy Monte Carlo Prediction5 minutes
Emma Brunskill: Batch Reinforcement Learning12 minutes
Week 1 Summary3 minutes

3 readingsTotal 90 minutes

Module 1 Learning Objectives10 minutes
Weekly Reading40 minutes
Chapter Summary40 minutes

1 assignmentTotal 30 minutes

Graded Quiz30 minutes

1 programming assignmentTotal 5 minutes

Blackjack5 minutes

1 discussion promptTotal 10 minutes

Comparing on-policy and off-policy learning10 minutes

This week, you will learn about one of the most fundamental concepts in reinforcement learning: temporal difference (TD) learning. TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods. TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world, and do not require knowledge of the model. TD methods are similar to DP methods in that they bootstrap, and thus can learn online---no waiting until the end of an episode. You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping. For this module, we first focus on TD for prediction, and discuss TD for control in the next module. This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.

What's included

6 videos2 readings1 assignment1 programming assignment1 discussion prompt

6 videosTotal 36 minutes

What is Temporal Difference (TD) learning?4 minutesPreview module
Rich Sutton: The Importance of TD Learning6 minutes
The advantages of temporal difference learning5 minutes
Comparing TD and Monte Carlo5 minutes
Andy Barto and Rich Sutton: More on the History of RL12 minutes
Week 2 Summary2 minutes

2 readingsTotal 50 minutes

Module 2 Learning Objectives10 minutes
Weekly Reading40 minutes

1 assignmentTotal 30 minutes

Practice Quiz30 minutes

1 programming assignmentTotal 180 minutes

Policy Evaluation with Temporal Difference Learning180 minutes

1 discussion promptTotal 10 minutes

Should we care about TD in the brain?10 minutes

This week, you will learn about using temporal difference learning for control, as a generalized policy iteration strategy. You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa, Q-learning and Expected Sarsa. You will see some of the differences between the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both. You will implement Expected Sarsa and Q-learning, on Cliff World.

What's included

9 videos3 readings1 assignment1 programming assignment1 discussion prompt

9 videosTotal 29 minutes

Sarsa: GPI with TD4 minutesPreview module
Sarsa in the Windy Grid World3 minutes
What is Q-learning?3 minutes
Q-learning in the Windy Grid World3 minutes
How is Q-learning off-policy?4 minutes
Expected Sarsa3 minutes
Expected Sarsa in the Cliff World3 minutes
Generality of Expected Sarsa1 minute
Week 3 Summary2 minutes

3 readingsTotal 90 minutes

Module 3 Learning Objectives10 minutes
Weekly Reading40 minutes
Chapter summary40 minutes

1 assignmentTotal 30 minutes

Practice Quiz30 minutes

1 programming assignmentTotal 180 minutes

Q-Learning and Expected SARSA180 minutes

1 discussion promptTotal 10 minutes

How can we use off-policy for learning multiple goals?10 minutes

Up until now, you might think that learning with and without a model are two distinct, and in some ways, competing strategies: planning with Dynamic Programming verses sample-based learning via TD methods. This week we unify these two strategies with the Dyna architecture. You will learn how to estimate the model from data and then use this model to generate hypothetical experience (a bit like dreaming) to dramatically improve sample efficiency compared to sample-based methods like Q-learning. In addition, you will learn how to design learning systems that are robust to inaccurate models.

What's included

11 videos4 readings2 assignments1 programming assignment1 discussion prompt

11 videosTotal 46 minutes

What is a Model?4 minutesPreview module
Comparing Sample and Distribution Models2 minutes
Random Tabular Q-planning3 minutes
The Dyna Architecture5 minutes
The Dyna Algorithm5 minutes
Dyna & Q-learning in a Simple Maze5 minutes
What if the model is inaccurate?3 minutes
In-depth with changing environments5 minutes
Drew Bagnell: self-driving, robotics, and Model Based RL7 minutes
Week 4 Summary1 minute
Congratulations!2 minutes

4 readingsTotal 130 minutes

Module 4 Learning Objectives10 minutes
Weekly Reading40 minutes
Chapter Summary40 minutes
Text Book Part 1 Summary40 minutes

2 assignmentsTotal 90 minutes

Practice Assessment 45 minutes
Replacement Practice Assignment45 minutes

1 programming assignmentTotal 180 minutes

Dyna-Q and Dyna-Q+180 minutes

1 discussion promptTotal 10 minutes

Compare Planning and Reasoning10 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Instructor ratings

4.7 (222 ratings)

Martha White

University of Alberta

4 Courses105,463 learners

Adam White

University of Alberta

4 Courses105,463 learners

Offered by

University of Alberta

Alberta Machine Intelligence Institute

Explore more from Machine Learning

University of Alberta
Prediction and Control with Function Approximation
Course
Illinois Tech
Statistical Learning
Course
University of Alberta
Fundamentals of Reinforcement Learning
Course
University of Washington
Machine Learning
Specialization

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

4.8

1,247 reviews

5 stars
82.29%
4 stars
13.22%
3 stars
2.80%
2 stars
0.64%
1 star
1.04%

Showing 3 of 1247

Reviewed on Apr 13, 2020

Well done. Follows Reinforcement Learning (Sutton/Barto) closely and explains topics well. Graded notebooks are invaluable in understanding the material well.

Reviewed on Feb 27, 2020

Itwasgoodinsubstane but there is plenty of issues with the automated grader. you spend most time dealing with the letter not on actual learning of the matter.

Reviewed on Mar 13, 2022

The videos are very clear and do a good job explaining the material from the textbook. The assignments are relevant and just right in terms of length and difficulty.

View more reviews

New to Machine Learning? Start here.

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.