CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

2023年4月10日下午3:25 • 深度学习

前面弄错了，应该看2017的秋季课，结果看了春季课了。

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation neural network control a virtual robot, by imitating human motion

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

Domain shift cause the failure of supervised learning in imitation learning.

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

human expert said "turn left!!!" (step 3)

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

we don't want the average of the two expected behaviors. when the actions are discrete, the model works well.

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation however, this is the gaussian output of continuous actions

solution:

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

add a noise input here.

the defect is implicit density model is harder to train.

recommend to look at VAE and GAN and stan??? variational gradient descent, which are three methods to train implicit density models

upside: capable to mimic any form of function

downside: much more complex to implement

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

the second net is conditionally sampling from the first net

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

It's time for case study

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

this is a human with three go-pro on his head...

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

robot: 300 bucks

game control: 100 bucks

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

c for cost

r for reward (the negative of cost)

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

you know there maybe a little bit culture differences here. so like americans like to believe life is for reward, but maybe russians behavior more pessimistically.

HAhahahahahahaha....

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

reinforcement learning in CS is exactly the same as optimal control in dynamic programming

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation - Python技术站

深度学习

0 0 打赏

微信扫一扫

支付宝扫一扫

深度学习-初始化权重矩阵

上一篇 2023年4月10日

干货分享!深度学习几何画板绘图技巧！

下一篇 2023年4月10日

机器学习，深度学习，神经网络，深度神经网络

先来说一下这几者之间的关系：人工智能包含机器学习，机器学习包含深度学习（是其中比较重要的分支）。深度学习源自于人工神经网络的研究，但是并不完全等于传统神经网络。所以深度学习可以说是在传统神经网络基础上的升级。神经网络一般有输入层->隐藏层->输出层，一般来说隐藏层大于2的神经网络就叫做深度神经网络，深度学习就是采用像深度神经网络这种深层架构的一种…

深度学习 2023年4月12日
000
深度学习原理与框架-卷积网络细节-数据增强策略 1.翻转 2.随机裁剪 3.平移 4.旋转角度

数据增强表示的是，在原始图像的基础上，对数据进行一定的改变，增加了数据样本的数量，但是数据的标签值并不发生改变，图片中可以看出对猫这张图片进行了灰度值的变化，但是猫的标签并没有发生改变常见的数据增强的策略： 1. Horizontal flips 翻转，左右翻转，将左边的像素点放在右边，将右边的像素点放在左边 2.Random crops/scales…

深度学习 2023年4月13日
000
为什么是深度学习

　　是不是深度学习的Hidden layer越多越好，我们并不是单纯的研究参数增多的所带来的性能改善，我们关注的是相同的参数情况下，是不是深度越深越好。 Fat + Short v.s. Thin + Tall 　　那么是什么样的原因出现上边的情况呢？一个合理的解释就是Modularization。 Deep learning 　　Modularizatio…

深度学习 2023年4月13日
000
6月份学习记录【海岛帝国系列赛】No.1 海岛帝国：诞辰之日【海岛帝国系列赛】No.2 海岛帝国：“落汤鸡”市的黑帮危机【海岛帝国系列赛】No.3 海岛帝国：运输资源【海岛帝国系列赛】No.4 海岛帝国：LYF的太空运输站【海岛帝国系列赛】No.5 海岛帝国：独立之战【海岛帝国系列赛】No.6 海岛帝国：战争前线【海岛帝国系列赛】No.7 海岛帝国：神圣之日图的广度优先遍历图的深度优先遍历 kruskal算法

6月份学习记录今天一看日历，6月差不多要过去了，又该写学习记录啦~~~ 想到6月的头一天，因为没有过传说中的儿童节（去出题了）闹了一顿，然后得到一张电影票QAQ（电影好像还是在电视上点播的）。LJX李家鑫说：“谁计算机没学两年啊！”，当我跟LJX李家鑫童靴说我c++学了6个月后，他说我智商太高？我瞬间就懵了，难道学6个月学不到这样吗？ …

深度学习 2023年4月12日
000
DLRS(近三年深度学习应用于推荐系统论文汇总)

Recommender Systems with Deep Learning Improving Scalability of Personalized Recommendation Systems for Enterprise Knowledge Workers – Authors: C Verma, M Hart, S Bhatkar, A Parker…

深度学习 2023年4月12日
000
深度学习（十一）残差网络

前言我们都知道增加网络的宽度和深度可以很好的提高网络的性能，深的网络一般都比浅的的网络效果好，比如说一个深的网络A和一个浅的网络B，那A的性能至少都能跟B一样，为什么呢？因为就算我们把A的网络参数全部迁移到B的前面几层，而B后面的层只是做一个等价的映射，就达到了A网络的一样的效果。一个比较好的例子就是VGG，该网络就是在AlexNex的基础上通过增加网络深…

深度学习 2023年4月12日
000
深度学习小记

0 前言近段时间，由于工作需要，一直在看深度学习的各种框架，主要是Caffe和Tensorflow。并且在可预见的未来，还会看更多不同的深度学习框架。最开始我是以软件工程师的角度去阅读这些框架的，说实话，Caffe的代码框架逻辑清晰相对好理解一点，而TensorFlow就比较麻烦了，里面内容太多，函数调用链非常长，且使用了大量的C++11语法，这对于C++…

深度学习 2023年4月13日
000
深度学习

【RS】A review on deep learning for recommender systems: challenges and remedies- 推荐系统深度学习研究综述：挑战和补救措施

【论文标题】A review on deep learning for recommender systems: challenges and remedies （Artificial Intelligence Review，201906）【论文作者】Zeynep Batmaz 1 · Ali Yurekli 1 · Alper Bilge 1 · Ci…

2023年4月12日
000

CS294-112 深度强化学习 秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

相关文章

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation