浙江省登峰学科浙江工商大学统计学

当前位置：首页 > 学术交流

×关闭窗口

“数字+”与之江统计讲坛（第18讲）10月27日华东师范大学李育强教授来我院线上讲座预告: 发布日期：2022-10-24　阅读：次

题目：Minimax Weight Learning for Absorbing MDPs

报告人：李育强

讲座时间：2022年10月27日，星期四：14:00--15:00，

地点：综合楼644

腾讯会议：184-369-637

摘要： Reinforcement learning policy evaluation problems are often modeled as a finite or infinite-horizon MDP, but this is often unrealistic for practical issues. In this paper, we study off-policy policy estimation for absorbing MDPs. Based on the Minimax Weight Learning (MWL) algorithm, we propose a so-called MWLA algorithm to directly estimate the importance ratio of state-action measure when the behavior policy is unknown, under the assumption that the data is collected by i.i.d. episodes. The Mean Square Error (MSE) bound for the MWLA method is investigated. In the episodic taxi environment, we show that the MWLA method has the lower MSE as the number of episodes and truncation length increase, significantly improving the accuracy of policy evaluation.

This talk is based on a joint work with Fengying Li and Xianyi Wu.

报告人简介：李育强，华东师范大学统计学院教授，博士生导师，《应用概率统计》期刊编辑部主任。主要研究兴趣包括随机过程理论及其应用，强化学习等方向。主持国家自然科学基金、上海市自然科学基金、上海市教委科研创新重点项目等十余项，目前在Stochastic Processes and Their Applications，Bernoulli，Science China-Mathematics，Journal of Applied Probability等杂志上发表30余篇论文，研究成果被包括墨西哥科学院院士Gorostiza教授在内的数十位国内外学者所引用。; 上一条： “数字+”与统计数据工程系列讲座（十一）10月29日东华大学闫理坦教授来我院线上讲座预告 2022-10-26; 下一条： “数字+”与之江统计讲坛（第17讲）10月27日山东工商学院赵峰教授来我院线上讲座预告 2022-10-24

首页

中心概况

新闻动态

学术交流

学术成果

统计聚焦

人才培养

规章制度

社会服务