在线学习方法综述: 汤普森抽样和其他方法

doi:10.15960/j.cnki.issn.1007-6093.2017.04.006

Operations Research Transactions

Previous Articles Next Articles

A survey on online learning methods: Thompson sampling and others

HE Simai^1,* JIN Yujia² WANG Hua²GE Dongdong³

1. School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China 2. School of Mathematical Sciences, Fudan University, Shanghai 200433, China 3. Research Institute for Interdisciplinary Sciences, Shanghai University of Finance and Economics, Shanghai 200433, China

Received:2017-08-30 Online:2017-12-15 Published:2017-12-15

Abstract

Abstract:

The paper is a survey on the latest research results, major theories and algorithms in the field of online learning. The topic of online learning is a broad one, and we aim at introducing the principles of the basic algorithms and ideas to the readers. We start from the most standard models and algorithm design, and extend all the way to a more general presentation on the latest developments in the area.

To begin with, we take the standard online optimization model, the Multi-Armed Bandit Problem, as an example. Then we discuss Thompson Sampling algorithms and Upper Confidence Bound algorithms, analyzing and presenting the main idea and newest theoretical achievements, with further discussion about the extensions and applications of Thompson Sampling in some more complicated real-world online learning scenarios. Furthermore, the paper gives a brief introduction about online convex optimization, which serves as an effective and well-known framework in solving Multi-Armed Bandit problem and other application problems.

Key words: online learning, multi-armed bandit, Thompson sampling, upper confidence bound, contextual multi-armed bandit, online convex optimization

HE Simai, JIN Yujia, WANG Hua, GE Dongdong. A survey on online learning methods: Thompson sampling and others[J]. Operations Research Transactions, doi: 10.15960/j.cnki.issn.1007-6093.2017.04.006.

A survey on online learning methods: Thompson sampling and others

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics

Comments