A survey on online learning methods: Thompson sampling and others

HE Simai, JIN Yujia, WANG Hua, GE Dongdong

doi:10.15960/j.cnki.issn.1007-6093.2017.04.006

Operations Research Transactions >

2017 , Vol. 21 >Issue 4: 84 - 102

DOI: https://doi.org/10.15960/j.cnki.issn.1007-6093.2017.04.006

A survey on online learning methods: Thompson sampling and others

Expand

1. School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China 2. School of Mathematical Sciences, Fudan University, Shanghai 200433, China 3. Research Institute for Interdisciplinary Sciences, Shanghai University of Finance and Economics, Shanghai 200433, China

Received date: 2017-08-30

Online published: 2017-12-15

Fold

Abstract

The paper is a survey on the latest research results, major theories and algorithms in the field of online learning. The topic of online learning is a broad one, and we aim at introducing the principles of the basic algorithms and ideas to the readers. We start from the most standard models and algorithm design, and extend all the way to a more general presentation on the latest developments in the area.

To begin with, we take the standard online optimization model, the Multi-Armed Bandit Problem, as an example. Then we discuss Thompson Sampling algorithms and Upper Confidence Bound algorithms, analyzing and presenting the main idea and newest theoretical achievements, with further discussion about the extensions and applications of Thompson Sampling in some more complicated real-world online learning scenarios. Furthermore, the paper gives a brief introduction about online convex optimization, which serves as an effective and well-known framework in solving Multi-Armed Bandit problem and other application problems.

Key words： online learning; multi-armed bandit; Thompson sampling; upper confidence bound; contextual multi-armed bandit; online convex optimization

Cite this article

HE Simai, JIN Yujia, WANG Hua, GE Dongdong . A survey on online learning methods: Thompson sampling and others[J]. Operations Research Transactions, 2017 , 21(4) : 84 -102 . DOI: 10.15960/j.cnki.issn.1007-6093.2017.04.006

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article