Operations Research Transactions >
2017 , Vol. 21 >Issue 4: 84 - 102
DOI: https://doi.org/10.15960/j.cnki.issn.1007-6093.2017.04.006
A survey on online learning methods: Thompson sampling and others
Received date: 2017-08-30
Online published: 2017-12-15
The paper is a survey on the latest research results, major theories and algorithms in the field of online learning. The topic of online learning is a broad one, and we aim at introducing the principles of the basic algorithms and ideas to the readers. We start from the most standard models and algorithm design, and extend all the way to a more general presentation on the latest developments in the area.
To begin with, we take the standard online optimization model, the Multi-Armed Bandit Problem, as an example. Then we discuss Thompson Sampling algorithms and Upper Confidence Bound algorithms, analyzing and presenting the main idea and newest theoretical achievements, with further discussion about the extensions and applications of Thompson Sampling in some more complicated real-world online learning scenarios. Furthermore, the paper gives a brief introduction about online convex optimization, which serves as an effective and well-known framework in solving Multi-Armed Bandit problem and other application problems.
HE Simai, JIN Yujia, WANG Hua, GE Dongdong . A survey on online learning methods: Thompson sampling and others[J]. Operations Research Transactions, 2017 , 21(4) : 84 -102 . DOI: 10.15960/j.cnki.issn.1007-6093.2017.04.006
/
| 〈 |
|
〉 |