International Journal of Web Applications

DLINE Journals portal

Home

New Journals

Browse Journals

Journal Prices

For Authors

Print ISSN: 0974-7710
Online ISSN: 0974-7729

About IJWA
	Aims & Scope Editorial Board Contact us Current Issue Next Issue Previous Issue Sample Issue Be a Reviewer Publisher Subscription

How To Order
	Order Online Price Information Request for Complimentary Print Copy

For Authors
	Guidelines for Contributors Online Submission Call for Papers Author Rights

RELATED JOURNALS

Journal of Digital Information Management (JDIM)

Journal of E-Technology(JET)

International Journal of Web Applications

Bandit Based Pure Exploration of Greedy Learning

Tian Lin, Jian Li, Wei Chen
Tsinghua University Beijing, China., Microsoft Research Beijing, China

Abstract: Combinatorial optimization has many greedy algorithms. Using the stochastic based greedy algorithm, we have studied the online learning issues. Primarily, we compared the quasi- greedy regret as a metric for learning and equated with the offline greedy algorithms. Multi-armed bandit based pure exploration at each level of greedy learning is used to develop the two step online greedy algorithm with semi-bandit feed backs. This process works out for regret metrics well. The use of general class of combinatorial structures and reward function permit greedy solutions. We have tested the procedures with the parameters of other problems.

Keywords: Combinatorial optimization, Greedy algorithms, Learning Metrics Bandit Based Pure Exploration of Greedy Learning

DOI:https://doi.org/10.6025/ijwa/2023/15/1/1-11

Full_Text PDF 2.41 MB Download: 51 times

References:

[1] Audibert, J.-Y., Bubeck, S. (2010) Best arm identification in multi-armed bandits. In: Colt.
[2] Audibert, J.-Y., Bubeck, S., Lugosi, G. Minimax Policies for Combinatorial Prediction Games. arXiv Preprint ArXiv:1105.4871 (2011).
[3] Auer, P., Cesa-Bianchi, N. & Fischer, P. (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47, 235–256.
[4] Auer, P., Cesa-Bianchi, N., Freund, Y. & Schapire, R.E. (2002) The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32, 48–77 [DOI: 10.1137/S0097539701398375].
[5] Björner, A. & Ziegler, G.M. (1992) Introduction to greedoids. Matroid Applications, 40, 284–357.
[6] Bubeck, S. & Cesa-Bianchi, N. (2012). Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. arXiv Preprint ArXiv:1204.5721 (2012).
[7] Bubeck, S., Munos, R. & Stoltz, G. (2011) Pure exploration in finitely armed and continuous-armed bandits. Theoretical Computer Science, 412, 1832–1852 [DOI: 10.1016/j.tcs.2010.12.059].
[8] Cesa-Bianchi, N. & Lugosi, G. (2012) Combinatorial bandits. Journal of Computer and System Sciences, 78, 1404–1422 [DOI: 10.1016/j.jcss.2012.01.001].
[9] Chen, S., Lin, T., King, I., Lyu, M.R. & Chen, W. (2014) Combinatorial pure exploration of multi-armed bandits. In: Nips.
[10] Chen, W., Wang, Y. & Yuan, Y. (2013) Combinatorial multi-armed bandit: General framework and applications. In: ICML.
[11] Chvatal, V. (1979) A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 4, 233–235 [DOI: 10.1287/moor.4.3.233].
[12] Gabillon, V., Kveton, B., Wen, Z., Eriksson, B. & Muthukrishnan, S. (2013) Adaptive submodular maximization in bandit setting. In: Nips.
[13] Gai, Y., Krishnamachari, B. & Jain, R. (2010) Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation. In: DySPAN. IEEE Publications, 1–9 [DOI: 10.1109/DYSPAN.2010.5457857].
[14] Garivier, A. & Cappé, O. The Kl-Ucb Algorithm for Bounded Stochastic Bandits and Beyond. arXiv Preprint ArXiv:1102.2490 (2011).
[15] Helman, P., Moret, B.M.E. & Shapiro, H.D. (1993) An exact characterization of greedy structures. SIAM Journal on Discrete Mathematics, 6, 274–283 [DOI: 10.1137/0406021].
[16] Kalyanakrishnan, S., Tewari, A., Auer, P. & Stone, P. (2012) Pac subset selection in stochastic multi-armed bandits. In: ICML.
[17] Kempe, D., Kleinberg, J., and ´ E Maximizing the spread of influence through a social network. In: SIGKDD (2003) [DOI: 10.1145/956750.956769].
[18] Korte, B. & Lovász, L. (1984) Greedoids and linear objective functions. SIAM Journal on Algebraic Discrete Methods, 5, 229–238 [DOI: 10.1137/0605024].
[19] Kruskal, J.B. (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7, 48–50 [DOI: 10.1090/S0002-9939-1956-0078686-7].
[20] Kveton, B., Wen, Z., Ashkan, A., Eydgahi, H. & Eriksson, B. Matroid Bandits: Fast Combinatorial Optimization with Learning. arXiv Preprint ArXiv:1403.5045 (2014).
[21] Kveton, B., Wen, Z., Ashkan, A. & Szepesvari, C. Tight Regret Bounds for Stochastic Combinatorial Semi-bandits. arXiv Preprint ArXiv:1410.0949 (2014).
[22] Lai, T.L. & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6, 4– 22 [DOI: 10.1016/0196-8858(85)90002-8].
[23] Lin, T., Abrahao, B., Kleinberg, R., Lui, J. & Chen, W. (2014) Combinatorial partial monitoring game with linear feedback and its applications. In: ICML.
[24] Mirzasoleiman, B., Badanidiyuru, A., Karbasi, A., Vondrak, J. & Krause, A. (2015) Lazier than lazy greedy. In:. Proceedings of the AAAI Conference on Artificial Intelligence. Proceedings of the Conference on Artificial Intelligence (AAAI), 29 [DOI: 10.1609/aaai.v29i1.9486].
[25] Prim, R.C. (1957) Shortest connection networks and some generalizations. Bell System Technical Journal, 36, 1389–1401 [DOI: 10.1002/j.1538-7305.1957.tb01515.x].
[26] Streeter, M. & Golovin, D. (2009) An online algorithm for maximizing submodular functions. In: Nips.

DLINE Journals portal