References

Next: About this document Up: No Title Previous: No Title

1: S. Singh and D. Bertsekas. Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Michael C. Mozer, Michael I. Jordan, and Thomas Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 974. The MIT Press, 1997.
2: R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In S. A. Solla, T. K. Leen, and K.-R Müller, editors, Advances in Neural Information Processing Systems, volume 12. The MIT Press, 2000.
3: G. Tesauro. Neurogammon wins computer olympiad. Neural Computation, 1(3):321-323, 1989.
4: J. N. Tsitsiklis V. R. Konda. Actor-critic algorithms. In S. A. Solla and T. K. Leenand K.-R Müller, editors, Advances in Neural Information Processing Systems, volume 12. The MIT Press, 2000.

Dirk Ormoneit
Tue Sep 5 16:37:23 PDT 2000