Main Catalog Mathematics and Mechanics Mathematical Simulation, Numerical Methods and Software Packages

Synthesis a Control System Based on a Reinforcement Learning Algorithm

Authors: Devyatkin D.D., Yurchenkov A.V.	Published: 15.03.2026
Published in issue: #1(124)/2026
DOI:
Category: Mathematics and Mechanics \| Chapter: Mathematical Simulation, Numerical Methods and Software Packages
Keywords: discrete-continuous control, Q-learning, reinforcement learning, linear systems

Abstract

This article is dedicated to developing a control strategy based on a reinforcement learning algorithm for a continuous system and comparing it with the classical method of discrete-continuous control. Discrete-continuous control extends classical methods by allowing the control signal to vary within the sampling interval, which improves accuracy; however, it requires knowledge of system parameters, limiting its applicability under uncertainty. As a more modern and adaptive approach, a data-driven method using the off-policy Q-learning algorithm is considered. This method doesn't require a priori model identification or precise knowledge of the system parameters, as it learns directly from measured data. It is shown that the sequence of gain coefficients converges, and each element of the sequence stabilizes the closed-loop system. The developed control algorithm exhibits robustness. Numerical simulations were carried out for a double integrator system, confirming the effectiveness of both methods. Additionally, an experiment was conducted to evaluate the impact of measurement noise on model performance. A comparative analysis of the two algorithms is presented. The practical implementation was done in Python using open-source libraries such as NumPy, SciPy, Matplotlib and Seaborn

Please cite this article in English as:

Devyatkin D.D., Yurchenkov A.V. Synthesis a control system based on a reinforcement learning algorithm. Herald of the Bauman Moscow State Technical University, Series Natural Sciences, 2026, no. 1 (124), pp. 32--50 (in Russ.). EDN: YKJZQV

References

[1] Johnson C.D., Abdel-Haleem M. Optimal discrete-continuous control for the linear-quadratic regulator problem. Proc. 28th Southeastern Symposium on System Theory, 1996, pp. 184--188. DOI: https://doi.org/10.1109/SSST.1996.493495

[2] Cheng S., Quilodran-Casas C., Ouala S., et al. Machine learning with data assimilation and uncertainty quantification for dynamical systems: a review. CAA J. Autom. Sin., 2023, vol. 10, iss. 6, pp. 1361--1387. DOI: https://doi.org/10.1109/JAS.2023.123537

[3] Mylnikov L.A., Gergel N.A., Kychkin A.V., et al. Dynamic prediction model in control systems of technic processes with inertia. Vestnik PNIPU. Elektrotekhnika, informatsionnye tekhnologii, sistemy upravleniya [PNRPU Bulletin. Electrical Engineering, Information Technology, Control Systems], 2018, no. 26, pp. 77--91 (in Russ.). EDN: XUEDGP

[4] Van Waarde H.J., Eising J., Trentelman H.L., et al. Data informativity: a new perspective on data-driven analysis and control. IEEE Trans. Autom. Control, 2020, vol. 65, iss. 11, pp. 4753--4768. DOI: https://doi.org/10.1109/TAC.2020.2966717

[5] Berberich J., Romer A., Scherer C., et al. Robust data-driven state-feedback design. Proc. ACC, 2020, pp. 1532--1538. DOI: https://doi.org/10.23919/ACC45564.2020.9147320

[6] Dorfler F., Coulson J., Markovsky I. Bridging direct and indirect data-driven control formulations via regularizations and relaxations. IEEE Trans. Autom. Control, 2023, vol. 68, iss. 2, pp. 883--897. DOI: https://doi.org/10.1109/TAC.2022.3148374

[7] Yang Y., Guo Z., Xiong H., et al. Data-driven robust control of discrete-time uncertain linear systems via off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst., 2019, vol. 30, iss. 12, pp. 3735--3747. DOI: https://doi.org/10.1109/TNNLS.2019.2897814

[8] Johnson C.D. A new discrete-time state model for linear dynamical systems with continuously-varying control/disturbance inputs. Proc. of the 1994 Southeastern Symposium on System Theory (SSST), 1994, pp. 523--527. DOI: https://doi.org/10.1109/SSST.1994.287821

[9] Ogata K. Discrete-time control systems. Prentice-Hall, 1987.

[10] Kuo B.C. Digital control systems. Oxford Univ. Press, 1992.

[11] Dorato P., Levis A.H. Optimal linear regulators: the discrete-time case. EEE Trans. Autom. Control, 1971, vol. 16, iss. 6, pp. 613--620. DOI: https://doi.org/10.1109/TAC.1971.1099832

[12] Sutton R.S., Barto A.G. Reinforcement learning: an introduction. MIT Press, 2018.

[13] Watkins J., Dayan P. Q-learning. Mach. Learn., 1992, vol. 8, no. 3, pp. 279--292. DOI: https://doi.org/10.1007/BF00992698

[14] Willems J.C., Rapisarda P., Markovsky I., et al. A note on persistency of excitation. Syst. Control Lett., 2005, vol. 54, iss. 4, pp. 325--329. DOI: https://doi.org/10.1016/j.sysconle.2004.09.003

[15] Bradtke S.J., Ydstie B.E., Barto A.G. Adaptive linear quadratic control using policy iteration. Proc. ACC, 1994, vol. 3, pp. 3475--3479. DOI: https://doi.org/10.1109/ACC.1994.735224

[16] Lopez V.G., Alsalti M., Muller M.A. Efficient off-policy Q-learning for data-based discrete-time LQR problems. IEEE Trans. Autom. Control, 2023, vol. 68, iss. 5, pp. 2922--2933. DOI: https://doi.org/10.1109/TAC.2023.3235967

[17] Lubanovic B. Introducing Python. O’Reilly Media, 2019.

[18] Barry P. Head first Python. O’Reilly Media, 2016.

[19] Idris I. NumPy beginner’s guide. Packt Publ., 2015.

[20] Nunez-Iglesias J., van der Walt S., Dashnow H. Elegant SciPy. O’Reilly Media, 2017.

[21] Abdrakhmanov M.I. Python. Vizualizatsiya dannykh [Python. Data visualization]. Devpractice.ru Publ., 2020.