Reinforcement Learning Seminar Ⅰ
Info
Text editor: $\text{Textmacs}$
Email: [yli340@uic.edu](mailto: yli340@uic.edu)
From Physics to Control Theory
State of Newton’s System(deterministic):
$$s(t) = \begin{bmatrix}x(t)\\dot x(t)\end{bmatrix}.$$
State-space form of Newton’s 2nd Law:
$$\begin{align*}\dot s(t) &=\begin{bmatrix}\dot x(t)\\ddot x(t)\end{bmatrix}\&= \begin{bmatrix}0\quad1\0\quad0\end{bmatrix} \begin{bmatrix}x(t)\\dot x(t)\end{bmatrix} + \begin{bmatrix}0\quad0\0\quad1\end{bmatrix}\begin{bmatrix}0\u\end{bmatrix}\&=A s(t) + B\begin{bmatrix}0\u\end{bmatrix}.\end{align*}$$
What’s a linear transformation?
$$\mathscr A(x) = \begin{bmatrix}\mathscr Ae_1\quad \mathscr Ae_2\end{bmatrix}x.$$
Group, Flow, Lie Group
General form of state transform:
$$\dot s(t) = f(s(t), u(t)).$$
From Deterministic to Stochastic
Discrete System Only
Without Controller:
$\dot s(t)=f(s(t)) \rightarrow s_{t+1} = f(s_t)$
Stochastic dynamic system:
$s_{t+1} = f(s_t) \rightarrow s_{t+1} \sim p(s_t)$
Markov System (memoryless):
$s_{t+1} \sim p(s_0,\cdots,s_t)=p(s_t)$
With Controller:
$x_{t+1}=f(x_t,u_t)\rightarrow s_{t+1}\sim p(s_t,a_t)$
Introduction to RL
Interact with environment
Logic -> Supervised -> RL
Safe RL
Sim2Real
Next Seminar: MDP
Maximize reward function
$\pi: S\rightarrow\Delta(A)$
$r: S\times A\rightarrow[0,1]$