State-Action Paradigm - Bernhard Pfann, CFA

In Reinforcement Learning (”RL”), we deal with the following terminology: - *States* $s \in S$ where an agent can be in. If all states are observed we also know $S$, the set of all possible states. ^045824 - *Actions* $a \in A$ that the agent can take to get from $s \mapsto s^\prime$. ^d84ec0 - *Transitions* $T(s,a,s^\prime)$ are probabilities to go from $s \mapsto s^\prime$ by taking action $a$. Can be expressed as conditional probability. ^c23f52 $ T(s,a,s^\prime)= \mathbf P(s^\prime | s,a) \qquad \sum_{s \in S; \, a \in A} T(s,a,s^\prime)=1 $ - *Rewards* $R(s_t)$ are based on the state, and the agent receives it for the desired behavior (reward could also be based on a state-action combination). ^01e643