Share the joy
Q(s, a), the expected return from starting state s, by taking action a at time t.
r(s, a), reward at state s, by taking action a
maxQ(s’, a’), maximized expected return for next state-action(s’,a’). Need to find the a’, which maximizes it.