In general, for Q-Learning to converge to the optimal Q-values...
A.
It is necessary that every state-action pair is visited infinitely often.
B.
It is necessary that the learning rate α (weight given to new samples) is decreased to 0 over time.
C.
It is necessary that the discount γ is less than 0.5.
D.
It is necessary that actions get chosen according to .