Several Q and Q(lambda) parameters include:
Lambda (ë) represents the eligibility decay rate. The greater it is,
the longer the sequence of values of state-action pairs updated.
Alpha (á) is the learning rate - how much the new state-action value
tends towards the new reward and value of the next state-action pair.
The greater alpha, the more
the state-action value tends towards new information. High values of
alpha makes learning faster, but ending up receiving slightly lower
rewards.
The discount rate parameter (gamma) describes how foreseeing the
agent
is. Small values of gamma (e.g., close to zero) make the agent giving
immediate events higher significance. Gamma describes how foreseeing
the agent is. If gamma = 0 only the now matters. If gamma = 1 the
agent
takes a timeless view at rewards. If gamma = 0.5 a reward now has the
double Value of the same reward in the next time step. For practical
purposes we want gamma close to but not quite 1.
A reasonable decision for choosing RL parameters might be:
1) to choose high values of alpha at the first iterations and
decrease
it over time (from 0.95 at the beginning to 0.05 at the end).
2) to pick a high (let's say 0.99) for gamma - to have a far view
over
future rewards.
3) to pick a moderate value of lambda (like 0.5) - first, to
accelerate
learning performance and second, not to harm the one's computer
memory
limitation.
Please give me your opinion and let's discuss it.
Thanks,
Uri.
>> Stay informed about: Reinforcement Learning - Picking the Right Parameter Values