Skip to content

TESTAR iv4xr reinforcement learning

Fernando Pastor edited this page Jan 9, 2023 · 6 revisions

Reinforcement Learning framework

The Reinforcement Learning (RL) framework of TESTAR allows users to develop and configure smarter action selections mechanisms on top of the State Model by automatically calculate and assign reward values and learn from the previously executed transitions.

The module of the RL framework consists on:

  • PolicyFactory to select an action from a list of executable abstract actions. The users can implement their customized policies, but TESTAR offers a set of policies by default:
    GreedyPolicy: selects the action with the highest Q-value.
    EpsilonGreedyPolicy: applies the greedy policy with a chance of one minus Epsilon and selects a random action with chance Epsilon.
    EpsilonGreedyAndBoltzmannDistributedExplorationPolicy: applies the greedy policy with a chance of one minus Epsilon and applies the Boltzmann-distributed exploration policy with chance Epsilon.
    BoltzmannDistributedExplorationPolicy: selects an action based on a probability. The probability is a function of the Q-value and temperature. After an action is executed, the decay rate is adjusted.

  • RewardFunctionFactory to define the reward for performing an action that depends on the testing goal. The users can implement their customized reward functions, but TESTAR offers a set of policies by default:
    CounterBasedRewardFunction: the reward is 1 divided by the number of times the action has been executed before + 1. The more times an action is executed lower the reward.
    WidgetTreeBasedRewardFunction: the reward compares the actionable widgets of the previous and current states to give a higher reward when more widgets exist between states.
    ImageRecognitionBasedRewardFunction: the reward function compares screenshots before and after executing an action. It compares the previous and current state and gives a higher reward when more changes exist between states.

  • QFunctionFactory to calculate the Q-values of the Q-Function algorithms after performing an action. The users can implement their customized q-functions, but TESTAR offers a set of policies by default:
    QlearningFunction: Q(S, A) ← Q(S, A) + α[R + γ * max Q(S0,a) − Q(S, A)]
    SarsaQFunction: Q(S, A) ← Q(S, A) + α[R + γ * Q(S0, A0) − Q(S, A)]

TESTAR_RL_diagram

LabRecruits

The TESTAR tool contains an example of a customized protocol that uses the RL framework with the LabRecruits game labrecruits_testar_reinforcement_learning.
The test.settings file of the protocol allows users to enable and select between the existing modules to change the default random action selection mechanism to the RL calculation on top of the model.

#################################################################
# Reinforcement learning settings
#################################################################
StateModelReinforcementLearningEnabled = true
RewardFunction = CounterBasedRewardFunction
Policy = EpsilonGreedyPolicy
QFunction = SarsaQFunction
Alpha = 1.0
Gamma = 0.99
DefaultValue = 0.0
DefaultReward = 0.0
Epsilon = 0.7
MaxQValue = 1.0
DecayRate = 0.0001
Temperature = 1.0