CbExploreAdfGreedy#

With an epsilon greedy exploration policy the best action will be chosen with probability \(1 - epsilon\), and then \(epsilon\) will be equality distributed to all actions.

In practice, this means that even if the optimal action is presented every time, it will only be selected with probability \((1 - epsilon) + (epsilon/numActions)\).

So, if \(epsilon\) is \(0.2\) there are 5 actions being presented and the optimal action is the 0th one, the resulting probability distribution would be: \([0.84, 0.04, 0.04, 0.04, 0.04]\)

Configuration#

Typename: CbExploreAdfGreedy

Property

Type

Default value

cbAdf

reduction

{"typename": "CbAdf"} - CbAdf

epsilon

number

0.05

Types#