A history is a sequence of \((X^t, Y^t)\) pairs. A policy \(\pi\) is a map from histories to \(\mathcal{X}\). Given history \(H\), let \(\mathcal{F}(H)\) be the set of \(Z\) that are consistent with \(H\). Define \(T(\pi,Z)\) to be the number of tests run under policy \(\pi\) when the state is \(Z\).
Extensions
- Feedback -- for example, could be \(X^t \cdot Z\) itself.
- Decision space -- for example, could choose weights to apply.
- Error tolerance -- cost of incorrect label.
- Noise -- could incorporate false positives or negatives.