Active Learning in Machine Learning

Active Learning

Active learning is a special case of semi-supervised machine learning in which a learning algorithm is able to interactively query the user to obtain the desired outputs at new data points. It is sometimes also called experimental design.

There are situations in which unlabeled data is abundant but naturally labelling is expensive. In such a scenario, learning algorithms can actively query the user for labels. This type of iterative supervised learning is called active learning.

Query Strategies

Algorithms for determining which data points should be labeled can be organized into a number of different categories:

Uncertainty Sampling

Label those points for which the current model is least certain as to what the correct output should be –

1. Query by committee: It is a variety of models are trained on the current labeled data and vote on the output for unlabeled data. Label those points for which the ‘committee’ disagrees the most.

2. Expected model change: Label those points that would change the current model.

3. Expected error reduction: Label those points that would most reduce the model’s generalization error Variance reduction. The label those points that would minimize output variance, which is one of the components of error.

4. Balance exploration and exploitation: It is the choices of examples to label is seen as a dilemma between the exploration and the exploitation over the data space representation. This strategy manages this compromise by modeling the active learning problem as a contextual bandit problem.