In this experiment we measure the influence of the size of the training set on POS tagger performance. 7 different POS taggers are trained on first 1000, 2000, 3000 and all sentences of the NLTK WSJ sample. Their performance is measured on GUM corpus.

