Test suites

View docs

Test suites lie at the heart of psycholinguistic evaluation. The items in a test suite are given as input to a language model, and the resulting surprisal values are used to assess the model's performance. Typically, test suites are designed in a way that probes a particular grammatical phenomenon.

Browse the available test suites in the table below, or add a new test suite by creating one interactively or uploading one as a .json file.

Available test suites
Name Reference Models evaluated Average performance
Tags
Name Reference Models evaluated Average performance Tags
"Wilcox E. Levy R. & Futrell R. (2019). Hierarchical representation in neural language models: Suppression and recovery of expectations."
7 / 7
85.71% Center Embedding
"Wilcox E. Levy R. & Futrell R. (2019). Hierarchical representation in neural language models: Suppression and recovery of expectations."
7 / 7
72.45% Center Embedding
7 / 7
92.50% Long-Distance Dependencies
No published reference
7 / 7
66.43% Long-Distance Dependencies
"Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?" Wilcox et al. 2018
7 / 7
61.22% Long-Distance Dependencies
"Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?" Wilcox et al. 2018
7 / 7
50.34% Long-Distance Dependencies
"Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?"
7 / 7
55.36% Long-Distance Dependencies
"Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?"
7 / 7
77.98% Long-Distance Dependencies
"Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?"
7 / 7
54.17% Long-Distance Dependencies
"Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?"
7 / 7
69.05% Long-Distance Dependencies
"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 7
65.82% Garden-Path Effects
"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 7
67.35% Garden-Path Effects
7 / 7
46.99% Licensing
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
40.98% Licensing
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
36.47% Licensing
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
33.83% Licensing
"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 7
79.17% Garden-Path Effects
"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 7
66.67% Garden-Path Effects
"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 7
95.24% Garden-Path Effects
"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 7
91.07% Garden-Path Effects
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
39.10% Agreement
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
61.65% Agreement
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
62.41% Agreement
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
20.30% Licensing
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
36.84% Licensing
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
16.54% Licensing
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
49.62% Licensing
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
15.79% Licensing
"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
7 / 7
45.86% Licensing
"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 7
81.99% Gross Syntactic State
"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 7
78.88% Gross Syntactic State
"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 7
86.96% Gross Syntactic State
"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 7
75.78% Gross Syntactic State