The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks

Research output: Contribution to conference › Paper › Research

Documents

The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks
Submitted manuscript, 711 KB, PDF document

The Turing Test evaluates a computer program’s ability to mimic human behaviour. The Reverse Turing Test, reversely, evaluates a human’s ability to mimic machine behaviour in a forward prediction task. We propose to use the Reverse Turing Test to evaluate the quality of interpretability methods. The Reverse Turing Test improves on previous experimental protocols for human evaluation of interpretability methods by a) including a training phase, and b) masking the task, which, combined, enables us to evaluate models independently of their quality, in a way that is unbiased by the participants' previous exposure to the task. We present a human evaluation of LIME across five NLP tasks in a Latin Square design and analyze the effect of masking the task in forward prediction experiments. Additionally, we demonstrate a fundamental limitation of LIME and show how this limitation is detrimental for human forward prediction in some NLP tasks.

Original language	English
Publication date	2020
Number of pages	13
Publication status	Published - 2020
Event	NeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies - ONLINE Duration: 11 Dec 2020 → …

Conference

Conference	NeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies
City	ONLINE
Period	11/12/2020 → …

Number of downloads are based on statistics from Google Scholar and www.ku.dk

No data available

ID: 258400558

Department of Computer Science