The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks

Research output: Contribution to conferencePaperResearch

The Turing Test evaluates a computer program’s ability to mimic human behaviour. The Reverse Turing Test, reversely, evaluates a human’s ability to mimic machine behaviour in a forward prediction task. We propose to use the Reverse Turing Test to evaluate the quality of interpretability methods. The Reverse Turing Test improves on previous experimental protocols for human evaluation of interpretability methods by a) including a training phase, and b) masking the task, which, combined, enables us to evaluate models independently of their quality, in a way that is unbiased by the participants' previous exposure to the task. We present a human evaluation of LIME across five NLP tasks in a Latin Square design and analyze the effect of masking the task in forward prediction experiments. Additionally, we demonstrate a fundamental limitation of LIME and show how this limitation is detrimental for human forward prediction in some NLP tasks.
Original languageEnglish
Publication date2020
Number of pages13
Publication statusPublished - 2020
EventNeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies - ONLINE
Duration: 11 Dec 2020 → …

Conference

ConferenceNeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies
CityONLINE
Period11/12/2020 → …

Number of downloads are based on statistics from Google Scholar and www.ku.dk


No data available

ID: 258400558