The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks

Research output: Contribution to conferencePaperResearch

Standard

The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks. / Gonzalez, Ana Valeria; Søgaard, Anders.

2020. Paper presented at NeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies, ONLINE.

Research output: Contribution to conferencePaperResearch

Harvard

Gonzalez, AV & Søgaard, A 2020, 'The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks', Paper presented at NeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies, ONLINE, 11/12/2020.

APA

Gonzalez, A. V., & Søgaard, A. (2020). The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks. Paper presented at NeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies, ONLINE.

Vancouver

Gonzalez AV, Søgaard A. The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks. 2020. Paper presented at NeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies, ONLINE.

Author

Gonzalez, Ana Valeria ; Søgaard, Anders. / The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks. Paper presented at NeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies, ONLINE.13 p.

Bibtex

@conference{ed482133c6b946a6ba83eaddcde19fb7,
title = "The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks",
abstract = "The Turing Test evaluates a computer program{\textquoteright}s ability to mimic human behaviour. The Reverse Turing Test, reversely, evaluates a human{\textquoteright}s ability to mimic machine behaviour in a forward prediction task. We propose to use the Reverse Turing Test to evaluate the quality of interpretability methods. The Reverse Turing Test improves on previous experimental protocols for human evaluation of interpretability methods by a) including a training phase, and b) masking the task, which, combined, enables us to evaluate models independently of their quality, in a way that is unbiased by the participants' previous exposure to the task. We present a human evaluation of LIME across five NLP tasks in a Latin Square design and analyze the effect of masking the task in forward prediction experiments. Additionally, we demonstrate a fundamental limitation of LIME and show how this limitation is detrimental for human forward prediction in some NLP tasks.",
author = "Gonzalez, {Ana Valeria} and Anders S{\o}gaard",
year = "2020",
language = "English",
note = "NeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies ; Conference date: 11-12-2020",

}

RIS

TY - CONF

T1 - The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks

AU - Gonzalez, Ana Valeria

AU - Søgaard, Anders

PY - 2020

Y1 - 2020

N2 - The Turing Test evaluates a computer program’s ability to mimic human behaviour. The Reverse Turing Test, reversely, evaluates a human’s ability to mimic machine behaviour in a forward prediction task. We propose to use the Reverse Turing Test to evaluate the quality of interpretability methods. The Reverse Turing Test improves on previous experimental protocols for human evaluation of interpretability methods by a) including a training phase, and b) masking the task, which, combined, enables us to evaluate models independently of their quality, in a way that is unbiased by the participants' previous exposure to the task. We present a human evaluation of LIME across five NLP tasks in a Latin Square design and analyze the effect of masking the task in forward prediction experiments. Additionally, we demonstrate a fundamental limitation of LIME and show how this limitation is detrimental for human forward prediction in some NLP tasks.

AB - The Turing Test evaluates a computer program’s ability to mimic human behaviour. The Reverse Turing Test, reversely, evaluates a human’s ability to mimic machine behaviour in a forward prediction task. We propose to use the Reverse Turing Test to evaluate the quality of interpretability methods. The Reverse Turing Test improves on previous experimental protocols for human evaluation of interpretability methods by a) including a training phase, and b) masking the task, which, combined, enables us to evaluate models independently of their quality, in a way that is unbiased by the participants' previous exposure to the task. We present a human evaluation of LIME across five NLP tasks in a Latin Square design and analyze the effect of masking the task in forward prediction experiments. Additionally, we demonstrate a fundamental limitation of LIME and show how this limitation is detrimental for human forward prediction in some NLP tasks.

M3 - Paper

T2 - NeurIPS 2020 Workshop on Human And Model in the Loop Evaluation and Training Strategies

Y2 - 11 December 2020

ER -

ID: 258400558