# Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?

## 做法

1、施瓦辛格正在哪个电影里扮演了纽约警探？

2、罗斯在那一年为电影《End of Days》做了宣传？

• 首先，我们通过预测断点将每个源问题分解成若干子串
• 其次，进行post processed，生成两个子问题。使用一些启发式方法从段落中提取子问题的答案。
• 最后，将生成的候选评价实例发送给人工验证。

## 实验

$f1$ 值大于0.8，直接认为符合要求， 或者，$f1$ 大于0.6 ，且标准答案文本跨度包含了预测文本的答案或预测答案包含了标准答案。

Baseline 选用开源的CogQA、DFGN、DecompRC

CogQA PM下的故障率: $(6.1+16.5+3.4)/(40.9+6.1+16.5+3.4) \times100\% = 38.86\%$

After analyzing the model failure cases, we ob- serve a common phenomenon that there is a high similarity between the words in the second sub- question and the words near the answer in the con- text. The model has learned to answer multi-hop question by local pattern matching, instead of going through the multiple reasoning steps. For the ex- ample presented in Figure 1, the model may locate the answer “1999” for the multi-hop question by matching the surrounding words “ Guns N Roses” in the second sub-question. Despite answering the multi-hop question correctly, the model fails to identify the answer of the first sub-question which it is expected to retrieve as a multi-hop QA system.