Facebookが先月発表したDrQAというニューラルネットワークについて紹介したいと思います。

 DrQAは一般的な質問に答えることを目的とした人工知能です。内部にRNNを持っていて、Wikipediaに書かれている全ての知識をまず学習し、次にさまざまなクイズ問題を学習します。誤解を恐れずに言えば、クイズ番組で優勝したときのIBMのワトソンがやったのと同じような処理を行うことを目指しています。

 ところでクイズが得意な人と勉強が得意な人は違います。
 

 実はクイズ番組で用いられる問題は、大部分が過去問題の使い回しが多いのです。
 これは実はクイズを作るのが極めて難しいからです。

 クイズというのは、単に知識の量や深さを問うものではありません。クイズの問題というのは、基本的に答えを聞くまではわからず、聞くと「ああ、そうか」と万人が感じることが大事なのです。

 たとえばごく普通の人を対象にこんなクイズを出すとします。

 「ダグラス・エンゲルバートが設立した研究機関の名前は?」

 ごく普通の人はマウスの発明者であるダグラス・エンゲルバートが何者なのかまず知りません。ましてや研究機関のことなど知りもしないでしょう。この答えは「Augmentation Research Center」ですが、問題文を聞いても答えを聞いてもサッパリです。

 良いクイズとは、ほどほどの深さとほどほどの知名度の両方を満たさなければなりません。たとえばこんな感じです。

 「"泣かぬなら、泣くまで待とうホトトギス"で知られる徳川家康ですが、織田信長は泣かないホトトギスをどうする?」

 早押しクイズの問題というのは最後まで聞けば必ず答えがわかるようになっています。答えは「殺す」ですが、最後まで聞かないと答えがわからないのはいいクイズ問題とは言えません。早押しがテレビのクイズとして定着したのは何よりスピーディさと、「そのヒントだけでわかるの?」という驚きがあるからです。

 今回、Facebook AI Researchが公開したDrQAは、それに比べると早押しでないぶんかなり有利です。クイズの問題というのは調べれば必ずわかるわけです。そうでなければクイズになりません。

 今回公開された事前訓練モデルを筆者も手元で試してみました。
 たとえばこんな質問にこたえることが出来ます。

#オスマン帝国には17世紀にいくつの州が含まれていたか?
>>> process("How many provinces did the Ottoman empire contain in the 17th century?")
07/28/2017 08:39:40 AM: [ Processing 1 queries... ]
07/28/2017 08:39:40 AM: [ Retrieving top 5 docs... ]
07/28/2017 08:39:41 AM: [ Reading 769 paragraphs... ]
07/28/2017 08:39:42 AM: [ Processed 1 queries in 1.8194 (s) ]
Top Predictions:
+------+--------+----------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+--------+----------------+--------------+-----------+
| 1 | 32 | Ottoman Empire | 1.1848e+07 | 392.97 |
+------+--------+----------------+--------------+-----------+

Contexts:
[ Doc = Ottoman Empire ]
During the 16th and 17th centuries, at the height of its power under the reign of Suleiman the Magnificent, the Ottoman Empire was a multinational, multilingual empire controlling much of Southeast Europe, Western Asia, the Caucasus, North Africa, and the Horn of Africa. At the beginning of the 17th century the empire contained 32 provinces and numerous vassal states. Some of these were later absorbed into the Ottoman Empire, while others were granted various types of autonomy during the course of centuries.

 他にも聞いてみましょう。

#映画マトリックスの脚本を書いたのは誰か?
>>> process("Who wrote the film the Matrix")
07/28/2017 08:41:39 AM: [ Processing 1 queries... ]
07/28/2017 08:41:39 AM: [ Retrieving top 5 docs... ]
07/28/2017 08:41:39 AM: [ Reading 458 paragraphs... ]
07/28/2017 08:41:40 AM: [ Processed 1 queries in 1.1882 (s) ]
Top Predictions:
+------+----------------+------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+----------------+------------+--------------+-----------+
| 1 | The Wachowskis | The Matrix | 8.6811e+07 | 140.84 |
+------+----------------+------------+--------------+-----------+

Contexts:
[ Doc = The Matrix ]
The Matrix is a 1999 science fiction film written and directed by The Wachowskis, starring Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano. It depicts a dystopian future in which reality as perceived by most humans is actually a simulated reality called "the Matrix", created by sentient machines to subdue the human population, while their bodies' heat and electrical activity are used as an energy source. Computer programmer "Neo" learns this truth and is drawn into a rebellion against the machines, which involves other people who have been freed from the "dream world".

 ちゃんとウォシャウスキー兄弟だとわかっているようです。
 ついでにもうひとつ。

#映画ブレードランナーを書いたのは誰か?
>>> process("Who wrote the film the Blade runner")
07/28/2017 08:44:09 AM: [ Processing 1 queries... ]
07/28/2017 08:44:09 AM: [ Retrieving top 5 docs... ]
07/28/2017 08:44:09 AM: [ Reading 203 paragraphs... ]
07/28/2017 08:44:09 AM: [ Processed 1 queries in 0.5196 (s) ]
Top Predictions:
+------+----------------+--------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+----------------+--------------+--------------+-----------+
| 1 | Philip K. Dick | Ridley Scott | 2.1647e+09 | 293.14 |
+------+----------------+--------------+--------------+-----------+

Contexts:
[ Doc = Ridley Scott ]
After a year working on the film adaptation of "Dune", and following the sudden death of his brother Frank, Scott signed to direct the film version of Philip K. Dick's novel "Do Androids Dream of Electric Sheep?". Starring Harrison Ford, "Blade Runner" was a commercial disappointment in cinemas in 1982, but is now regarded as a classic. In 1991 Scott's notes were used by Warner Brothers to create a rushed director's cut which removed the main character's voiceover and made a number of other small changes, including to the ending. Later Scott personally supervised a digital restoration of "Blade Runner" and approved what was called "The Final Cut". This version was released in Los Angeles, New York City and Toronto cinemas on 5 October 2007, and as an elaborate DVD release in December 2007. Today, "Blade Runner" is ranked by many critics as one of the most important and influential science fiction films ever made, partly thanks to its much imitated portraits of a future cityscape. It is often discussed along with William Gibson's novel "Neuromancer" as initiating the cyberpunk genre. Scott has described "Blade Runner" as his "most complete and personal film".

 フィリップ・K・ディックが書いたことになっています。間違いではありませんが。ディックは原作小説の作者であって劇場用の台本を書いたのはハンプトン・フィッチャーとデヴィッド・ピープルズです。でも「wrote」は意味が曖昧ですから、もう少し細かく、「誰がシナリオを書いたか」を聞いてみることにしましょう。

#映画ブレードランナーのシナリオを書いたのは誰か?
>>> process("Who wrote scinario of the film the Blade runner")
07/28/2017 08:46:40 AM: [ Processing 1 queries... ]
07/28/2017 08:46:40 AM: [ Retrieving top 5 docs... ]
07/28/2017 08:46:41 AM: [ Reading 203 paragraphs... ]
07/28/2017 08:46:41 AM: [ Processed 1 queries in 0.5803 (s) ]
Top Predictions:
+------+--------------+---------------------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+--------------+---------------------------+--------------+-----------+
| 1 | Ridley Scott | Blade Runner (soundtrack) | 2.4642e+13 | 286.8 |
+------+--------------+---------------------------+--------------+-----------+

監督のリドリー・スコットになってしまいました。まあ監督がシナリオを書いたと言えなくはないので、質問をもう少し明確にしましょう。

#映画ブレードランナーの脚本家は誰か?
>>> process("Who are screenplay of the film the Blade runner")
07/28/2017 08:49:44 AM: [ Processing 1 queries... ]
07/28/2017 08:49:44 AM: [ Retrieving top 5 docs... ]
07/28/2017 08:49:45 AM: [ Reading 203 paragraphs... ]
07/28/2017 08:49:45 AM: [ Processed 1 queries in 0.5648 (s) ]
Top Predictions:
+------+-----------------------------------+--------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+-----------------------------------+--------------+--------------+-----------+
| 1 | Hampton Fancher and David Peoples | Blade Runner | 2.2611e+05 | 479.01 |
+------+-----------------------------------+--------------+--------------+-----------+

これでようやく、ハンプトン・フィッチャーとデヴィッド・ピープルズが出てきました。

簡単に触ってみてわかることは、「確かに何らかの質問には答えてくれるが、こちらの質問が正しいかどうかはまでは答えてくれない」というのが現状のDrQAの訓練済みモデルのようです。それでもこの程度の質問に答えてくれるだけでもかなり驚異的ではあるのですが。

ちなみにこのDrQAの内部構造も拍子抜けするほど単純です。ソースコードを御覧ください(https://github.com/facebookresearch/DrQA/blob/master/drqa/reader/rnn_reader.py)。

もはや個別のプログラミングテクニックや理論構築能力そのものはAI時代の競争力の前では急激に無意味化しつつあります。
これから重要なのは、データを持っていることではなくて「持っている(公開されている)データと、他のデータをいかに組み合わせるか」です。

DrQAの場合は、Wikipedia全文と英語のクイズ問題を組み合わせて学習しました。
同じことがおそらく日本語でもできるでしょう。そうすると東ロボくんが苦手としていた国語や社会の問題は易易とける可能性があります。

実際、社会科の問題も解けます。

>>> process('Who is the President of the USA in 1996?')
07/28/2017 08:57:24 AM: [ Processing 1 queries... ]
07/28/2017 08:57:24 AM: [ Retrieving top 5 docs... ]
07/28/2017 08:57:25 AM: [ Reading 557 paragraphs... ]
07/28/2017 08:57:25 AM: [ Processed 1 queries in 1.3134 (s) ]
Top Predictions:
+------+----------------+----------------------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+----------------+----------------------------+--------------+-----------+
| 1 | George W. Bush | Foreign relations of India | 1.0137e+07 | 61.3 |
+------+----------------+----------------------------+--------------+-----------+

Contexts:
[ Doc = Foreign relations of India ]
However, India has not signed the CTBT, or the Nuclear Non-Proliferation Treaty, claiming the discriminatory nature of the treaty that allows the five declared nuclear countries of the world to keep their nuclear arsenal and develop it using computer simulation testing. Prior to its nuclear testing, India had pressed for a comprehensive destruction of nuclear weapons by all countries of the world in a time-bound frame. This was not favoured by the United States and by certain other countries. Presently, India has declared its policy of "no-first use of nuclear weapons" and the maintenance of a "credible nuclear deterrence". The USA, under President George W. Bush has also lifted most of its sanctions on India and has resumed military co-operation. Relations with USA have considerably improved in the recent years, with the two countries taking part in joint naval exercises off the coast of India and joint air exercises both in India as well as in the United States.

>>> process('Who is the President of the USA in 2015?')
07/28/2017 08:57:33 AM: [ Processing 1 queries... ]
07/28/2017 08:57:33 AM: [ Retrieving top 5 docs... ]
07/28/2017 08:57:34 AM: [ Reading 438 paragraphs... ]
07/28/2017 08:57:34 AM: [ Processed 1 queries in 1.1038 (s) ]
Top Predictions:
+------+--------------+-----------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+--------------+-----------------+--------------+-----------+
| 1 | Barack Obama | USA Freedom Act | 2.8428e+05 | 65.138 |
+------+--------------+-----------------+--------------+-----------+

Contexts:
[ Doc = USA Freedom Act ]
It was signed into law on the same day by US President Barack Obama stated that, "After a needless delay and inexcusable lapse in important national security authorities, my administration will work expeditiously to ensure our national security professionals again have the full set of vital tools they need to continue protecting the country," .

 今のAIはここまで来ているのです。
 面白いですね