More than 300 global teams participating in the ASC18 Student Supercomputer Challenge will challenge the machine reading comprehension in the coming months, a highly challenging artificial intelligence contest set by Microsoft. All ASC18 participant teams will independently develop the machine reading comprehension, asking and answering algorithm models with CNTK deep learning framework, and receive training with the latest super computation technology in combination with a dataset called MS MARCO, in an attempt to make the machine answer questions more accurately.
Enabling a machine to complete reading comprehension and question answering in connection with natural languages is one of the core difficulties of artificial intelligence (AI) and a core difficulty in the current intelligent voice interaction and man-machine dialogues. In general, people can easily summarize an article after completely reading it by giving its character, place, process, etc. The research on machine reading comprehension is to endow a computer with reading ability equal to a man, i.e., to make the computer read an article and then answer any questions related to the information therein. Such ability which is easy for a man but hard for a computer. For quite a long time, the research on natural language processing is based on sentence-level reading comprehension. For instance, the computer is provided with a sentence, and then made to comprehend the subject, object, verb, attribute, adverbial and complement, character, process, etc. therein. However, comprehension of long texts has long been a difficulty in the research, because it involves research content of a higher dimension such as consistency between sentences, context and inference.
At present, top-level AI experts and scholars of Microsoft, Carnegie Mellon University and Stanford University are working on this complicated task. It means that this current weak AI will take a big step towards strong AI if this goal is realized. As shown in the most recent ranking list of a text comprehension challenge called SQuAD (Stanford Question Answering Dataset) launched by Stanford University not long ago, the EM (Exact Match, meaning the complete match between the predicted and real answers) value of the R-NET model submitted by the Natural Language Computation Team, Microsoft Research Asia on January 3, 2018 was scored 82.650, being the highest point and first to exceed human’s 82.304.
Moreover, judging from the ASC18 contest question already issued, Microsoft (MS MARCO), a more difficult machine reading comprehension and question answering dataset, will be used in the contest. This dataset was created based on the real data collected from Bing and Cortana and consists of 100,000 questions, 1 million paragraphs and more than 200,000 file links. In the preliminary round of the ASC18 contest, Microsoft will provide some data of the dataset for use in the training model. In the final round, Microsoft will provide a brand-new test set to be challenged by the contestants. Meanwhile, to make the college students better set about answering and learn the contest question, Microsoft will also provide the CNTK-based datum codes and relevant theses as references.
The final judging criterion of the ASC18 AI contest question is based on the accuracy of machine reading identification of the training models of all teams, so the team members are required to proficiently master the algorithmic characteristics of the machine reading comprehension and question answering and Microsoft’s CNTK deep learning framework within two months. How to fully dig out and utilize the computing potential of different hardware becomes the key to winning the contest, since the dataset of the contest question is in a large scale. The ASC18 AI contest question requires the teams to artificially develop a machine reading comprehension algorithm model respectively, and speed up training and improve accuracy with the latest super computation technology, especially verifying the result of model training with a true question dataset. This is undoubtedly a “super challenge” to the college student contestants in their undergraduate years.
The ASC Student Supercomputer Challenge is the globally largest super computation contest for college students launched by China. Originated in 2012, it has developed for seven years, being more and more influential. Up to now, the ASC contest has attracted more than 5,500 young participants around the world, and the total number of participant teams is more than 1,100.