(Room: South America
, World Forum
, The Hague)
- 9:00-9:15 Welcome (chair: Jose)
- 9:15-10:15 Invited speaker 1: Katja Hofmann
- 10:15-10:30 ***Morning break***
- 10:30-12:30 Morning paper session (chair: Claes)
- 12:30-14:00 ***Lunch***
- 14:00-15:00 Invited speaker 2: Julian Togelius (chair: Claes)
- 15:00-16:00 Afternoon paper session (chair: Jordi/Ricardo)
- 16:00-16:15 ***Afternoon break***
- 16:15-17:00 Panel: "How to Benchmark Gradual and Guided Learning: Designing the Right Learning Tasks to Teach the Right Skills"
- 17:00-17:30 Demo: Martin Poliak: "GoodAI's School for AI and AI Roadmap Institute"
- 17:30-18:00 Open discussion: research challenges, continuation, future initiatives, journal issue(s)
- "Project Malmo: enabling AI experimentation in Minecraft"
Katja Hofmann, Microsoft Research, Cambridge, UK.
Abstract: As research and technological development in artificial intelligence progress, questions around AI evaluation become more pressing. How can we evaluate and compare a wide range of competing or complementary AI techniques? How can we ensure that experiments can be replicated - ensuring the long-term success of our research field? And how can we make sure that what we measure in some way reflects "intelligence"?
In this talk I present Project Malmo, which enables AI experimentation on top of the computer game Minecraft. I outline the capabilities of this platform, and how it can facilitate new approaches to AI evaluation.
- Games as test-beds for AI methods and the GVG competition
Julian Togelius, New York University, NY, USA.
Abstract: Artificial General Intelligence is not about performing a single task well, but performing a very large amount of tasks well - in particular tasks the agent has never seen before. Games are extremely well-suited to benchmark AI, because of their speed and accessibility and because they are in effect designed to test human cognitive capacities. This applies to video games as well as board games, with the difference that video games require more diverse skills to play well. I will compare three different general game playing frameworks: the General Game Playing Competition, the Arcade Learning Environment, and the General Video Game AI Competition (GVGAI). I will go into some depth about GVGAI and explain why I think it is the best existing benchmark for Artificial General Intelligence, how we plan to make it better and how you can use it in your research.
"How to Benchmark Gradual and Guided Learning: Designing the Right Learning Tasks to Teach the Right Skills""
Each panel member will have about 3-5 min. to say a few words about the general panel topic and then we will allow the audience to participate and ask questions to the panel.
Call for Papers
The aim of this workshop is to bring to bear on the expertise of a diverse set of researchers to progress in the evaluation of general purpose AI systems. Up to now, most AI systems are tested on specific tasks. However, to be considered truly intelligent, a system should exhibit enough flexibility to be able to learn how to perform a wide variety of tasks, some of which may not be known until after the system is deployed. This workshop will examine formalisations, methodologies and test benches for evaluating the numerous aspects of this type of general AI systems. More specifically, we are interested in theoretical or experimental research focused on the development of concepts, tools and clear metrics to characterise and measure the intelligence, and other cognitive abilities, of general AI agents.
We are interested in questions such as: Can the various tasks and benchmarks in AI provide a general basis for evaluation and comparison of a broad range of such systems?, Can there be a theory of tasks, or cognitive abilities, that enables a more direct comparison and characterisation of AI systems? How much does the specificity of an AI agent relate to how fast it can approach the optimal performance?
We welcome regular papers, demo papers about benchmarks or tools, and position papers, and encourage discussions over a broad list of topics (not exhaustive):
- Analysis and comparisons of AI benchmarks and competitions. Lessons learnt.
- Proposals for new general tasks, evaluation environments, workbenches and general AI development platforms.
- Theoretical or experimental accounts of the space of tasks, abilities and their dependencies.
- Evaluation of development in robotics and other autonomous agents, and cumulative learning in general learning systems.
- Tasks and methods for evaluating: transfer learning, cognitive growth, structural self-modification and self-programming.
- Evaluation of social, verbal and other general abilities in multi-agent systems, video games and artificial social ecosystems.
- Evaluation of autonomous systems: cognitive architectures and multi-agent systems versus general components: machine learning techniques, SAT solvers, planners, etc.
- Unified theories for evaluating intelligence and other cognitive abilities, independently of the kind of subject (humans, animals or machines): universal psychometrics.
- Analysis of reward aggregation and utility functions, environment properties (Markov, ergodic, etc.) in the characterisation of reinforcement learning tasks.
- Methods supporting automatic generation of tasks and problems with systematically introduced variations.
- Better understanding of the characterisation of task requirements and difficulty (energy, time, trials needed..), beyond algorithmic complexity.
- Evaluation of AI systems using generalised cognitive tests for humans. Computer models taking IQ tests. Psychometric AI.
- Application of (algorithmic) information theory, game theory, theoretical cognition and theoretical evolution for the definition of metrics of cognitive abilities.
- Adaptation of evaluation tools from comparative psychology and psychometrics to AI: item response theory, adaptive testing, hierarchical factor analysis.
- Characterisation and evaluation of artificial personalities.
- Evaluation methods for multi-resolutional perception in AI systems and agents.
- Workshop paper submissions:
(Note for authors of papers submitted to the main ECAI program:
- If the paper has been accepted as a short paper, the long paper can be considered for the workshop. If you let us know of your intention by June 12 by email, we may grant you a few extra days.
- If the paper has been rejected, it can be submitted to the workshop. If you let us know your intention of submitting the paper by June 12 by email, and attach your ECAI reviews, we may grant you a few extra days.)
June 1, 2016 Extended: June 12
- Workshop paper notifications: June 28, 2016 (early registration deadline July 5 for ECAI)
- Final submission of workshop program and materials: July 15, 2016
- Workshop date: August 30, 2016
The workshop will begin with a short presentation, followed by four sessions (two in the morning, two in the afternoon) of around 80 minutes, with breaks between them. Technical sessions will consist of a keynote speaker followed by short paper presentations, devoting an important share of time to discussion and interaction. The demo session will present real platforms and ways to evaluate AI systems for several tasks in these platforms. The discussion session will include a panel and a more open discussion about the research challenges around the workshop topics, continuation of the workshop, future initiatives, etc.
Submission of Papers
We welcome submissions describing work in progress as well as more
mature work related to AI evaluation.
Submitted papers must be formatted according to the camera-ready style for ECAI'16, and submitted electronically in PDF format through https://www.easychair.org/conferences/?conf=egpai2016.
Papers (technical, demos, position) are allowed a maximum eight (8) pages. An additional page containing the list of references is allowed.
Authorship is not anonymous (single-blind review). Papers will be reviewed by the program committee.
- FraMoTEC: Modular Task-Environment Construction Framework for Evaluating Adaptive Control Systems: Thorstur Thorarensen, Kristinn R. Thorisson, Jordi Bieger and Jona S. Sigurdardottir
- The Post-Modern Homunculus: Eric Neufeld and Sonje Finnestad
- PAGI World: A Physically Realistic, General-Purpose Simulation Environment for Developmental AI Systems: John Licato and Selmer Bringsjord
- A Dynamic Intelligence Test Framework for Evaluating AI Agents: Nader Chmait, Yuan-Fang Li, David Dowe and David Green
- Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay: Ionel-Alexandru Hosu and Traian Rebedea
- Historical account of computer models solving IQ test problems: Fernando Martínez-Plumed, José Hernández-Orallo, Ute Schmid, Michael Siebers and David L. Dowe
- Expert and Corpus-Based Evaluation of a 3-Space Model of Conceptual Blending: Donny Hurley, Yalemisew Abgaz, Hager Ali and Diarmuid O'Donoghue
- DisCSP-Netlogo- an open-source framework in NetLogo for implementation and evaluation of the distributed constraints: Ionel Muscalagiu, Horia Popa and José Vidal
- Evaluation of General-Purpose Artificial Intelligence: Why, What & How: Jordi Bieger, Kristinn R. Thorisson, Bas Steunebrink
Presentation and publication
Authors of accepted papers will be asked to prepare a presentation (short or long) during the workshop.
Pre-proceedings containing all accepted papers will be provided electronically on the workshop web page. The final workshop proceedings will be distributed electronically together with the ECAI conference proceedings. According to a late notice from the ECAI organisation, papers will not be included in the USB sticks that will be given at at the registration desk but will be published on the ECAI conference website instead.
After the workshop, a special journal issue is considered (journal to be discussed), to which, in case, contributing authors would be invited to submit a paper.
- Jordi Bieger, CADIA, Reykjavik University.
- Angelo Cangelosi, Plymouth University.
- David L. Dowe, Monash University.
- Devdatt Dubhashi, Chalmers.
- Helgi P. Helgason, Activity Stream.
- Sean B. Holden, Cambridge University.
- Jan Koutnik, IDSIA.
- Edward Keedwell, Exeter University.
- Frans A. Oliehoek, University of Amsterdam.
- Henri Prade, IRIT, Université Paul Sabatier.
- Ute Schmid, Bamberg University.
- Bas Steunebrink, IDSIA.
- Peter Sunehag, Google Deepmind.
- Joel Veness, Google Deepmind.
- Pei Wang, Temple University.
Competitions that aim at general-purpose AI systems:
Benchmarks and platforms:
Papers, tutorials and books: