2010 KDD Cup Competition Educational Data Mining Challenge https://pslcdatashop.web.cmu.edu/KDDCup/ CALL FOR PARTICIPATION The KDD Cup is the annual Data Mining and Knowledge Discovery competition in which some of the best data mining teams in the world compete to solve an important practical data mining problem. This year 15,000 USD in cash prizes and travel support will be provided thanks to our sponsors: Facebook, Elsevier, and the Pittsburgh Science of Learning Center. THIS YEAR'S CHALLENGE How generally or narrowly do students learn? How quickly or slowly? Will the rate of improvement vary between students? What does it mean for one problem to be similar to another? It might depend on whether the knowledge required for one problem is the same as the knowledge required for another. But is it possible to infer the knowledge requirements of problems directly from student performance data, without human analysis of the tasks? This year's challenge asks you to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems. This task presents significant technical challenges, has practical importance, and is scientifically interesting. TECHNICAL CHALLENGES In terms of technical challenges, we mention just a few: - The data matrix is sparse: not all students are given every problem, and some problems have only 1 or 2 students who completed each item. So, the contestants need to exploit relationships among problems to bring to bear enough data to hope to learn. - There is a strong temporal dimension to the data: students improve over the course of the school year, students must master some skills before moving on to others, and incorrect responses to some items lead to incorrect assumptions in other items. So, contestants must pay attention to temporal relationships as well as conceptual relationships among items. - Which problems a given student sees is determined in part by student choices or past success history: e.g., students only see remedial problems if they are having trouble with the non-remedial problems. So, contestants need to pay attention to causal relationships in order to avoid selection bias. SCIENTIFIC AND PRACTICAL IMPORTANCE From a practical perspective, improved models could be saving millions of hours of students' time (and effort) in learning algebra. These models should both increase achievement levels and reduce time needed. Focusing on just the latter, for the .5 million students that spend about 50 hours per year with Cognitive Tutors for mathematics, let's say these optimizations can reduce time to mastery by at least 10%. One experiment showed the time reduction was about 15% (Cen et al. 2007). That's 5 hours per student, or 2.5 million student hours per year saved. And this .5 million is less than 5% of all algebra-studying students in the US. If we include all algebra students (20x) and the grades 6-11 for which there are Carnegie Learning and Assistment applications (5x), that brings our rough estimate to 250 million student hours per year saved! In that time, students can be moving on in math and science or doing other things they enjoy. From a scientific viewpoint, the ability to achieve low prediction error on unseen data is evidence that the learner has accurately discovered the underlying factors which make items easier or harder for students. Knowing these factors is essential for the design of high-quality curricula and lesson plans (both for human instructors and for automated tutoring software). So you, the contestants, have the potential to influence lesson design, improving retention, increasing student engagement, reducing wasted time, and increasing transfer to future lessons. Currently K-12 education is extremely focused on assessment. The No Child Left Behind act has put incredible pressure on schools to "teach to the test", meaning that a significant amount of time is spent preparing and taking standardized tests. Much of the time spent drilling for and taking these tests is wasted from the point of view of deep learning (long-term retention, transfer, and desire for future learning); so any advances which allow us to reduce the role of standardized tests hold the promise of increasing deep learning. To this end, a model which accurately predicts long-term future performance as a byproduct of day-to-day tutoring could augment or replace some of the current standardized tests: this idea is called "assistment", from the goal of assessing performance while simultaneously assisting learning. Previous work has suggested that assistment is indeed possible: e.g., an appropriate analysis of 8th-grade tutoring logs can predict 10th-grade standardized test performance as well as 8th-grade standardized test results can predict 10th-grade standardized test performance (Feng, Heffernan, & Koedinger, 2009). But it is far from clear what the best prediction methods are; so, the contestants' algorithms may provide insights that allow important improvements in assistment. IMPORTANT DATES - March 15 - Call for participation - April 1 – Web site opens for registration - April 15 – Competition begins (Updated!) - June 1 - Competition ends For more information, please visit the official KDDCup 2010 Competition website: https://pslcdatashop.web.cmu.edu/KDDCup/