[edm-announce] Call for Participation: 2010 KDD CUP

  • From: John Stamper <john@xxxxxxxxxxx>
  • To: edm-announce@xxxxxxxxxxxxx
  • Date: Wed, 7 Apr 2010 20:36:52 -0400

2010 KDD Cup Competition
Educational Data Mining Challenge
https://pslcdatashop.web.cmu.edu/KDDCup/

CALL FOR PARTICIPATION

The KDD Cup is the annual Data Mining and Knowledge Discovery
competition in which some of the best data mining teams in the
world compete to solve an important practical data mining
problem.

This year 15,000 USD in cash prizes and travel support will be
provided thanks to our sponsors: Facebook, Elsevier, and the
Pittsburgh Science of Learning Center.


THIS YEAR'S CHALLENGE

How generally or narrowly do students learn? How quickly or
slowly? Will the rate of improvement vary between students? What
does it mean for one problem to be similar to another? It might
depend on whether the knowledge required for one problem is the
same as the knowledge required for another. But is it possible to
infer the knowledge requirements of problems directly from
student performance data, without human analysis of the tasks?

This year's challenge asks you to predict student performance on
mathematical problems from logs of student interaction with
Intelligent Tutoring Systems. This task presents significant
technical challenges, has practical importance, and is
scientifically interesting.


TECHNICAL CHALLENGES

In terms of technical challenges, we mention just a few:
 - The data matrix is sparse: not all students are given every
   problem, and some problems have only 1 or 2 students who
   completed each item. So, the contestants need to exploit
   relationships among problems to bring to bear enough data to
   hope to learn.
 - There is a strong temporal dimension to the data: students
   improve over the course of the school year, students must
   master some skills before moving on to others, and incorrect
   responses to some items lead to incorrect assumptions in
   other items. So, contestants must pay attention to temporal
   relationships as well as conceptual relationships among
   items.
 - Which problems a given student sees is determined in part by
   student choices or past success history: e.g., students only
   see remedial problems if they are having trouble with the
   non-remedial problems. So, contestants need to pay attention
   to causal relationships in order to avoid selection bias.


SCIENTIFIC AND PRACTICAL IMPORTANCE

From a practical perspective, improved models could be saving
millions of hours of students' time (and effort) in learning
algebra. These models should both increase achievement levels and
reduce time needed. Focusing on just the latter, for the .5
million students that spend about 50 hours per year with
Cognitive Tutors for mathematics, let's say these optimizations
can reduce time to mastery by at least 10%. One experiment showed
the time reduction was about 15% (Cen et al. 2007). That's 5
hours per student, or 2.5 million student hours per year
saved. And this .5 million is less than 5% of all
algebra-studying students in the US. If we include all algebra
students (20x) and the grades 6-11 for which there are Carnegie
Learning and Assistment applications (5x), that brings our rough
estimate to 250 million student hours per year saved! In that
time, students can be moving on in math and science or doing
other things they enjoy.

From a scientific viewpoint, the ability to achieve low
prediction error on unseen data is evidence that the learner has
accurately discovered the underlying factors which make items
easier or harder for students. Knowing these factors is essential
for the design of high-quality curricula and lesson plans (both
for human instructors and for automated tutoring software). So
you, the contestants, have the potential to influence lesson
design, improving retention, increasing student engagement,
reducing wasted time, and increasing transfer to future lessons.
Currently K-12 education is extremely focused on assessment. The
No Child Left Behind act has put incredible pressure on schools
to "teach to the test", meaning that a significant amount of time
is spent preparing and taking standardized tests. Much of the
time spent drilling for and taking these tests is wasted from the
point of view of deep learning (long-term retention, transfer,
and desire for future learning); so any advances which allow us
to reduce the role of standardized tests hold the promise of
increasing deep learning.

To this end, a model which accurately predicts long-term future
performance as a byproduct of day-to-day tutoring could augment
or replace some of the current standardized tests: this idea is
called "assistment", from the goal of assessing performance while
simultaneously assisting learning. Previous work has suggested
that assistment is indeed possible: e.g., an appropriate analysis
of 8th-grade tutoring logs can predict 10th-grade standardized
test performance as well as 8th-grade standardized test results
can predict 10th-grade standardized test performance (Feng,
Heffernan, & Koedinger, 2009). But it is far from clear what the
best prediction methods are; so, the contestants' algorithms may
provide insights that allow important improvements in assistment.


IMPORTANT DATES
 - March 15 - Call for participation
 - April 1 – Web site opens for registration
 - April 15 – Competition begins (Updated!)
 - June 1 - Competition ends

For more information, please visit the official KDDCup 2010 Competition website:

https://pslcdatashop.web.cmu.edu/KDDCup/

Other related posts:

  • » [edm-announce] Call for Participation: 2010 KDD CUP - John Stamper