[edm-discuss] EDM workshop proposal @ICALT07

  • From: "Silvia Viola" <sr.viola@xxxxxxxxx>
  • To: edm-announce@xxxxxxxxxxxxx, edm-discuss@xxxxxxxxxxxxx
  • Date: Tue, 19 Dec 2006 05:36:22 +0100

*** Apologies for cross postings****

Dear Colleagues,

together with Joe Beck (and a strong support by Ryan Baker!) we are trying
organizing a third workshop on Educational Data Mining at ICALT 07 (
This workshop would be devoted to investigate the different points of views
of various approaches within Educational Data Mining, and in particular
machine learning/data mining and psychometrics/statistics approaches.
If accepted, the workshop will be held in Niiagata, Japan on July 18-20
The deadline for submitting the proposal is dec 20th, 2006. (We apologize
for sending it so next to the deadline.)
We would like to invite everyone of you who could be interested to join the
PC of the workshop. In this case, we kindly ask you to send a confirmation
of your availability as soon as possible to sr.viola@xxxxxxxxx and
The workshop proposal follows inside this email.

Moreover, we would strongly appreciate your comments and feedback on the

We would eventually appreciate your recommendation of people (such as your
students, co-workers, etc.) to be invited in the PC.

Thanks and regards,
Silvia Viola

*Methodologies for Educational Data Mining *

*Motivations and backgrounds*

The recent increase in dissemination of interactive learning environments
has allowed the collection of huge amounts of data. These data are quite
heterogeneous, including web log files, interaction logs, text and dialogue
data, time series data, social network data, and human judgment and
observation data, and have variety of different scales, grain-sizes, and
spatial and temporal resolution.  Often, it is also appropriate to analyze
these types of data in combination with more traditional sorts of
educational data, such as data from standardized tests and questionnaires.

Though the many types of educational data often differ considerably from one
another, they provide multiple types of insight on a single domain or
context and, above all, share the potential to reveal unexpected and useful
knowledge concerning learners and/or the process of learning--if correctly
and coherently analyzed. As in many other scientific domains, educational
data is now generated at a pace exceeding the ability of  researchers to
analyze it, and drawing coherent and meaningful profiles/models of the
learning process or of other educational processes is the current challenge
of research when dealing with educational data.

Developing methods to mine the complex data that we can collect on
educational situations requires developing new approaches that build upon
techniques from a combination of areas, including statistics, psychometrics,
machine learning, and scientific computing. In particular, educational data
has characteristics such as:

-          heterogenousness, in terms of coming from multiple sources and
being expressed in different scales

-          data and inferences to be drawn at multiple grain-sizes,
sometimes simultaneously

-          often collected in non-experimental settings, requiring casual
modelling and the set-up of post-hoc quasi-experiments

-          needing models which are accurate, interpretable, and

make many analyses we would like to perform with these data, using methods
popular in other domains, intractable to perform or require the use of more
than one method, at more than one level, to fully characterize the data.

The entire knowledge extraction process is influenced by these properties of
educational data, including the data cleaning and preprocessing (eg [12])
steps, the translation of intuitive properties of data into formal distance
and/or proximity measures, and the interpretation of outcomes. Moreover, it
is difficult to determine in advance which types of models will be
interpretable and add to our scientific understanding of the phenomena being

In addition to difficulties imposed by the nature of the data, there is also
the student learning process itself to complicate matters.  Our methods need
to be sensitive to finding new patterns in student strategy and cognition
that may not have been relevant before the advent of modern educational
technology; in addition, we need to validate that the patterns found are
persistent in time and generalizable across educational contexts.  Furthermore,
student behaviour is non-stationary and may change over time, or change in
response to the system's behaviour.   More recently, these, and new,
problems have been raised by the attempt to automate, totally or partially,
the analysis of such data for embedding it into personalized and adaptive
electronic learning environments (eg [13]).

In recent years, a number of approaches to dealing with educational data
have been proposed by

different communities, including: machine learning (eg [11]), data mining
and pattern classification (eg [7, 9, 17], psychometric techniques, item
response theory and Rasch models, ACT-R, optimization (eg [2]), graph theory
(eg [6]), geometry-based, systems- theory based and many others approaches
and combinations of them. All these approaches share, in different ways and
at different levels, partially or totally, the feature of *learning from
data*, that is, the ones that "in the absence of first-principles models,
[use] such readily available data can be used to derive models by estimating
useful relationships between a system's variables (i.e. unknown input-output
dependencies)." [4].

As Educational Data Mining grows and matures as a field, there is increased
importance to understanding our methodological premises and assumptions.  This
workshop is aimed at providing a forum for discussing such methodological
aspects involved in analyzing educational data. Thus it aims to bring
together researchers coming from different backgrounds and communities, and
to attract new people interested in the topics.

This workshop would be the third of a set of workshops organized in 2007 by
the International Working Group for Educational Data Mining (
http://www.educationaldatamining.org), in coordination with several of the
world's most prestigious conferences.  Since 2000, six workshops have been
held in this, or related areas. In 2007, sister workshops will be held at
User Modeling (Greece) and (pending approval) Artificial Intelligence and
Education (USA). The sister workshops will focus on a wide range of aspects
of Educational Data Mining, including the methodological relationships
between educational data mining and ubiquitous data mining, tools for
educational data mining, data integration, computational aspects of data
mining, and the applications of educational data mining. This workshop, by
contrast, will be devoted to discussion and study of methods for educational
data mining, in particular the links between machine learning/data mining
methods and statistical/psychometric methods.

The aim of this workshop, therefore is to address the methodological
challenges in analyzing educational data, and in particular:

-          What are the benefits and drawbacks between different approaches
to mining educational data? What do all the most useful methods have in

-          Which are the commonalities and the differences between machine
learning and data mining approaches and statistics/psychometrics approaches
for dealing with educational data?

-          Can these approaches support each other and/or combine? How?

-          How can we integrate multiple scales, data sources, and grain
sizes in a coherent fashion?

-          How can we deal with multiple dependencies in data and/or unknown
distributions? How should we view sample size when data is not independent?

-          Are there case studies concerning the comparison of different
methods' effectiveness for analyzing educational data?

*Target audience*

The workshop aims to bring together researchers and practitioners from a
variety of backgrounds.  We expect that participants will come from a
variety of research areas, including: user modelling and profiling, data
mining and machine learning, statistics, psychometrics, psychology, computer
science and engineering, education and evaluation fields. In particular, we
hope that this workshop will build links between the educational data mining
community, the broader community of researchers who attend ICALT, and the
community of researchers attending the International Meeting of the
Psychometrics Society (IMPS), which occurs one week before ICALT, also in
Japan. Hence, this workshop has the potential to bring new people to ICALT.


10 minutes presentation of the workshop and presentation of the challenges
and of the issues from sister workshops

45 min presentations of the works

5 min break

60 min discussion and conclusions

*Expected results*

* *

In the previous workshops. there has been an average attendance of about 30
people, so we expect a similar turn out for this workshop.

Expected results include:

Ø       greater understanding of the state of the art in the fields related
to educational data mining

Ø       a presentation of interesting and challenging case studies

Thus the participants are expected to leave the workshop not only with a
better understanding and awareness, but also with ideas for future research
in the area.  Moreover we hope that the workshop will be a forum for sharing
knowledge and experiences, and for other attendees of ICALT who work on
educational research or educational technologies to learn about this
emerging field.


[1] Agresti, A.: Categorical Data Analysis, 2nd Edition, Wiley, New York .

[2] Bradley, P. S., Fayyad, U. M., and Mangasarian, O. L.: "Mathematical
Programming for Data Mining: Formulations and Challenges", J. of Computing,
11(3), 1999, pp. 217-238

[3] Casella G., Berger, R. L. : Statistical Inference, 2nd edition, Duxbury
Press, 2001

[4] Cherkassky, V., Mulier, P.: Learning from data, Wiley, New York, 1998

[5] Cox D. R. and Wermuth, N.: Multivariate dependencies, Chapmann & Hall,

[6] De Leeuw, J. and Michaidis, G.: "Graph Layout Techniques and
Multidimensional Data Analysis", in Game Theory, Optimal Stopping,
Probability and Statistics. Papers in honor of Thomas S. Ferguson,
F.T. Bruss and L. Le Cam (eds), IMS Lecture Notes-Monograph Series,
pp. 219-248, 2000

[7] Duda, R. O, Hart P. E., and Stork D. G.: Pattern Classification,
2ndedition, Wiley, New
York, 2001

[8] Gibbons, J. D., Chakraborti, S.: Nonparametric statistical inference, 3
rd edition, Mercel Dekker, New York, 1992

[9] Han J. and Kamber, M.: Data Mining: Concepts and Techniques, 2nd edition
Morgan Kauffman, San Diego, 2005

[10] Michaidis, G., and De Leeuw, J.:"The Gifi system of Descriptive
Multivariate Analysis", Statistical Science, 13:307--336, 1998.

[11] Mitchell, T.: Machine Learning, McGraw Hill, 1997

[12] Pyle, D.: Data preparation for data mining, Morgan Kauffman,
San Diego, 1999

[13] Romero, C. and Ventura
, S. (eds): Data mining in e-learning, WIT Press
, Spain, 2006

[14] Vapnik, V.: The nature of Statistical Learning Theory, Springer, Berlin,

[15] Vapnik, V.: Statistical Learning Theory, Wiley, New York, 1998

[16] Whittaker, J.: Graphical Models in Applied Multivariate Statistics,
Wiley, New York, 1990

[17] Witten, H. and Frank,E: Data Mining, 2nd edition, Morgan Kaufmann, San
Diego, 2005

Silvia Rita Viola

Other related posts: