. Date: Mon, 26 Apr 2010 15:07:39 -0700 From: Richard Hake <rrhake@xxxxxxxxxxxxx> Reply-To: Net-Gold@xxxxxxxxxxxxxxx To: AERA-L@xxxxxxxxxxxxxxxxx Cc: Net-Gold@xxxxxxxxxxxxxxx Subject: [Net-Gold] Seventeen Statements by Gold-Standard Skeptics If you reply to this encyclopedic (87 kB) post please don't hit the reply button unless you prune the copy of this post that may appear in your reply down to a few relevant lines, otherwise the entire already archived post may be needlessly resent to subscribers. *************************************** ABSTRACT: Andy Rudd in an EdResMeth post 6 Apr 2010 titled "Cause and Effect" wrote: "Today I dealt with a doctoral student who was adamantly opposed to the idea that causal relationships can be studied using non experimental designs. . . . . . I am curious what others think about the use of non experimental designs to study causal relationships if it is not possible to use an experimental or quasi-experimental design." This initiated an 18-post thread of diverse comments on the student's opinion, accessible at <http://tinyurl.com/y4um3g3>. Rudd's student may have been influenced by the fact that the "Randomized Control Trial" has been enthroned by the U.S. Dept. of Education (USDE, 2008) and Mosteller & Boruch (2002) as the "gold standard" for demonstrating causality in education research. For consideration by Rudd's student and others, herewith are SEVENTEEN STATEMENTS BY GOLD-STANDARD SKEPTICS: (1) American Education Research Association; (2) American Evaluation Association; (3) Hugh Burkhardt & Alan Schoenfeld; (4) Tom Cook & Monique Payne; (5) Margaret Eisenhart & Lisa Towne; (6) European Evaluation Society; (7) Richard Hake; (8) Burke Johnson; (9) Annette Lareau & Pamela Barnhouse; (10) Joseph Maxwell; (11) National Education Association; (12) Dennis Phillips;(13) Barbara Schneider, Martin Carnoy, Jeremy Kilpatrick, William Schmidt, & Richard Shavelson;
(14) Michael Scriven; (15) Mack Shelley, Larry Yore, and Brian Hand; (16) Deborah Stipek; (17) Carol Weiss. *************************************** Andy Rudd (2010) in his EdResMeth post "Cause and Effect" wrote: "Today, I dealt with a doctoral student who was adamantly opposed to the idea that causal relationships can be studied using non experimental designs. I tried to explain to him that while there are much stronger designs, e.g., a randomized design, it is still possible to study causal relationships with non experimental designs. This student was upset that I would suggest something so outlandish. I am curious what others of you think about the use of non experimental designs to study causal relationships if it is not possible to use an experimental or quasi-experimental design." Rudd's post initiated an 18-post thread on 6-7 April 2010 of diverse comments on the student's opinion, accessible to EdResMeth subscribers at <http://tinyurl.com/y4um3g3>Rudd's student may have been influenced by the fact that the "Randomized Control Trial" has been enthroned by the U.S. Dept. of Education (USDE, 2008) and Mosteller & Boruch (2002) as the "gold standard" for demonstrating causality in education research.
For consideration by Rudd's student and others, herewith are SEVENTEEN STATEMENTS BY GOLD-STANDARD SKEPTICS [my CAPS; my inserts at ". . . . . . [[insert]]. . . . ."]: ************************************* 1. AMERICAN EDUCATION RESEARCH ASSOCIATION [AERA (2003)]: "We urge you. . . . [[Rod Paige, Secretary of Education]]. . . . . . . to modify the language for a 'Proposed Priority' to be used for 'any appropriate programs in the Department of Education' in FY 2004 or later. While we appreciate the value of experimental designs as an evaluation method, WE BELIEVE THAT A JUDGMENT OF 'BEST,' AS SPECIFIED IN THE PROPOSED LANGUAGE, DOES NOT ADEQUATELY ACCOUNT FOR OTHER METHODS OF EVALUATION THAT MIGHT BE AS OR MORE APPROPRIATE DEPENDING ON THE SPECIFIC EDUCATION PROGRAM. We are concerned that the proposed priorities for application of scientifically based evaluation methods (1) invoke an uncommonly narrow definition of evaluation as used in the government and in the field, and (2) make no reference to the standards for scientifically valid education evaluation adopted in the legislation creating the Institute of Education Sciences (IES)." ************************************* 2. AMERICAN EVALUATION ASSOCIATION [AEA (2003)]: "[RCTs] are not the only studies capable of generating understandings of causality. In medicine, causality has been conclusively shown in some instances without RCTs, for example, in linking smoking to lung cancer and infested rats to bubonic plague. The secretary's proposal would elevate experimental over quasi-experimental, observational, single-subject, and other designs which are sometimes more feasible and equally valid. RCTs ARE NOT ALWAYS BEST FOR DETERMINING CAUSALITY AND CAN BE MISLEADING. RCTs examine a limited number of isolated factors that are neither limited nor isolated in natural settings. The complex nature of causality and the multitude of actual influences on outcomes render RCTs less capable of discovering causality than designs sensitive to local culture and conditions and open to unanticipated causal factors. RCTs should sometimes be ruled out for reasons of ethics. For example, assigning experimental subjects to educationally inferior or medically unproven treatments, or denying control group subjects access to important instructional opportunities or critical medical intervention, is not ethically acceptable even when RCT results might be enlightening. Such studies would not be approved by Institutional Review Boards overseeing the protection of human subjects in accordance with federal statute. In some cases, data sources are insufficient for RCTs. Pilot, experimental, and exploratory education, health, and social programs are often small enough in scale to preclude use of RCTs as an evaluation methodology, however important it may be to examine causality prior to wider implementation." **NOTE: See the reference "AEA (2003)" in the REFERENCE list for the "Not AEA Statement" [Lipsey (2003)] signed by 8 prominent AEA members.** ************************************* 3. HUGH BURKHARDT & ALAN SCHOENFELD (2003, p. 9) in "Improving Educational Research: Toward a More Useful, More Influential, and Better-Funded Enterprise": ". . . . it is essential for the research community to delineate the many good ways of doing high-quality research, and then live up to the standards it sets. SCIENCE ADVANCES BY TESTING HYPOTHESES FROM ALL CREDIBLE VIEWPOINTS, NOT BY APPLYING PREDETERMINED METHODS (e.g., RANDOMIZED CONTROLLED TRIALS) INDEPENDENT OF CONTEXT. The goal is to provide rigorous, evidence-based warrants for one's claims; the idea is to match the method(s) with the issue at hand, and to only draw conclusions warranted by each method or the methods in combination [see, e.g., . Schoenfeld (2002), National Research Council (2002). . . [[referenced here as Shavelson & Towne (2002)]]. . . . .]] . ************************************* 4. TOM COOK & MONIQUE PAYNE (2002, p. 174) in "Objecting to the Objections to Using Random Assignment in Educational Research": "In some quarters, particularly medical ones, the randomized experiment is considered the causal 'gold standard.' IT IS CLEARLY NOT THAT IN EDUCATIONAL CONTEXTS, given the difficulties with implementing and maintaining randomly created groups, with the sometimes incomplete implementation of treatment particulars, with the borrowing of some treatment particulars by control group units, and with the limitations to external validity that often follow from how the random assignment is achieved." ************************************* 5. MARGARET EISENHART & LISA TOWNE (2003) in "Contestation and Change in National Policy on 'Scientifically Based' Education Research" [see that article for references other than Shavelson & Towne (2002)]: "Recent federal education policies (e.g., the No Child Left Behind [NCLB] Act of 2001 [NCLB, 2001] and the Education Sciences Reform Act [ESRA] of 2002 [ESRA, 2002]) have generated considerable debate among education researchers. . . . . . .Much of this public debate has turned on two questions: 'What constitutes 'scientifically based' research in education?' and 'Is scientifically based research the only or the best approach to meaningful studies of educational phenomena?' In response to a request from the National Educational Research Policy and Priorities Board (NERPPB), a National Research Council (NRC) committee took up the first question in late 2000. . . . . . . . In the spring of 2002, the committee published its report, SRE (NRC, 2002). . . . . [[referred to as Shavelson & Towne (2002) in this post]]. . . . . , WHICH ARGUED FOR A POSTPOSITIVIST APPROACH . . . . .[[see e.g., Phillips & Burbules (2000)]]. . . . TO SCIENTIFICALLY BASED RESEARCH IN EDUCATION, INCLUDING A RANGE OF RESEARCH DESIGNS (EXPERIMENTAL, CASE STUDY, ETHNOGRAPHIC, SURVEY) AND MIXED METHODS (QUALITATIVE AND QUANTITATIVE) DEPENDING ON THE RESEARCH QUESTIONS UNDER INVESTIGATION. Although SRE recognized the legitimacy and importance of "nonscientific" ways of knowing for education research (pp. 26, 74-76), the report attempted a broad, inclusive answer to the first question and did not address the second question in any detail." ************************************* 6. EUROPEAN EVALUATION SOCIETY <http://www.europeanevaluation.org> [EES (2007)] in "The Importance of a Methodologically Diverse Approach to Impact Evaluation-Specifically with Respect to Development Aid and Development Interventions," Nijkerk, The Netherlands: December 2007; quoted in Donaldson (2009): "The EES, consistent with its mission to promote the 'theory, practice, and utilization of high quality evaluation,' notes the current interest in improving impact evaluation and assessment (IE) with respect to development and development aid. EES HOWEVER DEPLORES ONE PERSPECTIVE CURRENTLY BEING STRONGLY ADVOCATED: THAT THE BEST OR ONLY RIGOROUS AND SCIENTIFIC WAY IF DOING SO IS THROUGH RANDOMIZED CONTROLLED TRIALS (RCTs). . . . . . ." ************************************* 7. RICHARD HAKE (2008a) in "Randomized Trials (was Can Pre-to-posttest Gains Gauge Course Effectiveness?): "ABSTRACT: In a recent post "Can Pre-to-posttest Gains Gauge Course Effectiveness? #2. . . . [[Hake (2008d)]]. . . .," I wrote: "These [pre/post studies . . . . .[[demonstrating abut a two-standard-superiority in average normalized gains <g> for "Interactive Engagement" over "Traditional" passive-student lecture courses - Hake (1998a,b; 2002, 2008h)]]. . . . have been carried out on many different instructors, in many different institutions, using many different texts, and working with many different types of student populations from rural high schools to Harvard." In response, AERA-D's Jeremy Miles (2008) asked "WERE THESE RANDOMIZED TRIALS?" THE SHORT ANSWER IS "NO." The long answer explains that: (a) RANDOMIZED CONTROL TRIALS (RCT's) ARE ALMOST IMPOSSIBLE TO CARRY OUT IN UNDERGRADUATE PHYSICS EDUCATION RESEARCH, and (b) CAREFUL NON-RCT RESEARCH CAN ESTABLISH CAUSALITY TO A REASONABLE DEGREE - as argued by Shadish, Cook, & Campbell; Shavelson & Towne; Schneider, Carnoy, Kilpatrick, Schmidt, & Shavelson; and Michael Scriven. ************************************* 8. BURKE JOHNSON (2010) in EdResMeth post "Re: Cause and Effect": ". . . if one wants to search for causation of the scientific/nomological type (which I believe was assumed in the original question . . . . [[Rudd (2010)]. . . . ) then I teach that randomized experiments are the best (when they are possible and no moderator variable has been excluded), and I make the points Bruce just made. . . . .[[Thompson (2010): "If you use (a) regression discontinuity designs, or (b) create a control group using propensity scores, I think you can come reasonably close to a true experiment]]. . . . However, note that experiments are best for what Don Campbell called local molar causation or what Shadish, Cook, and Campbell. . . . .[[(2002)]. . . more recently call descriptive causation. . . . [[(pp. 9-12]]. . . . . Experiments are weaker on demonstrating complex processes or what Shadish, Cook, and Campbell call explanatory causation. . . . [[(pp. 9-12]]. . . . Qualitative research can be very useful (e.g., grounded theory) for generating evidence of explanatory causation. Mixed research is especially interested in connecting the two (descriptive and explanatory causation) because both are important. Also, there are many variables in the world that we cannot actively manipulate and we must still search for causes; scientists do not give up; making a dogmatic claim that the choice is either (a) an experiment or (b) nothing does not suffice. Many entire disciplines must deal with this situation of not being able to conduct experiments on many of their topics/variables of interest (e.g., archaeology, sociology, economics, political science, epidemiology, astronomy). In these cases, one has to do the best one can and there are many strategies that can be used to provide some warrant for assertions of causation in the absence of experimentation as scholars in these disciplines will readily explain. Again, I SUGGEST THAT MAKING A BINARY CLAIM THAT EITHER AN EXPERIMENT MUST BE DONE (WHICH IS THE BEST SINGLE METHOD) OR ONE CAN HAVE ZERO EVIDENCE OF CAUSATION IS, SCIENTIFICALLY AND PRACTICALLY SPEAKING, PROVINCIAL. A mixed research standpoint tends to tend to replace thinking in binary terms with thinking synechistically (i.e., in terms of continua)." ************************************* 9. ANNETTE LAREAU & PAMELA BARNHOUSE (2010) in "What Counts as Credible Research?": "It is a critical moment in educational policy. The Obama administration has renewed emphasis on educational policy and No Child Left Behind is up for renewal. But in the current debate, there has not been sufficient discussion of a crucial piece of educational debates: what kinds of research should be considered to be acceptable? In recent years, RANDOMIZED-CONTROLLED TRIALS WERE ELEVATED TO THE POSITION AS THE "GOLD STANDARD" FOR EDUCATIONAL RESEARCH. WE BELIEVE THIS POSITION TO BE HIGHLY PROBLEMATIC. As the debate about education begins to pick up speed, it is important to broaden the definition of legitimate educational research.. . . . . . We suggest that federal department of education decision makers need to acknowledge that there are many different research questions in education, and that different research questions call for different methods. There needs to be a realistic and critical assessment of the limits of randomized-controlled trials and the relatively narrow forms of knowledge that can be gained from their use (Phillips, 2009). Investigations that address a rich range of questions that fall outside the realm of randomized controlled trials need to be supported as well, such as the mechanisms through which parents influence children's schooling experiences, the micro-interactional patterns that build trust among school personnel, or political and organizational impediments to reform." ************************************* 10. JOSEPH MAXWELL (2004, abstract) in "Causal Explanation, Qualitative Research, and Scientific Inquiry in Education": "A National Research Council report, 'Scientific Research in Education'. . . . . . [[Shavelson & Towne (2002)]]. . . . . , has elicited considerable criticism from the education research community. . . . .[[see, e.g. Educational Researcher (2002), Eisenhart & Towne (2003.]]. . . ., but this criticism has not focused on a key assumption of the report-its HUMEAN, REGULARITY CONCEPTION OF CAUSALITY. IT IS ARGUED THAT THIS CONCEPTION, WHICH ALSO UNDERLIES OTHER ARGUMENTS FOR 'SCIENTIFICALLY-BASED RESEARCH,' IS NARROW AND PHILOSOPHICALLY OUTDATED, AND LEADS TO A MISREPRESENTATION OF THE NATURE AND VALUE OF QUALITATIVE RESEARCH FOR CAUSAL EXPLANATION. An alternative, realist approach to causality. . . . .[[Campbell (1988), House (1991), Pawson & Tilley (1997), Pawson (2006)]]. . . . . is presented that supports the scientific legitimacy of using qualitative research for causal investigation, reframes the arguments for experimental methods in educational research, and can support a more productive collaboration between qualitative and quantitative researchers." ************************************* 11. NATIONAL EDUCATION ASSOCIATION [NEA (2003)]:"The NEA STRONGLY ENDORSES the National Research Council's study, 'Scientific Research in Education,'. . . . [[SHAVELSON & TOWNE (2002)]]. . . . . AND RECOGNIZES THIS TO BE THE "GOLD STANDARD" in terms of selecting methodology that is most appropriate for the question presented, rather than framing the question to fit the methodology. If a federal regulation were to reward or even tacitly endorse the latter approach, we would no longer have true evidence-based education initiatives. We also strongly agree with the comments of both the American Education Research Association . . . .[[see above]]. . . . and the National Education Knowledge Industry Association on this point. ************************************* 12. DENNIS PHILLIPS (2009, p. 178) in "A Quixotic quest? Philosophical issues in assessing the quality of education research": "As the examples above illustrate, a narrow view of the nature of science, and crucially of its methods, is fostered if too much attention is paid to what the philosopher Hans Reichenbach. . . . . [[<http://en.wikipedia.org/wiki/Hans_Reichenbach>]]. . . . termed the 'context of justification' and too little notice is given to the vitally important 'context of discovery.' This distinction if heuristically valuable, but it is too crude to be taken as marking an absolute dichotomy. In practice, ideas are often tested as they are formulated, leading to many of them quickly being discarded as unworthy. There are not two temporally distinct processes occurring, as a crude understanding of Reichenbach's distinction might suggest, but one complex one in which probing, hypothesis formation, critiquing, and testing are intermingled, as the case of William Harvey amply illustrates. . . . .[['to convince his scientific peers that blood circulates in arteries and veins and is pumped by the heart']]. . . . . Crude as it is, however, the discovery/justification distinction is extremely helpful when applied to the recent debates concerning the use of the so-called 'gold standard' in education research. Thus those who insist the *the* criterion to use in identifying scientifically rigorous educational research is whether or not the study in question used randomized controlled field trails or experiments (RTFs), or quasi-experimental designs that approximate them, are guilty of focusing on only one-half of Reichenbach's categorization of the logic of science. RFT methodology is well suited to throw light only on the *justificatory* issue - that is whether or not it can be claimed that a treatment actually caused (produced) a desired effect. This focus is, of course, an important one, but taken by itself (and it is often put forward by itself) IT EGREGIOUSLY MISREPRESENTS THE NATURE OF SCIENTIFIC INQUIRY. For what is omitted is the vital steps leading up to the *initial discovery or production* of the treatment (or program or hypothesis) whose claim of effectiveness is being subjected to justificatory investigation by means of the RFT. It is often in this "phase" of discovery where scientists display their creative genius, their range and depth of background knowledge, their 'opportunism,' their ability to 'do their damnedest." ************************************* 13. BARBARA SCHNEIDER, MARTIN CARNOY, JEREMY KILPATRICK, WILLIAM SCHMIDT, & RICHARD SHAVELSON (2007, p. 117) in "Estimating Causal Effects Using Experimental and Observational Designs: A Think Tank White Paper": "As the NRC's Committee on Scientific Research on Education makes clear in 'Scientific Research in Education' . . . . .[[Shavelson & Towne (2002)]]. . . ., the question of causal effects is but one of three general questions that drive research. This report has focused on how to establish that there is an effect (i.e., 'Is there a systematic effect?'). What has been less emphasized are the two other question identified by the NRC: (1) 'What is happening?' (i.e., what is occurring in a particular context, usually documented through thick description); and (2) 'Why or how is it happening?' (i.e., What mechanisms are producing the effect that is observed?). These two questions are central to the design of experiments and their usefulness. They are also important for developing theories of cognition, learning, and social and emotional development. A PROGRAM OF EVALUATION BUILT ON A SOLID FOUNDATION OF CLOSELY LINKED RESEARCH USING A VARIETY OF METHODS IS NEEDED TO ESTABLISH THE BASIS FOR RELIABLE AND ENDURING KNOWLEDGE ABOUT THE EFFECTS OF EDUCATIONAL INNOVATIONS." ************************************* 14. MICHAEL SCRIVEN (2008) in "A Summative Evaluation of RCT Methodology: & An Alternative Approach to Causal Research": "Along with the attempt to redefine the concepts of-or at least the acceptable ways to establish-evidence and causation, the RCT campaign also involves the less-remarked parallel effort, going back further, to redefine the concept of an experiment. In standard scientific usage, experiments are just carefully constrained explorations, and the RCT is simply a special case of these. To call the RCT the only 'true experiment' is part of an attempt at redefinition that distorts the original and continuing usage, and excludes experiments designed to test many simple hypotheses about - or simple efforts to find out - what happens if we do *this*. This effort at persuasive redefinition is allied with an implicit denigration of the so-called 'quasi-experimental' designs, which are in fact perfectly respectable experiments, only 'quasi' with respect to the one respect in which they have less control over one possible way of excluding one type of alternative explanation. But in other respects, equally important in the practical business of selecting appropriate designs to get definite answers in the given circumstances, they are often massively superior, e.g., with respect to the number of subjects required in order to achieve useful results; the extent to which they avoid intrusion into a natural course of events that it may be very important not to disturb; their cost, not just in money terms but in terms of other important values, etc. Of particular importance, THE COMMONLY ACCEPTED IMPLICATION OF THE 'QUASI' TERMINOLOGY - THAT THE CONCLUSIONS FROM THEM WILL BE LESS SECURE - IS, AS ARGUED BELOW, CATEGORICALLY FALSE. It is based on an abstract concept of proof or certainty that ignores the practical process and standards used by working scientists and engineers-and by historians and judges in courts of law, and by everyone when acting as real people facing crucial decisions- all of whose approaches are treated with more respect in the present paper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SUMMATIVE PROPOSITIONS A. *The RCT design is a theoretical construct of considerable interest, BUT IT HAS ESSENTIALLY ZERO PRACTICAL APPLICATION TO THE FIELD OF HUMAN AFFAIRS.* It is important to be clear that a true RCT study has to be (at least) double-blind, as are all sound pharmacological studies, whereas the applications in public health, education, social services, law enforcements, etc., that are currently advocated as RCTs are neither double-blind nor even single blind, but 'zero-blind.' Such studies are of course open to the unintended explanation of their results by appeal to the Hawthorne effect or its converse. . . . .[[the "John Henry Effect"]]. . . ., since it's usually easy for members of the experimental and control groups to work out which one they are in. HENCE THE COMMON ARGUMENT THAT THE RCT DESIGNS BEING ADVOCATED IN AREAS LIKE EDUCATION, PUBLIC HEALTH, INTERNATIONAL AID, LAW ENFORCEMENT, ETC., HAVE THE (UNIQUE) ADVANTAGE OF 'ELIMINATING ALL SPURIOUS EXPLANATIONS' IS COMPLETELY INVALID. It was careless to suppose that randomization of subject allocation would compensate for the failure to blind the subjects (as in single blind studies), let alone the failure to blind the treatment dispensers, a.k.a. service providers (the requirement that distinguishes the double-blind study). The RCT banner in the applied human sciences is in fact being flown over pseudo-RCTs. This failing is not the result of carelessness, but of the almost complete impossibility, at least within the constraints of the usual protocols governing experimentation with human subjects, of arranging for even single blind conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. *THE REAL 'GOLD STANDARD' FOR CAUSAL CLAIMS IS THE SAME ULTIMATE STANDARD AS FOR ALL SCIENTIFIC CLAIMS; IT IS CRITICAL OBSERVATION.*. . . .. Causation can be directly observed, in lab or home or field, usually as one of many contextually embedded observations, such as lead being melted by heating a crucible, eggs being fried in a pan, or a hawk taking a pigeon. And causation can also be inferred from non-causal direct observations with no experimentation, as by the forensic pathologist performing an autopsy to determine the cause of death. " ************************************* 15. MACK SHELLEY, LARRY YORE, AND BRIAN HAND (2009b) in "Education Research Meets the 'Gold Standard': Evaluation, Research Methods, and Statistics after No Child Left Behind": " . . . . .Unfortunately, it appears as if the Gold Standard for research practice (randomized control trails, RCTs) is based on the stage 3 drug trial, or medical model, without duly recognizing the stage 1 and stage 2 trials necessitated by rarity of disease, risks, development of problem space, availability of related technologies or innovations, and costs. . . . . . . . . . .SOME INITIAL AND CURRENT INTERPRETATIONS OF THE GOLD STANDARD HAVE PRIVILEGED A SINGLE APPROACH AN TYPE OF EVIDENCE REGARDLESS OF THE DEVELOPMENT OF THE PROBLEM SPACE, SPECIFIC RESEARCH QUESTION, AVAILABLE TECHNOLOGIES AND INSTRUMENTATION AND COST AND ETHICAL CONSIDERATIONS. If such interpretations of this policy exclusively privilege RCT and quantitative evidence, it would disregard high-quality, qualitative research approaches and other contemporary approaches and, thus, the evidence flowing from such inquiries. Such an oversight would not fully recognize education as a social science that utilizes (a) epistemologies and methods that involve both hypothetico-deductive inquiry and normal hierarchical development and (b) inductive, nonexperimental inquiries that insert new theoretical discourses alongside existing ones (Yore & Lerman, 2008). . . . [[but regarding "new theoretical discourses" see the insightful "Expanded Social Scientist's Bestiary: A Guide to Fabled Threats To, and Defenses of, Naturalistic Social Science" by philosopher D.C. Phillips (2000)]]. . . . . . . ************************************* 16. DEBORAH STIPEK (2005) in "Scientifically Based Practice: It's About More Than Improving the Quality of Research": ". . . . the administration is also recommending significant changes in the way education researchers do business. According to the Institute of Education Sciences' director, Grover J. 'Russ' Whitehurst, the focus of research should be on identifying effective teaching practices. Borrowing from the field of medicine, the federal government has also put its faith, and its money, in a particular methodology - randomized field trials. This methodology is considered to be more rigorous than any other used in education research, and it allows causal conclusions that no other method can boast. Also concerned with the quality and reputation of education research, the National Research Council Committee on Scientific Principles in Education Research. . . . [[see, e.g., Shavelson & Towne (2002)]]. . .offers a somewhat different set of recommendations. The committee suggests that the fit between the method and the questions being asked is more important than the particular method. Its recommendations focus primarily on the culture of education research - the need to foster a greater commitment to objectivity, high standards of scientific inquiry, replication, and the free flow of constructive critique. Yet a third set of recommendations is well articulated in two documents - one . . .[[NAE (1999)]]. . . . issued by the National Academy of Education. . . . . . . . . ..[[<http://www.naeducation.org/>]. . . . . . ., and another by the National Research Council (Strategic Education Research Partnership, SERP). . .[[see Donovan & Pellegrino (2003)]]. . . .. These reports promote, as the administration does, research that focuses on the problems of practice. Their recommendations differ from the administration's strategy in several important ways, however. First, they encourage research in what Donald Stokes . . . [[Stokes (1997)]] . . . . . . .calls Pasteur's Quadrant - research on practical problems that develops, at the same time, general principles that can guide future research and practice. The reports suggest particular qualities of research that they claim will be more useful for improving education practice. They recommend, for example, research that is embedded in practice and that involves collaborations between researchers and practitioners. . . .[[see e.g., Kelly (2003)]]. . . Unlike the traditional linear model of 'research-into-practice,' their view of productive research and development involves moving back and forth between research and practice. Innovations are developed by researchers collaborating with practitioners. They are tried out in classrooms, refined or developed by practitioners in their schools and classrooms, and then systematically studied by researchers. The link between research and practice is assumed to be complex, reciprocal, and dynamic. Productive use of research findings at the policy level also requires many judgment calls. A policy found to be effective in one context is not necessarily effective in another, and there are often many details related to the original conditions of the research that need to be attended to when applying findings in new contexts. Consider the example of class-size reduction in California. A large, random-assignment study in Tennessee demonstrating the benefits of reducing class sizes to about 15 students was used to support a policy of reducing class size to 20 in California. But unlike in Tennessee, where trained teachers were in good supply, in California there was a serious teacher shortage. Because crucial variables related to the context of the study were ignored, the implementation of this very costly policy in California may have done more harm than good, at least for children in the low-income communities that could not compete for the limited supply of trained and experienced teachers. Another example is a random-assignment study of the High/Scope preschool intervention in Ipsilanti . . .[[sic, it's Ypsilanti in Michigan, see e.g. <http://evidencebasedprograms.org/wordpress/?page_id=65>]]. . . , cited repeatedly as support for preschool education. True, the study has demonstrated impressive and long-term effects of a preschool experience, but the devil is in the details. Many of the preschool programs that were spawned by this compelling research evidence look nothing like the Ipsilanti program. . . .[[sic, it's "Ypsilanti program"]]. . . . . It is very likely that many of the preschool programs based on this research do not give anything close to the same advantages seen in the original High/Scope program. These examples illustrate the complexity of making evidence-based policy decisions. Researchers will need to make sure that they communicate clearly what contextual variables and details of the intervention or program are necessary to achieve positive results. And policymakers will need either training or assistance to make judgments about the implications of research findings for their local context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The bottom line is that education researchers, like educational practitioners, are being asked to approach their work differently from how they did in the past. We are being challenged to impose high standards of scientific rigor on ourselves, to focus on problems of practice, and to develop sustained collaborations with practitioners. If the resources needed to do this kind of research become available (they currently are not), we should be able to live up to the challenge. BUT UNTIL MANY OTHER INSTITUTIONAL CHANGES OCCUR, AND THE ORGANIZATIONAL STRUCTURES TO SUPPORT EVIDENCE-BASED PRACTICE ARE DEVELOPED, RESEARCH FINDINGS, HOWEVER CLEAR AND USEFUL, WILL HAVE A FEATHER'S WEIGHT ON TEACHING AND STUDENT LEARNING IN THE NATION'S SCHOOLS. We do need to improve the quality and relevance of education research, but that's not all we need to do. Deborah Stipek. . . . .[[<http://ed.stanford.edu/suse/faculty/displayRecord.php?suid=stipek> ]]. . . . .is the dean of the Stanford University school of education. ************************************* 17. CAROL WEISS (2002) in "What to Do until the Random Assigner Comes": "The contributions to this volume have largely been appreciations of random assignment and its many virtues. I agree that it is ideal for purposes of establishing causality (I'd better if I don't want to be thrown out of this merry company) because it shows that the intervention was in fact responsible for the observed effects. BUT THERE ARE CIRCUMSTANCES WHEN RANDOM ASSIGNMENT IS VERY DIFFICULT, IF NOT IMPOSSIBLE, TO IMPLEMENT. One of those circumstances arises when the goal of the intervention is to change *not* the individuals but the community itself. Many such programs are currently in existence, programs that aim to 'revitalize,' 'transform,' or 'develop,' the community in the United States, in Europe with the European Community's 'social funds,' and in the developing countries. Ultimately, the purpose of the intervention is to improve the well-being of the residents, but the intervention is not directed at individual residents so much as the conditions and workings of the neighborhood. The obvious solution to the difficulty with randomizing individuals is to randomize communities, that is to assign communities randomly to program and control conditions. . . . . However, at the community level randomization faces three almost intractable problems: (a) small numbers, (b) funders insistence on control of selection, and (c) variability across sites. "Richard Hake Honorary Member, Curmudgeon Lodge of Deventer, The Netherlands
President, PEdants for Definitive Academic References which Recognize the Invention of the Internet (PEDARRII) <rrhake@xxxxxxxxxxxxx> <http://www.physics.indiana.edu/~hake> <http://www.physics.indiana.edu/~sdi> <http://HakesEdStuff.blogspot.com> <http://iub.academia.edu/RichardHake> "In science education, there is almost nothing of proven efficacy." Grover Whitehurst, RCT apostle and former director of the USDE's Institute of Education Sciences, as quoted by Sharon Begley (2004b) "Physics educators have led the way in developing and using objective tests to compare student learning gains in different types of courses, and chemists, biologists, and others are now developing similar instruments. These tests provide convincing evidence that students assimilate new knowledge more effectively in courses including active, inquiry-based, and collaborative learning, assisted by information technology, than in traditional courses." Wood & Gentile (2003) "It is fruitful to view scientists as making convincing cases, cases that appeal to a wide variety of evidence. This assessment of scientific cases is called the 'platinum standard'." Dennis Phillips (2006) "I can't resist sharing a correlation/causation comic to this thread: <http://xkcd.com/552/>" Sharon Osborn Popp (2010) REFERENCES [Tiny URL's courtesy <http://tinyurl.com/create.php>. All URL's accessed on 26 April 2010. The formatting is not commonly employed, but should be. It employs a blend of the *best* formatting features from the style manuals of the AIP (American Institute of Physics <http://www.aip.org/pubservs/style/4thed/toc.html>), APA (American Psychological Association <http://apastyle.apa.org/>), and CSE (Council of Science Editors <http://www.councilscienceeditors.org/publications/style.cfm>)]. AEA. 2003. American Evaluation Association, Response to U. S. Department of Education's "Scientifically Based Evaluation Methods: Studies capable of determining causality," online at <http://www.eval.org/doestatement.htm>. NOTE: Some prominent AEA members disagreed with the above statement and issued a "Not AEA Statement" [Lipsey (2003)]: "This statement is intended to support the. . . .[[USDE's]]. . . definition and associated preference for the use of such designs for outcome evaluation when they are applicable. It is also intended to provide a counterpoint to the statement submitted by the AEA leadership as the Association's position on this matter. The generalized opposition to use of experimental and quasi-experimental methods evinced in the AEA statement is unjustified, speciously argued, and represents neither the methodological norms in the evaluation field nor the views of the large segment of the AEA membership with significant experience conducting experimental and quasi-experimental evaluations of program effects." The statement was signed by Leonard Bickman, Robert F. Boruch, Thomas D. Cook, David S. Cordray, Gary Henry, Mark W. Lipsey, Peter H. Rossi, & Lee Sechrest.] AERA. 2003. American Educational Research Association, letter to the Honorable Rod Paige, Secretary of Education, online at <http://www.eval.org/doeaera.htm>. Begley, S. 2004a. "The Best Ways to Make Schoolchildren Learn? We Just Don't Know," Wall Street Journal 10 December, page B1; online to Wall Street Journal subscribers (and possibly others) at <http://tinyurl.com/26bmsn4>. I thank Keith Tipton for bringing this article and its sequel [Begley (2004b)] to my attention. Begley, S. 2004b. "To Improve Education, We Need Clinical Trials To Show What Works," Wall Street Journal, 17 December, page B1; online to Wall Street Journal subscribers at <http://tinyurl.com/34v4uss> and to discussion-list followers in the APPENDIX of Hake (2005a). See also Begley (2004a).] Bernhardt, P.C. 2008. Re: Randomized Trials (was Can Pre-to-posttest Gains Gauge Course Effectiveness?)," TIPS post of 22 Oct 2008 05:22:44-0700; online on the OPEN! TIPS archives at <http://tinyurl.com/5a7hzk>. Burkhardt, H. & A.H. Schoenfeld. 2003. "Improving Educational Research: Toward a More Useful, More Influential, and Better-Funded Enterprise," Educational Researcher 32(9): 3-14; online to subscribers at <http://www.aera.net/publications/?id=401>. Campbell, D.T. 1988. "Methodology and epistemology for social science: Selected papers (S. Overman, ed.). Chicago: University of Chicago Press, publisher's information at <http://tinyurl.com/25xpu9p>. Amazon.com information at <http://tinyurl.com/2dshbfa>. An expurgated Google Book Preview is online at <http://tinyurl.com/22jm3c4>. Christensen, L.B. , R.B. Johnson, & L.A. Turner. 2010. "Research Methods, Design, and Analysis," 11th edition. Allyn and Bacon. Amazon.com information at <http://tinyurl.com/y5yoxnp>. Notes for the 3rd edition are at <http://www.sagepub.com/bjohnsonstudy/index.htm>. Cook, T.D. & M.R. Payne. 2002. "Objecting to the Objections to Using Random Assignment in Educational Research" in Mosteller & Boruch (2002). Cronbach, L.J., S.R. Ambron, S.M. Dornbusch, R.D. Hess, R.C. Hornik, D.C. Phillips, D.F. Walker, and S.S. Weiner. 1980. "Toward reform of program evaluation." Jossey Bass. Amazon.com information at <http://tinyurl.com/y42s7ra>. Crouch, C.H. & E. Mazur. 2001. "Peer Instruction: Ten years of experience and results," Am. J. Phys. 69: 970-977; online at <http://tinyurl.com/sbys4>. DeHaan, R.L. 2005. "The Impending Revolution in Undergraduate Science Education," Journal of Science Education and Technology 14(2): 253-269. The abstract, online at <http://tinyurl.com/ymwwe3>. reads: "There is substantial evidence. . . . .[[little, if any from RCT's]]. . . . .that scientific teaching in the sciences, i.e. teaching that employs instructional strategies that encourage undergraduates to become actively engaged in their own learning, can produce levels of understanding, retention and transfer of knowledge that are greater than those resulting from traditional lecture/lab classes. But widespread acceptance by university faculty of new pedagogies and curricular materials still lies in the future. In this essay we review recent literature that sheds light on the following questions: (1) What has evidence from education research and the cognitive sciences told us about undergraduate instruction and student learning in the sciences? (2) What role can undergraduate student research play in a science curriculum? (3) What benefits does information technology have to offer? (4)What changes are needed in institutions of higher learning to improve science teaching? We conclude that widespread promotion and adoption of the elements of scientific teaching by university science departments could have profound effects in promoting a scientifically literate society and a reinvigorated research enterprise." Donaldson, S., T.C. Christie, & M.M. Mark, eds., 2009. "What counts as credible evidence in applied research and evaluation?" Sage, publisher's information at <http://www.sagepub.com/booksProdDesc.nav?prodId=Book231785&;>. Amazon.com information at <http://tinyurl.com/ygtt6gs>, note the "Look Inside" feature. An expurgated Google Book Preview is online at <http://tinyurl.com/y2ezueg>. See also the "Credible Evidence in Evaluation Website at <http://sites.google.com/site/credibleevidence/Home>, with these headings: Contents, Reviews, About the Editors, Key Features, Buy the Book, Free Resources, Training, & Contact. Chapter 1 "In Search of the Blueprint for an Evidence-Based Global Society" is online at <http://www.cgu.edu/PDFFiles/sbos/Donaldson_Credible_Evidence_1.pdf> (193 kB). Donaldson, S. 2009. "The Epilogue "A Practitioner's Guide for Gathering Credible Evidence in the Evidence-Based Global Society," in Donaldson et al. (2009, pp. 239-251), online at <http://www.cgu.edu/PDFFiles/sbos/Donaldson_Credible_Evidence_Epilogue.pdf> (172 kB). Donovan, M.S. & J. Pellegrino, eds. 2003. "Learning and Instruction: A SERP Research Agenda," Academies Press; online at <http://books.nap.edu/catalog/10858.html>. Educational Researcher. 2002. "Theme Issue on Scientific Research and Education" 31(8); online to subscribers at <http://www.aera.net/publications/?id=438>. EES. 2007. European Evaluation Society <http://www.europeanevaluation.org>, EES Statement: The Importance of a Methodologically Diverse Approach to Impact Evaluation - Specifically with Respect to Development Aid and Development Interventions. Nijkerk, The Netherlands: December; quoted in Donaldson (2009). Eisenhart, M. & L. Towne. 2003. "Contestation and Change in National Policy on 'Scientifically Based' Education Research," Educational Researcher 32(7): 31-38; online to subscribers as a 176 kB pdf at <http://edr.sagepub.com/cgi/reprint/32/7/31>: "In this article, we examine the definitions of 'scientifically based research' in education that have appeared in recent national legislation and policy. These definitions, now written into law in the No Child Left Behind Act of 2001 and the Education Sciences Reform Act of 2002, and the focus of [Shavelson & Towne (2002)], are being used to affect decisions about the future of education programs and the direction of education research." English, L.D. 2008. "Handbook of international research in mathematics education," 2nd edition, Routledge, publisher's information at <http://tinyurl.com/3szytm>. Amazon.com information at <http://tinyurl.com/y5uf5tq>. An expurgated Google Book Preview is online at <http://tinyurl.com/y3euscn>. Feuer, M.J., L. Towne, & R.J. Shavelson. 2002a. "Scientific Culture and Educational Research," Educational Researcher 31(8): 4-14; online to subscribers at <http://edr.sagepub.com/cgi/reprint/31/8/4>. Feuer, M.J., L. Towne, & R.J. Shavelson. 2002b. Comments on responses to Shavelson & Towne (2002) in Educational Researcher (2002) - see also Feuer et al. (2002a). Hake, R.R. 1998a. "Interactive-engagement vs traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses," Am. J. Phys. 66: 64-74; online at <http://www.physics.indiana.edu/~sdi/ajpv3i.pdf> (84 kB). [A Google search for "Interactive-engagement vs traditional methods" (with the quotes) netted 7,060 hits on 24 April 2010.] Hake, R.R. 1998b. "Interactive-engagement methods in introductory mechanics courses," online at <http://www.physics.indiana.edu/~sdi/IEM-2b.pdf> (108 kB). A crucial companion paper to Hake (1998a). Hake, R.R. 2002. "Lessons from the physics education reform effort," Ecology and Society 5(2): 28; online at <http://www.ecologyandsociety.org/vol5/iss2/art28/>. For an update on six of the lessons on "interactive engagement" see Hake (2007). Hake, R.R. 2004. "Direct Instruction Suffers a Setback in California - Or Does It?" contributed to the 129th National AAPT meeting in Sacramento, CA, 1-5 August 2004; online at <http://www.physics.indiana.edu/~hake/DirInstSetback-041104f.pdf> (420 kB). Hake, R.R. 2005a. Re: "To Improve Education, We Need Clinical Trials To Show What Works," AERA-L post of 10 Jan 2005 16:01:05 -0800; online at <http://tinyurl.com/yzjz5vp>. The APPENDIX contains a copy of Begley (2004b) as allowed by the "fair use" provision of copyrighted material under section 107 of U.S. Copyright Law - see e.g., <http://www.law.cornell.edu/uscode/17/107.shtml>. Hake, R.R. 2005b. "Should Randomized Control Trials Be the Gold Standard of Educational Research?" online on the OPEN! AERA-L archives at <http://tinyurl.com/ybcexn8>. Post of 5 Apr 2005 20:28:30 -0700 to AERA-C, AERA-D, AERA-G, AERA-C, AERA-H, AERA-J, AERA-K, AERA-L, AP-Physics, ASSESS, Biopi-L, Chemed-L, EvalTalk, Math-Learn, Phys-L, Physhare, PhysLrnR, STLHE-L, & TIPS. Hake, R.R. 2005c. "Scientifically Based Practice," online on the OPEN! AERA-L archives at , post of 15 Apr 2005 17:01:00-0700. The APPENDIX contains a copy of Stipek (2005) in accord with the "fair use" provision of copyrighted material under section 107 of U.S. Copyright Law. See also Hake (2005a,b). Hake, R. R. 2005d. "The Physics Education Reform Effort: A Possible Model for Higher Education?" online at <http://www.physics.indiana.edu/~hake/NTLF42.pdf> (100 kB). This is a slightly edited version of an article that was (a) published in the National Teaching and Learning Forum 15(1), December, online to subscribers at <http://www.ntlf.com/FTPSite/issues/v15n1/physics.htm> (if your institution doesn't subscribe, then it should), and (b) disseminated by the "Tomorrow's Professor" list <http://ctl.stanford.edu/Tomprof/postings.html> as Msg. 698 on 14 Feb 2006. Hake, R.R. 2005e. "Will the No Child Left Behind Act Promote Direct Instruction of Science?" Am. Phys. Soc. 50: 851 (2005); APS March Meeting, Los Angles, CA. 21-25 March; online at <http://www.physics.indiana.edu/~hake/WillNCLBPromoteDSI-3.pdf> (256 kB). The abstract reads: "The No Child Left Behind (NCLB) Act requires testing in science achievement starting in 2007. Will such testing tend to propagate California's Direct Science Instruction (DSI) [Hake (2004)] throughout the entire nation? After discussing the evidence for the superiority of "interactive engagement" or "guided inquiry" methods over DSI in conceptually difficult areas of science, I indicate seven reasons why NCLB might promote DSI, and one reason - possible *effective* intervention by the National Research Council - why it might not. Hake, R.R. 2006. "Possible Palliatives for the Paralyzing Pre/Post Paranoia that Plagues Some PEP's" [PEP's = Psychologists, Education Specialists, and Psychometricians], Journal of MultiDisciplinary Evaluation, Number 6, November, online at <http://survey.ate.wmich.edu/jmde/index.php/jmde_1/article/view/41/50>. This even despite the admirable anti-alliteration advice at psychologist Donald Zimmerman's site <http://mypage.direct.ca/z/zimmerma/> to "Always assiduously and attentively avoid awful, awkward, atrocious, appalling, artificial, affected alliteration." Hake, R.R. 2007. "Six Lessons From the Physics Education Reform Effort," Latin American Journal of Physics <http://journal.lapen.org.mx/sep07/HAKE%20Final.pdf> (124 kB). Hake, R.R. 2008a. "Randomized Trials (was Can Pre-to-posttest Gains Gauge Course Effectiveness?)" online on the OPEN! AERA-D archives at <http://tinyurl.com/yc3dg6z>. Post of 21 Oct 2008 17:03:30-0700 to AERA-D, ASSESS, EdResMeth, EvalTalk, PhysLrnR, POD, and TIPS. See also Hake (2008b). Hake, R.R. 2008b. "Randomized Trials - ADDENDUM" [response to Bernhardt (2008)], online on the OPEN! AERA-D archives at <http://tinyurl.com/yhxcqbq>. Post of 22 Oct 2008 11:34:36-0700 to AERA-D, ASSESS, EdResMeth, EvalTalk, PhysLrnR, POD, and TIPS. Hake, R.R. 2008c. "Can Pre-to-posttest Gains Gauge Course Effectiveness?" online on the OPEN! AERA-D archives at <http://tinyurl.com/6a393m>. Post of 18 Oct 2008 12:05:53-0700 to AERA-D, ASSESS, EdStat-L, EdResMeth, EvalTalk, PhysLrnR, and POD. Hake, R.R. 2008d. "Can Pre-to-posttest Gains Gauge Course Effectiveness? #2," online on the OPEN! AERA-D archives at <http://tinyurl.com/27d2bwt>. Post of 19 Oct 2008 16:08:08-0700 to AERA-D, ASSESS, EdResMeth, EvalTalk, PhysLrnR, & POD. Most of academia is either oblivious or dismissive of pre/post testing demonstrations of causality in education research, but see Stokstad (2001), DeHaan (2005), Wood & Gentile (2003), Michael (2006), and Hake (1998a,b; 2002; 2005d,e; 2006; 2007; 2008c,e,f,g,h; 2010a,b); Hake, R.R. 2008e. "Can Pre-to-posttest Gains Gauge Course Effectiveness? #2," online on the OPEN! POD archives at <http://tinyurl.com/2emx4e8>. Post of 20 Oct 2008 10:17:40-0700 to EvalTalk, PhysLrnR, & POD. Contains Ed Nuhfer's cogent comment on Dennis Roberts's vacuous statement ". . . when we try to use gain scores (whatever form) to decide about course effectiveness, one is in a funk as to being able to know what effectiveness is a result of." Shortly thereafter Roberts kicked me of his EdStat list. Hake, R.R. 2008f. "Can Pre-to-posttest Gains Gauge Course Effectiveness? #3," online on the OPEN! AERA-D archives at <http://tinyurl.com/2bb6u3y>. Post of 24 Oct 2008 17:44:44 -0700 to AERA-D. The abstract and link to the full post was transmitted to ASSESS, EdStat-L, EdResMeth, EvalTalk, & POD. Therein I wrote: "In my opinion, one should treat Bill Becker's discussion of assessment in areas outside his own field of economics with caution." Hake, R.R. 2008g. "Can Pre-to-posttest Gains Gauge Course Effectiveness? #3 - ADDENDUM," online on the OPEN! AERA-D archives at <http://tinyurl.com/29h3nbv>. Post of 27 Oct 2008 12:45:40 -0700 to AERA-D. The abstract and link to the full post was transmitted to ASSESS, EdStat-L, EdResMeth, EvalTalk, & POD. The abstract reads [see that post for references other than Hake (1998a, 2002)]:"Bill Becker's (2001) criticisms of my survey of introductory physics courses [Hake (1998a)] were shown to be problematic in the section 'Criticisms of the Survey' of 'Lessons from the physics education reform effort' [Hake (2002)]. But Becker, in most of his more recent criticisms [Becker (2004, 2008)] of Hake (1998a) has essentially replayed his earlier statements, essentially ignoring the counters to his criticism contained in Hake (2002) - I give herewith six examples. In my opinion: (a) such non-recognition of counter arguments hardly serves as a model for 'The Scholarship of Teaching and Learning in Higher Education' [Becker & Andrews (2004)]; (b) like biology [Klymkowsky et al. (2003)] and engineering [Smith et al. (2005)], economics education might have something to learn from physics education research [Simkins & Maier (2008)]." Hake, R.R. 2008h. "Design-Based Research in Physics Education Research: A Review," in Kelly, Lesh, & Baek (2008)]. A pre-publication version of that chapter is online at <http://www.physics.indiana.edu/~hake/DBR-Physics3.pdf> (1.1 MB). The abstract reads: "In this chapter I argue that some physics education research (PER) is design-based research (DBR) and that an important DBR-like facet of PER, the pre/post testing movement, has the potential to improve drastically the effectiveness of undergraduate instruction generally, the education of pre-service teachers in particular, and, as a net result, the education of the general population." Hake, R.R. 2010a. "Should We Measure Change? Yes!" online at <http://www.physics.indiana.edu/~hake/MeasChangeS.pdf> (2.5 MB) and as ref. 43 at <http://www.physics.indiana.edu/~hake>. To appear as a chapter in "Evaluation of Teaching and Student Learning in Higher Education" [Hake (in preparation)]. The abstract reads (slightly updated): "Formative pre/post testing is being successfully employed to improve the effectiveness of courses in undergraduate astronomy, biology, chemistry, economics, engineering, geoscience, mathematics, and physics. But such testing is still anathema to many members of the psychology-education-psychometric (PEP) community. I argue that this irrational bias impedes a much needed enhancement of student learning in higher education. I then review the development of diagnostic multiple-choice tests of higher-level learning; normalized gain and ceiling effects; the documented two-sigma superiority of interactive engagement (IE) to traditional passive-student pedagogy in the conceptually difficult subject of Newtonian mechanics; the probable neuronal basis for such superiority; education's lack of a community map; higher education's resistance to change and its related failure to improve the public schools; and, finally, why we should be concerned with student learning."A severely truncated version is online at Hake (2006). Hake, R.R. 2010b. "Re: Quality Research in Literacy and Science Education: International Perspectives and Gold Standards," online on the OPEN! AERA-L archives at <http://tinyurl.com/yhhbu72>. Post of 22 Feb 2010 14:04:43-0800 to AERA-L and Net-Gold. The abstract was sent to various discussion list and also appears at <http://hakesedstuff.blogspot.com/2010/02/re-quality-research-in-literacy-and.html> with a provision for comments. In the abstract I wrote: " . . . . the authors contributing to [this book] appear to be either dismissive or oblivious of physics education research, *inconsistent* with the generally positive opinions of most observers. . . . .[[e.g., Stokstad (2001), DeHaan (2005), Wood & Gentile (2003), Michael (2006)]]. . . . .For example, Millar and Osborne make the following erroneous claims (paraphrasing): "No standard or commonly agreed outcome measures exist for any major topic. Published assessment tools such as the 'Force Concept Inventory' have not been subjected to the kind of rigorous scrutiny of factorial structure and content validity that would be standard practice for measures of attainment or learning outcome in other subject areas." House, E.R. 1991. "Realism in research," Educational Researcher 20(6): 2-9, 25; online to subscribers at <http://edr.sagepub.com/cgi/reprint/20/6/2>. Howe, K.R. 2009a. Educational Researcher 38(6): 428-440; online to subscribers at <http://edr.sagepub.com/cgi/reprint/38/6/428>. This article is in a section titled "Epistemology, Methodology, and Education Sciences," online to subscribers at <http://edr.sagepub.com/content/vol38/issue6/> that also contains responses to Howe from: Eric Bredo , R. Burke Johnson, and Linda C. Tillman; plus comments on those responses by Howe (2009b). Howe, K.R. 2009b. "Straw Makeovers, Dogmatic Holism, and Interesting Conversation," response to comments by Bredo, Burke, and Tillman, Educational Researcher 38(6): 463-466; online to subscribers at <http://edr.sagepub.com/cgi/reprint/38/6/463>. Johnson, R.B. 2001. "Toward a New Classification of Nonexperimental Quantitative Research," Educational Researcher 30(2): 3-13; online to subscribers as a 1.1 MB pdf at <http://tinyurl.com/25nenq3>. Johnson, R.B. 2010. "Re: Cause and Effect," EdResMeth post of 6 Apr 2010 15:21:56-0500; online at <http://tinyurl.com/235aedh>. To access the archives of EdResMeth one needs to subscribe, but that takes only a few minutes by clicking on <http://listserv.uconn.edu/edresmeth-l.html> and then clicking on "Join or leave the list (or change settings)." If you're busy, then subscribe using the "NOMAIL" option under "Miscellaneous." Then, as a subscriber, you may access the archives and/or post messages at any time, while receiving NO MAIL from the list! See also Christensen, Johnson, Turner (2010) and Johnson (2001). Kelly, A.E. 2003. "Research as Design," Educational Researcher 32(1): 3-4; online to subscribers at <http://www.aera.net/publications/?id=393>. See also Kelly, Lesh, & Baek (2008). Kelly, A.E., R.A. Lesh, & J.Y. Baek. 2008. "Handbook of Design Research Methods in Education: Innovations in Science, Technology, Engineering, and Mathematics Learning and Teaching." Routledge. Publisher's information at <http://tinyurl.com/4eazqs>; Amazon.com information at <http://tinyurl.com/5n4vvo>. Lareau, A. & P. Barnhouse. 2010. "What Counts as Credible Research?" Teachers College Record, 01 March; online at <http://www.susanohanian.org/show_research.php?id=343>. See also Walters, Lareau, Ranis (2009). Lipsey, M. 2003. "NOT the AEA statement on Scientifically Based Evaluation, EvalTalk post of 3 Dec 2003 13:22:10-0600; online at <http://tinyurl.com/y5v2fg9>. To access the archives of EvalTalk one needs to subscribe, but that takes only a few minutes by clicking on <http://bama.ua.edu/archives/evaltalk.html> and then clicking on "Join or leave the list (or change settings)." If you're busy, then subscribe using the "NOMAIL" option under "Miscellaneous." Then, as a subscriber, you may access the archives and/or post messages at any time, while receiving NO MAIL from the list! Mark, M. 2009. "Credible Evidence," In Donaldson et al. (2009, pp. 214-238), portions are accessible at the Google Book Preview of Donaldson et al. (2009) at <http://tinyurl.com/y2ezueg>, including most of Mark's discussion on pp. 221-232 of Scriven's (2009) *hypothetical* pre/post test demonstration of causality. In my opinion it would have been more relevant to the real world of evaluation if Mark had discussed the pre/post-test experiments [Hake (1998a,b), Crouch & Mazur (2001), Mazur (2010)] which approximate an actualization of Scriven's hypothetical example. Maxwell, J.A. 2004. "Causal Explanation, Qualitative Research, and Scientific Inquiry in Education," Educational Researcher 33(2): 3-11; online to subscribers at <http://edr.sagepub.com/cgi/reprint/33/2/3>. Mazur, E. 2010. "Confessions of a Converted Lecturer" talk at the University of Maryland on 11 November 2009. The abstract reads: "I thought I was a good teacher until I discovered my students were just memorizing information rather than learning to understand the material. Who was to blame? The students? The material? I will explain how I came to the agonizing conclusion that the culprit was neither of these. It was my teaching that caused students to fail! I will show how I have adjusted my approach to teaching and how it has improved my students' performance significantly." That talk is now on UTube at <http://www.youtube.com/watch?v=WwslBPj8GgI> (click on the view number to see a graph of "Total Views" vs Time); and the abstract, slides, and references - sometimes obscured in the UTube talk - are at <http://tinyurl.com/ybc53jw> as a 4 MB pdf. As of 26 April 2010 10:20:00-0700 Eric's talk had been viewed by 18,093 UTube fans, up from 12,800 on 16 March 2010. In contrast, serious articles in the education literature, often read only by the author and a few cloistered academic specialists, usually create tsunamis in educational practice equivalent to those produced by a pebble dropped into the Pacific Ocean. Michael, J. 2006. "Where's the evidence that active learning works?" Advances in Physiology Education 30: 159-167, online at <http://tinyurl.com/ykzp7lt>. The abstract reads: "Calls for reforms in the ways we teach science at all levels, and in all disciplines, are wide spread. The effectiveness of the changes being called for, employment of student-centered, active learning pedagogy, is now well supported by evidence. The relevant data. . . . [[little, if any from RCT's]]. . . . . have come from a number of different disciplines that include the learning sciences, cognitive psychology, and educational psychology. There is a growing body of research within specific scientific teaching communities that supports and validates the new approaches to teaching that have been adopted. These data are reviewed, and their applicability to physiology education is discussed. Some of the inherent limitations of research about teaching and learning are also discussed." Miles, J. 2001. "Research Methods and Statistics" Crucial Publishers. Amazon.com information at <http://www.amazon.co.uk/exec/obidos/ASIN/1903337151/jeremymiles>. Miles, J. 2008. Re: Can Pre-to-posttest Gains Gauge Course Effectiveness? #2, AERA-D post of 19 Oct 2008 20:39:41 -0700; online on the OPEN! AERA-D archives at <http://tinyurl.com/29ekn4o>. Jeremy Miles manages a Psychology Research Methods Wiki: <http://www.researchmethodsinpsychology.com/wiki/index.php?title=Main_Page > based on his book "Research Methods & Statistics [Miles (2001)]. Mosteller, F. & R. Boruch, eds. 2002. "Evidence Matters: Randomized Trials in Education Research." Brookings Institution. Amazon.com information at <http://tinyurl.com/59gp6o>. NAE. 1999. National Academy of Education report "Recommendations Regarding Research Priorities: An Advisory Report to the National Educational Research Policy and Priorities Board," online at <http://www.naeducation.org/Research_Priorities_Publication.pdf> (217 kB). NEA. 2003. National Education Association, letter to the Honorable Rod Paige, Secretary of Education at <http://www.eval.org/doe.nearesponse.pdf > (88 kB). Pawson, R. & N. Tilley. 1997. "Realistic Evaluation." Sage, publisher's information at <http://www.sagepub.com/booksProdDesc.nav?prodId=Book205276&;>. Amazon.com information at <http://tinyurl.com/22kqar9>. Note the searchable "Look Inside" feature. An overview by Tilley, presented at the Founding Conference of the Danish Evaluation Society, September 2000, is at <http://tinyurl.com/26rjcs2> as a 53 kB pdf. Pawson, R. 2006. "Evidence-Based Policy: A Realist Perspective." Sage, publisher's information at <http://www.sagepub.com/booksProdDesc.nav?prodId=Book227875&;>. Amazon.com information at <http://tinyurl.com/24ppavv>. Note the searchable "Look Inside" feature. Phillips, D.C. 2000. "Expanded Social Scientist's Bestiary: A Guide to Fabled Threats To, and Defenses of, Naturalistic Social Science." Rowman & Littlefield - information at <http://tinyurl.com/ycmlvy>. The late Paul Meehl <http://en.wikipedia.org/wiki/Paul_E._Meehl> wrote: "Should be required reading for all Ph.D. candidates in social science. It is a mind clearing analysis of the highest order, prophylactic and curative of the numerous methodological and substantive ills that afflict us. It is especially needed today when the 'positivist-bashers' are using the Vienna Circle's mistakes and Kuhn's exaggerations for obscurantist purposes." Phillips, D.C. & N.C. Burbules. 2000. "Postpositivism and Educational Research." Rowman & Littlefield; publisher's information at <http://tinyurl.com/yncvls >. Amazon.com information <http://tinyurl.com/yelju39>. See especially "Mistaken accounts of positivism," pp. 11-14. Phillips, D.C. 2006. "A guide for the perplexed: Scientific educational research, methodolatry, and the gold versus platinum standards," Educational Research Review 1(1): 15-26; an abstract, online at <http://tinyurl.com/yhztz6w>, reads: ". . . .the main discussion focusses upon the end of this continuum where there are located the recent attempts to restore rigor to educational research by using the so-called 'gold standard' of randomized field trials. It is argued that . . . .[[this misrepresents the nature of science]]. . . . ., and some examples are briefly mentioned in order to covey the point that IT IS FRUITFUL TO VIEW SCIENTISTS AS MAKING CONVINCING CASES, cases that appeal to a wide variety of evidence. This assessment of scientific cases is called the 'PLATINUM STANDARD'." See also Phillips (2009). Phillips, D.C. 2009. "A Quixotic quest? Philosophical issues in assessing the quality of education research," in Walters et al. (2009, pp. 163-195), accessible at Amazon's "Look Inside" feature at <http://tinyurl.com/yyc8jd9>. Popkewitz, T.S. 2004. "Is the National Research Council Committee's report on Scientific Research in Education scientific? On trusting the manifesto." Qualitative Inquiry 10(1): 62-78; an abstract is online at <http://qix.sagepub.com/cgi/content/abstract/10/1/62>. Popp, S.O. 2010. EdResMeth post of 7 Apr 2010 07:42:42-0700; online at <http://tinyurl.com/29d6jzt>. Rudd, A. 2010. "Cause and Effect," EdResMeth post of 6 Apr 2010 13:44:43-0400; online at <http://tinyurl.com/y7ffx54>. To access the archives of EdResMeth one needs to subscribe, but that takes only a few minutes by clicking on <http://listserv.uconn.edu/edresmeth-l.html> and then clicking on "Join or leave the list (or change settings)." If you're busy, then subscribe using the "NOMAIL" option under "Miscellaneous." Then, as a subscriber, you may access the archives and/or post messages at any time, while receiving NO MAIL from the list! Schneider, B., M. Carnoy, J. Kilpatrick, W.H. Schmidt, & R.J. Shavelson. 2007. "Estimating Causal Effects Using Experimental and Observational Designs: A Think Tank White Paper" AERA, publisher's information and FREE download at <http://www.aera.net/publications/Default.aspx?menu_id=46&id=3360>. Schoenfeld, A.H. 2002. "Research methods in (mathematics) education." in L.D. English, ed. "Handbook of international research in mathematics education," (pp. 467-488). Erlbaum. Amazon.com information at <http://tinyurl.com/y6ssmn3>. See also English (2008). For those who can read between the lines the Google Book Preview <http://tinyurl.com/y3euscn>of English (2008) contained the following pages of Schoenfeld's article when I examined it on 24 April 2010: pp. 467-469, 471, 473-476, 481-485, 489-492, 496, 498-501, 506-507, 510-514. But which pages can and cannot be seen may depend on the circumstances. At <http://tinyurl.com/6nl27k> Google states: "Many of the books you can preview on Google Books are still in copyright, and are displayed with the permission of publishers and authors. You can browse these 'limited preview' titles just as you would in a bookstore, but you won't be able to see more pages than the copyright holder has made available. When you've accessed the maximum number of pages allowed for a book, any remaining pages will be omitted from your preview. Scriven, M. 2008. "A Summative Evaluation of RCT Methodology: & An Alternative Approach to Causal Research" Journal of Multidisciplinary Evaluation 5(9): 11-24; online at <http://survey.ate.wmich.edu/jmde/index.php/jmde_1/article/view/160/186>. Scriven, M. 2009. "Demythologizing Causation and Evidence" in Donaldson et al. (2009, pp. 134-152). Two points: (1) Scriven gives a hypothetical pre/post test demonstration of causality which has been discussed by Mark (2009). But, in my opinion, it would have been more relevant to the real world of evaluation if Mark had discussed the pre/post-test experiments [Hake (1998a,b), Crouch & Mazur (2001), Mazur (2010)] which approximate an actualization of Scriven's hypothetical example. (2) Scriven's reference to "A critical appraisal of the case against using experiments to assess school (or community) effects" [Cook (2001)] states that it's online at <http://www.hoover.org/publications/ednext/3399216.html>. But that URL yields a redirect to <http://educationnext.org/> where a search for "Cook" yields only the non-scholarly popularization "Sciencephobia: Why education rejects randomized experiments," Education Next 1(3): 62-68 (2001) at <http://educationnext.org/sciencephobia/>. For the non-Hooverized article "A critical appraisal of the case against using experiments to assess school (or community) effects" [Cook (2001)] as originally written by Cook click on <http://media.hoover.org/documents/ednext20013unabridged_cook.pdf> (131 kB) Scriven, M. 2010. "Rethinking Evaluation Methodology," Journal of MultiDisciplinary Evaluation 6(13), online at <http://survey.ate.wmich.edu/jmde/index.php/jmde_1/article/view/264/253>. Shadish, W.R., T.D. Cook, & D.T. Campbell. 2002. "Experimental and Quasi-Experimental Designs for Generalized Causal Inference." Amazon.com information at <http://www.amazon.com/dp/0395615569/?tag=katlin-20> A goldmine of references to social-science research. Portions of the book are online at <http://depts.washington.edu/methods/readings/Shadish.pdf> (1.8 MB). Shavelson, R.J. & L. Towne, eds., 2002. "Scientific Research in Education" (SRE), National Academy Press; online at <http://www.nap.edu/catalog/10236.html>. On page 114 it is stated: "IN SOME SETTINGS, WELL-CONTROLLED QUASI-EXPERIMENTS MAY HAVE GREATER 'EXTERNAL VALIDITY' - GENERALIZABILITY TO OTHER PEOPLE, TIMES, AND SETTINGS'- THAN EXPERIMENTS WITH COMPLETELY RANDOM ASSIGNMENT (Cronbach et al., 1980; Weiss, 1998)." Among members of the Academy's "Committee on Scientific Principles for Education Research" that authored SRE were (aside from Shavelson & Towne): Robert Boruch, Jere Confrey, Robert DeHaan, Margaret Eisenhart, Eugene Garcia, Norman Hackerman, Eric Hanushek, Ellen Condliffe Lagemann, Dennis Phillips, and Carol Weiss. See also: (a) the Educational Researcher (2002) theme issue 31(8), online to subscribers at <http://edr.sagepub.com/content/vol31/issue8/> carrying responses to SRE and a reply to the responses by Feuer, Towne, & Shavelson (2002b); (b) Eisenhart & Towne's (2003) review of definitions of "scientifically based research"; (c) Maxwell's (2004) critique of the SRE's emphasis on quantitative research as the sole warrant for causality; (d) Towne, Wise, & Winters (2004) sequel to SRE "Advancing Scientific Research in Education"; (e) Popkewitz's (2004) "Is the National Research Council Committee's report on Scientific Research in Education scientific?"; (f) Howe's (2009a,b) discussion of SRE as "the new scientific orthodoxy." Shavelson &Towne (2002; p. 3-5) wrote: The Committee argued that ALL THE SCIENCES, INCLUDING SCIENTIFIC EDUCATIONAL RESEARCH, SHARED A SET OF EPISTEMOLOGICAL OR FUNDAMENTAL GUIDING PRINCIPLES, and that all scientific endeavors should: (a) pose significant questions that can be investigated empirically, (b) link research to relevant theory, (c) use methods that permit direct investigation of the questions, (d) provide a coherent and explicit chain of reasoning, (e) attempt to yield findings that replicate and generalize across studies, and (f) disclose research data and methods to enable and encourage professional scrutiny and critique." Shelley, M.C., L.D. Yore, & B. Hand, eds. 2009a. "Quality Research in Literacy and Science Education: International Perspectives and Gold Standards." Springer, publisher's information at <http://www.springerlink.com/content/g2447682464446x2/>. Amazon.com information at <http://tinyurl.com/yf7efra>, note the searchable "Look Inside" feature. Barnes & Noble information at <http://tinyurl.com/y8n9pe9>. An expurgated (teaser) version is online as a Google "book preview" at <http://tinyurl.com/yddphh3>. For a lukewarm review see Hake (2010b). Shelley, M.C., L.D. Yore, & B. Hand, eds. 2009b. "Education Research Meets the 'Gold Standard': Evaluation, Research Methods, and Statistics after No Child Left Behind," Chapter 1, pages 3-18, in Shelley et al. (2009a). Surprisingly, the Google book preview of Shelley et al. (2009) at <http://tinyurl.com/yddphh3> contains all of pages 3-15. To see this use the ">" at the top of the first page to go to page vi and then click on chapter 1. Stipek, D. 2005. "Scientifically Based Practice: It's About More Than Improving the Quality of Research," Education Week 24(28): 33-34; online to discussion-list followers in the APPENDIX of Hake (2005c) at <http://tinyurl.com/29almt3>, as allowed by the "fair use" provision of copyrighted material under section 107 of U.S. Copyright Law - see e.g., <http://www.law.cornell.edu/uscode/17/107.shtml>. Stokes, D.E. 1997. "Pasteur's quadrant: Basic science and Technological Innovation." Brookings Institution Press, publisher;s information at <http://www.brookings.edu/press/Books/1997/pasteur.aspx>. Amazon.com information at <http://tinyurl.com/lto97>. Stokstad, E. 2001. "Reintroducing the Intro Course," Science 293: 1608-1610, 31 August; an abstract is online at <http://www.sciencemag.org/cgi/content/summary/293/5535/1608>. Stokstad wrote: "Physicists are out in front in measuring how well students learn the basics, as science educators incorporate hands-on activities in hopes of making the introductory course a beginning rather than a finale." Thompson, B. 2010. "Re: Cause and Effect," EdResMeth post of 6 Apr 2010 13:00:55-0500; online at <http://tinyurl.com/yb2t3o9>: Thompson wrote: "If you use (a) regression discontinuity designs. . . . . .[[see. e.g. Shadish et al. (2002, pp. 207-245)]]. . . ., or (b) create a control group using propensity scores . . . .[[op cit, p. 122 & pp. 161-165]]. . . . I think you can come reasonably close to a true experiment." For Scriven's dour view of the phrase "true experiment" see Gold Standard Skeptic Statement #14 in this post. Towne, L., L.L.Wise, & T.M. Winters, eds. 2004. "Advancing Scientific Research in Education." National Academies Press; online at <http://www.nap.edu/catalog.php?record_id=11112>. USDE. 2005. "Scientifically Based Evaluation Methods; Notice. Federal Register 70(15), 25 January, Part II, Dept. of Education online as a 111 kB pdf at <http://tinyurl.com/y4w3ygm>. USDE. 2008. U.S. Dept. of Education, "What Works Clearinghouse Evidence Standards For Reviewing Studies, Version 1.0, Revised May 2008,"online at <http://ies.ed.gov/ncee/wwc/pdf/wwc_version1_standards.pdf> (147 kB). See also USDE (2005). Walters, P.B., A. Lareau, & S. Ranis, eds. 2009. "Education research on trial: Policy reform and the call for scientific rigor." Routledge, publisher's information at <http://www.routledge.com/books/details/9780415989893/>. Amazon.com information at <http://tinyurl.com/yyc8jd9>. Note the searchable "Look Inside" feature. Weiss, C.H. 1997. "Evaluation: Methods for studying programs and policies." Prentice Hall, 2nd edition. Amazon.com information at <http://tinyurl.com/2bv6239>. Weiss, C.H. 2002."What to Do until the Random Assigner Comes" in Mosteller & Boruch (2002). See also Weiss (1997). Wood, W.B., & J.M. Gentile. 2003. "Teaching in a research context," Science 302: 1510; 28 November; online to subscribers at <http://www.sciencemag.org/cgi/reprint/302/5650/1510.pdf>. A summary is online to all at <http://www.sciencemag.org/cgi/content/summary/302/5650/1510>. Yore, L.D. & S. Lerman. 2008. "Metasynthesis of qualitative research studies in mathematics and science education" (editorial). International Journal of Science and Mathematics Education 6(2): 217-223. The first page is online at <http://www.springerlink.com/content/m2201uq34245670g/>. Yore, L.D. & P. Boscol. 2009. "Why 'Gold Standard' Needs Another "s": Results from the Gold Standard(s) in Science and Literacy Education Research Conference," Chapter 2, pages 17-39, in Shelley et al. (2009). On 25 April 2010 I was able to access this entire article except for pages 18 & 19 at the Google book preview of Shelley et al. (2009) at <http://tinyurl.com/yddphh3>. Which pages can and cannot be seen may depend on the circumstances. At <http://tinyurl.com/6nl27k> Google states: "Many of the books you can preview on Google Books are still in copyright, and are displayed with the permission of publishers and authors. You can browse these 'limited preview' titles just as you would in a bookstore, but you won't be able to see more pages than the copyright holder has made available. When you've accessed the maximum number of pages allowed for a book, any remaining pages will be omitted from your preview. You can order full copies of any book using the 'Get this book' links to the side of the preview page. See also Yore & Lerman (2008). .