What Counts as Credible Evidence in Applied Research and Evaluation Practice?What Counts as Credible Evidence in Applied Research and Evaluation Practice? by Stewart I. Donaldson
The evaluation field continues to engage in paradigm wars that involve heated debates over which approaches and methodologies produce the most reliable results to support evidence-based policy-making. Somewhat regretfully, the commendable goal of enhanced rigour in evaluation research has been hijacked by a focus on a narrow set of experimental methods—randomized controlled trials or RCTs—which have been proclaimed as the ‘gold standard’ by their proponents. This trend has been boosted by calls for unambiguous measurements of results and impacts, and cost-efficiency, by policy-makers and bureaucrats struggling with making policy choices and undertaking programs under increasing resource constraints. On the other side, the reaction from the proponents of more qualitative methodologies and participatory approaches to evaluation has been strong, even emotional at times. As a professional evaluator, I’ve witnessed these brawls first hand.

This book makes an excellent contribution to the debate through a balanced presentation of the issues and by letting the different sides to make their respective cases. The authors in the book include a number of leading scholars and practitioners in the field. The perspective is North American (all authors work in the US or Canada) and draws heavily on experiences from education and social services. Although my own work pertains to evaluating international development programs, I found the discussion in the book on what constitutes credible evidence very valuable.

In the two introductory chapters the editors frame the debate in the context of a search for evidence-based society and how this has played out in the use of experimental and non-experimental designs for collecting evidence. Quite didactically, they place the debate within the broader scientific paradigms of social inquiry, including logical positivism, post-positivism, constructivism and related thinking, and pragmatism. They also tentatively place the chapter authors along these paradigmatic axes. The following eight chapters are divided into two sections arguing, respectively, for experimental and non-experimental routes to credible evidence. The trajectory of the chapters moves from the most hard-core case for experimental designs to an argument for the credibility of image-based research in evaluation. The chapters in between tend to take a conciliatory approach to the extent that the last of the chapters in the experimental section could have been moved to the non-experimental section.

Part II entitled ‘Experimental Approaches as a Route to Credible Evidence’ contains four chapters. In the first, Gary Henry argues for high-quality policy and program impact evaluations as a necessity for providing solid evidence for policy choices, linking the matter to democratic theory and the need to detect and debunk bad policies. His view assumes that there actually are ways to objectively define what produces desirable outcomes—and what such outcomes are. He acknowledges types of bias in evaluation research, but sees them in technical, rather than political terms. In the chapter that follows, Leonard Bickman and Stephanie Reich assess the credibility, reliability and validity of RCTs, concluding that while there are threats to the validity (especially external validity) of RCTs, they still can be seen to be amongst the most credible designs available to evaluators. Despite this overall conclusion, Bickman and Reich acknowledge that there are other, non-experimental approaches to establish causality (many natural science disciplines—geology, astronomy, engineering, and subfields of medicine—base their research on non-experimental designs). In social sciences, they particular highlight the program theory, theory-driven or pattern-matching method, recognizing that such other approaches are needed to supplement RCTs that can only answer a very limited number of questions.

The last chapter in this part of the book, by George Julnes and Debra Rog, introduces the concept of actionable evidence. The authors assert that for evidence to be useful, it should not only be credible but also actionable, defined as adequate and appropriate for guiding actions in targeted real-world contexts. The lengthy chapter takes as its starting point the question of relating the choice of methods to the questions that stakeholders want addressed. They proceed to outline a multitude of evaluation tasks (borrowing from Carol      Weiss) and then consider the implications of the tasks for methodology. Another way of framing the evaluation questions relates to the level of conclusion and the different levels of causal questions in impact evaluation, including whether the evaluation seeks to provide an aggregate description, disaggregation for causal analysis, or an inferential analysis of the underlying constructs and causal mechanisms. They state that, “experimental methods are argued as appropriate for strengthening impact-evaluation conclusions, but the value of these methods is dependent on the level of conclusions being addressed” (p. 104). Summarizing the discussion on the relationships between questions and methods, Julnes and Rog express their view that, while particular questions call for quantitative designs, there is substantial territory open to other designs. This summary leads the authors to consider the contextual factors that affect the adequateness and appropriateness of alternative methods, including policy context and the nature of the phenomena studied. They then discuss how to judge the adequacy of methods for providing the evidence that is needed to address the stakeholder questions identified, and when is it appropriate to use particular methods for causal analysis, taking into account constraints posed by factors internal to the program, evaluation capacity and political constraints, as well as ethical considerations. The discussion is nuanced and fair. Julnes and Rog conclude by affirming the primacy of the evaluation stakeholder questions in influencing the types of evidence needed. They caution against “simple frameworks that drive method choice in a somewhat automatic fashion” and instead wish to support “more informed judgments on method choice in political public policy environments” (p. 128). To me, this is one of the strongest, most thoughtful and balanced chapters in the book. Therefore, it also deserves its place in the middle, bridging the quantitative and qualitative parts.

Another chapter in this section of the book, preceding the chapter discussed above, by Russell Gersten and John Hitchcock focuses on the role of the What Works Clearinghouse, established in 2002 by the U.S. Department of Education. The chapter is descriptive and possibly useful to education researchers and evaluators, but did not raise my own somewhat biased interest.

Part III on ‘Nonexperimental Approaches for Building Credible Evidence’ consists of another five chapters, starting with one of the grand old men of evaluation, Michael Scriven. His chapter, ‘Demythologizing Causation and Evidence,’ is written with flair in lively language. Like Julnes and Rog before him, Scriven enlists other sciences—from mathematical physics and geology to anthropology, ethnography and epidemiology—to demonstrate how experimental methods are but one of the many approaches to analyse causation. He writes, “much of the world of science, suffused with causal claims, runs along very well with the usual high standards of evidence, but without RCTs” (p. 136). Breezing through the origins of causal concepts, the cognitive process of causal inference vs. observation, and the level of evidential certainty required for scientific, legal and practical purposes, he then addresses the alleged supremacy of RCTs as well as “other contenders.”  His myth-busting position is that “(i) the attempted takeover of the terms evidence and cause is partly inspired by the false dichotomy between experiment and quasi-experiment, and (ii) the whole effort is closely analogous to the attempted annexation of the concept of significance by statistically significant” (p. 151).

The following chapter by Jennifer Greene discusses evidence as ‘proof’ vs. evidence as ‘inkling.’ Her premise is that evaluation is both influenced by the political, organizational and sociocultural contexts, as well as it serves to shape that context. Consequently, “evaluation is not a bystander or neutral player in the debates that often surround it, but rather an active contributor to those debates and to the institutions that house them” (p. 153). Her chapter attempts to demonstrate how the present discourse assumes that ‘evidence’ can make social systems ‘efficient and effective’ and how these assumptions convey a particular view of human phenomena and responsibilities of government in democratic societies. The argument is a useful antidote to the positivistic view presented by Henry earlier in the book. Her vision of evidence is not providing the truth, or neat and tidy small answers to small questions. Rather, in Greene’s view evidence must provide a “window into the messy complexity of human experience; evidence that accounts for history, culture, and context; evidence that respects difference in perspective and values; evidence about experiences in addition to consequences; evidence about the responsibilities of government, not just the responsibilities of its citizens; evidence with the potential for democratic inclusion and legitimization of multiple voices—evidence not as proof but as inkling” (p. 166).

Sharon Rallis starts her chapter by telling the story of how she first encountered evaluation when she was teaching a federally funded summer program that was subject to an evaluation. The evaluators insisted on holding on to their plan to assess the program against a single outcome, with no regard to the important associated benefits that the program had bolstering the self-esteem of the participating students. Furthermore, the evaluation based on a quasi-experimental design deprived half of the students from participating in an important part of the program, which Rallis and other program colleagues felt was unfair. While the evaluators claimed that their work was scientific and rigorous, Rallis pondered about the missing piece and came to the conclusion that it was ‘probity’—goodness and moral soundness. Consequently, she began to study evaluation with a commitment to make evaluations useful for the program personnel and participants. This chapter elaborates on her vision of evaluation with probity and moral reasoning, grounded in nonconsequentialist theories. She explains: “The evidence we collect looks quite different from that of our colleagues who measure outcomes. Our aim is not to cast judgment … but to discover what happened and what the experience meant to the program participants. We hope that our discoveries can lead to improving the program and thus the well-being of the participants” (pp. 174-175). Rather than RCTs, evaluation done with these principles borrows tools from fields such as ethnography, phenomenology and sociolinguistics/semiotics. She presents a case from an evaluation and needs assessment of an HIV/AIDS education and prevention program that provides some unexpected insights into the participants’ experiences. She asserts that this work is rigorous because “it is grounded in theory and previous research and in moral principles of justice and caring” (p. 178).

Sandra Mathison in her chapter ‘Seeing Is Believing’ explores the credibility of image-based research and evaluation, as one form of evidence to establish and represent truth and value. Like the part III authors before her, she emphasizes how the credibility of evidence and the knowledge thus created is contingent on experiences, perception and social conventions. Image-based research uses images in three ways: (i) as data or evidence; (ii) as an elicitation device to collect other data, and (iii) as a representation of knowledge (p. 184). Mathison posits four considerations for establishing the credibility of image-based research: (1) quality of the research design, (2) attention to context, (3) adequacy of the image from multiple perspectives, and (4) the contribution images make to new knowledge (p. 188).

The last chapter in part III, by Thomas Schwandt, is entitled ‘Toward a Practical Theory of Evidence for Evaluation’ and it functions as a kind of recap of what has come before; as such, it could have equally well been placed in part IV on conclusions. This is another rich chapter that goes to the heart of the debate of what we mean by evidence: “…information helpful in forming a conclusion or judgment. Framed in a more rigorous epistemological perspective, evidence means information bearing on whether a belief or proposition is true or false, valid or invalid, warranted or unsupported. At present, we face some difficulty and confusion with understanding the term evidence in evaluation because it is often taken to be synonymous with the term evidence-based” (p. 199). He then proceeds to problematize the term evidence-based, as being narrowly interpreted to mean only a specific kind of finding regarding causal efficacy. Secondly, Schwandt argues why evidence cannot serve as a secure and infallible base or foundation for action. Furthermore, he emphasizes that, as an aspect of policy making, evaluation must consider ethics. Schwandt concludes that “deciding the question of what constitutes credible evidence is not the same as deciding the question of what constitutes credible evaluation … However necessary, developing credible evidence in evaluation is not sufficient for establishing the credibility of an evaluation” (p. 209). He further asserts that method choice alone does not determine what is credible and convincing evidence. He calls for framing evidence in a practical-theoretical way that that is concerned with the character and ethics of evidence and the contexts in which evidence is used.

In the final part of the book, Melvin Mark summarizes the different perspectives of the book with an aim of changing the terms of the debate. He concludes: “Extensive and continued discussion of the relative merits and credibility of RCTs versus other methods would have limited capacity to move forward our understanding and our practice. … by changing the terms of the debate, we may be able to improve understandings of deeply entrenched disagreements; move toward a common ground where such can be found; better understand the disagreements that remain; allow, at least in select places, credible evidence as part of the conversation; and enhance the capacity of stakeholders to make sensible decisions rather than be bewildered by our disagreement or draw allegiances based on superficial considerations” (pp. 237-8). Certainly a deserving goal.  The book ends with an epilogue by Stewart Donaldson that attempts to provide a practitioners guide for gathering credible evidence in the evidence-based global society.

I thoroughly enjoyed reading this book and although much of the debate around epistemology, approaches and methods is familiar to someone educated and working in evaluation and applied social science research, the way it is framed in this book is truly enlightening. For a thoughtful reader, it becomes evident that the truth—as almost always in such debates—is somewhere in the middle. All of the approaches and methodologies have merit when used appropriately and in appropriate contexts. Both experimental and non-experimental methods can be rigorous, but both can also have serious flaws with regard to internal and external validity, relevance and appropriateness. The old saw about everything looking like a nail when you have only a hammer in your tool box is true here as well. The take-home lesson is that, instead of allowing methods to dictate one’s evaluation questions and designs, one should choose one’s methods according to the questions one wants answered.

Shooting America

On Friday, 14th of December 2012, a young gunman walked into Sandy Hook elementary school in Newtown, Connecticut, and proceeded to shoot and kill 26 people, including 20 children in the ages of 6 and 7 years. This horrific incident was one of the worst mass murders even in the violent history of America. Unfortunately, however, it wasn’t entirely isolated. In fact, this was the third incident of a similar kind to take place in the United States just in 2012. These recent cases include the movie theatre shooting in Aurora, Colorado, on the 20th of July 2012, in which twelve people were killed and 58 injured; and the shooting at a Sikh Temple in Oak Creek, Wisconsin, on August 5, in which a white supremacist killed six people, possibly mistaking the Sikhs for Muslims. Six of the worst mass shootings in America have taken place since 2007.

Could this finally be a turning point—the straw that broke the camel’s back, in the words of Sen. Dianne Feinstein of California—in the American attitude towards gun violence? There are signs of outrage that seem unprecedented; after all, the horror of the slaughter of these little children is just too painful, too hard even to imagine. But don’t hold your breath.

There is much focus on the mental health of the shooter and on providing increased security in schools. This is understandable. However, I do not see this primarily as a mental health issue. This is not meant to belittle the importance of psychological factors or to deny the importance of mental illness as an explanatory factor. America’s mental health care system is clearly broken and there seem to be extraordinarily many sick people who turn to violence. However, there are crazy people everywhere, but in most other places they can be stopped before they are able to commit mass murder. Just recently, a Chinese nutcase attacked a school in his own country. Armed only with a knife, he was able to injure a number of people but not to kill anyone before he was apprehended.

In this latest case in Connecticut, the assailant was in the possession of three advanced pieces of weaponry: Glock and Sig Sauer handguns and an AR-15 Bushmaster semi-automatic rifle. He used this firepower to enter the school and then to kill the kids and their teachers. It is worth noting that apparently he had acquired access to these guns from his mother (who was the first victim of his killing spree), a gun enthusiast.

What surprises me is that so relatively few Americans have arrived at the inevitable conclusion that access to high-powered firearms itself is a threat. While a few people like Feinstein and Dan Gross of the anti-gun violence Brady Campaign have systematically brought the issue of stronger gun control up, the debate still seems to be up in the air. Although most pro-gun politicians have had the sense of laying low since Friday’s tragedy, a number of gun activists, sensing a threat to their God-given right to carry any weapon, have again decided that attack is the best defense. People like John Lott, author of More Guns, Less Crime, has been seen on a number of TV talk shows peddling his claim that everywhere where stricter gun controls have been effected, murder rates have gone up (would someone please look at the statistics behind this implausible claim). Citing the Aurora example, Lott asserts that the shooter actually chose that particular movie theatre because it didn’t allow guns. There were other theatres closer to the murderer’s home but, Lott implies, there might have been armed people there who would have returned the fire, so the shooter was afraid of attacking them. Just imagine a gunfight in a crowded and dark movie theatre.

Another creep, Philip Van Cleave, president of the sinister sounding Virginia Citizens Defense League, actually had the temerity to tell Washington Post on Sunday after the Newtown massacre that guns are fun. Defending the people’s wish to own semi-automatic weapons like the AR-15 (implicated in the three latest massacres mentioned above), he was quoted by the newspaper as saying: “I could ask you why should anyone want a Ferrari? [Bushmasters] are absolutely a blast to shoot with. They’re fast. They’re accurate. … Guns are fun, and some of them are much more cool than others.” Apart from the stunning insensitivity, Van Cleave’s views may not be that rare amongst Americans.

Who exactly Lott, Van Cleave and their ilk envision could have been better armed to fight back in the Sandy Hook elementary school in Newtown is not clear to me. The young teachers who died with their students? Or perhaps the children themselves? Employing armed guards and arming teachers or movie goers or citizens in general so that they can return fire when a deranged person starts shooting at them is such a dystopian vision that few of us I imagine would cherish. And would all of us really want to be trained in handling guns?

Maybe I’m wrong. Maybe many Americans do see this as a desirable way of protecting oneself and one’s liberties. After all, this is a country where large groups of people arm themselves in preparation against a takeover by the socialists in the Federal Government (or even—gasp!—the United Nations) . It would seem inevitable that an even larger number of people walking and driving around with concealed weapons would result in those guns being used when things heat up in, say, a traffic jam or a supermarket line. And the risk of innocent bystanders getting hurt in these altercations seems high.

This happened on Friday, August 24, 2012, when a man shot his former co-worker near Empire State Building in Manhattan. The police killed the assailant immediately following the incident, in the process injuring eight innocent bystanders with ricocheting bullets—and remember, these were highly trained law enforcement officers who are experts in handling firearms.

“Guns don’t kill people; people kill people,” has long been the rallying call of the supporters of the National Rifle Association (NRA), a powerful pro-gun lobby. Regretfully, that’s just not true. There will always be nuts who want to kill people. However, their ability to do so is significantly increased by the availability of guns. Later the same evening, the brother of the Empire State Building victim was interviewed by CNN. His opinion: Don’t turn this into a referendum on firearms; if the killer hadn’t had a gun, he could have used a “baseball bat or whatever.” Perhaps so. People who are inclined to commit premeditated murder do find a way to do so. A baseball bat would do the trick, but it just might be more difficult to carry it concealed to the spot, then un-shield it rapidly when the intended victim is in sight, and to wield it to his head—all on a crowded city street. The intended victim might have a higher chance of escape, too, when he sees the batter approaching. In this particular case, the victim was also physically larger than the assailant, which might have made a difference in a fight without guns.

A more important point is that premeditated murder is, by definition, usually targeted towards a specific individual against whom the would-be murderer bears a grudge. Such murders occur in all countries.

There are two very obvious cases where the availability of guns does create a much larger hazard. One, as we have seen, is indiscriminate mass murder by a lunatic. An individual’s ability to massacre a large number of people is directly correlated with the availability of guns, especially powerful assault weapons. In this year’s cases, the madmen’s ability to kill was multiplied by the availability of such automatic weapons, with clips holding more than 100 rounds of ammunition reducing the need to reload. Such weapons and clips have no legitimate use in private hands, as they are by no stretch of imagination needed for hunting or target practice. Their only purpose is to enable the killing of as many people as possible in close combat.

The other case, which so obviously speaks against having guns around is that the majority of killings in the US happen between family members and people who know each other. Only a small fraction of these are premeditated murders. Most are either accidents (every year many people shoot themselves or their dear ones accidentally when fondling their beloved guns) or happen when arguments—between spouses, friends or colleagues—heat up and a loaded gun happens to be handy. Guns kill people, even when people don’t intend to do so.

In the December 2012 issue of The Atlantic, that appeared on newsstands just before the Connecticut massacre, Jeffrey Goldberg argues that it is too late to install any further gun controls in America. There are already some 280-300 million guns owned by private citizens in America and each year this number is increased by more than 4 million. These are of course stupendous figures, given that the total population of the USA is just around 311 million. I have recently heard that slightly less than half of American adults own a gun. Simple arithmetic thus implies that these people have multiple guns at home. Goldberg draws the conclusions that it would be impossible anymore to regulate the situation through democratic means and, therefore, it would be better to give more guns to law-abiding citizens so that they can defend themselves. This is a saddening view, although it does have a certain logic. Yet, by the same logic it would be futile to attempt to address any similar issue that involves an advanced situation, including nuclear disarmament. His solution would also bring back the Wild West in which disputes were settled with six-shooters.

Except that in the Wild West, guns were quite strictly controlled in towns where the sheriff made sure that gunslingers would check their weapons at the gate. Joe Klein in Time magazine (August 6, 2012) outlined how this free-for-all guns-galore is not an American tradition or what the Founding Fathers expected. Rather it’s a result of a concerted advocacy effort by all kinds of right-wing groups since the 1970s to overturn gun control legislation. In 1993, during Bill Clinton’s presidency, legislation was passed to ban assault weapons, but that too was let expire a decade later.

Of all industrialized countries, the United States has by far the largest amount of guns per capita: 88.8 firearms per 100 people. This is far more than the 54.8 in the second most gun-heavy country in the world, Yemen. America has some 5% of the world’s population but, depending on the estimate, up to half of the world’s firearms in the hands of private citizens. Topping the list with Yemen, which many consider en route to becoming a failed state and which now is the principal host of Al-Qaeda on the Arabian Peninsula, should not be an accolade most Americans would want. Still a large number of Americans seem to be perfectly content with it.

Unfortunately, my own country of origin, Finland, is number 4 on the list (the third place goes to Switzerland, where most men who have served in the army have a rifle in their closet). Why Finland, the seemingly tranquil Nordic country known for its peace-making efforts on the world scene? I suppose the reasons are something similar to those that would be applicable to America, too: a macho culture of rugged individualists. There, too, men go on rampage shooting people.

Another factor that militates against gun control in America is (you guessed it) money. There is a lot of money involved in gun sales and they are not limited to domestic sales, as if the 4 million guns sold in the US annually wouldn’t be enough. Mexico’s drug war is largely fought with American guns: it is estimated that 80% of the guns confiscated from Mexican gangs have been bought legally in the US.

In the American political system, lobbyists for special interest groups play a central role. Politicians find it hard to go against the lobbyist if they want to stay in power. According to the Time article by Joe Klein, NRA has funded a total of US$18.9 million to political parties and candidates running for federal office since 1990. Of this amount, 82% has gone to the Republicans. The New York Times columnist Charles Blow has calculated that the NRA’s financial contributions to politicians in Washington are 4,100 times larger than those by the largest anti-gun organization, the Brady Campaign.

When culture, tradition, corporate interests and money come together, it will be a tough job to go against them and change things when it comes to the prevalence of guns and the ensuing gun violence. This is the reason for my pessimism, but I do hope that I am proven wrong. Perhaps, in the unspeakable tragedy of the Sandy Hook elementary school, there lies a seed of hope for some modest reform. If nothing more, it would seem reasonable to start by banning semi-automatic assault weapons and clips containing tens or hundreds of rounds of ammunition.