The power and limits of controlled experiments

The Freakanomics blog is running a fun contest asking readers to predict whether providing more factual evidence of impact increases or decreases donations.  Dean Karlan, co-author with Jacob Appel of More Than Good Intentions is running an experiment in partnership with Freedom From Hunger.  Dean wants to understand whether sharing, with donors, cold hard facts about the proven effectiveness of a business training program run by Freedom From Hunger increases or decreases donations.

As context, there is a well-documented study conducted by Deborah Small, George Loewenstein, and Paul Slovic that tests how potential donors respond to generalized factual information about hardship versus the story of an individual girl, named Rokia, in Mali.  The punchline is that the story of Rokia elicits about 2x the donations as does a brief with summary factual information about poverty.

Put another way: stories sell, facts don’t.  (remember, we think with our brains, but…)

Dean Karlan’s experiment is designed to test whether “story + facts” is more or less effective than “story.”   To test this, Freedom from Hunger sent out two mailers.  The control mailer just has the story of Rita, and it starts:

Many people would have met Rita and decided she was too poor to repay a loan.  Five hungry children and a small plot of mango trees don’t count as collateral.  But Freedom from Hunger knows that women like Rita are ready to end hunger in their own families and their communities…

The treatment mailer has different copy:

In order to know that our programs work for people like Rita, we look for more than anecdotal evidence.  That is why we have coordinated with independent researchers to conduct scientifically rigorous impact studies of our programs.  In Peru they found that women who were offered our Credit with Education program had 16% higher profits in their businesses than those who were not, and they increased profits in bad months by 27%!  This is particularly important because it means our program helped women generate more stable incomes throughout the year.

These independent researchers used a randomized evaluation, the methodology routinely used in medicine, to measure the impact of our programs on things like business growth, children’s health, investment in education, and women’s empowerment.

The question is: which mailer will have a higher response?

My guess is that the first one wins (even though this mailer is being sent to repeat donors, who have probably heard this story before – and it’s fair to guess that I’m wrong, otherwise why would they have blogged the contest in this way?).

Whether or not I’m right, I’d like to see a better-designed study. It feels misleading to me to describe this as testing “story” versus “story + facts.”  I’d instead say it’s testing “good letter” versus “only OK letter,” and if my take is right, the generalizability of these results will be low indeed.

This is on my mind because last week I had the chance to hear Esther Duflo speak about some of the examples from her book, Poor EconomicsMany of them are highly compelling – particularly those in which the treatment being tested (e.g. de-worming) is clear and readily measurable.

But the risk of the randomized-control trial rage (which is, very appropriately, a hot and exciting topic in our field right now, and Esther and Abhijit have been champions of high-quality, clear thinking) is that we over-extend our definition of “treatment” that can meaningfully be assessed in this way.  For example, one of the examples Esther cited in her talk was about whether poor farmers were willing to pay enough for a weather insurance product to make the product commercially viable.  In this test, farmers were offered a relatively simple and straightforward product that would pay them a certain amount if recorded rainfall at the weather station dropped below a certain level.  The conclusion, as described by Esther in the talk (and stated more strongly than she does in the book), was that farmers wouldn’t pay enough – and I heard her take this to mean that the insurance market for the poor might not be viable without significant subsidy.

Not having dug into the research – but having heard Esther’s description – I was left worried that in this case, like in the Dean Karlan study about the mailer, we run a real risk of overreaching in the conclusions we draw.  It may well be that market-based insurance for the poor doesn’t work; it may be that government needs to provide a subsidy; it may also be that in a market in which there is a limited track record of insurance, little history of or confidence in payouts, no competition and almost no trust, the study showed that willingness to pay was low – which wouldn’t be in the least bit surprising.

What I’m getting at is that sometimes our attitude about figuring out “what works” in poverty alleviation feels like designing studies, in the 1980s or 1990s, on the future of the tablet market based on intensive study of the Apple Newton and early tablet PCs.  Assuming everything is static, there’s no market.  But of course the whole point is NOT to let things be static – to create the development equivalent of the iPhone and the iPad through relentless innovation and a dogged unwillingness to fail.

This is an important point because at some fundamental level we must ask ourselves how much we believe in the power of innovation.  How far do we push, prod and experiment before we conclude that something does, or doesn’t, work?  In the simple example of the Freedom from Hunger mailer, I’m betting that some drastically better copy would have the desired effect (or a bigger desired effect) of using hard data to increase donations.  In the insurance example, I’d be interested in a lot more product development, market testing, and trust-building with smallholder farmers before drawing any broad conclusions.  And so it goes across the board with all the major interventions in the fight on poverty, from microfinance to girls’ education to de-worming to fortifying food to to HIV/AIDS prevention (where, shockingly, male circumcision is proving to be a very effective way to slow the spread of disease).

I don’t want to come out against testing, rigor, and “proof” – not at all.  We need all of these things, and need to have the ability to ask tough questions, to be willing to let things go quickly when they’re not working, and to over-resource things that are working even if they contradict our initial assumptions.  At the same time, our field – and, specifically, the injection of real innovation into our field – is nascent enough that it feels early in most cases to aspire to draw anything but narrow conclusions about what does and doesn’t work; where the poor are and are not willing to pay; and what interventions will have the greatest impact over time.  We’ve seen this play out most recently and most vociferously in the microfinance space – too-broad claims that it changes everything, and then equally broad claims that it does nothing – when surely the right answer is that when done right it can be valuable, when done wrong it can be destructive.   I’m sure we’ll see this same story play out time and time again, across interventions, across sectors, and across geographies.