Measuring More and Learning Less

admin

3 months ago

Tangled paper strips blue, yellow and red with arrows pointing different ways

(Photo by iStock/adventtr)

In the past two decades, impact evaluation has become an unavoidable topic in the social sector. Yet beyond the discourse on how to measure social impact lies a structural problem: We are not producing reliable knowledge about what works, regardless of the method used. While part of the debate gravitates toward Randomized Control Trials (RCTs), the real gap lies in the absence of standards, capacities, and institutional structures that enable civil society and philanthropy to learn systematically. This article aims to reframe the conversation on what matters: rigor.

Drawing on my experience leading impact evaluations within government (including experimental and non-experimental studies) and later advising civil society organizations and philanthropic funders, I have seen how this gap is reinforced from different directions. On one side, reporting requirements often prioritize speed, volume, and compliance over understanding. On the other, critiques of experimental and quantitative approaches have sometimes been used to legitimize evaluations that abandon basic scientific logic altogether, as if complexity or social purpose exempted the sector from standards of credible inference. This article examines how these dynamics converged and what a more rigorous, learning-oriented approach to evaluation would require from philanthropy, organizations, and evaluators alike.

Table of Contents

Toggle

The False Dilemma of RCTs

Beginning in the early 2000s, Randomized Control Trials (RCTs) became one of the most influential tools for evaluating impact in development and social policy. Their expansion was closely associated with the work of researchers such as Esther Duflo and her collaborators, as well as institutions like the Abdul Latif Jameel Poverty Action Lab (J-PAL), which helped scale experimental approaches across governments, nonprofits, and multilateral organizations.

Are you enjoying this article? Read more like this, plus SSIR’s full archive of content, when you subscribe.

As experimental approaches spread, they reshaped how impact was discussed and assessed, triggering both enthusiasm and backlash. Over time, however, the debate was increasingly framed as a choice for or against RCTs themselves—despite their well-established value as tools for causal inference—diverting attention from the more substantive question of how to apply rigorous evaluative standards, experimental or otherwise, to strengthen learning across the sector.

In a recent SSIR article, Nicole P. Marwell and Jennifer E. Mosley argue that privileging experimental methods such as RCTs in evaluation funding can reinforce inequality by favoring organizations with greater technical capacity. They also caution that, in complex social settings, experimental designs can overstate certainty and narrow what counts as meaningful impact. These critiques do not reject experimental methods outright but question their elevation as the dominant benchmark for measuring impact and allocating resources across the social sector.

Beyond the merits of these arguments, it is an ill-posed debate given the sector’s current position. Civil society and philanthropy do not have an RCT problem; they have a rigor and standards problem. More specifically, some organizations struggle to establish even minimal evaluative frameworks, and philanthropy has not invested enough in building their institutional capacities, including their ability to evaluate.

We Evaluate Little, We Evaluate Poorly, and We Learn Even Less

In 2017, public policy and evaluation researchers Amy L. Gardner and Claire D. Brindis
reported that only 6 percent of 106 surveyed evaluators had used experimental methods in advocacy evaluations. A separate study from the Center for Effective Philanthropy (CEP) and the Center for Evaluation Innovation (CEI) based on 127 funders in the United States and Canada found that only 1 in 5 evaluations they financed were RCTs. Although not representative of the full universe, these are among the largest surveys available and suggest something essential: The problem is not methodological colonization by experimental methods. It is a scholarly debate that, while important for epistemic clarity, is far removed from the practical challenges the sector faces in adopting professionalized evaluative practice.

More precisely, the recurrent use of the concept of evaluation contrasts sharply with its actual practice. According to the CEP-CEI report, only 6 in 10 foundations have a dedicated evaluation unit. For every 10 program staff members, foundations have one full-time equivalent dedicated to evaluation. In other words, there is a 10:1 ratio between the resources used to push initiatives and the resources used to understand whether those initiatives work. Even when evaluations do occur, more than three-quarters of respondents report difficulty translating them into meaningful insights, and only 9 percent of staff prioritize sharing findings externally. Moreover, more than 2 out of 3 respondents believe their foundations invest far too little in strengthening grantees’ evaluation or data-collection capacities.

Given that philanthropy does not merely fund isolated activities but also seeks to build knowledge about effective pathways to social change, these numbers are even more concerning: Foundations do not understand their grantees’ results in nearly half of the cases, and only 1 in 5 understands the effects produced on ultimate beneficiaries. This knowledge deficit is reflected in the perception held by 6 in 10 respondents that evaluation findings will likely not influence decisions about future first-time grants.

On the civil society side, the picture is similarly troubling. In the 2025 Financial Sustainability Models Survey conducted by Civic House, a Latin American nonprofit that supports civic innovation and sustainability among civil society organizations, only 1 in 3 organizations in Latin America and the Caribbean reported conducting annual strategic planning, and a similar share reported conducting annual evaluations of results or impact.

This is not only a matter of quantity. In my work with Civic Compass, Civic House’s research and policy advocacy unit focused on technology-related public policy, I evaluated the quality of the theories of change of 60 organizations across Argentina, Mexico, and Colombia. Approximately 45 percent struggled to clearly articulate the specific variable or effect expected from a successful campaign, and around 40 percent had difficulty distinguishing routine activities from services, outputs, or outcomes. These findings suggest that organizations understand their substantive fields well but lack the structured narratives required to monitor and evaluate results, making timely course corrections difficult or impossible.

All these data point to a concerning pattern: We “evaluate” more but learn less. Evaluation, across much of the sector, has been distorted into a sequence of affirmative narratives, where little seems to go wrong and where failure scarcely exists. While scholars and practitioners spend significant time critiquing the “What works?” agenda, the dominant logic on the ground is actually: “How many people did we reach?”—often without the faintest idea of whether those people benefited. In my work as an evaluator, I have been asked far more often about participation counts in employability programs than about how many participants secured actual jobs.

The Tyranny of ‘Nominality’

All this unfolds in a sector driven by genuine goodwill. Organizations and funders are motivated by an honest desire to improve lives. The absence of rigorous evaluation is not a matter of malice; it is a matter of institutional design, incentives, and capacities—and, above all, the normalization of the belief that doing more is equivalent to achieving more.

Advocating for civil society while simultaneously celebrating the idea that its work is “too complex to evaluate” places it in a permanently disadvantaged position. It turns civil society into an actor entering strategic conversations with incomplete tools, while others participate with fully developed ones. The outcome is a symbolic civil society, heard, but not necessarily influential, whose transformative capacity is limited not by lack of commitment but by the absence of structures that allow it to learn what works and what does not.

This dynamic fuels what I call the tyranny of “nominality”: the predominance of metrics that count activities and outputs (workshops held, people reached) as if they were indicators of change. Under this tyranny, beneficiaries become numbers, and achievements become infographics. Interventions are measured by volume, not by effect. Civil society risks working tirelessly while learning very little, trapped in a logic where reporting becomes more important than understanding. Here are some ways philanthropy can reverse this pattern:

Theory-driven design: Philanthropy must change not only what it asks organizations to report, but how it designs grants from the outset. Too often, funding prioritizes novelty, scale, or the promise of rapid results, without requiring a clear articulation of how change is expected to occur. A more productive starting point would be to benchmark proposed interventions against existing evidence about mechanisms that have worked in comparable contexts, recognizing that while solutions may differ, the behavioral and institutional constraints they seek to address are often similar. This shift moves the focus away from endlessly reinventing programs and toward testing whether specific and theory-driven assumptions about access, incentives, or support actually hold.

Managing expectations: Time horizons matter just as much. Many grants are awarded for multiple years, yet success is assessed through short-term, nominal milestones that encourage constant reinvention rather than sustained improvement within the same population. Structural problems rarely change quickly, but that does not mean incremental progress is insignificant. Improving even one dimension of, for example, vulnerability (employment stability, school continuity, access to services) can meaningfully alter people’s daily lives and sometimes trigger cascading effects. Treating such changes as trivial because they do not immediately “solve” the problem reflects a misunderstanding of how social change unfolds. What philanthropy should demand instead is clarity about what improvement would look like in practice: How would the lives of intended beneficiaries be different if the intervention were working, and how would we recognize that difference?

Focus on capacity building: Evaluation is often treated as a reporting requirement to be satisfied at the end of a project, rather than as a professional discipline that must be integrated into program design. When organizations, particularly smaller ones, are forced to choose between investing in delivery or in evaluation, the result is usually superficial monitoring that satisfies donor expectations but leaves no durable learning behind. Data are collected in haste, stories are assembled to illustrate success, and once funding ends, neither the organization nor the system is better equipped to make decisions. If philanthropy is serious about learning, grants must explicitly support evaluative expertise (internal or external) and allow time and resources for monitoring systems that enable course correction, not just retrospective judgment.

Program coherence and design discipline:
A related problem is the tendency to design programs with multiple components layered on top of one another, often in the belief that complexity itself signals ambition or increases the likelihood of change. When interventions combine several activities without a clear theoretical spine, participants may follow highly variable and improvised trajectories, making it difficult to know which elements mattered, which did not, and why. At the end of such projects, weak results can lead to the conclusion that nothing worked, while positive outcomes can create the illusion that everything did, when, in reality, only some components may have driven change. This lack of clarity not only undermines learning but also makes responsible scaling nearly impossible because programs that rely on ad hoc combinations of activities often require exceptional organizations capable of managing that same level of improvisation. Designing grants around clearer theories of change (limiting unnecessary components and sequencing interventions deliberately) helps ensure that both learning and execution capacity remain in place beyond the life of a single grant.

Philanthropy should not ask for speed or volume; it should ask for learning. Resources dedicated to social change are too valuable to be diluted in activity metrics that say little about people’s lives. Innovation, to be genuine, must be measured; otherwise, it is conservatism disguised as movement.

What RCTs and Alternatives Actually Offer

This is not a defense of RCTs as the only valid method for evaluation. The experimental method has significant limitations, especially in complex social interventions. Yet the inability to conduct an RCT is not an excuse to avoid rigorous evaluation. Not being able to randomize does not absolve us from the responsibility of explaining why an intervention should work. To do otherwise is not only a failure of creativity but an ethical problem: It means foregoing knowledge that could improve lives.

Having conducted several evaluations, experimental and non-experimental, I must emphasize the distinctive value of the experimental method: It is the only approach that, under relatively simple assumptions, allows causal attribution. Critics often highlight ethical concerns or the idea that RCTs create a “black box” that hides mechanisms. But no evaluation, experimental or otherwise, should operate as a black box. Without mechanisms, there can be no learning. Every serious evaluation is grounded in a robust theory of change and uses complementary methods to illuminate processes, assumptions, and mechanisms. When evaluations fail to do this, the problem lies with the evaluator, not the method.

Another common critique is that RCTs take too long. But this raises a basic question: Do we really expect complex social changes to occur overnight? What takes time is not the methodology, it is the phenomenon. Indeed, many qualitative studies and surveys, including those critical of RCTs, take years to be published. Time is not a methodological problem; it is an empirical reality.

Non-experimental evaluation has also contributed valuable tools. Rigorous qualitative approaches are grounded in the same inferential logic as quantitative methods, relying on systematic evidence to assess whether and how an intervention plausibly produced change. Contribution Analysis is particularly useful in complex contexts: It does not establish direct causality but builds plausible, evidence-based explanations of how an intervention contributes to change. For evaluations with few cases, Process Tracing
and Bayesian Updating approaches allow evaluators to test and refine hypotheses by accumulating and weighing evidence over time, rather than relying on single observations. Although different in form, they share a principle: Narrative is not enough—explanation is required.

I have learned to appreciate studies that wear their limitations upfront and still manage to offer something meaningful, rather than those that hide behind their limitations to avoid saying anything at all. Rigor is not measured by the proportion of truth an evaluation claims to reveal, but by its correctness: the transparency of its assumptions, its acknowledgement of boundaries, and its responsibility in what it asserts.

The value of theory-driven approaches and robust evaluative methods (including experimental ones) becomes particularly clear when they dismantle widely held but misleading intuitions. One example comes from cash transfer programs: Contrary to the conservative belief that unconditional transfers create disincentives to work, a 2017 study showed that they generated sustained improvements in well-being without reducing labor supply.

In my own evaluative practice, I have seen similar lessons. In a publicly funded web development training program in Buenos Aires, initial data suggested a decline in employment. Subsequent investigation revealed that participants were indeed entering the tech sector but informally. Rather than treating this result as a program failure, the findings informed design adjustments, including the addition of employability training and stronger links with private-sector employers to support transitions into formal work. In another evaluation of direct transfers to secondary school students, we observed unintended incentives related to school continuity. Further analysis showed that the issue was not the transfers themselves, but their limited capacity to offset broader constraints facing vulnerable households. In both cases, combining rigorous estimation with qualitative inquiry shifted the policy conversation from questioning whether the interventions worked to understanding how their design could be improved to better achieve their goals.

Toward a Learning Sector: Rigor as Strategic Responsibility

The foundational principles of experimental methods offer lessons even to those who will never conduct an RCT: the need for a clear theory of change, explicit assumptions connecting each link in the results chain, simplicity in intervention design, and the logic of factorial or multi-arm designs when testing several solutions. These are not principles of experimentalism; they are principles of methodological seriousness, and every organization can adopt them.

If the goal of evaluation is to produce useful knowledge, the priority for the social sector and philanthropy should not be choosing between RCTs and non-RCTs. It should be overcoming the standards deficit that limits our capacity to learn. Every intervention, complex or simple, rests on an implicit theory, and evaluation consists of testing that theory honestly. The challenge is not methodological; it is strategic. And it is political in a democratic sense: Without reliable knowledge, civil society operates blindly, and those who most need effective solutions pay the price. A sector that learns can improve lives sustainably. A sector that does not learn works hard but transforms very little. Philanthropy, organizations, and evaluators have both the possibility—and the responsibility—to change this.

Support SSIR’s coverage of cross-sector solutions to global challenges.
Help us further the reach of innovative ideas. Donate today.

Read more stories by Marco Di Natale.

link