Problem
How do you make evidence-based decisions when the evidence base far exceeds what you can process? Understanding how to make sustainable change to people’s lives, mitigate the damage of climate change, and build a fairer and more equitable world is a significant challenge. There are no easy metrics to indicate success. Evidence is crucial. By building on the lessons learnt by organisations seeking change, whether that be grassroots community-led initiatives or international programmes spanning years, evidence gives us a foundation to inform and guide our efforts.
In donor agencies like USAID, the challenge of leveraging evidence to inform their work is immense. With a huge portfolio of programmes, and an ever-increasing body of evidence, USAID (and many other development organisations), have found that there is often more evidence out there than they can be expected to process. On the other side of the equation, is the international community of researchers generating valuable evidence which isn’t being picked up by the organisations in a position to make use of it.
What can we do to approach this problem, and where might AI be able to help?
The idea
While AI is not a silver bullet for the challenge of evidence-based decision-making, the growing capabilities of LLMs offer new ways to tackle the problem.
In collaboration with USAID, software company DevelopMetrics—which operates at the intersection of international development and advanced AI—developed the Development Evidence Large Learning Model (DEELM). DEELM is a multi-purpose LLM designed to collate and synthesise insights from USAID’s databases of reports, programme assessments, and resources into a searchable dashboard. This allows users to efficiently sift through and interrogate vast amounts of development-related evidence.
The DevelopMetrics team co-designed DEELM’s use cases with different USAID departments to ensure the tool met specific needs. Here, we focus on their collaboration with the Innovation, Technology, and Research (ITR) department to answer the question: “What digital interventions has USAID implemented across its entire portfolio?” To address this, the team:
Drew on USAID databases to compile a dashboard.
Categorised interventions by technology type and development outcomes.
Enabled users to explore specific interventions—for example, how blockchain has been used to improve governance outcomes.
Provided specific excerpts from reports and a quantitative, peer-reviewed assessment of each intervention’s success.
Developed a ranking system, allowing users to compare the effectiveness of different interventions at a glance.
To understand how the tool functions, it’s important to first explore why the team developed a fine-tuned, bespoke model rather than relying on an off-the-shelf tool like ChatGPT. As highlighted in the previous case study, there were two primary reasons:
-
A bespoke model allows the team to limit searches to specific, relevant sources—in this case, USAID programme reports and resources. Unlike a general-purpose LLM, which searches across a vast and uncontrolled dataset, this approach ensures that responses are reliable, domain-specific, and aligned with USAID’s evidence base.
-
General-purpose LLMs perform well on generic requests but struggle with sector-specific terminology and concepts. Consider the term "resilience" in the humanitarian sector: Researchers and practitioners disagree on its definition, and what it means for a community to be resilient is contextual. A domain-specific LLM, trained with expert input, is better equipped to process sector-specific nuances and provide contextually appropriate, information-rich responses.
For use cases such as generating policy briefs for Congress, the ability to conduct sophisticated searches across a curated database, accurately interpret sector-specific language, and synthesise relevant insights into concise, meaningful outputs is essential.