The Department of Defense’s Chief Digital and Artificial Intelligence Office (CDAO) has wrapped up a groundbreaking pilot program focused on improving the safety and reliability of Large Language Model (LLM) chatbots within the realm of military medicine. Known as the Crowdsourced AI Red-Teaming (CAIRT) Assurance Program, this initiative aims to tackle AI risks and strengthen assurance by drawing on the collective expertise and input of diverse contributors.
Crowdsourcing plays a pivotal role in the CAIRT program, offering a platform for innovative problem-solving by involving a wide range of participants. Through this collaborative approach, the Department is able to generate large volumes of testing data while engaging stakeholders from various backgrounds.
Partnering for AI Resilience
In this latest pilot, the CDAO teamed up with Humane Intelligence—a company dedicated to creating standards for algorithmic evaluation—as well as the Defense Health Agency (DHA) and the Program Executive Office for Defense Healthcare Management Systems (PEO DHMS). Using red-teaming techniques, which involve adopting adversarial perspectives to identify potential weaknesses, Humane Intelligence led the effort to rigorously evaluate the robustness of AI systems. This methodology not only uncovers vulnerabilities but also encourages participants to actively contribute to the enhancement of emerging technologies, fostering a sense of shared responsibility.
Earlier, in spring 2024, the CDAO piloted another red-teaming CAIRT exercise that leveraged financial incentives to encourage engagement. Building on lessons learned, the current program took the next step by focusing on two critical applications in military healthcare: summarizing clinical notes and deploying medical advisory chatbots.
Key Findings from the Pilot
Over 200 individuals—including healthcare providers, analysts, and researchers from DHA, the Uniformed Services University of the Health Sciences, and other military branches—participated in this initiative. By evaluating three widely-used LLMs, the exercise unearthed more than 800 issues related to vulnerabilities and potential biases. These findings are expected to shape the development of benchmark datasets, which will serve as tools to assess future AI solutions for their alignment with military healthcare requirements.
This large-scale data collection effort will have far-reaching implications. Not only will it improve the DoD’s policies and best practices for adopting Generative AI (GenAI), but it will also ensure that military medical applications are designed responsibly and align with established risk management protocols, such as those outlined in OMB M-24-10. Ultimately, these insights will help ensure that prospective AI tools improve patient care and safety in military contexts.
A Pathfinding Initiative
“The use of Generative AI within the Department of Defense is still in its experimental stages, making this pilot a critical step toward building a strong foundation for future development,” stated Dr. Matthew Johnson, who led this initiative for the CDAO. He emphasized that this program not only surfaces areas requiring attention but also validates approaches for mitigating risks. The insights gained from the pilot are expected to influence the future design, research, and assurance strategies for GenAI systems, ensuring they meet the DoD’s rigorous standards.
Shaping the Future of AI in Defense
The results of this pilot underscore the importance of continued evaluation and testing of LLMs and other AI systems. By leveraging the CAIRT Assurance Program, the CDAO seeks to accelerate advancements within its AI Rapid Capabilities Cell, enhancing the effectiveness of GenAI tools while fostering confidence across various defense applications.
About the CDAO
Operational since June 2022, the CDAO plays a crucial role in advancing the integration of AI and digital tools across the Department of Defense. Its mission is to accelerate the adoption of data-driven solutions, refine the Department’s digital infrastructure, and implement scalable AI capabilities for enterprise and joint applications. By doing so, the CDAO aims to strengthen national security and counter emerging threats in an increasingly complex world.
For further details about the CDAO’s initiatives, visit their official website at ai.mil. Stay updated by following them on LinkedIn (@ DoD Chief Digital and Artificial Intelligence Office) and X, formerly known as Twitter (@dodcdao). Additional resources and updates are also available on the CDAO’s page on DVIDS.