AI Interpretability

Schmidt Sciences

Eligibility

Eligible PIs

Description

Schmidt Sciences invites proposals for a pilot program in AI interpretability. They seek methods for detecting and mitigating deceptive behaviors from AI models, such as when models knowingly give misleading or harmful advice to users. Proposals should address whether we can develop interpretability methods that 1) detect deceptive behaviors exhibited by LLMs and 2) steer their reasoning to eliminate these behaviors.

Award Note

One-Time RFP

Learn more about this program

Keywords:
AI models, technical methods, evaluative methodes, deceptive behaviors, interpretability,

Application due:
05/26/26

Award amount:
$300,000 - $1,000,000