Schmidt Sciences
Eligibility
- Eligible PIs
Description
Schmidt Sciences invites proposals for a pilot program in AI interpretability. They seek methods for detecting and mitigating deceptive behaviors from AI models, such as when models knowingly give misleading or harmful advice to users. Proposals should address whether we can develop interpretability methods that 1) detect deceptive behaviors exhibited by LLMs and 2) steer their reasoning to eliminate these behaviors.
Award Note
One-Time RFP
Keywords:
AI models, technical methods, evaluative methodes, deceptive behaviors, interpretability,
Application due:
05/26/26
Award amount:
$300,000 - $1,000,000