AI Alignment & Interpretability Jobs













































Search AI Safety Roles
What You Need to Know
AI Alignment and Interpretability are crucial areas in AI Safety. These focus on ensuring AI systems act in ways that align with human values and are understandable.
We must ensure that AI systems reason about what people intend rather than carrying out commands literally. - Eric Horvitz, Chief Scientific Officer at Microsoft
How to transition from machine learning to AI Alignment/ Interpretability?
Start by learning about transformer models and how they function. Join online groups like LessWrong and the Alignment Forum to connect with others. To gain more useful skills, do real projects, like those from Apollo Research.
How to stay updated in AI Alignment and Interpretability?
You can follow top researchers on platforms like X, LinkedIn, and Google Scholar. You may also participate in groups such as LessWrong and the Alignment Forum, and you can subscribe to the Alignment Newsletter. Additionally, you can attend conferences such as NeurIPS or workshops from Redwood Research or CHAI to learn about new and exciting work being done.

Is AI Safety, including Alignment and Interpretability, saturated?
While some roles are less passable, positions requiring specialization in alignment and interpretability offer more opportunities. Having a robust portfolio will undoubtedly improve chances for employment.
What projects can build experience in AI Alignment/Interpretability?
You can start by studying how models behave, replicating past research on alignment, or helping build tools that explain AI. Apollo Research has more than 45 open projects you can join.
What are the research community's views on AI Alignment and Interpretability?
The AI research community is divided into groups. AI Safety practitioners consider Alignment and Interpretability central for the safety and reliability of AI systems. Many others, particularly in mainstream machine learning, find them interesting but less urgent. Knowing both sides of the argument is useful to make sense of the field, which remains dynamic and subject to differing opinions.
Interpretability provides a way to catch some of these unknowns before they have negative impact. - AI Alignment Forum
What non-technical skills are valued in AI Alignment/Interpretability roles?
Having knowledge from different domains, such as ethics, philosophy, or cognitive science, adds value. Effective verbal communication goes a long way in explaining intricate details and improves teamwork.