I’m an Assistant Professor at MIT EECS and a member of CSAIL.

I study Natural Language Processing (NLP) and AI systems, seeking to answer questions like: How do we program intelligent software systems that are partly specified in natural language, that process natural language at scale, and whose quality and cost can be optimized using language models?

To answer these questions, my research develops new algorithms and abstractions for declarative AI programming and for composing retrieval and reasoning. This creates systems that leverage massive text corpora to craft knowledgeable responses efficiently and transparently.

I received my Ph.D. in Computer Science from Stanford, where I was advised by Matei Zaharia and Christopher Potts and was part of Stanford NLP. During my Ph.D., I was generously supported by the Apple Scholars in AI/ML PhD Fellowship. After my Ph.D., I worked as a Research Scientist at Databricks.

Research

My research spans two overarching directions, consolidated in two influential open-source research systems, each downloaded millions of times a month.

I) Building Reliable AI Systems with Language Models

I built the DSPy framework, a programming model for declaratively expressing and automatically optimizing Natural Language Programs, i.e. modular software systems that use natural language to specify parts of their behavior. In this line of work, my research develops:

Natural Language Programs and their abstractions & optimizers, as in DSPy (ICLR’24 Spotlight) and its predecessor DSP. It includes state-of-the-art systems like STORM (NAACL’24), IReRa, PATH, and PAPILLON (NAACL’25) and optimizers like GEPA, MIPRO (EMNLP’24), BetterTogether (EMNLP’24).

Retrieval-based NLP Systems like ColBERT-QA (TACL’21), Baleen (NeurIPS’21 Spotlight), Hindsight (ICLR’22), and ARES (NAACL’24).

II) Developing Effective & Efficient Retrieval Models

I built the ColBERT retrieval model, which has been central to the development of the modern landscape of information retrieval. In this line of work, my research develops:

Retrieval Models like ColBERT (SIGIR’20), ColBERTv2 (NAACL’22), and UDAPDR (EMNLP’23).

Scalable Retrieval Infrastructure like PLAID (CIKM’22), WARP (SIGIR’25 Best Paper), and DeepImpact (SIGIR’21).

Papers

Recursive Language Models
A Zhang, T Kraska, O Khattab
Preprint 2025 | paper

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
LA Agrawal, S Tan, D Soylu, N Ziems, …, D Klein, M Zaharia, O Khattab
Preprint 2025 | paper

Reasoning-Intensive Regression
D Tchuindjo, O Khattab
Preprint 2025 | paper

Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs
N Ziems, D Soylu, L A Agrawal, I Miller, L Lai, …, C Potts, O Khattab
Tech Report 2025 | paper

WARP: An Efficient Engine for Multi-Vector Retrieval
JL Scheerer, M Zaharia, C Potts, G Alonso, O Khattab
SIGIR 2025 (Best Paper Award) | paper

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
N Thakur, J Lin, S Havens, M Carbin, O Khattab, A Drozdov
NeurIPS 2025 | paper

LangProBe: a Language Programs Benchmark
S Tan, LA Agrawal, A Singhvi, L Lai, …, O Khattab, K Sen, M Zaharia
EMNLP 2025 Findings | paper

Drowning in Documents: Consequences of Scaling Reranker Inference
M Jacob, E Lindgren, M Zaharia, M Carbin, O Khattab, A Drozdov
ReNeuIR 2025 | paper

PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
L Siyan, VC Raghuram, O Khattab, J Hirschberg, Z Yu
NAACL 2025 | paper

Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
S Hsu, O Khattab, C Finn, A Sharma
ICLR 2025 | paper

ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring
K Huang, T Venkatesh, U Dingankar, …, O Khattab, S Sarup, K Santhanam
ECIR 2025 | paper

Problem-Oriented Segmentation and Retrieval: Case Study on Tutoring Conversations
R Wang, P Wirawarn, K Lam, O Khattab, D Demszky
EMNLP 2024 Findings | paper

Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together
D Soylu, C Potts, O Khattab
EMNLP 2024 | paper

Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels
J Xian, S Samuel, F Khoubsirat, R Pradeep, …, A Sil, C Potts, O Khattab
Preprint 2024 | paper

Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
K Opsahl-Ong, M Ryan, J Purtell, D Broman, C Potts, M Zaharia, O Khattab
EMNLP 2024 | paper

Backtracing: Retrieving the Cause of the Query
R Wang, P Wirawarn, O Khattab, N Goodman, D Demszky
EACL Findings 2024 | paper

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Y Shao, Y Jiang, T Kanell, P Xu, O Khattab, M Lam
NAACL 2024 | paper

ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
J Saad-Falcon, O Khattab, M Zaharia, C Potts
NAACL 2024 | paper

In-Context Learning for Extreme Multi-Label Classification
K D’Oosterlinck, O Khattab, F Remy, T Demeester, C Develder, C Potts
Preprint 2024 | paper

Building Efficient and Effective OpenQA Systems for Low-Resource Languages
E Budur, R Özçelik, D Soylu, O Khattab, T Güngör, C Potts
Preprint 2024 | paper

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
A Singhvi, M Shetty, S Tan, C Potts, K Sen, M Zaharia, O Khattab
Preprint 2023 | paper

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
O Khattab, A Singhvi, P Maheshwari, Z Zhang, K Santhanam, S Vardhamanan, S Haq, A Sharma, T Joshi, H Moazam, H Miller, M Zaharia, C Potts
ICLR 2024 (Spotlight) | paper

Image and Data Mining in Reticular Chemistry Using GPT-4V
Z Zheng, Z He, O Khattab, N Rampal, M Zaharia, C Borgs, J Chayes, O Yaghi
Digital Discovery 2024 | paper

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers
J Saad-Falcon, O Khattab, K Santhanam, R Florian, M Franz, S Roukos, A Sil, M Sultan, C Potts
EMNLP 2023 | paper

Resources and Evaluations for Multi-Distribution Dense Information Retrieval
S Chatterjee, O Khattab, S Arora
SIGIR REML 2023 | paper

Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking
K Santhanam, J Saad-Falcon, M Franz, O Khattab, A Sil, R Florian, S Roukos, A Sil, M Sultan, M Zaharia, C Potts
ACL 2023 Findings | paper

Holistic evaluation of language models
P Liang, R Bommasani, T Lee, D Tsipras, D Soylu, …, O Khattab, …, Y Zhang, Y Koreeda
TMLR 2023 | paper
Note: This is a multi-component, 50-author project. O Khattab directed the Information Retrieval evaluation.

Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
O. Khattab, K. Santhanam, X. Li, P. Liang, C. Potts, M. Zaharia
ArXiv 2022 | paper | code

PLAID: An Efficient Engine for Late Interaction Retrieval
K. Santhanam*, O. Khattab*, C. Potts, M. Zaharia
CIKM 2022 | paper | (* denotes co-first authors)

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
K. Santhanam*, O. Khattab*, J. Saad-Falcon, C. Potts, M. Zaharia
NAACL 2022 | paper | (* denotes co-first authors)

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction
S. Hofstätter, O. Khattab, S. Althammer, M. Sertkan, A. Hanbury
CIKM 2022 | paper

Hindsight: Posterior-guided Training of Retrievers for Improved Open-Ended Generation
A. Paranjape, O. Khattab, C. Potts, M. Zaharia, Christopher D. Manning
ICLR 2022 | preprint

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval
O. Khattab, C. Potts, M. Zaharia
NeurIPS 2021 (Spotlight) | preprint | HoVer leaderboard entry

On the Opportunities and Risks of Foundation Models
Stanford’s Center for Research on Foundation Models (CRFM), with 113 co-authors
Contributions to: Systems, Modeling, and Reasoning & Search
ArXiv 2021 | paper

Relevance-guided Supervision for OpenQA with ColBERT
O. Khattab, C. Potts, M. Zaharia
TACL 2021 | paper

Learning Passage Impacts for Inverted Indexes
A. Mallia, O. Khattab, N. Tonellotto, T. Suel
SIGIR 2021 (short) | paper

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
O. Khattab and M. Zaharia
SIGIR 2020 | paper | code

Finding the Best of Both Worlds: Faster and More Robust Top-k Document Retrieval
O. Khattab, M. Hammoud, and T. Elsayed
SIGIR 2020 | paper

PolyHJ: A Polymorphic Main-Memory Hash Join Paradigm for Multi-Core Machines
O. Khattab, M. Hammoud, and O. Shekfeh
CIKM 2018 | paper | code

LA3: A Scalable Link- and Locality-Aware Linear Algebra-Based Graph Analytics System
Y. Ahmad, O. Khattab, A. Malik, A. Musleh, M. Hammoud, M. Kutlu, M. Shehata, T. Elsayed
VLDB 2018 | paper | code

Blog Posts

On Impactful AI Research
O. Khattab | post

The Shift from Models to Compound AI Systems
M. Zaharia, O. Khattab, L. Chen, J. Q. Davis, H. Miller, C. Potts, J. Zou, M. Carbin, J. Frankle, N. Rao, A. Ghodsi
Berkeley Artificial Intelligence Research | post

A Guide to Large Language Model Abstractions
P. Y. Zhong, H. He, O. Khattab, C. Potts, M. Zaharia , H, Miller
Two Sigma Articles | post

Building Scalable, Explainable, and Adaptive NLP Models with Retrieval
O. Khattab, C. Potts, M. Zaharia
Stanford AI Lab (SAIL) blog | post

A moderate proposal for radically better AI-powered Web search. Stanford HAI blog.
O. Khattab, C. Potts, M. Zaharia
Stanford HAI blog | post

Last Update: July 2025