okhattab@stanford.edu
Curriculum Vitae
Research Statement
Google Scholar
GitHub
Twitter
I’m a fifth-year CS Ph.D. candidate at Stanford NLP and a 2022 Apple Scholar in AI/ML. I’m interested in Natural Language Processing (NLP) at scale. I build systems capable of retrieval and reasoning, which can leverage massive text corpora to craft knowledgeable responses efficiently and transparently.
I’m advised by Matei Zaharia and Christopher Potts. Before coming to Stanford, I got my B.S. in CS in May 2019 from CMU-Qatar, where I was supervised by Mohammad Hammoud. My Ph.D. has been generously supported by the Eltoukhy Family Graduate Fellowship and then the Apple Scholars in AI/ML PhD Fellowship.
Anouncement: I’m on the faculty job market. You can read my research statement here.
My research spans two overarching directions, consolidated in two influential and widely used open-source research systems.
I’ve built the fast-growing DSPy framework, a programming model for expressing and automatically optimizing Language Model Programs, i.e. sophisticated pipelines of language models, retrieval models, and other tools. In this line of work, my research develops:
Language Model Programs and their abstractions, as in the DSPy programming model and its predecessor Demonstrate–Search–Predict. This also includes DSPy Assertions and IReRa.
Retrieval-based NLP Systems like ColBERT-QA, Baleen, and Hindsight.
I’ve built the ColBERT retrieval model, which has been central to the development of the modern landscape of information retrieval. In this ongoing line of work, my research develops:
Retrieval Models like ColBERT, ColBERTv2, and UDAPDR.
Scalable Retrieval Infrastructure like PLAID and DeepImpact.
Backtracing: Retrieving the Cause of the Query
R Wang, P Wirawarn, O Khattab, N Goodman, D Demszky
Preprint 2024 | paper
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, O Khattab, Monica S. Lam
NAACL 2024 | paper
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
J Saad-Falcon, O Khattab, M Zaharia, C Potts
NAACL 2024 | paper
In-Context Learning for Extreme Multi-Label Classification
K D’Oosterlinck, O Khattab, F Remy, T Demeester, C Develder, C Potts
Preprint 2024 | paper
Building Efficient and Effective OpenQA Systems for Low-Resource Languages
E Budur, R Özçelik, D Soylu, O Khattab, T Güngör, C Potts
Preprint 2024 | paper
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
A Singhvi, M Shetty, S Tan, C Potts, K Sen, M Zaharia, O Khattab
Preprint 2023 | paper
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
O Khattab, A Singhvi, P Maheshwari, Z Zhang, K Santhanam, S Vardhamanan, S Haq, A Sharma, T Joshi, H Moazam, H Miller, M Zaharia, C Potts
ICLR 2024 (Spotlight) | paper
Image and Data Mining in Reticular Chemistry Using GPT-4V
Z Zheng, Z He, O Khattab, N Rampal, M Zaharia, C Borgs, J Chayes, O Yaghi
Digital Discovery 2024 | paper
UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers
J Saad-Falcon, O Khattab, K Santhanam, R Florian, M Franz, S Roukos, A Sil, M Sultan, C Potts
EMNLP 2023 | paper
Resources and Evaluations for Multi-Distribution Dense Information Retrieval
S Chatterjee, O Khattab, S Arora
SIGIR REML 2023 | paper
Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking
K Santhanam, J Saad-Falcon, M Franz, O Khattab, A Sil, R Florian, S Roukos, A Sil, M Sultan, M Zaharia, C Potts
ACL 2023 Findings | paper
Holistic evaluation of language models
P Liang, R Bommasani, T Lee, D Tsipras, D Soylu, …, O Khattab, …, Y Zhang, Y Koreeda
TMLR 2023 | paper
Note: This is a multi-component, 50-author project. O Khattab directed the Information Retrieval evaluation.
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
O. Khattab, K. Santhanam, X. Li, P. Liang, C. Potts, M. Zaharia
ArXiv 2022 | paper | code
PLAID: An Efficient Engine for Late Interaction Retrieval
K. Santhanam*, O. Khattab*, C. Potts, M. Zaharia
CIKM 2022 | paper | (* denotes co-first authors)
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
K. Santhanam*, O. Khattab*, J. Saad-Falcon, C. Potts, M. Zaharia
NAACL 2022 | paper | (* denotes co-first authors)
Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction
S. Hofstätter, O. Khattab, S. Althammer, M. Sertkan, A. Hanbury
CIKM 2022 | paper
Hindsight: Posterior-guided Training of Retrievers for Improved Open-Ended Generation
A. Paranjape, O. Khattab, C. Potts, M. Zaharia, Christopher D. Manning
ICLR 2022 | preprint
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval
O. Khattab, C. Potts, M. Zaharia
NeurIPS 2021 (Spotlight) | preprint | HoVer leaderboard entry
On the Opportunities and Risks of Foundation Models
Stanford’s Center for Research on Foundation Models (CRFM), with 113 co-authors
Contributions to: Systems, Modeling, and Reasoning & Search
ArXiv 2021 | paper
Relevance-guided Supervision for OpenQA with ColBERT
O. Khattab, C. Potts, M. Zaharia
TACL 2021 | paper
Learning Passage Impacts for Inverted Indexes
A. Mallia, O. Khattab, N. Tonellotto, T. Suel
SIGIR 2021 (short) | paper
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
O. Khattab and M. Zaharia
SIGIR 2020 | paper | code
Finding the Best of Both Worlds: Faster and More Robust Top-k Document Retrieval
O. Khattab, M. Hammoud, and T. Elsayed
SIGIR 2020 | paper
PolyHJ: A Polymorphic Main-Memory Hash Join Paradigm for Multi-Core Machines
O. Khattab, M. Hammoud, and O. Shekfeh
CIKM 2018 | paper | code
LA3: A Scalable Link- and Locality-Aware Linear Algebra-Based Graph Analytics System
Y. Ahmad, O. Khattab, A. Malik, A. Musleh, M. Hammoud, M. Kutlu, M. Shehata, T. Elsayed
VLDB 2018 | paper | code
The Shift from Models to Compound AI Systems
M. Zaharia, O. Khattab, L. Chen, J. Q. Davis, H. Miller, C. Potts, J. Zou, M. Carbin, J. Frankle, N. Rao, A. Ghodsi
Berkeley Artificial Intelligence Research | post
A Guide to Large Language Model Abstractions
P. Y. Zhong, H. He, O. Khattab, C. Potts, M. Zaharia , H, Miller
Two Sigma Articles | post
Building Scalable, Explainable, and Adaptive NLP Models with Retrieval
O. Khattab, C. Potts, M. Zaharia
Stanford AI Lab (SAIL) blog | post
A moderate proposal for radically better AI-powered Web search. Stanford HAI blog.
O. Khattab, C. Potts, M. Zaharia
Stanford HAI blog | post
Last Update: Mar 2024