Stefani Karp

PhD Student at Carnegie Mellon University
Machine Learning Department

shkarp [at] cs [dot] cmu [dot] edu

Hi! I'm a PhD student in the Machine Learning Department at CMU, and I also work part-time at Google Research. I'm focused on building the theory of deep learning using a combination of mathematics and experiments. I have studied the optimization, generalization, and feature learning capabilities of various neural network architectures under different data modeling assumptions, and I am currently working on understanding and improving the training of Transformers for language (ranging from mathematical analysis to training industry-scale LLMs). More broadly, I'm motivated by trying to (1) understand the nature of intelligence (in both machines and humans), (2) use this understanding to improve our algorithms, and (3) ultimately unlock human-level (and beyond) machine intelligence.

At CMU, I am advised by Aarti Singh and often work with Yuanzhi Li. At Google Research, my collaborators have included Satyen Kale, Pranjal Awasthi, Mehryar Mohri, and Behnam Neyshabur (among many others).

Before grad school, I worked as a software engineer at Google NYC on search quality and the Google Assistant. Before that, I was an undergrad at Princeton, where I studied theoretical computer science and worked with Robert Tarjan and Mark Braverman. (Before that, I was a high school student at Thomas Jefferson High School for Science and Technology, where I had the most incredible teachers!)

See my CV for more details.

Papers & Publications

Provable Gradient-Descent-Based Learning of Decision Lists by Transformers. Stefani Karp, Pranjal Awasthi, Satyen Kale. To appear at DeepMath 2023 as a contributed talk.
Efficient Training of Language Models using Few-Shot Learning. Sashank J. Reddi, Sobhan Miryoosefi, Stefani Karp, Shankar Krishnan, Satyen Kale, Seungyeon Kim, Sanjiv Kumar. ICML 2023.
Agnostic Learnability of Halfspaces via Logistic Loss. Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp. ICML 2022.
Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels. Stefani Karp, Ezra Winston, Yuanzhi Li, Aarti Singh. NeurIPS 2021.
PAC-Bayes Learning Bounds for Sample-Dependent Priors. Pranjal Awasthi, Satyen Kale, Stefani Karp, Mehryar Mohri. NeurIPS 2020.
On the Algorithmic Stability of SGD in Deep Learning. Stefani Karp, Behnam Neyshabur, and Mehryar Mohri. 2020.
+ several works in progress (ask me about them!)

Google Scholar.

Awards

[2021] Alan J. Perlis Graduate Student Teaching Award (for “the most outstanding graduate TA in CMU’s School of Computer Science”)
[2021] Student Community Leadership Award (Machine Learning Department, CMU)

More

Other interests: longevity, automated science, consciousness, philosophy, psychology, creative writing, puns. I want to understand the way the world works and harness this understanding to bring about incredible, transformative scientific and technological change. AGI and longevity (i.e., defeating aging) are two such examples. Before embarking on my current research journey, I also considered studying either quantum complexity or consciousness.