Publications
Publications.
A ghost mechanism: An analytical model of abrupt learning
Fatih Dinc, Ege Cirakman, Yiqi Jiang, Mert Yuksekgonul, Mark J Schnitzer, Hidenori Tanaka
Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering
Eric Bigelow, Daniel Wurgaft, YingQiao Wang, Noah Goodman, Tomer Ullman, Hidenori Tanaka, Ekdeep Singh Lubana
Emergence of Hierarchical Emotion Organization in Large Language Models
Maya Okawa, Bo Zhao, Eric Bigelow, Rose Yu, Tomer Ullman, Ekdeep Singh Lubana, Hidenori Tanaka
Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
Matyas Bohacek, Thomas Fel, Maneesh Agrawala, Ekdeep Singh Lubana
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
Tian Qin, Core Francisco Park, Mujin Kwun, Aaron Walsman, Eran Malach, Nikhil Anand, Hidenori Tanaka, David Alvarez-Melis
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valerie Costa, Thomas Fel, Ekdeep Singh Lubana, Bahareh Tolooshams, Demba Ba
How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations
Brandon Jaipersaud, David Krueger, Ekdeep Singh Lubana
Latent computing by biological neural networks: A dynamical systems framework
Fatih Dinc, Marta Blanco-Pozo, David Klindt, Francisco Acosta, Yiqi Jiang, Sadegh Ebrahimi, Adam Shai, Hidenori Tanaka, Peng Yuan, Mark J Schnitzer, Nina Miolane
New News: System-2 Fine-tuning for Robust Integration of New Knowledge
Core Francisco Park, Zechen Zhang, Hidenori Tanaka
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
Yongyi Yang, Jianyang Gao, Wei Hu
Topological Invariance and Breakdown in Learning
Yongyi Yang, Tomaso Poggio, Isaac Chuang, Liu Ziyin
Understanding and controlling the geometry of memory organization in RNNs
Udith Haputhanthri, Liam Storan, Yiqi Jiang, Tarun Raheja, Adam Shai, Orhun Akengin, Nina Miolane, Mark J Schnitzer, Fatih Dinc, Hidenori Tanaka
A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language
Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert P. Dick, Hidenori Tanaka
Analyzing (In)Abilities of SAEs via Formal Languages
Abhinav Menon, Manish Srivastava, David Krueger, Ekdeep Singh Lubana
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
Thomas Fel, Ekdeep Singh Lubana, Jacob S. Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba Ba, Talia Konkle
Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics
Amir Zur, Atticus Geiger, Ekdeep Singh Lubana, Eric Bigelow
Competition Dynamics Shape Algorithmic Phases of In-Context Learning
Core Francisco Park, Ekdeep Singh Lubana, Itamar Pres, Hidenori Tanaka
Detecting High-Stakes Interactions with Activation Probes
Alex McKenzie, Urja Pawar, Phil Blandfort, William Bankes, David Krueger, Ekdeep S Lubana, Dmitrii Krasheninnikov
Dynamical phases of short-term memory mechanisms in RNNs
Bariscan Kurtkaya, Fatih Dinc, Mert Yuksekgonul, Marta Blanco-Pozo, Ege Cirakman, Mark Schnitzer, Yucel Yemez, Hidenori Tanaka, Peng Yuan, Nina Miolane
Dynamics of Concept Learning and Compositional Generalization
Yongyi Yang, Core Francisco Park, Ekdeep Singh Lubana, Maya Okawa, Wei Hu, Hidenori Tanaka
Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
Valerie Costa, Thomas Fel, Ekdeep Singh Lubana, Bahareh Tolooshams, Demba Ba
Forking Paths in Neural Text Generation
E. Bigelow, A. Holtzman, H. Tanaka, T. Ullman
ICLR: In-Context Learning of Representations
Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana, Yongyi Yang, Kento Nishi, Maya Okawa, Martin Wattenberg, Hidenori Tanaka
In-Context Learning Strategies Emerge Rationally
Daniel Wurgaft, Ekdeep Singh Lubana, Core Francisco Park, Hidenori Tanaka, Gautam Reddy, Noah Goodman
Kindness or Sycophancy? Understanding and Shaping Model Personality via Synthetic Games
Maya Okawa, Ekdeep S Lubana, Mai Uchida, Hidenori Tanaka
New Evidence of the Two-Phase Learning Dynamics of Neural Networks
Zhanpeng Zhou, Yongyi Yang, Mahito Sugiyama, Junchi Yan
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur, Ekdeep Singh Lubana, Thomas Fel, Demba Ba
PROVABLE LOW-FREQUENCY BIAS OF IN-CONTEXT LEARNING OF REPRESENTATIONS
Yongyi Yang, Hidenori Tanaka, Wei Hu
Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
Kento Nishi, Maya Okawa, Rahul Ramesh, Mikail Khona, Hidenori Tanaka, Ekdeep Singh Lubana
Analyzing (In)Abilities of SAEs via Formal Languages
Abhinav Menon, Manish Srivastava, David Krueger, Ekdeep Singh Lubana
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, R.P. Dick, Hidenori Tanaka
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park, Maya Okawa, A. Lee, Hidenori Tanaka, Ekdeep Singh Lubana
Emergence of Hierarchical Emotion Organization in Large Language Models
Bo Zhao, Maya Okawa, Eric J. Bigelow, Rose Yu, Tomer Ullman, Ekdeep Singh Lubana, Hidenori Tanaka
In-Context Learning Dynamics with Random Binary Sequences
Eric Bigelow, Ekdeep Singh Lubana, R.P. Dick, Hidenori Tanaka, Tomer Ullman
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
S. Jain, R. Kirk, Ekdeep Singh Lubana, R.P. Dick, Hidenori Tanaka, T. Rocktäschel, E. Grefenstette, David Krueger
Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model
Mikail Khona, Maya Okawa, J. Hula, Rahul Ramesh, Kento Nishi, R.P. Dick, Ekdeep Singh Lubana, Hidenori Tanaka
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Maya Okawa, Ekdeep Singh Lubana, R.P. Dick, Hidenori Tanaka
CORNN: Convex optimization of recurrent neural networks for rapid inference of neural dynamics
Fatih Dinc, A. Shai, M. Schnitzer, Hidenori Tanaka
Interpreting the retinal neural code for natural scenes: from computations to neurons
N. Maheswaranathan, L.T. McIntosh, Hidenori Tanaka, S. Grant, D.B. Kastner, J.B. Melander, A. Nayebi, L. Brezovec, J. Wang, Surya Ganguli, S.A. Baccus
Mechanistic Mode Connectivity
Ekdeep Singh Lubana, Eric Bigelow, R.P. Dick, David Krueger, Hidenori Tanaka
Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations and anomalous diffusion
Daniel Kunin, J. Sagastuy-Brena, L. Gillespie, E. Margalit, Hidenori Tanaka, Surya Ganguli, D.L.K. Yamins
What shapes the loss landscape of self-supervised learning?
Ziyin Liu, Ekdeep Singh Lubana, M. Ueda, Hidenori Tanaka
A lexical approach for identifying behavioural action sequences
Gautam Reddy, L. Desban, Hidenori Tanaka, J. Roussel, O. Mirat, C. Wyart
Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning
Ekdeep Singh Lubana, R.P. Dick, Hidenori Tanaka
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
Daniel Kunin, J. Sagastuy-Brena, Surya Ganguli, D.L.K. Yamins, Hidenori Tanaka
Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks
Hidenori Tanaka, Daniel Kunin
Pruning neural networks without any data by iteratively conserving synaptic flow
Hidenori Tanaka, Daniel Kunin, D. Yamins, Surya Ganguli
From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction
Hidenori Tanaka, A. Nayebi, N. Maheswaranathan, L. McIntosh, S.A. Baccus, Surya Ganguli
Non-Hermitian quasi-localization and ring attractor neural networks
Hidenori Tanaka, David Nelson
Hot particles attract in a cold bath
Hidenori Tanaka, A.A. Lee, Michael Brenner
Spatial gene drives and pushed genetic waves
Hidenori Tanaka, H.A. Stone, David Nelson
Mutation at expanding front of self-replicating colloidal clusters
Hidenori Tanaka, Z. Zeravcic, Michael Brenner
Quenched metastable vortex states in Sr2RuO4
D. Shibata, Hidenori Tanaka, S. Yonezawa, T. Nojima, Y. Maeno