Publications · Physics of Intelligence

Fatih Dinc, Marta Blanco-Pozo, David Klindt, Francisco Acosta, Yiqi Jiang, Sadegh Ebrahimi, Adam Shai, Hidenori Tanaka, Peng Yuan, Mark J Schnitzer, Nina Miolane

Preprint 2025 Preprint (arXiv:2502.14337) arxiv:2502.14337

Preprint 2025

New News: System-2 Fine-tuning for Robust Integration of New Knowledge

Core Francisco Park, Zechen Zhang, Hidenori Tanaka

Preprint 2025 CoRR abs/2505.01812 (arXiv) arxiv:2505.01812

Preprint 2025

RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm

Yongyi Yang, Jianyang Gao, Wei Hu

Preprint 2025 CoRR abs/2504.03717 (arXiv) arxiv:2504.03717

Preprint 2025

Topological Invariance and Breakdown in Learning

Yongyi Yang, Tomaso Poggio, Isaac Chuang, Liu Ziyin

Preprint 2025 CoRR abs/2510.02670 (arXiv) arxiv:2510.02670

Preprint 2025

Understanding and controlling the geometry of memory organization in RNNs

Udith Haputhanthri, Liam Storan, Yiqi Jiang, Tarun Raheja, Adam Shai, Orhun Akengin, Nina Miolane, Mark J Schnitzer, Fatih Dinc, Hidenori Tanaka

Preprint 2025 Preprint (arXiv:2502.07256) arxiv:2502.07256

ICLR 2025

A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language

Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert P. Dick, Hidenori Tanaka

ICLR 2025 ICLR 2025 (conference proceedings) link ↗

NAACL 2025

Analyzing (In)Abilities of SAEs via Formal Languages

Abhinav Menon, Manish Srivastava, David Krueger, Ekdeep Singh Lubana

NAACL 2025 NAACL 2025 (ACL Anthology long paper) link ↗

ICML 2025

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Thomas Fel, Ekdeep Singh Lubana, Jacob S. Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba Ba, Talia Konkle

ICML 2025 ICML 2025 (PMLR v267) link ↗

ICML 2025

Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics

Amir Zur, Atticus Geiger, Ekdeep Singh Lubana, Eric Bigelow

ICML 2025 Workshop: Actionable Interpretability workshop · Actionable Interpretability link ↗

ICLR 2025

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Core Francisco Park, Ekdeep Singh Lubana, Itamar Pres, Hidenori Tanaka

ICLR 2025 ICLR 2025 (conference proceedings) link ↗

NeurIPS 2025

Detecting High-Stakes Interactions with Activation Probes

Alex McKenzie, Urja Pawar, Phil Blandfort, William Bankes, David Krueger, Ekdeep S Lubana, Dmitrii Krasheninnikov

NeurIPS 2025 NeurIPS 2025 poster (Thu, Dec 4, 2025 • 11:00 AM–2:00 PM PST; Exhibit Hall C,D,E #1112). arxiv:2506.10805

ICML 2025

Dynamical phases of short-term memory mechanisms in RNNs

Bariscan Kurtkaya, Fatih Dinc, Mert Yuksekgonul, Marta Blanco-Pozo, Ege Cirakman, Mark Schnitzer, Yucel Yemez, Hidenori Tanaka, Peng Yuan, Nina Miolane

ICML 2025 ICML 2025 (PMLR v267) link ↗

ICLR 2025

Dynamics of Concept Learning and Compositional Generalization

Yongyi Yang, Core Francisco Park, Ekdeep Singh Lubana, Maya Okawa, Wei Hu, Hidenori Tanaka

ICLR 2025 ICLR 2025 (conference proceedings) link ↗

ICML 2025

Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit

Valerie Costa, Thomas Fel, Ekdeep Singh Lubana, Bahareh Tolooshams, Demba Ba

ICML 2025 Workshop: Methods and Opportunities at Small Scale workshop · Methods and Opportunities at Small Scale (MOSS) arxiv:2506.05239

ICLR 2025

Forking Paths in Neural Text Generation

E. Bigelow, A. Holtzman, H. Tanaka, T. Ullman

ICLR 2025 ICLR 2025 (conference proceedings) link ↗

ICLR 2025

ICLR: In-Context Learning of Representations

Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana, Yongyi Yang, Kento Nishi, Maya Okawa, Martin Wattenberg, Hidenori Tanaka

ICLR 2025 ICLR 2025 (conference proceedings) link ↗

NeurIPS 2025

In-Context Learning Strategies Emerge Rationally

Daniel Wurgaft, Ekdeep Singh Lubana, Core Francisco Park, Hidenori Tanaka, Gautam Reddy, Noah Goodman

NeurIPS 2025 NeurIPS 2025 (poster; main conference) link ↗

Workshop: 2025

Kindness or Sycophancy? Understanding and Shaping Model Personality via Synthetic Games

Maya Okawa, Ekdeep S Lubana, Mai Uchida, Hidenori Tanaka

Workshop: CogInterp: Interpreting Cognition in Deep Learning Models 2025 workshop · CogInterp: Interpreting Cognition in Deep Learning Models link ↗

ICML 2025

New Evidence of the Two-Phase Learning Dynamics of Neural Networks

Zhanpeng Zhou, Yongyi Yang, Mahito Sugiyama, Junchi Yan

ICML 2025 Workshop: High-dimensional Learning Dynamics workshop · High-dimensional Learning Dynamics (HiLD) link ↗

NeurIPS 2025

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

Sai Sumedh R. Hindupur, Ekdeep Singh Lubana, Thomas Fel, Demba Ba

NeurIPS 2025 NeurIPS 2025 (poster; main conference) link ↗

NeurIPS 2025

PROVABLE LOW-FREQUENCY BIAS OF IN-CONTEXT LEARNING OF REPRESENTATIONS

Yongyi Yang, Hidenori Tanaka, Wei Hu

NeurIPS 2025 Workshop: Symmetry and Geometry in Neural Representations workshop · Symmetry and Geometry in Neural Representations arxiv:2507.13540

ICML 2025

Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing

Kento Nishi, Maya Okawa, Rahul Ramesh, Mikail Khona, Hidenori Tanaka, Ekdeep Singh Lubana

ICML 2025 ICML 2025 (PMLR v267) link ↗

NeurIPS 2024

Analyzing (In)Abilities of SAEs via Formal Languages

Abhinav Menon, Manish Srivastava, David Krueger, Ekdeep Singh Lubana

NeurIPS 2024 Workshop: Foundation Model Interventions workshop · Foundation Model Interventions link ↗

ICML 2024

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, R.P. Dick, Hidenori Tanaka

ICML 2024 ICML 2024

NeurIPS 2024

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

Core Francisco Park, Maya Okawa, A. Lee, Hidenori Tanaka, Ekdeep Singh Lubana

NeurIPS 2024 NeurIPS 2024

NeurIPS 2024

Emergence of Hierarchical Emotion Organization in Large Language Models

Bo Zhao, Maya Okawa, Eric J. Bigelow, Rose Yu, Tomer Ullman, Ekdeep Singh Lubana, Hidenori Tanaka

NeurIPS 2024 Workshop: Scientific Methods for Understanding Neural Networks workshop · Scientific Methods for Understanding Neural Networks arxiv:2507.10599

ICLR 2024

In-Context Learning Dynamics with Random Binary Sequences

Eric Bigelow, Ekdeep Singh Lubana, R.P. Dick, Hidenori Tanaka, Tomer Ullman

ICLR 2024 ICLR 2024

ICLR 2024

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

S. Jain, R. Kirk, Ekdeep Singh Lubana, R.P. Dick, Hidenori Tanaka, T. Rocktäschel, E. Grefenstette, David Krueger

ICLR 2024 ICLR 2024

ICML 2024

Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

Mikail Khona, Maya Okawa, J. Hula, Rahul Ramesh, Kento Nishi, R.P. Dick, Ekdeep Singh Lubana, Hidenori Tanaka

ICML 2024 ICML 2024

NeurIPS 2023

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

Maya Okawa, Ekdeep Singh Lubana, R.P. Dick, Hidenori Tanaka

NeurIPS 2023 NeurIPS 2023 link ↗

NeurIPS 2023

CORNN: Convex optimization of recurrent neural networks for rapid inference of neural dynamics

Fatih Dinc, A. Shai, M. Schnitzer, Hidenori Tanaka

NeurIPS 2023 NeurIPS 2023

Neuron 2023

Interpreting the retinal neural code for natural scenes: from computations to neurons

N. Maheswaranathan, L.T. McIntosh, Hidenori Tanaka, S. Grant, D.B. Kastner, J.B. Melander, A. Nayebi, L. Brezovec, J. Wang, Surya Ganguli, S.A. Baccus

Neuron 2023 Neuron (2023)

ICML 2023

Mechanistic Mode Connectivity

Ekdeep Singh Lubana, Eric Bigelow, R.P. Dick, David Krueger, Hidenori Tanaka

ICML 2023 ICML 2023

Neural 2023

Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations and anomalous diffusion

Daniel Kunin, J. Sagastuy-Brena, L. Gillespie, E. Margalit, Hidenori Tanaka, Surya Ganguli, D.L.K. Yamins

Neural Computation 2023 Neural Computation (2023)

ICLR 2023

What shapes the loss landscape of self-supervised learning?

Ziyin Liu, Ekdeep Singh Lubana, M. Ueda, Hidenori Tanaka

ICLR 2023 ICLR 2023

PLoS 2022

A lexical approach for identifying behavioural action sequences

Gautam Reddy, L. Desban, Hidenori Tanaka, J. Roussel, O. Mirat, C. Wyart

PLoS Computational Biology 2022 PLoS Computational Biology (2022)

NeurIPS 2021

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Ekdeep Singh Lubana, R.P. Dick, Hidenori Tanaka

NeurIPS 2021 NeurIPS 2021

ICLR 2021

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

Daniel Kunin, J. Sagastuy-Brena, Surya Ganguli, D.L.K. Yamins, Hidenori Tanaka

ICLR 2021 ICLR 2021

NeurIPS 2021

Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks

Hidenori Tanaka, Daniel Kunin

NeurIPS 2021 NeurIPS 2021

NeurIPS 2020

Pruning neural networks without any data by iteratively conserving synaptic flow

Hidenori Tanaka, Daniel Kunin, D. Yamins, Surya Ganguli

NeurIPS 2020 NeurIPS 2020

NeurIPS 2019

From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction

Hidenori Tanaka, A. Nayebi, N. Maheswaranathan, L. McIntosh, S.A. Baccus, Surya Ganguli

NeurIPS 2019 NeurIPS 2019 arxiv:1912.06207

Physical 2018

Non-Hermitian quasi-localization and ring attractor neural networks

Hidenori Tanaka, David Nelson

Physical Review E 2018 Physical Review E (2018)

Physical 2017

Hot particles attract in a cold bath

Hidenori Tanaka, A.A. Lee, Michael Brenner

Physical Review Fluids 2017 Physical Review Fluids (2017)

PNAS 2017

Spatial gene drives and pushed genetic waves

Hidenori Tanaka, H.A. Stone, David Nelson

PNAS 2017 PNAS (2017)

Physical 2016

Mutation at expanding front of self-replicating colloidal clusters

Hidenori Tanaka, Z. Zeravcic, Michael Brenner

Physical Review Letters 2016 Physical Review Letters (2016)

Physical 2015

Quenched metastable vortex states in Sr2RuO4

D. Shibata, Hidenori Tanaka, S. Yonezawa, T. Nojima, Y. Maeno

Physical Review B 2015 Physical Review B (2015)

Publications.

When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

A ghost mechanism: An analytical model of abrupt learning

Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering

Emergence of Biased Consensus in Multi-Agent LLM Debates

Emergence of Hierarchical Emotion Organization in Large Language Models

Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders

Decomposing Elements of Problem Solving: What "Math" Does RL Teach?

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations

Latent computing by biological neural networks: A dynamical systems framework

New News: System-2 Fine-tuning for Robust Integration of New Knowledge

RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm

Topological Invariance and Breakdown in Learning

Understanding and controlling the geometry of memory organization in RNNs

A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language

Analyzing (In)Abilities of SAEs via Formal Languages

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Detecting High-Stakes Interactions with Activation Probes

Dynamical phases of short-term memory mechanisms in RNNs

Dynamics of Concept Learning and Compositional Generalization

Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit

Forking Paths in Neural Text Generation

ICLR: In-Context Learning of Representations

In-Context Learning Strategies Emerge Rationally

Kindness or Sycophancy? Understanding and Shaping Model Personality via Synthetic Games

New Evidence of the Two-Phase Learning Dynamics of Neural Networks

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

PROVABLE LOW-FREQUENCY BIAS OF IN-CONTEXT LEARNING OF REPRESENTATIONS

Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing

Analyzing (In)Abilities of SAEs via Formal Languages

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

Emergence of Hierarchical Emotion Organization in Large Language Models

In-Context Learning Dynamics with Random Binary Sequences

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

CORNN: Convex optimization of recurrent neural networks for rapid inference of neural dynamics

Interpreting the retinal neural code for natural scenes: from computations to neurons

Mechanistic Mode Connectivity

Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations and anomalous diffusion

What shapes the loss landscape of self-supervised learning?

A lexical approach for identifying behavioural action sequences

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks

Pruning neural networks without any data by iteratively conserving synaptic flow

From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction

Non-Hermitian quasi-localization and ring attractor neural networks

Hot particles attract in a cold bath

Spatial gene drives and pushed genetic waves

Mutation at expanding front of self-replicating colloidal clusters

Quenched metastable vortex states in Sr2RuO4