2nd MUSI Workshop

About the Workshop

Our multi-modal spatial intelligence (MUSI) workshop addresses how multimodal large language models (MLLMs) understand, reason about, and interact with spatial information from the physical world. The multimodal nature of spatial intelligence—requiring integration of images, videos, and 3D data—necessitates bringing together researchers from diverse domains: computer vision, robotics, graphics, and NLP. While recent MLLMs show promising visual-spatial capabilities, fundamental questions remain about spatial relationships, 3D environment modeling, and real-world spatial reasoning. This workshop explores how MLLMs learn spatial representations across modalities, advance world modeling and embodied AI, and address ethical considerations. We aim to establish benchmarks and foster cross-disciplinary collaboration to advance spatial reasoning in multimodal AI.

Keywords:

Spatial Reasoning Multimodal Large Language Model World Models Embodied AI 3D Understanding

Call for Papers

Topics

We invite submissions on topics including, but not limited to:

Spatial Reasoning in Multimodal LLMs
World Models for Physical Understanding
Embodied Agents and VLA Models
3D Scene Understanding, Generation, and Reconstruction
Open-Vocabulary 2D/3D Perception and Reasoning

Temporal and Causal Reasoning in Dynamic Environments
Multimodal Interaction, Grounding, and Planning
Neuro-symbolic Approaches for Spatial Intelligence
Benchmarks and Datasets for Spatial Reasoning
Trust, Ethics, and Societal Impact of Spatial AI

Important Dates

Submission Deadline	March 13, 2026 (23:59 AoE)	Loading...
Author Notification	April 3, 2026 (23:59 AoE)	Loading...
Camera Ready	April 17, 2026 (23:59 AoE)	Loading...
Workshop Date	June 2026	-

*All deadlines are Anywhere on Earth (AoE). Timelines are subject to change.

Submission Guidelines

Eligibility: We welcome both new work and papers previously accepted at other venues.
Format: For new work, papers must be submitted in the CVPR 2026 format. Previously accepted papers may be submitted in their original format, but must still be anonymized for the review process.
Length: Max 8 pages (excluding references).
Review: Double-blind peer review.
Presentation: Accepted papers will be presented as posters.

Submit via OpenReview

Submissions should be made through the OpenReview submission system.

Publication

The workshop will be non-archival. Authors of accepted papers retain the full copyright of their work and are free to submit extended versions to conferences or journals.

Invited Speakers (Alphabetical order)

Program (Tentative)

08:00 – 08:30	Poster Setup and Early Morning Poster Session
08:30 – 08:40	Welcome & Introduction
08:40 – 09:10	Keynote Talk 1 Angel X. Chang (Simon Fraser University)
09:10 – 09:40	Keynote Talk 2 Chuang Gan (UMass Amherst / MIT-IBM Watson AI Lab)
09:40 – 10:10	Keynote Talk 3 Katerina Fragkiadaki (Carnegie Mellon University)
10:10 – 10:25	☕ Coffee Break & Social
10:25 – 10:55	Keynote Talk 4 Kristen Grauman (University of Texas at Austin)
10:55 – 11:25	Keynote Talk 5 Ranjay Krishna (University of Washington / Allen Institute for AI)
11:25 – 11:55	Keynote Talk 6 Roozbeh Mottaghi (FAIR / University of Washington)
11:55 – 12:25	Keynote Talk 7 Saining Xie (New York University / Google DeepMind)
12:25 – 12:55	Closing Poster Session and Remarks