Daichi Azuma

Tokyo, Japan

daichi.azuma@weblab.t.u-tokyo.ac.jp

I am a Ph.D. student at the Matsuo-Iwasawa Laboratory, The University of Tokyo.

My research focuses on Embodied AI, at the intersection of 3D Computer Vision and Natural Language Processing. I aim to develop intelligent agents that can understand, navigate, and interact with the physical world through language and visual perception.

I am particularly interested in how multimodal learning and 3D scene understanding can enable such agents to reason and act effectively in complex environments.

news

Jun 29, 2025	Two papers have been accepted to ICCV2025! GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information For more details, check out publications page
Oct 08, 2024	As a working doctoral student, I will be enrolled Matsuo Lab at the University of Tokyo from April 2025.
Oct 07, 2024	We will be presenting at IROS2024 held in Abu Dhabi, UAE at October 14-18, 2024. ThPI5T9.7, 10/17 15:30~16:30: Map-Based Modular Approach for Zero-Shot Embodied Question Answering FrAT12.2, 10/18 10:15~10:30: Answerability Fields: Answerable Location Estimation Via Diffusion Models
Aug 08, 2024	We will be presenting at 日本ロボット学会学術講演会（RSJ2024） held in Osaka at September 6, 2024. 3D1-04: 基盤モデルと地図モジュールを用いたゼロショットロボット質問応答の実現

publications

ICCV2025

GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields

Shunsuke Yasuki, Taiki Miyanishi, Nakamasa Inoue, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Lee Jungdae, Masato Taki, and Yutaka Matsuo

In IEEE/CVF International Conference on Computer Vision (ICCV), 2025
ICCV2025

CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, and Nakamasa Inoue

In IEEE/CVF International Conference on Computer Vision (ICCV), 2024

arXiv PROJECT PAGE Code

IROS2024/Oral

Answerability Fields: Answerable Location Estimation via Diffusion Models

Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, and Motoaki Kawanabe

In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

arXiv Bib

@inproceedings{azuma_2024_IROS,
  title = {Answerability Fields: Answerable Location Estimation via Diffusion Models},
  author = {Azuma, Daichi and Miyanishi, Taiki and Kurita, Shuhei and Sakamoto, Koya and Kawanabe, Motoaki},
  booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year = {2024},
}

IROS2024

Map-based Modular Approach for Zero-shot Embodied Question Answering

Koya Sakamoto, Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, and Motoaki Kawanabe

In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

arXiv Bib PROJECT PAGE Code

@inproceedings{sakamoto_2024_IROS,
  title = {Map-based Modular Approach for Zero-shot Embodied Question Answering},
  author = {Sakamoto, Koya and Azuma, Daichi and Miyanishi, Taiki and Kurita, Shuhei and Kawanabe, Motoaki},
  booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year = {2024},
}

3DV2024

Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans

Taiki Miyanishi, Daichi Azuma, Shuhei Kurita, and Motoaki Kawanabe

In The 10th International Conference on 3D Vision (3DV), 2024

arXiv Bib Code

@inproceedings{miyanishi_2024_3DV,
  title = {Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans},
  author = {Miyanishi, Taiki and Azuma, Daichi and Kurita, Shuhei and Kawanabe, Motoaki},
  booktitle = {The 10th International Conference on 3D Vision (3DV)},
  year = {2024},
}

CVPR2022/Oral

ScanQA: 3D Question Answering for Spatial Scene Understanding

Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, and Motoaki Kawanabe

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

arXiv Bib PDF Code

@inproceedings{azuma_2022_CVPR,
  title = {ScanQA: 3D Question Answering for Spatial Scene Understanding},
  author = {Azuma, Daichi and Miyanishi, Taiki and Kurita, Shuhei and Kawanabe, Motoaki},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2022},
  google_scholar_id = {u5HHmVD_uO8C},
}

achievements

International Conference

Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto and Motoaki Kawanabe, “Answerability Fields: Answerable Location Estimation via Diffusion Models”, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2024), 2024.
Koya Sakamoto, Daichi Azuma, Taiki Miyanishi, Shuhei Kurita and Motoaki Kawanabe, “Map-based Modular Approach for Zero-shot Embodied Question Answering”, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2024), 2024.
Taiki Miyanishi, Daichi Azuma, Shuhei Kurita and Motoaki Kawanabe, “Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans”, International Conference on 3D Vision 2024 (3DV2024), 2024.
Daichi Azuma*, Taiki Miyanishi*, Shuhei Kurita* and Motoaki Kawanabe, “ScanQA: 3D Question Answering for Spatial Scene Understanding”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2022), pages 19129-19139, New Orleans, 2022. *: Equally contributed.

Local Conference

地理情報を考慮した3D都市ビジュアルプログラミング, 2025年度　人工知能学会全国大会（第39回）,大阪, 2025.6, 安木駿介, 宮西大樹,　井上中順, 栗田修平, 坂本滉也, 東大地, Lee Jungdae, 瀧雅人, 松尾豊
基盤モデルと地図モジュールを用いたゼロショットロボット質問応答の実現, 第42回日本ロボット学会学術講演会　（RSJ2024）,大阪, 2024.9, 坂本滉也, 東大地, 宮西大樹, 栗田修平, 川鍋一晃
実世界質問応答のための拡散モデルを用いた回答可能位置の予測, 第27回画像の認識・理解シンポジウム（MIRU2024）,熊本, 2024.8, 東大地, 宮西大樹, 栗田修平, 坂本滉也, 川鍋一晃
異なるRGB-Dスキャンを用いたデータセット横断3D言語接地, 2023年度人工知能学会全国大会（第37回）,熊本, 2023.6, 宮西大樹, 東大地, 栗田修平, 川鍋一晃
屋内環境の意味的理解に向けた3次元質問応答, 第25回画像の認識・理解シンポジウム（MIRU2022）,兵庫, 2022.7, 東大地, 宮西大樹, 栗田修平, 川鍋一晃

Invited Talks

ScanQA: 3D Question Answering for Spatial Scene Understanding. MIRU2022. Daichi Azuma, Taiki Miyanishi, Shuhei Kurita and Motoaki Kawanabe

Academic Services

IROS2024 Reviewer
ACL ARR Reviewer
ICCV Reviewer