Vision-language models gain spatial reasoning skills through artificial worlds and 3D scene descriptions

Press/Media: Research

Description

Recent advancements in vision-language models (VLMs) highlight the potential of artificial worlds and 3D scene descriptions to enhance spatial reasoning capabilities. By leveraging synthetic datasets that pair 3D images with natural language and transformation matrices, VLMs can be trained to interpret spatial relationships and visual perspectives. This approach supports the development of embodied AI systems, such as robots, that can better navigate environments and collaborate with humans. As virtual environments become more realistic, these methods may significantly improve the transfer of spatial reasoning skills from simulation to real-world applications.
Period13 Jun 2025

Media coverage

1

Media coverage

  • TitleVision-language models gain spatial reasoning skills through artificial worlds and 3D scene descriptions
    Degree of recognitionInternational
    Media name/outletTech Xplore
    Media typeWeb
    Country/TerritoryUnited Kingdom
    Date13/06/25
    DescriptionA team of researchers from the Italian Institute of Technology (IIT) and the University of Aberdeen have recently introduced a new conceptual framework and a dataset containing computationally generated data, which could be used to train VLMs on spatial reasoning tasks.
    Producer/AuthorIngrid Fadelli
    URLhttps://techxplore.com/news/2025-06-vision-language-gain-spatial-skills.html
    PersonsJoel William Currie, Gioele Migno, Enrico Piacenti, Elena Giannaccini, Patric Bach, Davide De Tommaso , Agnieszka Wykowska