publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2026

ICLR
Learning to Grasp Anything by Playing with Random Toys

Dantong Niu, Yuvan Sharma, Baifeng Shi, Rachel Ding, Matteo Gioia, Haoru Xue, Henry Tsai, Konstantinos Kallidromitis, Anirudh Pai, Shankar Shastry, Trevor Darrell, Jitendra Malik, and Roei Herzig

In Proceedings of the International Conference on Learning Representations (ICLR), 2026

Abs Bib Website

LEGO trains robots on simple "toy" objects (spheres, cuboids, cylinders, rings) to achieve 67% grasping success on real YCB objects through object-centric visual representations and zero-shot generalization. Work done during internship at ItalAI, with Panasonic R&D and the BAIR Lab.
@inproceedings{niu2026lego, title = {Learning to Grasp Anything by Playing with Random Toys}, author = {Niu, Dantong and Sharma, Yuvan and Shi, Baifeng and Ding, Rachel and Gioia, Matteo and Xue, Haoru and Tsai, Henry and Kallidromitis, Konstantinos and Pai, Anirudh and Shastry, Shankar and Darrell, Trevor and Malik, Jitendra and Herzig, Roei}, booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)}, year = {2026}, }

2025

ICCV
MonSTeR: a Unified Model for Motion, Scene, Text Retrieval

Luca Collorone^*, Matteo Gioia^*, Massimiliano Pappa, Paolo Leoni, Giovanni Ficarra, Or Litany, Indro Spinelli, and Fabio Galasso

In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

Abs Bib Website

MonSTeR is the first MOtioN-Scene-TExt Retrieval model aligning motion, textual intention, and scene context in a unified latent space, enabling robust cross-modal retrieval and zero-shot tasks.
@inproceedings{collorone2025monster, title = {MonSTeR: a Unified Model for Motion, Scene, Text Retrieval}, author = {Collorone, Luca and Gioia, Matteo and Pappa, Massimiliano and Leoni, Paolo and Ficarra, Giovanni and Litany, Or and Spinelli, Indro and Galasso, Fabio}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, year = {2025}, }