LEGO trains robots on simple "toy" objects (spheres, cuboids, cylinders, rings) to achieve 67% grasping success on real YCB objects through object-centric visual representations and zero-shot generalization. Work done during internship at ItalAI, with Panasonic R&D and the BAIR Lab.
@inproceedings{niu2026lego,title={Learning to Grasp Anything by Playing with Random Toys},author={Niu, Dantong and Sharma, Yuvan and Shi, Baifeng and Ding, Rachel and Gioia, Matteo and Xue, Haoru and Tsai, Henry and Kallidromitis, Konstantinos and Pai, Anirudh and Shastry, Shankar and Darrell, Trevor and Malik, Jitendra and Herzig, Roei},booktitle={Proceedings of the International Conference on Learning Representations (ICLR)},year={2026},}
2025
ICCV
MonSTeR: a Unified Model for Motion, Scene, Text Retrieval
Luca Collorone*, Matteo Gioia*, Massimiliano Pappa, Paolo Leoni, Giovanni Ficarra, Or Litany, Indro Spinelli, and Fabio Galasso
In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025
MonSTeR is the first MOtioN-Scene-TExt Retrieval model aligning motion, textual intention, and scene context in a unified latent space, enabling robust cross-modal retrieval and zero-shot tasks.
@inproceedings{collorone2025monster,title={MonSTeR: a Unified Model for Motion, Scene, Text Retrieval},author={Collorone, Luca and Gioia, Matteo and Pappa, Massimiliano and Leoni, Paolo and Ficarra, Giovanni and Litany, Or and Spinelli, Indro and Galasso, Fabio},booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},year={2025},}