Generating realistic 3D human-object interactions (HOIs) from text descriptions is a active research topic with potential applications in virtual and augmented reality, robotics, and animation. However, creating high-quality 3D HOIs remains challenging due to the lack of large-scale interaction data and the difficulty of ensuring physical plausibility, especially in out-of-domain (OOD) scenarios. Current methods tend to focus either on the body or the hands, which limits their ability to produce cohesive and realistic interactions. In this paper, we propose OOD-HOI, a text-driven framework for generating whole-body human-object interactions that generalize well to new objects and actions. Our approach integrates a dual-branch reciprocal diffusion model to synthesize initial interaction poses, a contact-guided interaction refiner to improve physical accuracy based on predicted contact areas, and a dynamic adaptation mechanism which includes semantic adjustment and geometry deformation to improve robustness. Experimental results demonstrate that our OOD-HOI could generate more realistic and physically plausible 3D interaction pose in OOD scenarios compared to existing methods.
Our approach decomposes the generation process into three module: (1) a dual-branch reciprocal diffusion model that exchanges information between human and object to generate an initial interaction pose, (2) a contact-guided interaction refiner is employed to revise the initial interaction human-object pose with additional inference-time guidance, (3) and a dynamic adaptation module designed for out-of-domain (OOD) generation, ensuring more realistic and physically plausible results.
The refiner module takes text prompt, initial hand pose and object geometry as input, predicts the contact area between hand and object, and optimizes the floating object and interpenetration based on the predicted contact area.
@article{zhang2024oodhoi,
author = {Zhang, Yixuan and Yang, Hui and Luo, Chuanchen and Peng, Junran and Wang, Yuxi and Zhang, Zhaoxiang},
title = {OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domain},
journal = {arxiv preprint arxiv:xxxx },
year = {2024},
}