Structured Bayesian Meta-Learning for Data-Efficient Visual-Tactile Model Estimation

Shaoxiong Yao¹, Yifan Zhu², Kris Hauser¹

¹University of Illinois at Urbana-Champaign, IL, USA. ²Yale University, CT, USA.

Abstract

Estimating visual-tactile models of deformable objects is challenging because vision suffers from occlusion while touch data is sparse and noisy. We propose a novel data-efficient method for dense heterogeneous model estimation by leveraging experience from diverse training objects. The method is based on Bayesian meta-learning (BML), which can mitigate overfitting high-capacity visual-tactile models by meta-learning an informed prior and naturally achieves few-shot online estimation via posterior estimation. However, BML requires a shared parametric model across tasks but visual-tactile models for diverse objects have different parameter spaces. To address this issue, this paper introduces Structured Bayesian Meta-Learning (SBML) that incorporates heterogeneous physics models, enabling learning from training objects with varying appearances and geometries. SBML performs zero-shot vision-only prediction of deformable model parameters, as well as few-shot adaptation after a handful of touches. Experiments show that in two classes of heterogeneous objects, namely plants and shoes, SBML outperforms existing approaches in force and torque prediction accuracy in zero- and few-shot settings.

BibTeX

@inproceedings{ yao2024structured, title={Structured Bayesian Meta-Learning for Data-Efficient Visual-Tactile Model Estimation}, author={Shaoxiong Yao and Yifan Zhu and Kris Hauser}, booktitle={8th Annual Conference on Robot Learning}, year={2024}, url={https://openreview.net/forum?id=TzqKmIhcwq} }

Structured Bayesian Meta-Learning for Data-Efficient Visual-Tactile Model Estimation

TL;DR: We enable data-efficient visual-tactile model estimation by learning a prior of visual-tactile model from diverse real world objects.

Offline visual-tactile dataset collection

Here we show the robot touching several artificial plants. We collect RGBD video streams and corresponding joint torque sensor readings.

Zero-shot prediction from vision

Given a novel object, we use the prior learned offline to predict stiffness from appearance alone. Here, the branch region is predicted to be stiffer than the leaf region, aligning with our intuition.

Few-shot adaption using touch

Once the robot begins touching the plant, we enable efficient stiffness map update using touch data. Branches behind the leaves cause a torque response larger than expected, leading to an increased estimated stiffness to match observations.

Abstract

Method

Results

Unified prior over plants and shoes

Our method can learn a high-capacity prior across multiple object categories. Here we show the unified prior mean prediction trained using both plant and shoe broad datasets.

Dracaena

Orange tree

Leather boot

Sneaker

Running shoe

BibTeX