(last edited March 18th, 2026)
Main trends in benchmarking deformable-object manipulation
Benchmarking of robotic manipulation of textiles remains fragmented and evolving. Early work focused on physical task benchmarks that define standardized manipulation tasks (e.g., towel folding or cloth spreading) and evaluation metrics. More recent efforts have expanded benchmarking along three complementary directions.
First, several works emphasize object standardization, proposing shared cloth object sets and material characterization protocols so that experiments can be reproduced across laboratories. This reflects the recognition that textile manipulation results are strongly influenced by fabric properties, which are often poorly reported.
Second, the rapid growth of learning-based approaches has led to the emergence of simulation-based benchmarks such as SoftGym, DaXBench, GarmentLab and DexGarmentLab. These platforms provide reproducible environments and datasets for comparing reinforcement learning or planning algorithms at scale, but they also highlight a persistent sim-to-real gap in cloth dynamics.
Third, the field is beginning to adopt dataset-driven and competition-style benchmarks, where shared datasets and defined tasks enable head-to-head comparisons between methods.
Compared with rigid-object manipulation (e.g., YCB benchmarks), textile manipulation still lacks a widely adopted community standard benchmark with common datasets, tasks, and leaderboards.
| Ref | Work | Benchmark type | Scope | Tasks defined | Standardized objects | Metrics defined | Dataset shared | Data type | Simulation environment | Main contribution | Main limitation |
| [1] | Bimanual Cloth Manipulation | Physical benchmark | Cloth | Yes | Yes | Yes | Partial | Real | No | One of the first structured benchmarks for cloth manipulation tasks. | Limited dataset and task diversity. |
| [2] | Household Cloth Object Set | Object standardization | Cloth | Yes | Yes | Partial | Yes | Real | No | Standardized cloth object set enabling reproducible experiments across labs. | No large dataset or leaderboard. |
| [3] | Standardization of Cloth Objects | Material characterization | Cloth | No | Yes | Partial | Partial | Real | No | Introduces material descriptors to enable comparable experiments. | Focuses on cloth characterization rather than manipulation tasks. |
| [4] | SoftGym | Simulation benchmark | Deformable objects | Yes | Yes | Yes | Yes | Simulation | Yes | Widely adopted RL benchmark with cloth manipulation tasks. | Limited realism of cloth physics. |
| [5] | DaXBench | Simulation benchmark | Deformable objects | Yes | Yes | Yes | Yes | Simulation | Yes | Differentiable physics benchmark for learning deformable manipulation. | Mostly simulation-based evaluation. |
| [6] | Sim-to-Real Gap | Sim-to-real benchmark | Cloth | Yes | Yes | Yes | Partial | Real + Simulation | Yes | Benchmarks fidelity of cloth simulators compared to real experiments. | Narrow focus on simulator evaluation. |
| [7] | GarmentLab | Simulation benchmark | Garments | Yes | Yes | Yes | Yes | Simulation | Yes | Large-scale garment manipulation environment with many tasks and garments. | Still emerging; limited real-world validation. |
| [8] | Cloth Unfolding Benchmark (ICRA competition dataset) | Dataset + competition | Cloth | Yes | Yes | Yes | Yes | Real | No | Public dataset and benchmark for cloth unfolding grasp prediction. | Focused on a single subtask. |
| [9] | NIST Deformable Object Benchmark | Methodology benchmark | Deformable objects | Yes | Partial | Yes | No | — | No | Standardized metrics for evaluating deformable manipulation tasks. | Not textile-specific. |
| [10] | Flat’n’Fold | Dataset + benchmark | Garments | Yes | Yes | Yes | Yes | Real | No | Large multimodal dataset with ~2,000 demonstrations across 44 garments, capturing full manipulation sequences from crumpled to folded states. | Primarily perception/learning benchmark rather than full robot manipulation benchmark. |
| [11] | Cloth Folding Dataset (UGent) | Demonstration dataset | Cloth | Yes | Yes | Partial | Yes | Real | No | Dataset of ~8.5 hours of folding demonstrations (~1000 samples) for learning cloth folding policies. | Limited garment diversity and task scope. |
| [12] | Cloth Tracking Dataset | Dynamic cloth dataset | Cloth | No | Yes | No | Yes | Real | No | Motion capture dataset capturing cloth deformation dynamics across different fabrics. | Focused on cloth tracking for simulator system identification rather than manipulation tasks. |
| [13] | DexGarmentLab | Simulation environment | Garments | Yes | Yes | Yes | Yes | Simulation | Yes | Environment for dexterous bimanual garment manipulation with diverse garment assets and tasks. | Mostly simulation; limited physical validation. |
References
[1] Garcia-Camacho et al., Benchmarking Bimanual Cloth Manipulation, RA-L, 2020.
[2] Garcia-Camacho et al., Household Cloth Object Set, RA-L, 2022.
[3] Garcia-Camacho et al., Standardization of Cloth Objects and its Relevance in Robotic Manipulation, ICRA 2024.
[4] Lin et al., SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object Manipulation, CoRL 2021.
[5] Chen et al., DaXBench, ICLR 2023.
[6] Blanco-Mulero et al., Benchmarking the Sim-to-Real Gap in Cloth Manipulation, RA-L 2024.
[7] Lu et al., GarmentLab, NeurIPS 2024.
[8] De Gusseme et al., Cloth Unfolding Benchmark from ICRA 2024 competition, IJRR 2025.
[9] Kimble et al., Performance Measures to Benchmark Deformable Object Manipulation, Frontiers in Robotics and AI 2022.
[10] Zhuang et al., Flat’n’Fold: A Diverse Multi-Modal Dataset for Garment Perception and Manipulation, ICRA 2025.
[11] Verleysen et al., Human Demonstrations of Cloth Folding Dataset, IJRR 2020.
[12] Coltraro et al., Cloth Tracking Dataset, IJRR 2025.
[13] Wang et al., DexGarmentLab, 2025.
