Learning strategies for DOM

(last updated March 11th, 2026)

Learning approaches for control can broadly be divided into two main paradigms: Reinforcement Learning (RL) and Imitation Learning (IL) [7]. Overall, RL and IL provide complementary advantages: RL enables autonomous skill discovery and long-horizon optimization, whereas IL offers strong priors from expert knowledge and improved sample efficiency. Modern robotic systems often combine both paradigms, for example by initializing policies with demonstrations and refining them through reinforcement learning.

Imitation Learning encompasses multiple techniques differing in how demonstrations are used:

Behavioral Cloning (BC): Treats policy learning as a supervised learning problem, directly mapping observed states to actions. While simple and data-efficient, BC can suffer from compounding errors due to distribution shift. Specifically, once the agent reaches a state not covered by the demonstration distribution, it lacks a mechanism to recover.
Learning from Demonstration (LfD): A machine learning paradigm in which the agent learns an imitation policy. It enables robots to acquire complex skills without the need for explicit trajectory programming [7].
Adversarial Imitation Learning (AIL): Learns policies by matching the distribution of expert and learner trajectories using adversarial training, improving robustness to state distribution mismatch. Generative Adversarial Imitation Learning (GAIL) is a specific AIL formulation in which a discriminator distinguishes expert from agent-generated behavior, guiding the policy toward expert-like performance without explicitly modeling rewards [8].

RL for deformable object manipulation

Reinforcement learning (RL) learns a control policy by trial-and-error interaction: the robot observes the world, takes actions, and updates its policy to maximize cumulative reward. For deformable object manipulation (DOM), RL is appealing because explicit modeling and planning can be brittle when objects have high-dimensional state, nonlinear dynamics, and partial observability (occlusion, self-contact, ambiguous shapes). In practice, DOM RL often aims to learn strategies (where/how to pull, fold, stretch, regrasp) rather than only low-level motor control, because modeling accurate deformation + contact at the torque level is difficult and expensive to learn from scratch.

A common theme in DOM benchmarks is that there may not be a single “correct” goal configuration (e.g., cloth folding can have multiple valid outcomes), which makes reward design and goal specification particularly important. This also interacts with partial observability: from a single camera view, multiple underlying cloth states can look similar, so the policy must be robust to state aliasing or rely on memory/state estimation.

Common RL approaches (as seen in DOM work)

Model-free RL (policy/value learning): learns directly from experience without an explicit dynamics model; popular in continuous control, but often sample-hungry and sensitive to reward design. In DOM, model-free RL is frequently paired with action abstractions (e.g., pick-and-place style actions rather than torques) and shaped rewards (coverage, distance-to-target, intersection/overlap penalties) to make exploration feasible.
Model-based RL: learns/uses a predictive model (sometimes combined with MPC/planning) to improve data efficiency—promising for DOM but depends on having a usable model. Deformables make this hard because small modeling errors can compound over long horizons (e.g., predicting cloth wrinkles or fluid spill), and SoftGym explicitly highlights the difficulty of accurate future prediction for deformable dynamics.
Hybrid / structure-in-the-loop RL: improves RL by injecting structure (e.g., compact state features, structured dynamics priors, residual learning, action abstractions) to reduce exploration burden and improve transfer. This is a natural fit for DOM because particle/mesh/graph structure can be exploited, and surveys emphasize structured modeling and multimodal sensing as key enablers for robust manipulation systems.
Hierarchical / skill-based RL: decomposes long tasks into subskills and learns sequencing; fits DOM because many tasks are multi-stage (spread → regrasp corners → fold → align → refine) and require recovery behaviors when folds slip or tangles form. Hierarchy helps with long-horizon credit assignment and lets you reuse skills across related tasks.
Sim-to-real RL: trains mostly in simulation and transfers to real robots, often using domain randomization (visual + dynamics). This is common in DOM because data collection is expensive and failures are messy; however, sim-to-real can be dominated by differences in material parameters (stiffness, friction), contact, and grasp dynamics, not just rendering. The sim-to-real cloth RL work (Matas et al.) illustrates both the promise of this pipeline and the need for careful randomization + engineering around grasping.

Some recent works

Sim-to-real deep RL for cloth (Matas et al., 2018)

A key early result is sim-to-real RL for cloth manipulation: policies are trained entirely in simulation (PyBullet), then deployed on a real Kinova arm for tasks like folding to a mark, diagonal folding, and draping over a hanger. The work shows a practical recipe for sparse-reward cloth RL: an improved DDPG-style pipeline integrating multiple stabilizing components (e.g., prioritized replay, n-step returns, twin critics, asymmetric actor-critic), plus heavy domain randomization for transfer. It also documents a very real limitation: PyBullet’s deformable/grasping support was insufficient “as is,” requiring custom “fake grasps” via anchors to cloth nodes.

Benchmarks for DOM RL (SoftGym, Lin et al., 2020)

SoftGym established an RL benchmark suite for rope/cloth/fluids with standardized tasks and observation modes (full state, reduced state, pixels). Two important findings:

Image-based RL underperforms state/reduced-state methods on many deformable tasks, showing that perception/state estimation is a core bottleneck.
Learning deformable dynamics from pixels is hard: SoftGym analyzes failures via poor future prediction (e.g., models failing to predict cloth shape or water spill trajectories).

Broader DOM perspective (Zhu et al., 2022)

These surveys emphasize that DOM breaks rigid-manipulation assumptions and requires progress across hardware, sensing, modeling, planning, and control. A notable community survey in Zhu et al. ranks sensing as high-importance but relatively low-maturity (i.e., high opportunity). Some researchers also highlight multimodal perception (vision + tactile), tactile simulators, and structured modeling (including particle/graph perspectives) as key ingredients for data-driven manipulation going forward.

Tactile-prior TD3 for robust deformable grasping (sim-to-real, Zhou et al., 2025)

Zhou et al. propose T-TD3, a reinforcement-learning framework for stable grasping of deformable objects that explicitly leverages tactile priors from vision-based tactile sensors (GelSight). The key idea is to (1) build a unified tactile prior representation by preprocessing tactile images into multiple maps (flow/contact/depth/force) and fusing them, and (2) extend TD3 with a Multi-Scale Fusion Network (MSF-Net) plus several stability/sample-efficiency tweaks (notably a dual-actor structure, noise attenuation during training, and critic regularization). They also decompose the overall grasping problem into three sequential subtasks—slip detection, stable grasp evaluation under disturbances, and minimum grasp force tracking—with explicit reward shaping for each, and train in a custom simulator CR5GraspStable-Env built with PyBullet + TACTO for tactile simulation. In experiments, they report strong sim-to-real performance, including a 94.81% real-world success rate across a set of everyday deformable objects, and show that richer tactile representations (their priors) improve robustness compared to using only raw/partial tactile cues.

Common problems (recurring across the papers)

Perception + partial observability: pixel-based policies lag badly; occlusion/self-occlusion and ambiguous shapes make state estimation difficult.
Deformable dynamics prediction: forward prediction is unreliable, hurting model-based RL and even representation learning (SoftGym explicitly demonstrates this).
Simulation limitations + sim-to-real gap: deformable physics and grasp/contact are hard to simulate well; Matas et al. needed grasp “hacks,” and transfer depends heavily on what is randomized.
Low-level grasping dominates real performance: benchmarks often abstract it away (pickers), but real robots fail on contact/grasp reliability.
Long-horizon composition: real tasks are multi-stage and require recovery behaviors; surveys highlight this as a major gap to practical deployment.

Research directions

Close the “pixels → state” gap: better representations for deformables (particle/graph-structured latents, keypoint/feature tracking under occlusion) to narrow the large performance gap SoftGym reports.
Multimodal RL (vision + touch): surveys argue vision gives global shape while tactile provides local contact/material cues; building tactile simulators and multimodal DOM datasets is a high-leverage enabling direction.
Hybrid/model-based RL with structured models: model-based RL can be more data-efficient, but needs better predictive models; some work explicitly points to structured modeling approaches and the need to handle sim-real gaps (e.g., residual adaptation).
Bridge benchmark abstractions to robots: develop methods that learn in picker-style benchmarks but transfer to real grippers (or explicitly learn grasp/contact policies alongside high-level policies).
Long-horizon skills + recovery: hierarchical policies and explicit recovery (regrasp, untangle, spread-then-fold) are necessary for realistic DOM as emphasized by the surveys’ discussion of multi-stage tasks.

References

[1] Matas, Jan, Stephen James, and Andrew J. Davison. “Sim-to-real reinforcement learning for deformable object manipulation.” Conference on Robot Learning. PMLR, 2018.

[2] Lin, Xingyu, et al. “Softgym: Benchmarking deep reinforcement learning for deformable object manipulation.” Conference on Robot Learning. PMLR, 2021.

[3] Zhu, Jihong, et al. “Challenges and outlook in robotic manipulation of deformable objects.” IEEE Robotics & Automation Magazine 29.3 (2022): 67-77.

[5] Vodolazskii, Danil, et al. “A review of robotic manipulation of deformable objects with imitation learning techniques: Progress and outlook.” 2025 11th International Conference on Electrical Engineering, Control and Robotics (EECR). IEEE, 2025.

[6] Zhou, Yanmin, et al. “T-td3: A reinforcement learning framework for stable grasping of deformable objects using tactile prior.” IEEE Transactions on Automation Science and Engineering 22 (2024): 6208-6222.

[7] Zare, Maryam, et al. (2023). A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges. Version Number: 1. doi: 10.48550/ARXIV.2309.02473. url: https://arxiv.org/abs/2309.02473.

[8] Ho, Jonathan and Stefano Ermon (2016). Generative Adversarial Imitation Learning. Version Number: 1. doi: 10.48550/ARXIV.1606.03476. url: https://arxiv.org/abs/1606.03476