We present a framework that integrates EEG-based visual and motor imagery (VI/MI) with robotic control to enable real-time, intention-driven grasping and placement. Motivated by the promise of BCI-driven robotics to enhance human-robot interaction, this system bridges neural signals with physical control by deploying offline-pretrained decoders in a zero-shot manner within an online streaming pipeline. This establishes a dual-channel intent interface that translates visual intent into robotic actions, with VI identifying objects for grasping and MI determining placement poses, enabling intuitive control over both what to grasp and where to place. The system operates solely on EEG via a cue-free imagery protocol, achieving integration and online validation. Implemented on a Base robotic platform and evaluated across diverse scenarios, including occluded targets or varying participant postures, the system achieves online decoding accuracies of 40.23% (VI) and 62.59% (MI), with an end-to-end task success rate of 20.88%. These results demonstrate that high-level visual cognition can be decoded in real time and translated into executable robot commands, bridging the gap between neural signals and physical interaction, and validating the flexibility of a purely imagery-based BCI paradigm for practical human–robot collaboration.
High-level cognitive control pipeline for EEG-based robotic grasp-and-place.
The system integrates offline-trained VI/MI decoders into an online streaming pipeline: VI decoding determines grasping intent (object selection), while MI decoding determines placement intent (target pose/position). The resulting dual-channel commands drive a robotic arm to execute the grasp-and-place task in real-world settings, demonstrating seamless mapping from high-level visual cognition to physical manipulation.
Participants. Five healthy graduate students (2 females; age 23–30 years; M = 25.4, SD = 2.97) participated in this study. All five completed the offline experiments, and four further engaged in the online testing.
Offline tasks. Offline data collection comprised visual perception (VP), visual imagery (VI), and motor imagery (MI) tasks.
EEG acquisition. EEG was recorded using a 64-channel Neuracle system (59 scalp EEG, 2 mastoid, 2 EOG, 1 ECG) at a sampling rate of 1000 Hz.
Robotics platform. We employed a KINOVA GEN2 robot as the embodied platform for systematic interaction. Two Intel RealSense D435 cameras served as the primary sensors of the robotic perception system.
We design a cue-free imagery protocol with three offline tasks—visual perception (VP), visual imagery (VI), and motor imagery (MI). During online control, VI is decoded to infer grasping intent (what to grasp), while MI is decoded to infer placement intent (where to place), enabling a dual-channel interface for real-time grasp-and-place.
Offline results for three tasks: (a) Visual Perception, (b) Visual Imagery, and (c) Motor Imagery. For each task, accuracies are reported per participant across frequency conditions.
Online system results. Top: Base demo performance per subject on online VI/MI tasks across model/frequency settings (e.g., 63.64 (RGNN/40) denotes accuracy using the RGNN model at 40 Hz). For each subject, we report the top-2 online accuracies and the overall mean. Bolded values indicate the row maximum, while underlined values mark the second highest. Middle: Overall online system accuracy, with online VI/MI accuracy representing the mean top-2 accuracies across all subjects. Bottom: Overall system runtime statistics.
| Subject | VI-online Task | MI-online Task | ||||
|---|---|---|---|---|---|---|
| Top1 Acc. (%) | Top2 Acc. (%) | Overall Avg. (%) | Top1 Acc. (%) | Top2 Acc. (%) | Overall Avg. (%) | |
| 01 | 63.64 (RGNN/40) |
44.44 (RGNN/60) |
47.14 | 75.00 (RGNN/60) |
60.00 (MLP/40) |
61.67 |
| 02 | 50.00 (EEGNet/100) |
39.13 (MLP/100) |
40.82 | 65.22 (MLP/100) |
60.00 (MLP/40) |
58.41 |
| 03 | 50.00 (MLP/60) |
33.67 (RGNN/60) |
38.89 | 60.00 (RGNN/100) |
53.85 (RGNN/60) |
54.62 |
| 04 | 37.50 (RGNN/100) |
33.67 (MLP/60) |
34.72 | 66.67 (MLP/60) |
60.00 (MLP/40) |
60.00 |
| Online VI avg Acc. (%) | Query success rate (%) | Online MI avg Acc. (%) | Place success rate (%) | System Accuracy (%) |
|---|---|---|---|---|
| 40.23 | 76.11 | 62.59 | 100 | 20.88 |
| Prepare | VI Task | VI Data Proc. | VI Infer | MI Task | MI Data Proc. | MI Infer | Robot Exec. | System Total |
|---|---|---|---|---|---|---|---|---|
| 6.010 | 15.000 | 0.639 | 8.191 | 15.000 | 0.524 | 8.000 | 54.872 | 107.266 |
Online Demos for extended application scenarios.
(a) Hidden Object Interaction: Using only VI-EEG, the subject guides the system to identify and reveal a hidden object, demonstrating decoding without visual cues.
(b) Direct Human–Robot Interaction: By imagining the object and controlling hand movements through VI/MI, the subject interacts seamlessly with the robot, which places the object directly into their hand.
Clip 1
Clip 2
Clip 3
Clip 4
@article{liu2026robotic,
title={Robotic Grasping and Placement Controlled by EEG-Based Hybrid Visual and Motor Imagery},
author={Liu, Yichang and Wang, Tianyu and Ye, Ziyi and Li, Yawei and Jiang, Yu-Gang and Wang, Shouyan and Fu, Yanwei},
journal={arXiv preprint arXiv:2603.03181},
year={2026}
}