Data Collection SDK | Trossen Robotics
top of page
Abstract Code Display

Data Collection SDK

A modular open-source C++ framework for high quality robotic data collection

SOLVING THE
DATA COLLECTION
BOTTLENECK

TROSSEN DATA COLLECTION SDK IS
PERFORMANT
MODULAR
CONFIGURABLE
HARDWARE AGNOSTIC

Training robot manipulation policies requires large volumes of high-quality demonstration data: synchronized joint states, camera streams, and mobile base odometry, all captured at precise rates and stored in formats that machine learning pipelines can actually consume. For most research teams, this means stitching together ad-hoc scripts, wrestling with ROS bag files, and writing one-off conversion tools every time the hardware changes. The pipeline between "physically moving a robot arm" and "training a neural network" is fragile, slow to iterate on, and almost never reusable across projects.

The Trossen SDK is an open-source C++ framework built to eliminate that bottleneck. It provides a configuration-driven, hardware-agnostic data collection pipeline that records synchronized multi-modal episodes from arms, cameras, and mobile bases, then converts them directly into the LeRobot V2 format used by the HuggingFace robotics ecosystem. It ships ready to use with Trossen AI Kits, but its registry-based architecture means adding support for new sensors, robots, or storage backends requires no changes to the core library.

WHAT MAKES IT DIFFERENT?

A Real Architecture, Not a Script Collection

Most robotics data collection tools are either tightly coupled to a single robot or are thin wrappers around ROS topics. The Trossen SDK is neither. It is built around five cooperating components with clean separation of concerns:

Hardware Component

Abstracts a physical device (arm, camera, base) behind a common interface configured via JSON.

Producers

From both polled and push-based reads from hardware and emits typed records at configurable rates, up to 200 Hz for joint states.

Sink

Owns a lock-free multi-producer single-consumer queue and a background drain thread. Disk latency never stalls sensor polling.

Backend

Serializes record batches to storage. One backend instance per episode keeps state isolated.

Session Manager

Orchestrates the full episode lifecycle: it creates a fresh Scheduler, Sink, and Backend for every episode, manages duration limits and auto-stop, and provides real-time statistics and lifecycle callbacks.

MODULAR

Each layer can be extended or replaced independently.

PLUGIN REGISTRIES

Add Hardware Without Touching Core Code

The SDK uses static factory registries for hardware, producers, and backends. Adding a new sensor is a matter of implementing the `HardwareComponent` and `PolledProducer` interfaces, then dropping two macros into your `.cpp` file. 

The build system automatically picks up the new files. No central dispatch tables to edit, no recompilation of unrelated modules. The same pattern applies to backends: if you need to write data to a custom format, implement the `Backend` interface and register it. The SDK currently ships with TrossenMCAP (high-performance binary recording) and LeRobot V2, but the registry is open to extensions.

```cpp
REGISTER_HARDWARE(MyNewSensor, "my_sensor")
REGISTER_PRODUCER(MyNewSensorProducer, "my_sensor")
```

CONFIGURATION DRIVEN

CLI-Overridable

Every aspect of a recording session is defined in a single JSON file: hardware addresses, camera resolutions, producer poll rates, teleop pairings, episode duration limits, and backend parameters. No recompilation needed to change a camera serial number or switch from 30 Hz to 200 Hz arm recording. For quick iteration, any config value can be overridden at the command line using dot-notation. 

A `--dump-config` flag prints the fully merged configuration without running, making it easy to verify settings before a recording session.

```bash
.
/trossen_solo_ai \
   --set session.max_duration=30 \
   --
set backend.dataset_id=trial_01
```

LOCK-FREE PIPELINE

Zero-Copy Where It Counts

The producer-to-sink path uses a lock-free MPSC queue so that high-frequency arm polling at 200 Hz is never blocked by camera frame writes or disk flushes. The Sink batches up to 64 records per drain iteration before forwarding to the backend, amortizing serialization overhead. The Scheduler supports a dual-lane architecture with configurable high-resolution timing for latency-sensitive tasks and spin-wait thresholds for sub-millisecond precision.

DIRECT PATH TO TRAINING

TrossenMCAP to LeRobot V2

The converter produces Parquet files for joint state data, MP4 video for camera streams, and `info.json` with per-episode statistics. The output is immediately consumable by LeRobot training scripts and uploadable to HuggingFace Hub. No intermediate formats, no manual post-processing.

```bash
trossen_mcap_to_lerobot_v2 \
   ~/.trossen_sdk/my_dataset/ \
   ~/lerobot_datasets
```

SUPPORTED HARDWARE OUT OF THE BOX

Three example configurations cover the main Trossen AI Kit setups: Solo AI, Stationary AI, and Mobile AI.
Each is a working starting point that can be customized solely through the config file.

Trossen AI

WidowX AI

RealSense
A3-R-RealSense-Logo_edited.png

D400 series

Stereolabs/ZED
6bbc6bae9ec3bf985402528f76b4c369a9d98830.png

Stereolabs ZED series

USB/V4L2

OpenCV-compatible device

SLATE

Mobile Base

UNDER THE HOOD

How TrossenMCAP Records Everything

At the core of the SDK's storage layer is TrossenMCAP — a high-performance binary recording format built on the open MCAP https://mcap.dev container standard. MCAP was designed for multi-channel, time-stamped robotics data, and the Trossen SDK takes full advantage of its structure to capture synchronized sensor streams with zero data loss.

SCHEMA-FIRST DESIGN WITH PROTOBUF

Every data type recorded by the SDK has a well-defined schema written in Protocol Buffers (proto3). This is not ad-hoc byte packing — each message type is formally specified, self-describing, and forward-compatible.

JointState

Joint positions, velocities, and efforts as float arrays, each tagged with a dual-clock timestamp (monotonic steady clock + real-time UTC) and a sequence number.

RawImage

Camera frames stored with width, height, encoding (e.g., `"bgr8"`), step size, and raw pixel data. Depth images from cameras get their own dedicated channel with depth scale metadata.

Odometry2D

Mobile base pose (x, y, theta) and twist (linear velocity, angular velocity), also dual-timestamped.

Timestamp

Every record carries both a monotonic clock (for jitter-free duration math) and a real-time clock (for correlation with wall-clock events). This dual-timestamp approach means you never lose timing precision even if the system clock adjusts mid-recording.

CHANNEL ARCHITECTURE ONE TOPIC PER STREAM

Inside an MCAP file, data is organized into named channels, each carrying a single message type. Channels are created lazily — the first time a producer emits a record for a given stream ID, the backend registers the channel and its protobuf schema. This means the MCAP file only contains channels for the hardware you actually have connected. A Solo kit with two cameras produces four channels; a Mobile kit with a SLATE base and three cameras produces eight. No wasted space, no empty channels.

Channel Topic

`/leader/joints/state`

 `/follower/joints/state`

`/cameras/camera_main/image`

`/cameras/camera_wrist/image`

`/cameras/camera_main_depth/image`

`/base/odom/state`

Message Type

JointState

JointState

RawImage

RawImage

RawImage

Odometry2D

Example

Leader arm joint data at 200 Hz

Follower arm joint data at 200 Hz

Main camera color frames at 30 FPS

Wrist camera color frames at 30 FPS

Depth frames with scale metadata

Mobile base odometry at 50 Hz

EPISODE ISOLATION

Each recording episode gets its own MCAP file

Example: `episode_000000.mcap`, `episode_000001.mcap`, and so on. This is a deliberate design choice. If episode 47 is corrupted or you decide to discard it, you delete one file. The other 46 episodes are untouched. The SessionManager creates a fresh backend instance for every episode, so there is no shared state that can leak between recordings.

Each MCAP file also embeds a metadata record (`trossen_sdk_recording`) containing the dataset ID, robot name, SDK version, and a JSON blob describing every stream's joint names, camera resolutions, and whether a mobile base is present. This metadata travels with the file — you can hand someone an `.mcap` file months later and they have everything needed to interpret it.

WHY MCAP OVER ALTERNATIVES?

MCAP
vs
ROS Bags

MCAP is the storage format behind ROS 2 bags, but the Trossen SDK uses it directly without requiring ROS. You get the same efficient binary container without the middleware dependency, the topic naming conventions, or the build system complexity. MCAP files recorded by the SDK can still be opened in Foxglove Studio for visualization and debugging.

MCAP
vs
HDF5

HDF5 is designed for array-oriented scientific data, not time-series sensor streams. It lacks native support for multi-channel messages with different schemas, does not handle variable-rate producers well, and its file-level locking model is a poor fit for real-time recording from multiple threads.

MCAP
vs
Raw Files

CSV, JSON, PNG sequences: Flat files scatter a single episode's data across hundreds of files and directories. Timestamps must be reconstructed from filenames. There is no schema, no compression, and no way to verify data integrity. MCAP packs everything into a single seekable binary file with built-in chunking, optional compression (Zstd or LZ4), and CRC integrity checks.

CONFIGURABLE PERFORMANCE

The MCAP backend exposes tuning knobs through the same JSON config system

Chunk size controls how much data is buffered before writing a chunk to disk (default: 4 MB). Larger chunks improve compression ratios; smaller chunks reduce memory footprint.

Compression supports Zstd (best ratio), LZ4 (fastest), or none. For camera-heavy recordings, Zstd can cut file sizes by 30-50% with negligible CPU overhead on modern hardware.

```json
{
 "trossen_mcap_backend": {
    "root": "~/.trossen_sdk",
    "dataset_id": "my_experiment",
    "chunk_size_bytes": 4194304,
    "compression": "zstd"
  }
}
```

FROM MCAP TO TRAINING

The SDK includes a standalone converter (`trossen_mcap_to_lerobot_v2`) that reads MCAP episodes and produces a complete LeRobot V2 dataset. The output is immediately compatible with LeRobot training scripts and can be uploaded directly to HuggingFace Hub.

EXTRACT

Reads the embedded metadata to discover streams, joint names, and camera specs automatically.

SYNCHRONIZE

Aligns all streams to a common 30 Hz timebase, interpolating joint states and matching camera frames by timestamp.

TRANSFORM

Maps leader arm joints to `action` columns and follower arm joints to `observation.state` columns in the LeRobot schema. Mobile base velocities are appended to both.

ENCODE

Writes Parquet files for numeric data and MP4 video for camera streams, organized into chunks for efficient loading.

COMPUTE STATISTICS

Generates per-episode min/max/mean/std for every channel, written to `episode_stats.jsonl` for dataset quality inspection.

WHO IS THIS FOR?

The Trossen SDK is built for robotics researchers and engineers who need reliable, high-throughput data collection without the overhead of a full middleware stack. If you are collecting demonstration data for imitation learning, building a teleoperation pipeline for a manipulation task, or evaluating new sensor hardware for a data collection rig, this SDK gives you a structured, extensible foundation rather than a pile of scripts you will rewrite next month.

It is pure C++ with no ROS dependency, builds with CMake on Ubuntu, and is designed to run on the same compute hardware that controls your robot.

GETTING STARTED

The SDK is open source and available on GitHub. Clone the repository, choose the example that matches your Trossen AI Kit configuration, edit the JSON config to match your hardware addresses and camera serials, and start recording. From raw demonstrations to a training-ready LeRobot V2 dataset is a single conversion command away.

If your hardware is not yet supported, the registry pattern means you can add it yourself without forking the project. Implement two interfaces, add two macros, and your new sensor is a first-class citizen in the pipeline.

WHATS NEXT?

THIS IS JUST THE BEGINNING!

What you see today is the initial implementation. We have major updates planned — all in the spirit of keeping things robust, performant, and ready for more data, more sensors, and more robots. The modular architecture was designed from day one to grow, and we intend to push it further: faster pipelines, broader hardware support, richer backends, and tighter integration with the training ecosystem.

This is an active project, and we are building it alongside the researchers and engineers who use it every day.

WE WANT TO 

HEAR FROM YOU

What you see today is the initial implementation. We have major updates planned — all in the spirit of keeping things robust, performant, and ready for more data, more sensors, and more robots. The modular architecture was designed from day one to grow, and we intend to push it further: faster pipelines, broader hardware support, richer backends, and tighter integration with the training ecosystem.

This is an active project, and we are building it alongside the researchers and engineers who use it every day.

OUR PROMISE TO YOU

We stand behind our products with an industry-leading commitment to reliability, service,
and long-term support—because we believe performance should be measured in years, not months.

BUILT FOR REAL-WORLD RESEARCH ENVIRONMENTS. COVERS DEFECTS IN MATERIALS AND WORKMANSHIP. WEAR COMPONENTS ARE FIELD-REPLACEABLE AND READILY AVAILABLE.
LIFETIME SUPPORT FOR TROSSEN PRODUCTS 

Follow Us On Social

  • LinkedIn
  • Youtube
  • Facebook
  • GitHub
  • Twitter
  • Instagram
  • TikTok

© 2025 Trossen Robotics. All Rights Reserved.

bottom of page