Trossen SDK: Open-Source Robotics Data Collection

Mar 24
8 min read

Updated: 6 days ago

The Short Version

Clone the Trossen SDK from GitHub and review the Data Collection SDK page for architecture and setup details.
Define your recording behavior in JSON config—set hardware addresses, resolutions, poll rates, and teleoperation pairings without recompiling.
Record synchronized, multi-modal episodes from robot arms, cameras, and mobile bases into TrossenMCAP.
Add new sensors, producers, or backends through the registry-based interfaces instead of editing core code.
Convert recorded episodes to LeRobot V2 format, producing Parquet for state data and MP4 for video.
Test it with your own hardware, push the edges, and check where it saves time and reduces friction.
Submit an issue on GitHub, request a feature, or fork the project to help shape the roadmap.

Who this is for

Robotics researchers collecting demonstration data
Engineers building imitation learning or teleoperation pipelines
Teams evaluating sensor stacks and capture workflows
Developers wanting a C++ data collection foundation without ROS
Contributors shaping open infrastructure for physical AI

Want to jump right into all the technical details? Check out the landing page: Data Collection SDK

Just take me to GitHub! https://github.com/TrossenRobotics/trossen_sdk

The Trossen SDK is an open-source C++ framework from Trossen Robotics for robotics and physical AI data collection. It records synchronized, multi-modal episodes from robot arms, cameras, and mobile bases, then moves that data into formats modern training pipelines can actually use. It is configuration-driven, hardware-agnostic by design, and built to be extended without changing the core library. The goal is simple: reduce the friction between physically demonstrating a task and training a model on the resulting data.

Robotics does not just have a model problem. It has a data problem.

If you are training manipulation policies, building imitation learning workflows, or trying to turn demonstrations into training-ready datasets, you already know where things tend to break down. Joint states come from one place. Camera streams come from another. Mobile base odometry lives somewhere else. Timing gets messy. Storage formats drift. Every hardware change creates more one-off code.

Before long, the path from "record a task" to "train a model" becomes fragile, slow, and difficult to reuse. That is a bad foundation for physical AI.

The field needs better shared infrastructure for robotics data collection. Something more modular. More reusable. More extensible. Something that helps the community move faster, instead of forcing every team to rebuild the same plumbing over and over again.

That is why Trossen Robotics is open-sourcing the Trossen SDK.

What is the Trossen SDK?

The Trossen SDK is an open-source C++ framework for robotics and physical AI data collection. It is designed to record synchronized, multi-modal episodes from robot arms, cameras, and mobile bases, then move that data into formats that modern training pipelines can actually use.

It is configuration-driven, hardware-agnostic by design, and built to be extended without requiring changes to the core library.

At a practical level, the goal is simple: reduce the friction between physically demonstrating a task and training a model on the resulting data.

For teams working in robotics research, imitation learning, teleoperation, and embodied AI, that gap is often where momentum gets lost. The Trossen SDK is built to help close it.

Why does open-source robotics data collection matter?

Trossen Robotics is open-sourcing the Trossen SDK because robotics needs stronger common tooling.

In machine learning, progress accelerated as the ecosystem began to converge around shared frameworks, patterns, and abstractions. Robotics still has too much fragmentation at the data layer. Too many pipelines are custom, brittle, and tightly bound to a single setup. That slows iteration, makes collaboration harder, and creates unnecessary reinvention.

Open source is one of the clearest ways to improve that. It gives researchers and engineers a real starting point instead of another blank page. It creates a path for the community to fork, test, refine, and extend the framework. And it increases the likelihood that the field will standardize on better ways to capture, structure, and hand off robotics data.

That is the larger point of this release.

Yes, we want people to use the SDK. But we also want to help push the ecosystem toward a more durable software layer for physical AI.

What's wrong with most robotics data collection workflows?

For many teams, data collection still looks like a patchwork of scripts, middleware dependencies, bag files, format conversions, and hardware-specific workarounds. That approach can work for a proof of concept, but it rarely scales well.

The moment you add a new sensor, swap a camera, change recording rates, or move from one robot setup to another, you often end up rewriting infrastructure that should have been modular in the first place.

That is exactly the bottleneck the Trossen SDK is meant to address.

Instead of treating robotics data collection as a collection of temporary hacks, the Trossen SDK treats it as a real system. The architecture separates hardware abstraction, producers, buffering, storage backends, and session orchestration into distinct layers. That makes the pipeline easier to reason about, extend, and keep stable as the system grows. It also supports high-frequency recording, lock-free queuing between producers and storage, and a direct path to downstream dataset conversion.

What makes the Trossen SDK different?

1. A real architecture, not a script pile

The SDK is built around a clean separation of concerns. Hardware components, producers, sinks, backends, and session management are treated as independent parts of the system rather than collapsed into one fragile application. That matters because it lets teams change one part of the stack without destabilizing everything else.

2. Configuration-driven robotics recording

Recording behavior is defined via JSON configuration rather than hidden in source code. Hardware addresses, resolutions, poll rates, teleoperation pairings, duration limits, and backend parameters can all be adjusted without recompiling. That makes iteration faster and reduces the cost of adapting the stack to new experiments or hardware layouts.

3. Hardware extensibility without touching core code

The registry-based design is a major part of the SDK's philosophy. New sensors, producers, and backends can be added through interfaces and registration — not by editing central dispatch code or rewriting the framework. If you want to support new hardware, the intent is to make that feel like a first-class extension, not a forked maintenance burden.

4. Built for synchronized, high-throughput data collection

The producer-to-sink path uses a lock-free queue, so sensor polling is not blocked by disk writes or serialization overhead. That is especially important for high-frequency streams such as arm state recording, where performance problems can quietly degrade data quality. The design is meant to preserve throughput and timing integrity under real recording conditions.

5. A direct path from recording to training

Recorded episodes can be converted to the LeRobot V2 format, producing outputs such as Parquet for structured state data and MP4 for video streams. That shortens the distance from capture to experimentation and avoids a lot of the manual cleanup that typically slows teams down after a recording session.

Why is MCAP part of the story?

One of the most important choices in the SDK is its storage approach.

At the core of the storage layer is TrossenMCAP, built on the open MCAP standard. That matters because robotics data is inherently multi-channel and time-sensitive. You are dealing with different stream types, different rates, and a need to preserve synchronization and metadata in a way that remains usable later.

This is not just about file format preference. It is about building a better foundation for capture, inspection, reuse, and downstream processing.

Approach	What you get with TrossenMCAP
Fragmented flat-file approaches	A recording episode stays structured and portable
Middleware-bound approaches	A more direct path to handling robotics data without dragging in a full ROS dependency chain

Who is the Trossen SDK for?

The Trossen SDK is for technical users who are tired of rebuilding the same infrastructure every time a project changes.

That includes:

Robotics researchers collecting demonstration data
Engineers building imitation learning or teleoperation pipelines
Teams evaluating sensor stacks and capture workflows
Developers who want a C++ data collection foundation without requiring ROS
Contributors who want to help shape open infrastructure for robotics and physical AI

If your work depends on reliable data capture, synchronized recording, and a path to modern training workflows, this project is for you.

Why we want the community involved

Trossen Robotics is not releasing this as a static artifact. We are releasing it because we want people to use it, challenge it, and help improve it.

The current release is an initial implementation. The architecture was built to grow, and we plan to keep pushing it forward — with broader hardware support, stronger backends, tighter integration with training workflows, and continued refinement based on real use.

The roadmap should not be shaped in isolation. It should be shaped by the people actually building, testing, and deploying robotics data pipelines. That is why community participation is not a side note here. It is part of the strategy.

Try the Trossen SDK and help shape the roadmap

If you are working in robotics, embodied AI, manipulation, teleoperation, or data infrastructure, we want you to try the Trossen SDK.

Put it in your workflow. Use it with your hardware. Test the edges. Find what feels strong and what still needs work.

Then tell us.

Submit an issue on GitHub or message us directly at the bottom of the SDK page.

Open an issue. Request a feature. Share feedback. Fork the project. Extend support for new hardware. Help us understand where the biggest opportunities are for improvement and standardization.

We believe robotics needs stronger open foundations. This is our contribution toward that future. Now we want to build it with the community.

Explore the Trossen SDK. Review the architecture, supported workflows, and setup details on our SDK page.

Test it in your environment. Try it with your own robotics data collection workflow and see where it can save time and reduce friction.

Give feedback. Tell us what worked, what did not, and what would make the SDK more useful to your team.

Contribute. File issues, request features, or fork the project on GitHub and help move open robotics infrastructure forward.

References

Trossen Robotics

Frequently Asked Questions

What is the Trossen SDK?

The Trossen SDK is an open-source C++ framework from Trossen Robotics for robotics and physical AI data collection. It records synchronized, multi-modal episodes from robot arms, cameras, and mobile bases, then moves that data into formats modern training pipelines can use.

Why is Trossen Robotics open-sourcing the SDK?

Robotics still has too much fragmentation at the data layer, with custom, brittle pipelines bound to a single setup. Open-sourcing gives researchers a real starting point and pushes the field to standardize on better ways to capture, structure, and hand off robotics data.

What problem does the Trossen SDK solve?

Most data collection is a patchwork of scripts, middleware dependencies, bag files, and hardware-specific workarounds that rarely scales. The Trossen SDK treats data collection as a real system with distinct layers, reducing the friction between demonstrating a task and training a model on it.

How is recording configured in the Trossen SDK?

Can I use the Trossen SDK without ROS?

Yes. The SDK gives teams a direct path to handling robotics data without dragging in a full ROS dependency chain, making it a C++ data collection foundation that does not require ROS.

How does the SDK get data into training pipelines?

Recorded episodes can be converted to the LeRobot V2 format, producing Parquet for structured state data and MP4 for video streams. That shortens the distance from capture to experimentation and avoids manual cleanup after a recording session.

Why does the Trossen SDK use MCAP?

At the core of the storage layer is TrossenMCAP, built on the open MCAP standard. Because robotics data is multi-channel and time-sensitive, MCAP keeps a recording episode structured, portable, and synchronized for later reuse and downstream processing.

The Short Version

Who this is for

What is the Trossen SDK?

Why does open-source robotics data collection matter?

What's wrong with most robotics data collection workflows?

What makes the Trossen SDK different?

1. A real architecture, not a script pile

2. Configuration-driven robotics recording

3. Hardware extensibility without touching core code

4. Built for synchronized, high-throughput data collection

5. A direct path from recording to training

Why is MCAP part of the story?

Who is the Trossen SDK for?

Why we want the community involved

Try the Trossen SDK and help shape the roadmap

Then tell us.

References

Frequently Asked Questions

What is the Trossen SDK?

Why is Trossen Robotics open-sourcing the SDK?

What problem does the Trossen SDK solve?

How is recording configured in the Trossen SDK?

Can I use the Trossen SDK without ROS?

How does the SDK get data into training pipelines?

Why does the Trossen SDK use MCAP?

BUILT FOR REAL-WORLD RESEARCH ENVIRONMENTS. COVERS DEFECTS IN MATERIALS AND WORKMANSHIP. WEAR COMPONENTS ARE FIELD-REPLACEABLE AND READILY AVAILABLE.

LIFETIME SUPPORT FOR TROSSEN PRODUCTS