Exporting and Importing Cassandra Data Under a Different Structure

Migrating data in Apache Cassandra is often more complex than just running a few commands. The built-in cqlsh tools (COPY TO and COPY FROM) work well for simple tasks but struggle with large datasets, complex UDTs, and schema changes.

The Challenge

Many organizations need to:

  • Move data to a new table structure.
  • Split or merge columns.
  • Convert data formats before import.

Out-of-the-box tools cannot do this efficiently or at scale.

Common Export & Import Commands

Basic export using cqlsh:

COPY old_keyspace.old_table TO '/tmp/export.csv' WITH HEADER = TRUE;

Basic import into a new table:

COPY new_keyspace.new_table FROM '/tmp/transformed.csv' WITH HEADER = TRUE;

But if your column names differ or your primary keys change, you must first transform the CSV — usually with Python, Spark, or another ETL tool.

Yellow! GNU’s Migration Tool

To solve this, Yellow! GNU created a custom Cassandra Data Migration Tool that:

  • Exports data from any Cassandra keyspace.
  • Transforms data on the fly (renames columns, converts UDTs, changes formats).
  • Imports into redesigned schemas with zero manual intervention.
  • Parallelizes operations for large volumes to reduce downtime.

This approach eliminates the bottlenecks of manual scripts and provides a repeatable migration pipeline.

Instead of manually coding ETL scripts, Yellow! GNU built a Cassandra Data Migration Tool. Each migration is defined by a properties file, making it easy to manage dozens of tables consistently.

Sample config excerpt:

copy.tables=source_table_name=>destination_table_name
copy.ignoreColumns=source_table_name.column_to_ignore
copy.batchSize=20000
copy.queryPageSize=1000
copy.batchesPerSecond=1
copy.fetchSize=100
copy.rowCounter=true

source.cassandra.contactPoints=XXX.XXX.XXX.XXX
source.cassandra.keyspace=SOURCE_KEYSPACE_NAME
destination.cassandra.keyspace=DESTINATION_KEYSPACE_NAME

The tool will:

  • Export from the source table(s).
  • Apply column mappings, ignore columns, and batch control.
  • Import into the destination table(s) — even across clusters.

Why This Beats Out-of-the-Box Tools

  • Config-driven: no hardcoding of table names or mappings.
  • Parallel & tunable: batch size, fetch size, and rate limits.
  • Schema-flexible: source and destination tables don’t have to match perfectly.
  • Enterprise-ready: SSL, keystores, and large-volume support.

Conclusion

With its properties-driven approach, Yellow! GNU’s Cassandra Data Migration Tool overcomes the limits of cqlsh and traditional ETL scripts — making Cassandra upgrades, restructures, and cross-cluster moves predictable and fast.

 

This article is inspired by real-world challenges we tackle in our projects. If you're looking for expert solutions or need a team to bring your idea to life,

Let's talk!

    Please fill your details, and we will contact you back

      Please fill your details, and we will contact you back