Migrating data in Apache Cassandra is often more complex than just running a few commands. The built-in cqlsh
tools (COPY TO
and COPY FROM
) work well for simple tasks but struggle with large datasets, complex UDTs, and schema changes.
The Challenge
Many organizations need to:
- Move data to a new table structure.
- Split or merge columns.
- Convert data formats before import.
Out-of-the-box tools cannot do this efficiently or at scale.
Common Export & Import Commands
Basic export using cqlsh
:
COPY old_keyspace.old_table TO '/tmp/export.csv' WITH HEADER = TRUE;
Basic import into a new table:
COPY new_keyspace.new_table FROM '/tmp/transformed.csv' WITH HEADER = TRUE;
But if your column names differ or your primary keys change, you must first transform the CSV — usually with Python, Spark, or another ETL tool.
Yellow! GNU’s Migration Tool
To solve this, Yellow! GNU created a custom Cassandra Data Migration Tool that:
- Exports data from any Cassandra keyspace.
- Transforms data on the fly (renames columns, converts UDTs, changes formats).
- Imports into redesigned schemas with zero manual intervention.
- Parallelizes operations for large volumes to reduce downtime.
This approach eliminates the bottlenecks of manual scripts and provides a repeatable migration pipeline.
Instead of manually coding ETL scripts, Yellow! GNU built a Cassandra Data Migration Tool. Each migration is defined by a properties file, making it easy to manage dozens of tables consistently.
Sample config excerpt:
copy.tables=source_table_name=>destination_table_name copy.ignoreColumns=source_table_name.column_to_ignore copy.batchSize=20000 copy.queryPageSize=1000 copy.batchesPerSecond=1 copy.fetchSize=100 copy.rowCounter=true source.cassandra.contactPoints=XXX.XXX.XXX.XXX source.cassandra.keyspace=SOURCE_KEYSPACE_NAME destination.cassandra.keyspace=DESTINATION_KEYSPACE_NAME
The tool will:
- Export from the source table(s).
- Apply column mappings, ignore columns, and batch control.
- Import into the destination table(s) — even across clusters.
Why This Beats Out-of-the-Box Tools
- Config-driven: no hardcoding of table names or mappings.
- Parallel & tunable: batch size, fetch size, and rate limits.
- Schema-flexible: source and destination tables don’t have to match perfectly.
- Enterprise-ready: SSL, keystores, and large-volume support.
Conclusion
With its properties-driven approach, Yellow! GNU’s Cassandra Data Migration Tool overcomes the limits of cqlsh and traditional ETL scripts — making Cassandra upgrades, restructures, and cross-cluster moves predictable and fast.