Offline migration

Steps for an offline migration using YugabyteDB Voyager

The following page describes the steps to perform and verify a successful offline migration to YugabyteDB.

Migration workflow

Offline migration workflow

Phase Step Description
PREPARE Install voyager yb-voyager supports RHEL, CentOS, Ubuntu, and macOS, as well as airgapped and Docker-based installations.
Prepare source DB Create a new database user with READ access to all the resources to be migrated.
Prepare target DB Deploy a YugabyteDB database and create a user with superuser privileges.
EXPORT Export schema Convert the database schema to PostgreSQL format using the yb-voyager export schema command.
Analyze schema Generate a Schema Analysis Report using the yb-voyager analyze-schema command. The report suggests changes to the PostgreSQL schema to make it appropriate for YugabyteDB.
Modify schema Using the report recommendations, manually change the exported schema.
Export data Dump the source database to the target machine (where yb-voyager is installed), using the yb-voyager export data command.
IMPORT Import schema Import the modified schema to the target YugabyteDB database using the yb-voyager import schema command.
Import data Import the data to the target YugabyteDB database using the yb-voyager import data command.
Refresh materialized views Refresh materialized views (if any) in the target YugabyteDB database using the yb-voyager import schema command with additional flags --post-snapshot-import and --refresh-mviews .
Verify Check if the offline migration is successful.
END End migration Clean up the migration information stored in the export directory and databases (source and target).

Before proceeding with migration, ensure that you have completed the following steps:

Prepare the source database

Create a new database user, and assign the necessary user permissions.

Create a database user and provide the user with READ access to all the resources which need to be migrated. Run the following commands in a psql session:

  1. Create a new user named ybvoyager.

    CREATE USER ybvoyager PASSWORD 'password';
    
  2. Grant permissions for migration. Use the yb-voyager-pg-grant-migration-permissions.sql script (in /opt/yb-voyager/guardrails-scripts/ or, for brew, check in $(brew --cellar)/yb-voyager@<voyagerversion>/<voyagerversion>) to grant the required permissions as follows:

    psql -h <host> \
         -d <database> \
         -U <username> \
         -v voyager_user='ybvoyager' \
         -v schema_list='<comma_separated_schema_list>' \
         -v is_live_migration=0 \
         -v is_live_migration_fall_back=0 \
         -f <path_to_the_script>
    

    The ybvoyager user can now be used for migration.

If you want yb-voyager to connect to the source database over SSL, refer to SSL Connectivity.

Create a database user and provide the user with READ access to all the resources which need to be migrated. Replace <client_IP> from the following commands with an appropriate hostname in your setup.

  1. Create a new user ybvoyager.

    CREATE USER 'ybvoyager'@'<client_IP>' IDENTIFIED WITH  mysql_native_password BY 'Password#123';
    
  2. Grant the global PROCESS permission.

    GRANT PROCESS ON *.* TO 'ybvoyager'@'<client_IP>';
    
  3. Grant the SELECT, SHOW VIEW, and TRIGGER permissions on the source database:

    GRANT SELECT ON source_db_name.* TO 'ybvoyager'@'<client_IP>';
    GRANT SHOW VIEW ON source_db_name.* TO 'ybvoyager'@'<client_IP>';
    GRANT TRIGGER ON source_db_name.* TO 'ybvoyager'@'<client_IP>';
    

    Note that if you want to accelerate data export, include the following grants additionally as follows:

    For MYSQL

    GRANT FLUSH_TABLES ON *.* TO 'ybvoyager'@'<client_IP>';
    GRANT REPLICATION CLIENT ON *.* TO 'ybvoyager'@'<client_IP>';
    

    For MYSQL RDS

    GRANT FLUSH_TABLES ON *.* TO 'ybvoyager'@'<client_IP>';
    GRANT REPLICATION CLIENT ON *.* TO 'ybvoyager'@'<client_IP>';
    GRANT LOCK TABLES ON <source_db_name>.* TO 'ybvoyager'@'<client_IP>';
    
  4. If you are running MySQL version 8.0.20 or later, grant the global SHOW_ROUTINE permission. For older versions, grant the global SELECT permission. These permissions are necessary to dump stored procedure/function definitions.

    --For MySQL >= 8.0.20
    GRANT SHOW_ROUTINE  ON *.* TO 'ybvoyager'@'<client_IP>';
    
    --For older versions
    GRANT SELECT ON *.* TO 'ybvoyager'@'<client_IP>';
    

    The ybvoyager user can now be used for migration.

If you want yb-voyager to connect to the source database over SSL, refer to SSL Connectivity.

Create a role and a database user, and provide the user with READ access to all the resources which need to be migrated.

  1. Create a role that has the privileges as listed in the following table:

    Permission Object type in the source schema
    SELECT VIEW, SEQUENCE, TABLE PARTITION, TABLE, SYNONYM, MATERIALIZED VIEW
    EXECUTE TYPE

    Change the <SCHEMA_NAME> appropriately in the following snippets, and run the following steps as a privileged user.

    CREATE ROLE <SCHEMA_NAME>_reader_role;
    
    BEGIN
        FOR R IN (SELECT owner, object_name FROM all_objects WHERE owner=UPPER('<SCHEMA_NAME>') and object_type in ('VIEW','SEQUENCE','TABLE PARTITION','SYNONYM','MATERIALIZED VIEW'))
        LOOP
           EXECUTE IMMEDIATE 'grant select on '||R.owner||'."'||R.object_name||'" to <SCHEMA_NAME>_reader_role';
        END LOOP;
    END;
    /
    
    BEGIN
        FOR R IN (SELECT owner, object_name FROM all_objects WHERE owner=UPPER('<SCHEMA_NAME>') and object_type ='TABLE' MINUS SELECT owner, table_name from all_nested_tables where owner = UPPER('<SCHEMA_NAME>'))
        LOOP
           EXECUTE IMMEDIATE 'grant select on '||R.owner||'."'||R.object_name||'" to  <SCHEMA_NAME>_reader_role';
        END LOOP;
    END;
    /
    
    BEGIN
        FOR R IN (SELECT owner, object_name FROM all_objects WHERE owner=UPPER('<SCHEMA_NAME>') and object_type = 'TYPE')
        LOOP
           EXECUTE IMMEDIATE 'grant execute on '||R.owner||'."'||R.object_name||'" to <SCHEMA_NAME>_reader_role';
        END LOOP;
    END;
    /
    
    GRANT SELECT_CATALOG_ROLE TO <SCHEMA_NAME>_reader_role;
    GRANT SELECT ANY DICTIONARY TO <SCHEMA_NAME>_reader_role;
    GRANT SELECT ON SYS.ARGUMENT$ TO <SCHEMA_NAME>_reader_role;
    
  2. Create a user ybvoyager and grant CONNECT and <SCHEMA_NAME>_reader_role to the user:

    CREATE USER ybvoyager IDENTIFIED BY password;
    GRANT CONNECT TO ybvoyager;
    GRANT <SCHEMA_NAME>_reader_role TO ybvoyager;
    

    If you're using accelerated data export, run the additional grant as follows:

    GRANT FLASHBACK ANY TABLE TO ybvoyager;
    
  3. If you're using accelerated data export, the log_mode should be archivelog.

    1. Check the value for log_mode using the following command for Oracle/Oracle RDS:

      SELECT LOG_MODE FROM V$DATABASE;
      
    2. If the log_mode value is NOARCHIVELOG, run the following commands:

      For Oracle

      sqlplus /nolog
      SQL>alter system set db_recovery_file_dest_size = 10G;
      SQL>alter system set db_recovery_file_dest = '<oracle_path>/oradata/recovery_area' scope=spfile;
      SQL> connect / as sysdba
      SQL> Shutdown immediate
      SQL> Startup mount
      SQL> Alter database archivelog;
      SQL> Alter database open;
      

      For Oracle RDS

      exec rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours',24);
      
    3. Verify using archive log list.

If you want yb-voyager to connect to the source database over SSL, refer to SSL Connectivity. Note that you can use only one of the following arguments to connect to your Oracle instance:

  • --source-db-schema (Schema name of the source database.)
  • --oracle-db-sid (Oracle System Identifier you can use while exporting data from Oracle instances.)
  • --oracle-tns-alias (TNS (Transparent Network Substrate) alias configured to establish a secure connection with the server.)

Prepare the target database

Prepare your target YugabyteDB database cluster by creating a database, and a user for your cluster.

Create the target database

Create the target YugabyteDB database in your YugabyteDB cluster. The database name can be the same or different from the source database name. If the target YugabyteDB database name is not provided, yb-voyager assumes the target YugabyteDB database name to be yugabyte. If you do not want to import to the default yugabyte database, specify the name of the target YugabyteDB database using the --target-db-name argument of the yb-voyager import command.

CREATE DATABASE target_db_name;

Create a user

Create a user with SUPERUSER role.

  • For a local YugabyteDB cluster or YugabyteDB Anywhere, create a user and role with the superuser privileges using the following command:

    CREATE USER ybvoyager SUPERUSER PASSWORD 'password';
    
  • For YugabyteDB Aeon, create a user with yb_superuser role using the following command:

    CREATE USER ybvoyager PASSWORD 'password';
    GRANT yb_superuser TO ybvoyager;
    

If you want yb-voyager to connect to the target YugabyteDB database over SSL, refer to SSL Connectivity.

Create an export directory

yb-voyager keeps all of its migration state, including exported schema and data, in a local directory called the export directory.

Before starting migration, you should create the export directory on a file system that has enough space to keep the entire source database. Next, you should provide the path of the export directory as a mandatory argument (--export-dir) to each invocation of the yb-voyager command in an environment variable.

mkdir $HOME/export-dir
export EXPORT_DIR=$HOME/export-dir

The export directory has the following sub-directories and files:

  • reports directory contains the generated Schema Analysis Report.
  • schema directory contains the source database schema translated to PostgreSQL. The schema is partitioned into smaller files by the schema object type such as tables, views, and so on.
  • data directory contains CSV (Comma Separated Values) files that are passed to the COPY command on the target YugabyteDB database.
  • metainfo and temp directories are used by yb-voyager for internal bookkeeping.
  • logs directory contains the log files for each command.

Migrate your database to YugabyteDB

Proceed with schema and data migration using the following steps:

Export and analyze schema

To begin, export the schema from the source database. Once exported, analyze the schema and apply any necessary manual changes.

Export schema

The yb-voyager export schema command extracts the schema from the source database, converts it into PostgreSQL format (if the source database is Oracle or MySQL), and dumps the SQL DDL files in the EXPORT_DIR/schema/* directories.

Renaming index names for MySQL

YugabyteDB Voyager renames the indexes for MySQL migrations while exporting the schema. MySQL supports two or more indexes to have the same name in the same database, provided they are for different tables. Similarly to PostgreSQL, YugabyteDB does not support duplicate index names in the same schema. To avoid index name conflicts during export schema, yb-voyager prefixes each index name with the associated table name.

Usage for source_db_schema

The source_db_schema argument specifies the schema of the source database.

  • For MySQL, currently the source-db-schema argument is not applicable.
  • For PostgreSQL, source-db-schema can take one or more schema names separated by comma.
  • For Oracle, source-db-schema can take only one schema name and you can migrate only one schema at a time.

An example invocation of the command with required arguments is as follows:

# Replace the argument values with those applicable for your migration.
yb-voyager export schema --export-dir <EXPORT_DIR> \
        --source-db-type <SOURCE_DB_TYPE> \
        --source-db-host <SOURCE_DB_HOST> \
        --source-db-user <SOURCE_DB_USER> \
        --source-db-password <SOURCE_DB_PASSWORD> \ # Enclose the password in single quotes if it contains special characters.
        --source-db-name <SOURCE_DB_NAME> \
        --source-db-schema <SOURCE_DB_SCHEMA> # Not applicable for MySQL

Refer to export schema for details about the arguments.

Analyze schema

The schema exported in the previous step may not yet be suitable for importing into YugabyteDB. Even though YugabyteDB is PostgreSQL compatible, given its distributed nature, you may need to make minor manual changes to the schema.

The yb-voyager analyze-schema command analyses the PostgreSQL schema dumped in the export schema step, and prepares a report that lists the DDL statements which need manual changes. An example invocation of the command An example invocation of the command with required arguments is as follows:

# Replace the argument values with those applicable for your migration.
yb-voyager analyze-schema --export-dir <EXPORT_DIR> --output-format <FORMAT>

The above command generates a report file under the EXPORT_DIR/reports/ directory.

Refer to analyze schema for details about the arguments.

Manually edit the schema

Fix all the issues listed in the generated schema analysis report by manually editing the SQL DDL files from the EXPORT_DIR/schema/*.

After making the manual changes, re-run the yb-voyager analyze-schema command. This generates a fresh report using your changes. Repeat these steps until the generated report contains no issues.

To learn more about modelling strategies using YugabyteDB, refer to Data modeling.

Manual schema changes

Include the primary key definition in the CREATE TABLE statement. Primary Key cannot be added to a partitioned table using the ALTER TABLE statement.

Refer to the Manual review guideline for a detailed list of limitations and suggested workarounds associated with the source databases when migrating to YugabyteDB Voyager.

Export data

Dump the source data into the EXPORT_DIR/data directory using the yb-voyager export data command as follows:

# Replace the argument values with those applicable for your migration.
yb-voyager export data --export-dir <EXPORT_DIR> \
        --source-db-type <SOURCE_DB_TYPE> \
        --source-db-host <SOURCE_DB_HOST> \
        --source-db-user <SOURCE_DB_USER> \
        --source-db-password <SOURCE_DB_PASSWORD> \ # Enclose the password in single quotes if it contains special characters.
        --source-db-name <SOURCE_DB_NAME> \
        --source-db-schema <SOURCE_DB_SCHEMA> # Not applicable for MySQL

PostgreSQL and parallel jobs

For PostgreSQL, make sure that no other processes are running on the source database that can try to take locks; with more than one parallel job, Voyager will not be able to take locks to dump the data.

Note that the source-db-schema argument is required for PostgreSQL and Oracle, and is not applicable for MySQL. Refer to export data for details about the arguments.

The options passed to the command are similar to the yb-voyager export schema command. To export only a subset of the tables, pass a comma-separated list of table names in the --table-list argument.

Sequence migration considerations

Sequence migration consists of two steps: sequence creation and setting resume value (resume value refers to the NEXTVAL of a sequence on a source database). A sequence object is generated during export schema and the resume values for sequences are generated during export data. These resume values are then set on the target YugabyteDB database just after the data is imported for all tables.

Note that there are some special cases involving sequences such as the following:

  • In MySQL, auto-increment column is migrated to YugbayteDB as a normal column with a sequence attached to it.
  • For PostgreSQL, SERIAL datatype and GENERATED AS IDENTITY columns use sequence object internally, so resume values for them are also generated during data export.

Migrating source databases with large row sizes

If a table's row size on the source database is too large, and exceeds the default RPC message size, import data will fail with the error ERROR: Sending too long RPC message... So, you need to migrate those tables separately after removing the large rows.

Export data status

Run the yb-voyager export data status --export-dir <EXPORT_DIR> command to get an overall progress of the export data operation.

Refer to export data status for details about the arguments.

Accelerate data export for MySQL and Oracle

For MySQL and Oracle, you can optionally speed up data export by setting the environment variable BETA_FAST_DATA_EXPORT=1 when you run export data using yb-voyager.

Consider the following caveats before using the feature:

  • You need to perform additional steps when you prepare the source database.
  • Some data types are unsupported. For a detailed list, refer to datatype mappings.
  • --parallel-jobs argument (specifies the number of tables to be exported in parallel from the source database at a time) will have no effect.
  • In MySQL RDS, writes are not allowed during the data export process.
  • For Oracle where sequences are not attached to a column, resume value generation is unsupported.

Import schema

Import the schema using the yb-voyager import schema command.

Usage for target_db_schema

The target_db_schema argument specifies the schema of the target YugabyteDB database and is applicable only for MySQL and Oracle. yb-voyager imports the source database into the public schema of the target YugabyteDB database. By specifying --target-db-schema argument during import, you can instruct yb-voyager to create a non-public schema and use it for the schema/data import.

An example invocation of the command with required arguments is as follows:

# Replace the argument values with those applicable for your migration.
yb-voyager import schema --export-dir <EXPORT_DIR> \
        --target-db-host <TARGET_DB_HOST> \
        --target-db-user <TARGET_DB_USER> \
        --target-db-password <TARGET_DB_PASSWORD> \ # Enclose the password in single quotes if it contains special characters.
        --target-db-name <TARGET_DB_NAME> \
        --target-db-schema <TARGET_DB_SCHEMA> # MySQL and Oracle only

Refer to import schema for details about the arguments.

yb-voyager applies the DDL SQL files located in the $EXPORT_DIR/schema directory to the target YugabyteDB database. If yb-voyager terminates before it imports the entire schema, you can rerun it by adding the --ignore-exist option.

Import data

After you have successfully exported the source data and imported the schema in the target YugabyteDB database, you can import the data using the yb-voyager import data command with required arguments as follows:

# Replace the argument values with those applicable for your migration.
yb-voyager import data --export-dir <EXPORT_DIR> \
        --target-db-host <TARGET_DB_HOST> \
        --target-db-user <TARGET_DB_USER> \
        --target-db-password <TARGET_DB_PASSWORD> \ # Enclose the password in single quotes if it contains special characters.
        --target-db-name <TARGET_DB_NAME> \
        --target-db-schema <TARGET_DB_SCHEMA> \ # MySQL and Oracle only.
        --parallel-jobs <NUMBER_OF_JOBS>

By default, yb-voyager creates C/2 connections where C is the total number of cores in the cluster. You can change the default number of connections using the --parallel-jobs argument. If yb-voyager fails to determine the number of cores in the cluster, it defaults to 2 connections per node.

Refer to import data for details about the arguments.

yb-voyager splits the data dump files (from the $EXPORT_DIR/data directory) into smaller batches. yb-voyager concurrently ingests the batches such that all nodes of the target YugabyteDB database cluster are used. This phase is designed to be restartable if yb-voyager terminates while the data import is in progress. After restarting, the data import resumes from its current state.

Importing large datasets

When importing a very large database, run the import data command in a screen session, so that the import is not terminated when the terminal session stops.

If the yb-voyager import data command terminates before completing the data ingestion, you can re-run it with the same arguments and the command will resume the data import operation.

Migrating source databases with large row sizes

When exporting data using the BETA_FAST_DATA_EXPORT flag, the import data process has a default row size limit of 32MB. If a row exceeds this limit but is smaller than the batch-size * max-row-size, you can increase the limit for the import data process by setting the following an environment variable to handle such rows:

export CSV_READER_MAX_BUFFER_SIZE_BYTES = <MAX_ROW_SIZE_IN_BYTES>

Import data status

Run the yb-voyager import data status --export-dir <EXPORT_DIR> command to get an overall progress of the data import operation.

Refer to import data status for details about the arguments.

Refresh materialized views

If there are Materialized views in the target YugabyteDB, you can refresh them using the following command:

# Replace the argument values with those applicable for your migration.
yb-voyager import schema --export-dir <EXPORT_DIR> \
        --target-db-host <TARGET_DB_HOST> \
        --target-db-user <TARGET_DB_USER> \
        --target-db-password <TARGET_DB_PASSWORD> \ # Enclose the password in single quotes if it contains special characters.
        --target-db-name <TARGET_DB_NAME> \
        --target-db-schema <TARGET_DB_SCHEMA> \ # MySQL and Oracle only
        --post-snapshot-import true \
        --refresh-mviews true

Refer to import schema for details about the arguments.

Verify migration

After the schema and data import is complete, manually run validation queries on both the source and target YugabyteDB database to ensure that the data is correctly migrated. For example, you can validate the databases by running queries to check the row count of each table.

Caveat associated with rows reported by import data status

Suppose you have a scenario where,

  • import data or import data file command fails.
  • To resolve this issue, you delete some of the rows from the split files.
  • After retrying, the import data command completes successfully.

In this scenario, import data status command reports incorrect imported row count; because it doesn't take into account the deleted rows.

For more details, refer to the GitHub issue #360.

End migration

To complete the migration, you need to clean up the export directory (export-dir) and Voyager state (Voyager-related metadata) stored in the target YugabyteDB database.

Run the yb-voyager end migration command to perform the clean up, and to back up the schema, data, migration reports, and log files by providing the backup related flags (mandatory) as follows:

# Replace the argument values with those applicable for your migration.
yb-voyager end migration --export-dir <EXPORT_DIR> \
        --backup-log-files <true, false, yes, no, 1, 0> \
        --backup-data-files <true, false, yes, no, 1, 0> \
        --backup-schema-files <true, false, yes, no, 1, 0> \
        --save-migration-reports <true, false, yes, no, 1, 0> \
        # Set optional argument to store a back up of any of the above arguments.
        --backup-dir <BACKUP_DIR>

After you run end migration, you will not be able to continue further.

If you want to back up the schema, data, log files, and the migration reports (analyze-schema report, export data status output, or import data status output) for future reference, use the --backup-dir argument, and provide the path of the directory where you want to save the backup content (based on what you choose to back up).

Refer to end migration for more details on the arguments.

Delete the ybvoyager user (Optional)

After migration, all the migrated objects (tables, views, and so on) are owned by the ybvoyager user. Transfer the ownership of the objects to some other user (for example, yugabyte) and then delete the ybvoyager user. For example, do the following:

REASSIGN OWNED BY ybvoyager TO yugabyte;
DROP OWNED BY ybvoyager;
DROP USER ybvoyager;