Skip to content

mapbox/dynamodb-replicator

Repository files navigation

dynamodb-replicator

dynamodb-replicator offers several different mechanisms to manage redundancy and recoverability on DynamoDB tables.

  • A replicator function that processes events from a DynamoDB stream, replaying changes made to the primary table and onto a replica table. The function is designed to be run as an AWS Lambda function.
  • An incremental backup function that processes events from a DynamoDB stream, replaying them as writes to individual objects on S3. The function is designed to be run as an AWS Lambda function.
  • A consistency check script that scans the primary table and checks that each individual record in the replica table is up-to-date. The goal is to double-check that the replicator is performing as is should, and the two tables are completely consistent.
  • A table dump script that scans a single table, and writes the data to a file on S3, providing a snapshot of the table's state.
  • A snapshot script that scans an S3 folder where incremental backups have been made, and writes the aggregate to a file on S3, providing a snapshot of the backup's state.

Design

Managing table redundancy and backups involves many moving parts. Please read DESIGN.md for an in-depth explanation.

Utility scripts

dynamodb-replicator provides several CLI tools to help manage your DynamoDB table.

diff-record

Given two tables and an item's key, this script looks up the record in both tables and checks for consistency.

$ npm install -g dynamodb-replicator
$ diff-record --help

Usage: diff-record <primary region/table> <replica region/table> <key>

# Check for discrepancies between an item in two tables
$ diff-record us-east-1/primary eu-west-1/replica '{"id":"abc"}'

diff-tables

Given two tables and a set of options, performs a complete consistency check on the two, optionally repairing records in the replica table that differ from the primary.

$ npm install -g dynamodb-replicator
$ diff-tables --help

Usage: diff-tables primary-region/primary-table replica-region/replica-table

Options:
  --repair     perform actions to fix discrepancies in the replica table
  --segment    segment identifier (0-based)
  --segments   total number of segments
  --backfill   only scan primary table and write to replica

# Log information about discrepancies between the two tables
$ diff-tables us-east-1/primary eu-west-2/replica

# Repair the replica to match the primary
$ diff-tables us-east-1/primary eu-west-2/replica --repair

# Only backfill the replica. Useful for starting a new replica
$ diff-tables us-east-1/primary eu-west-2/new-replica --backfill --repair

# Perform one segment of a parallel scan
$ diff-tables us-east-1/primar eu-west-2/replica --repair --segment 0 --segments 10

replicate-record

Given two tables and an item's key, this script insures that the replica record is synchronized with its current state in the primary table.

$ npm install -g dynamodb-replicator
$ replicate-record --help

Usage: replicate-record <primary tableinfo> <replica tableinfo> <recordkey>
 - primary tableinfo: the primary table to replicate from, specified as `region/tablename`
 - replica tableinfo: the replica table to replicate to, specified as `region/tablename`
 - recordkey: the key for the record specified as a JSON object

# Copy the state of a record from the primary to the replica table
$ replicate-record us-east-1/primary eu-west-1/replica '{"id":"abc"}'

backup-table

Scans a table and dumps the entire set of records as a line-delimited JSON file on S3.

$ npm install -g dynamodb-replicator
$ backup-table --help

Usage: backup-table region/table s3url

Options:
  --jobid      assign a jobid to this backup
  --segment    segment identifier (0-based)
  --segments   total number of segments
  --metric     cloudwatch metric namespace. Will provide dimension TableName = the name of the backed-up table.

# Writes a backup file to s3://my-bucket/some-prefix/<random string>/0
$ backup-table us-east-1/primary s3://my-bucket/some-prefix

# Specifying a jobid guarantees the S3 location
# Writes a backup file to s3://my-bucket/some-prefix/my-job-id/0
$ backup-table us-east-1/primary s3://my-bucket/some-prefix --jobid my-job-id

# Perform one segment of a parallel backup
# Writes a backup file to s3://my-bucket/some-prefix/my-job-id/4
$ backup-table us-east-1/primary s3://my-bucket/some-prefix --jobid my-job-id --segment 4 --segments 10

incremental-backfill

Scans a table and dumps each individual record as an object to a folder on S3.

$ npm install -g dynamodb-replicator
$ incremental-backfill --help

Usage: incremental-backfill region/table s3url

# Write each item in the table to S3. `s3url` should provide any desired bucket/prefix.
# The name of the table will be appended to the s3 prefix that you provide.
$ incremental-backfill us-east-1/primary s3://dynamodb-backups/incremental

incremental-snapshot

Reads each item in an S3 folder representing an incremental table backup, and writes an aggregate line-delimited JSON file to S3.

$ npm install -g dynamodb-replicator
$ incremental-snapshot --help

Usage: incremental-snapshot <source> <dest>

Options:
  --metric     cloudwatch metric region/namespace/tablename. Will provide dimension TableName = the tablename.

# Aggregate all the items in an S3 folder into a single snapshot file
$ incremental-snapshot s3://dynamodb-backups/incremental/primary s3://dynamodb-backups/snapshots/primary

incremental-diff-record

Checks for consistency between a DynamoDB record and its backed-up version on S3.

$ npm install -g dynamodb-replicator
$ incremental-diff-record --help

Usage: incremental-diff-record <tableinfo> <s3url> <recordkey>
 - tableinfo: the table where the record lives, specified as `region/tablename`
 - s3url: s3 folder where the incremental backups live
 - recordkey: the key for the record specified as a JSON object

# Check that a record is up-to-date in the incremental backup
$ incremental-diff-record us-east-1/primary s3://dynamodb-backups/incremental '{"id":"abc"}'

incremental-backup-record

Copies a DynamoDB record's present state to an incremental backup folder on S3.

$ npm install -g dynamodb-replicator
$ incremental-backup-record --help

Usage: incremental-backup-record <tableinfo> <s3url> <recordkey>
 - tableinfo: the table to backup from, specified as `region/tablename`
 - s3url: s3 folder into which the record should be backed up to
 - recordkey: the key for the record specified as a JSON object

# Backup a single record to S3
$ incremental-backup-record us-east-1/primary s3://dynamodb-backups/incremental '{"id":"abc"}'

incremental-record-history

Prints each version of a record that is available in an incremental backup folder on S3.

$ incremental-record-history --help

Usage: incremental-record-history <tableinfo> <s3url> <recordkey>
 - tableinfo: the table where the record lives, specified as `region/tablename`
 - s3url: s3 folder where the incremental backups live. Table name will be appended
 - recordkey: the key for the record specified as a JSON object

# Read the history of a single record
$ incremental-record-history us-east-1/my-table s3://dynamodb-backups/incremental '{"id":"abc"}'