Postgres CDC Source Integration

GlassFlow provides a managed connector for Postgres CDC

A PostgreSQL CDC (Change Data Capture) source streams real-time database changes, such as inserts, updates, and deletes, to downstream systems for integration or analysis.

Important Info

Glassflow supports CDC on postgres only via logical replication with the help of replication slots. Starting logical replication on hot standbys (non-primary nodes) may end up in replication slots getting invalidated. Please checkout and beware of the documentation here if a replication slot is required on a hot standby.

GlassFlow Whitelist IPs

Many of the hosted postgres providers allow access only from specific whitelisted IP addresses. In such a case, please add GlassFlow's IP address in the allowed IP section of your postgres providers

Step 1: Setting Up Postgres

Azure Postgres

If you are using a managed postgres from Azure, the following configuration is needed on the database for GlassFlow Postgres connector to work.

  1. Set whitelst IP address and allow requests from these two IP addresses:
  1. Set the following parameters from the database:
  • wal_level to logical
  • max_worker_processes to a number larger than 16
  1. The postgres user that you use for the connection should have replication role and be owner of the tables you want to syc.
  1. Create replication slot:

Gcloud (wal2json)

If you are using a managed postgres from Google Cloud, the following configuration is needed on the database for GlassFlow Postgres connector to work

  1. Add the replica's IP Addresses to the primary's authorized networks:
  1. Set the logical_decoding flag to on for the instance GCP Database Flag

  2. Create a user in role cloudsqlsuperuser with replication permission:

  1. Setup a database e.g. test_db or use an existing DB

  2. Create a replication slot test_slot after logging in with the replication_user . The slot must be owned by the user responsible for replication.

  1. Note the public IP of the instance that will be needed when setting up the GlassFlow connector GCP IP

Step 2: Integrating Postgres CDC as a Source in GlassFlow

You can integrate Postgres CDC with GlassFlow either through the WebApp or by using the Python SDK. Below are the instructions for both methods.

Using WebApp

WebApp Postgres

  1. Log in to the GlassFlow WebApp and navigate to the "Pipelines" section.
  2. Create a new pipeline.
  3. Go to Source section in the pieline setup page and select Postgres.
  4. Enter the following details:
    • Host: Postgres hostname or IP address.
    • Port: Postgres Port (default 5432).
    • User Name: User on your postgres database (examle replication_user created above)
    • Password: Password for the user on postgres
    • Database Name: Database on postgres for which you want to capture CDC
    • Replication Slot: name of the replication slot (example test_slot created above)
  5. Click Continue to save your source settings and configure other parts of your pipeline.

Once created, this source will be available for use in your GlassFlow pipelines. CDC data will automatically be captured by GlassFlow and be available in the pipeline.

Using PythonSDK

If you are using GlassFlow Python SDK to deploy and manage the pipeline, the following code shows how to configure Postgres CDC connector via the SDK.

A fully functional example to create a pipeline with Postgres CDC as a managed source connector is available on our examples repo as a Jupyter Notebook

Using Github Actions

If you are using GitHub Actions to to deploy and manage the pipeline, the following snippet shows the YAML configuration of Postgres CDC connector component