Postgres CDC Source Integration
GlassFlow provides a managed connector for Postgres CDC
A PostgreSQL CDC (Change Data Capture) source streams real-time database changes, such as inserts, updates, and deletes, to downstream systems for integration or analysis.
Important Info
Glassflow supports CDC on postgres only via logical replication with the help of replication slots. Starting logical replication on hot standbys (non-primary nodes) may end up in replication slots getting invalidated. Please checkout and beware of the documentation here if a replication slot is required on a hot standby.
GlassFlow Whitelist IPs
Many of the hosted postgres providers allow access only from specific whitelisted IP addresses. In such a case, please add GlassFlow's IP address in the allowed IP section of your postgres providers
Step 1: Setting Up Postgres
Azure Postgres
If you are using a managed postgres from Azure, the following configuration is needed on the database for GlassFlow Postgres connector to work.
- Set whitelst IP address and allow requests from these two IP addresses:
- Set the following parameters from the database:
wal_level
to logicalmax_worker_processes
to a number larger than 16
- The postgres user that you use for the connection should have replication role and be owner of the tables you want to syc.
- Create replication slot:
Gcloud (wal2json)
If you are using a managed postgres from Google Cloud, the following configuration is needed on the database for GlassFlow Postgres connector to work
- Add the replica's IP Addresses to the primary's authorized networks:
-
Set the
logical_decoding
flag toon
for the instance -
Create a user in role
cloudsqlsuperuser
withreplication
permission:
-
Setup a database e.g.
test_db
or use an existing DB -
Create a replication slot
test_slot
after logging in with thereplication_user
. The slot must be owned by the user responsible for replication.
- Note the public IP of the instance that will be needed when setting up the GlassFlow connector
Step 2: Integrating Postgres CDC as a Source in GlassFlow
You can integrate Postgres CDC with GlassFlow either through the WebApp or by using the Python SDK. Below are the instructions for both methods.
Using WebApp
- Log in to the GlassFlow WebApp and navigate to the "Pipelines" section.
- Create a new pipeline.
- Go to Source section in the pieline setup page and select Postgres.
- Enter the following details:
- Host: Postgres hostname or IP address.
- Port: Postgres Port (default 5432).
- User Name: User on your postgres database (examle
replication_user
created above) - Password: Password for the user on postgres
- Database Name: Database on postgres for which you want to capture CDC
- Replication Slot: name of the replication slot (example
test_slot
created above)
- Click Continue to save your source settings and configure other parts of your pipeline.
Once created, this source will be available for use in your GlassFlow pipelines. CDC data will automatically be captured by GlassFlow and be available in the pipeline.
Using PythonSDK
If you are using GlassFlow Python SDK to deploy and manage the pipeline, the following code shows how to configure Postgres CDC connector via the SDK.
A fully functional example to create a pipeline with Postgres CDC as a managed source connector is available on our examples repo as a Jupyter Notebook
Using Github Actions
If you are using GitHub Actions to to deploy and manage the pipeline, the following snippet shows the YAML configuration of Postgres CDC connector component