Engineering

Move Data from Postgres to Snowflake in Real Time using CDC

Stream Postgres data to Snowflake in real time with CDC.

Written by Armend Avdijaj17/03/2025, 08.53
hero about image

How to Move Data from Postgres to Snowflake in Real Time

In modern data architectures, real-time data synchronization between transactional and analytical systems is crucial for making informed business decisions. Change Data Capture (CDC) is a technique that efficiently tracks and propagates database changes. This article will explore a practical use case for implementing CDC to stream customer order data from a PostgreSQL database to Snowflake using GlassFlow.

Note: This article assumes that you are familiar with the CDC. If you need an overview, you can check out the previous article on the basics of the CDC.

Use Case Overview

Imagine an e-commerce company that processes thousands of customer orders daily. The order data is stored in a PostgreSQL database, but the company needs to replicate the changes to Snowflake in real-time for advanced analytics and reporting. This ensures that analysts and business intelligence tools always have up-to-date information.

Architecture

The CDC pipeline consists of the following components:

  1. PostgreSQL (Source Database) — Stores customer orders.
  2. GlassFlow (Streaming Platform) — Acts as a message broker for change events
  3. Snowflake (Target Database) – Stores the replicated data for analytics.

PostgreSQL Overview

PostgreSQL is a powerful, open-source relational database system known for its extensibility, performance, and strong support for ACID-compliant transactions. It is widely used in transactional applications due to its robust feature set, including logical replication, which makes it an excellent choice for implementing CDC. With support for JSON, indexing techniques, and stored procedures, PostgreSQL provides flexibility for modern application needs.

Snowflake Overview

Snowflake is a cloud-based data warehousing platform that offers high scalability, elasticity, and performance for analytical workloads. Unlike traditional databases, Snowflake uses a unique architecture that separates storage and compute, allowing businesses to scale their queries and storage independently. Its native support for semi-structured data, automatic optimization, and seamless integration with various ETL tools make it a preferred choice for real-time analytics and data warehousing.

Implementation Steps

Step 1: Configure PostgreSQL for CDC

There are a lot of ways to set up your PostgreSQL server. You can set it up in Azure or AWS. In this article, I will create a PostgreSQL in Digital Ocean as I find this to be the most straightforward approach. You can use any cloud service to set up your database. There are two things to note when you set up your database. First, you need to enable the logical decoding. You need to whitelist GlassFlow’s IP address so it can read your database.

Enabling Logical Decoding

PostgreSQL supports logical decoding, which is required for CDC. Connect to your database and run the following command to enable logical decoding.

You also need to allow the user to have the REPLICATION role.

In Digital ocean, the wal_level and the REPLICATION role has been setup so you can directly create the replication slot.

Whitelist GlassFlow IPs

You need to whitelist the following IPs:

35.198.78.140
34.107.63.104

Setup the Table

We will be using the following table.

ColumnType
order_idSerial Primary Key
user_idInteger
timestampTimestamp
amountFloat
user_emailVarChar (255)
shipping_addressText

You can run the following SQL command to create the table.

For a more complete guide on how to setup PostgreSQL for CDC, please visit our GlassFlow Postgres CDC Guide.

Step 2: Set up Snowflake

Let’s create an analytics database and a table with customer_orders as the name. You can run the following command in the Snowflake console.

This step is optional, but in some cases, you will need to whitelist the GlassFlow IP. To do this, run the following command in the Snowflake console.

Or if you already have an existing network policy you can update it.

Now you need to apply the network policy.

For a more in-depth guide on how to setup Snowflake, please visit our Snowflake Guide.

Step 3: Configure GlassFlow

Now we need to create the pipeline to connect the PostgreSQL database to the Snowflake. Using the GlassFlow Python SDK you can write the following code.

A little note about the SNOWFLAKE_ACCOUNT. You should use the account identifier. You can get this value in your Snowflake Account Details. It follows the following format: -. Example: GOOOWIB-DS12345.

The content of the transform.py is the following. The code will transform the CDC data format into a data format that Snowflake can understand later.

Step 4: Populating Order Data

We will use Python and the faker library to populate the table.

When you run the above code, your database will be populated and any changes will be reflected in the Snowflake database. In the GlassFlow pipeline page, you will see the log get updated.

cdc-in-action.png

Step 5: Querying the Data in Snowflake

Once the data is synced, you can go to the Snowflake console and run the following query to check for the result.

query-result.png

A fully functional example can be found on our Postgres-Snowflake examples repo.

Conclusion

By leveraging CDC with PostgreSQL, GlassFlow, and Snowflake, businesses can achieve real-time data replication for analytical workloads. This architecture enables faster insights and improved decision-making. Implementing this solution ensures that customer order data remains synchronized across transactional and analytical systems without significant performance overhead.

Move Data from Postgres to Snowflake in Real Time using CDC

Get started today

Reach out and we show you how GlassFlow interacts with your existing data stack.

Book a demo