Snowflake

GlassFlow provides a managed connector for Snowflake

The Snowflake sink connector allows you to seamlessly send data from your GlassFlow pipeline to a Snowflake database, enabling real-time data warehousing, analytics, and reporting. The connector enables you to build streaming pipelines and update your Snowflake Datawarehouse in realtime without having to manage complex infrastructure.
For example, with GlassFlow Postgres CDC source connector and Snowflake sink connector, you can build a pipeline to load data form postgres to snowflake in real time without running any code.

Connector Details

To configure the Snowflake sink connector, you will need the following details:

Snowflake Account: Your Snowflake instance (e.g., etxdv.europe-west3.gcp).
Warehouse: The Snowflake virtual warehouse to be used for data ingestion.
Username & Password: Authentication to access Snowflake
Database: The name of the target Snowflake database.
Schema: The schema within the Snowflake database where the data will be stored.
Role (Optional): The Snowflake role to be used

Obtaining Connection Credentials

To obtain your Snowflake connection details, follow these steps:

Log in to the Snowflake Console.
Create a dedicated user for GlassFlow with appropriate privileges:
Retrieve the Account Identifier from Snowflake's Account Usage.

Setting Up the Snowflake Sink Connector

Using WebApp

Snowflake WebApp

Log in to the GlassFlow WebApp and navigate to the "Pipelines" section.
Create a new pipeline or open an existing one.
Configure the Data Sink:
- Choose "Snowflake" as the data sink type.
- Enter your Account, Warehouse, Database, Schema, Table Name, and Username & Password.
Click Next Step and confirm your pipeline data sink settings.

Using PythonSDK

If you are using GlassFlow Python SDK to deploy and manage the pipeline, the following code shows how to configure Snowflake connector via the SDK.

A fully functional example to create a pipeline with Snowflake as a managed source connector is available on our examples repo as a Jupyter Notebook For building a pipeline to move data from Postgres to Snowflake, see this example on github

Using GitHub Actions

If you are using GitHub Actions to deploy and manage the pipeline, the following snippet shows the YAML configuration of the Snowflake sink connector component:

Supported Operations

The Snowflake sink connector supports the following operations:

Operation	Description
INSERT	New records can be inserted into the specified table.
UPDATE	Existing records can be updated based on the provided filters.
DELETE	Records can be deleted using the specified filters.

These operations allow you to keep your Snowflake tables synchronized with your GlassFlow pipeline transformations in real time.

Expected Data Structure

For the Snowflake sink connector to properly process incoming data, it must adhere to the following JSON format. Users must ensure that the transformation function outputs data in this format so that different operations can be supported.

Explanation of Fields

operation: Specifies the action to be performed (insert, update, or delete).
schema: The target schema in Snowflake.
table: The target table where the operation should be applied.
columns: A list of columns and their corresponding values to be inserted or updated.
filters: A list of conditions used to match records for update or delete operations.

Ensuring the transformation function follows this structure guarantees seamless integration with the Snowflake sink connector and allows data to be processed efficiently.

Snowflake IP Whitelisting

If your Snowflake instance restricts connections, whitelist the following GlassFlow IP addresses:

Additional Resources

Once configured, your GlassFlow pipeline will automatically send processed data to your Snowflake database for real-time analysis and reporting.

Amazon S3

Slack