Snowflake
GlassFlow provides a managed connector for Snowflake
The Snowflake sink connector allows you to seamlessly send data from your GlassFlow pipeline to a Snowflake database, enabling real-time data warehousing, analytics, and reporting.
The connector enables you to build streaming pipelines and update your Snowflake Datawarehouse in realtime without having to manage complex infrastructure.
For example, with GlassFlow Postgres CDC source connector and Snowflake sink connector, you can build a pipeline to load data form postgres to snowflake in real time without running any code.
Connector Details
To configure the Snowflake sink connector, you will need the following details:
- Snowflake Account: Your Snowflake instance (e.g.,
etxdv.europe-west3.gcp
). - Warehouse: The Snowflake virtual warehouse to be used for data ingestion.
- Username & Password: Authentication to access Snowflake
- Database: The name of the target Snowflake database.
- Schema: The schema within the Snowflake database where the data will be stored.
- Role (Optional): The Snowflake role to be used
Obtaining Connection Credentials
To obtain your Snowflake connection details, follow these steps:
- Log in to the Snowflake Console.
- Create a dedicated user for GlassFlow with appropriate privileges:
- Retrieve the Account Identifier from Snowflake's Account Usage.
Setting Up the Snowflake Sink Connector
Using WebApp
- Log in to the GlassFlow WebApp and navigate to the "Pipelines" section.
- Create a new pipeline or open an existing one.
- Configure the Data Sink:
- Choose "Snowflake" as the data sink type.
- Enter your Account, Warehouse, Database, Schema, Table Name, and Username & Password.
- Click Next Step and confirm your pipeline data sink settings.
Using PythonSDK
If you are using GlassFlow Python SDK to deploy and manage the pipeline, the following code shows how to configure Snowflake connector via the SDK.
A fully functional example to create a pipeline with Snowflake as a managed source connector is available on our examples repo as a Jupyter Notebook For building a pipeline to move data from Postgres to Snowflake, see this example on github
Using GitHub Actions
If you are using GitHub Actions to deploy and manage the pipeline, the following snippet shows the YAML configuration of the Snowflake sink connector component:
Supported Operations
The Snowflake sink connector supports the following operations:
Operation | Description |
---|---|
INSERT | New records can be inserted into the specified table. |
UPDATE | Existing records can be updated based on the provided filters. |
DELETE | Records can be deleted using the specified filters. |
These operations allow you to keep your Snowflake tables synchronized with your GlassFlow pipeline transformations in real time.
Expected Data Structure
For the Snowflake sink connector to properly process incoming data, it must adhere to the following JSON format. Users must ensure that the transformation function outputs data in this format so that different operations can be supported.
Explanation of Fields
- operation: Specifies the action to be performed (insert, update, or delete).
- schema: The target schema in Snowflake.
- table: The target table where the operation should be applied.
- columns: A list of columns and their corresponding values to be inserted or updated.
- filters: A list of conditions used to match records for update or delete operations.
Ensuring the transformation function follows this structure guarantees seamless integration with the Snowflake sink connector and allows data to be processed efficiently.
Snowflake IP Whitelisting
If your Snowflake instance restricts connections, whitelist the following GlassFlow IP addresses:
Additional Resources
Once configured, your GlassFlow pipeline will automatically send processed data to your Snowflake database for real-time analysis and reporting.