ClickHouse
GlassFlow provides a managed connector for ClickHouse
The ClickHouse Sink Connector allows you to seamlessly send transformed data from your GlassFlow pipelines directly into a ClickHouse database. ClickHouse is a fast, open-source columnar database management system optimized for OLAP (Online Analytical Processing) use cases, enabling efficient real-time data analysis. This integration makes it easy to stream and store large volumes of structured data, supporting real-time analytics at scale.
Connector Details
To configure the ClickHouse Sink Connector, you will need the following information:
- Address: The host address for your ClickHouse instance.
- Database Name: The name of the database where data will be inserted.
- Table Name: The table where data will be written. The table must be pre-created and match the schema of the incoming data from GlassFlow.
- Username and Password: Credentials to authenticate your connection to ClickHouse.
Obtaining Connection Credentials
To obtain the required connection details from ClickHouse Cloud, follow these steps:
- Create a free ClickHouse service in ClickHouse Cloud
- Follow the ClickHouse Cloud Quick Start guide to create a database and table in ClickHouse where the incoming data will be stored.
- Copy host address, table name, username, and password.
Note:
The host address should be of the format
HOSTNAME.REGION.CSP.clickhouse.cloud:9440
as GlassFlow uses native protocol to connect to ClickHouse.
Supported Operations
The ClickHouse Sink Connector currently supports only INSERT operations. This means data can be added to the specified ClickHouse table but cannot be updated or deleted.
Data Format for Insert Operations
Data to be inserted into ClickHouse should be formatted as a JSON object, where keys correspond to column names in the target table, and values represent the data to be inserted. The JSON object must adhere to the schema defined in the ClickHouse table.
Example JSON Object:
The data will be inserted into the ClickHouse table as specified in the connector parameters of the pipeline.
Setting Up the ClickHouse Sink Connector
Using WebApp
- Log in to the GlassFlow WebApp:
- Navigate to the GlassFlow WebApp and log in with your credentials.
- Create a New Pipeline:
- If you haven't already, create a new data pipeline in the Pipelines section.
- Configure a Data Source:
- Choose a Data Source from where the pipeline ingests real-time data.
- Add a Transformer:
- If your data needs to be transformed before being sent to ClickHouse, you can add a transformation function in Python.
- Configure the ClickHouse as a Data Sink:
- In the pipeline setup, navigate to the Data Sink section.
- Choose ClickHouse as the connector type.
- Enter the following details:
- Address: e.g.,
HOSTNAME.REGION.CSP.clickhouse.cloud:9440"
- Database Name: e.g.,
analytics_db
- Table Name: e.g.,
events
- Username: e.g.,
admin
- Password: Enter your ClickHouse password.
- Address: e.g.,
- Click Next Step and proceed with the pipeline creation.
- Confirm and Deploy:
- Review your pipeline configuration and click Create to activate the pipeline.
Using Python SDK
If you are using GlassFlow Python SDK to deploy and manage the pipeline, the following code shows how to configure ClickHouse connector via the SDK.
A fully functional example to create a pipeline with Clickhouse as a managed sink connector is available on our examples repo as a Jupyter Notebook For building a pipeline to move data from Postgres to ClickHouse, see this example on github
Using Github Actions
If you are using GitHub Actions to to deploy and manage the pipeline, the following snippet shows the YAML configuration of ClickHouse connector component
IP Whitelisting
If your ClickHouse instance restricts connections, whitelist the following GlassFlow IP addresses:
Example Use Case
Real-Time Web Analytics: You can use the ClickHouse Sink Connector to store and analyze web traffic data in real-time. As user events are captured (page views, clicks, etc.), GlassFlow processes the data and streams it directly into ClickHouse. This enables fast querying of user behavior data to support dashboards, reports, and insights.