ClickHouse

Overview of ClickHouse managed sink connector.

The ClickHouse Sink Connector allows you to seamlessly send transformed data from your GlassFlow pipelines directly into a ClickHouse database. ClickHouse is a fast, open-source columnar database management system optimized for OLAP (Online Analytical Processing) use cases, enabling efficient real-time data analysis. This integration makes it easy to stream and store large volumes of structured data, supporting real-time analytics at scale.

Connector Details

To configure the ClickHouse Sink Connector, you will need the following information:

  1. Address: The host address for your ClickHouse instance.
  2. Database Name: The name of the database where data will be inserted.
  3. Table Name: The table where data will be written. The table must be pre-created and match the schema of the incoming data from GlassFlow.
  4. Username and Password: Credentials to authenticate your connection to ClickHouse.

Obtaining Connection Credentials

To obtain the required connection details from ClickHouse Cloud, follow these steps:

  1. Create a free ClickHouse service in ClickHouse Cloud
  2. Follow the ClickHouse Cloud Quick Start guide to create a database and table in ClickHouse where the incoming data will be stored.
  3. Copy host address, table name, username, and password.

Setting Up the ClickHouse Sink Connector

Using WebApp

  1. Log in to the GlassFlow WebApp:
  2. Create a New Pipeline:
    • If you haven't already, create a new data pipeline in the Pipelines section.
  3. Configure a Data Source:
    • Choose a Data Source from where the pipeline ingests real-time data.
  4. Add a Transformer:
    • If your data needs to be transformed before being sent to ClickHouse, you can add a transformation function in Python.
  5. Configure the ClickHouse as a Data Sink:
    • In the pipeline setup, navigate to the Data Sink section.
    • Choose ClickHouse as the connector type.
    • Enter the following details:
      • Address: e.g., HOSTNAME.REGION.CSP.clickhouse.cloud:9440"
      • Database Name: e.g., analytics_db
      • Table Name: e.g., events
      • Username: e.g., admin
      • Password: Enter your ClickHouse password.
    • Click Next Step and proceed with the pipeline creation.
  6. Confirm and Deploy:
    • Review your pipeline configuration and click Create to activate the pipeline.

Using Python SDK

Try out and run Jupyter nobotebook example on the GitHub repo to integrate with ClickHouse using Python SDK.

Example Use Case

Real-Time Web Analytics: You can use the ClickHouse Sink Connector to store and analyze web traffic data in real-time. As user events are captured (page views, clicks, etc.), GlassFlow processes the data and streams it directly into ClickHouse. This enables fast querying of user behavior data to support dashboards, reports, and insights.