Run any transformation on Kafka data at TB scale for ClickHouse

Run any transformation on Kafka data at TB scale for ClickHouse

Run any transformation on Kafka data at TB scale for ClickHouse

GlassFlow OSS delivers fast, query-ready data with reliable pipelines.

Low-latency transformations. Proven at scale.

Low-latency transformations. Proven at scale.

Run any transformation, stateful or stateless, simple or complex, battle-tested at enterprise scale.

Fully flexible transformations

Run stateful and stateless transformations with long time windows.

Run in production at scale

GlassFlow is built to run on TBs of production data every day. A typical production scenario would look like this.

0 TB

0 TB

Ingestion per day

0k RPS

0k RPS

9 pipelines running each 51k per sec

0 ms

0 ms

Latency for 80% of transformations

$0.0

$0.0

Per TB resources costs c4d-standard-16

Why Our Customers Choose Glassflow

Why Our Customers Choose Glassflow

The only purposely built open source streamingETL that moves and transforms your data.

Dedupe

Fast joins

Stateless transformations

Late event handling

Reduces load on ClickHouse

Low maintanance effort

Dead-letter-queue

Pipeline

Observability

Open source

Enterprise Support

Deployment Service

CH Kafka Table Engine

Needs RMT

ClickPipes
for Kafka

Needs RMT

Self-Built Go Service

Needs Custom Code

Needs Custom Code

Needs Custom Code

Needs Custom Code

Vector.dev

The GlassFlow Approach

The GlassFlow Approach

Designed specifically for ClickHouse. With GlassFlow your fix data correctness for TBs of data before ClickHouse sees the data.

Connect to all Kafka providers

Connect GlassFlow to MSK, Redpanda, and Confluent using SASL, SSL, and more. Our ClickHouse connector uses the native protocol for the best performance and experience.

Streaming Transformations

A lightweight built-in state store enables low-latency, in-memory dedupe and joins with time-windowed context retention. GlassFlow also includes standard stateless transformations via the EXPR library.

Dead-Letter-Queue keeps your pipeline running

Isolate faulty events without interrupting data flow, and re-run them effortlessly after fixes.

Enterprise Workloads

Ingest TBs of data into ClickHouse while transforming. GlassFlow runs on Kubernetes with a Dead Letter Queue and built-in pipeline observability.

Cost Efficient Infra Footprint

A lightweight platform with no clusters to manage, delivering high performance at low operational cost.

Use Case Spotlight:
Build your own end-2-end OSS observability stack

We have prepared a full stack that cuts your observability costs near to zero. Try out our tutorial and self-host the highly scalable and clean observability stack.

KAFKA TO CLICKHOUSE: A PRACTICAL GUIDE

This ebook covers everything you need to know about building Kafka → ClickHouse pipelines.

KAFKA TO CLICKHOUSE: A PRACTICAL GUIDE

This ebook covers everything you need to know about building Kafka → ClickHouse pipelines.

Frequently asked questions

Frequently asked questions

Feel free to contact us if you have any questions after reviewing our FAQs.

Can ClickHouse and Kafka handle streaming duplicates and joins?

ClickHouse merging process is happening in the background and controlled via ClickHouse. That makes deduplication for streaming data nearly impossible without overspending and slow performance. ClickHouse recommends to reduce the usage of JOINs as it can slow down the system too much. Kafka lacks native deduplication and JOIN capabilities. It just stores events. You need a processing layer in between that handles both deduplication and stateful JOINs before data hits ClickHouse. You can learn more about the challenges from our blog post.

Is GlassFlow for ClickHouse open-source?

Yes! GlassFlow for ClickHouse is open-source under the Apache 2.0 license. You’re free to use, modify, and self-host it.

Can I connect other data sources besides Kafka?

Currently, Kafka is the primary input. Support for additional streaming sources like Kinesis and Pub/Sub is on our roadmap. Reach out if you have specific needs via our contact form.

Is it secure?

As GlassFlow for ClickHouse is running completely locally on your machine, we do not have any access to your data.

Can I host GlassFlow in the cloud?

Yes, GlassFlow is built on Kubernetes.>

How can I get in touch?

You can talk to the team via our contact us page and our Slack channel

Transformed Kafka data for ClickHouse

Get query ready data, lower ClickHouse load, and reliable
pipelines at enterprise scale.

Transformed Kafka data for ClickHouse

Get query ready data, lower ClickHouse load, and reliable
pipelines at enterprise scale.

Transformed Kafka data for ClickHouse

Get query ready data, lower ClickHouse load, and reliable
pipelines at enterprise scale.