GlassFlow OSS delivers fast, query-ready data with reliable pipelines.

Run any transformation, stateful or stateless, simple or complex, battle-tested at enterprise scale.
Fully flexible transformations
Run stateful and stateless transformations with long time windows.
Run in production at scale
GlassFlow is built to run on TBs of production data every day. A typical production scenario would look like this.
Ingestion per day
9 pipelines running each 51k per sec
Latency for 80% of transformations
Per TB resources costs c4d-standard-16
The only purposely built open source streamingETL that moves and transforms your data.
Dedupe
Fast joins
Stateless transformations
Late event handling
Reduces load on ClickHouse
Low maintanance effort
Dead-letter-queue
Pipeline
Observability
Open source
Enterprise Support
Deployment Service
Designed specifically for ClickHouse. With GlassFlow your fix data correctness for TBs of data before ClickHouse sees the data.

Connect to all Kafka providers
Connect GlassFlow to MSK, Redpanda, and Confluent using SASL, SSL, and more. Our ClickHouse connector uses the native protocol for the best performance and experience.
Streaming Transformations
A lightweight built-in state store enables low-latency, in-memory dedupe and joins with time-windowed context retention. GlassFlow also includes standard stateless transformations via the EXPR library.


Dead-Letter-Queue keeps your pipeline running
Isolate faulty events without interrupting data flow, and re-run them effortlessly after fixes.
Enterprise Workloads
Ingest TBs of data into ClickHouse while transforming. GlassFlow runs on Kubernetes with a Dead Letter Queue and built-in pipeline observability.

Cost Efficient Infra Footprint
A lightweight platform with no clusters to manage, delivering high performance at low operational cost.

Use Case Spotlight:
Build your own end-2-end OSS observability stack
We have prepared a full stack that cuts your observability costs near to zero. Try out our tutorial and self-host the highly scalable and clean observability stack.
Feel free to contact us if you have any questions after reviewing our FAQs.
Can ClickHouse and Kafka handle streaming duplicates and joins?
ClickHouse merging process is happening in the background and controlled via ClickHouse. That makes deduplication for streaming data nearly impossible without overspending and slow performance. ClickHouse recommends to reduce the usage of JOINs as it can slow down the system too much. Kafka lacks native deduplication and JOIN capabilities. It just stores events. You need a processing layer in between that handles both deduplication and stateful JOINs before data hits ClickHouse. You can learn more about the challenges from our blog post.
Is GlassFlow for ClickHouse open-source?
Yes! GlassFlow for ClickHouse is open-source under the Apache 2.0 license. You’re free to use, modify, and self-host it.
Can I connect other data sources besides Kafka?
Currently, Kafka is the primary input. Support for additional streaming sources like Kinesis and Pub/Sub is on our roadmap. Reach out if you have specific needs via our contact form.
Is it secure?
As GlassFlow for ClickHouse is running completely locally on your machine, we do not have any access to your data.
Can I host GlassFlow in the cloud?
Yes, GlassFlow is built on Kubernetes.>
How can I get in touch?
You can talk to the team via our contact us page and our Slack channel





