Pipeline
In GlassFlow, a pipeline orchestrates the flow of data from various sources, through transformations, to designated destinations. It enables real-time data processing by integrating custom Python functions that handle data transformations as it moves through the system.
Pipeline Components
Each pipeline comprises three key elements:
Data Sources
Data sources are the entry points where data is ingested into the pipeline. These can include:
- Databases: Such as PostgreSQL or MongoDB.
- Message Queues/Brokers: Like Amazon SQS or Google Pub/Sub.
- File Systems: For file-based data ingestion.
- Event-Driven Applications: Custom applications emitting events.
Transformation
The transformation component is a custom Python function that processes incoming data. This function can perform operations such as cleaning, enriching, or analyzing data to extract meaningful insights.
Sinks
Sinks are the destinations where processed data is sent. These can include:
- Analytical Databases: Such as ClickHouse or ChromaDB.
- Storage Systems: Like Amazon S3 or Azure Blob Storage.
- Data Warehouses: Including Snowflake or Google BigQuery.
- Other Services: Any other platforms for further data utilization.