Data transformation
This page outlines data transformation concepts in GlassFlow.
What is Data Transformation?
Data transformation involves converting data from its original format to a different format or structure to make it more suitable for analysis, processing, or storage.
Data transformation often involves cleaning, enriching, and manipulating data using various libraries and functions.
Common Data Transformations that you can do with GlassFlow
- Data Cleaning by removing unwanted columns
- Data Enrichment with external data from APIs
- Data Validation to check for schema consistency
- Data Anomaly Detection
- Data Quality Check with custom rules
- Data Normalization to adhere to destination schemas
- Data Conversion from one format to another
- Real-time APIs integration to enrich data
- LLMs integration with any custom python library
- ML-trained model integration with Hugging face or other providers
Transforming data in Python with GlassFlow
In GlassFlow, you create a custom transformation function in a Python script to transform data.
You implement your logic for the transformation inside the handler
function.
Deploy transformation function
To deploy and run the transformation function you defined in GlassFlow, you create a pipeline and provide the function along with requirements.txt
and any env
variables.
GlassFlow packages the function as a container amd executes it on its Serverless Execution Engine for every event entering the pipeline.
Python dependencies for transformation
With each import
statement in your transformation function script, you are bringing in a new Python dependency.
GlassFlow needs to install those dependencies to compile and run the function successfully.