Deploying Pipelines with GitHub Actions
GlassFlow enables seamless integration with GitHub, allowing users to build, deploy, and maintain their data pipelines using GitHub Actions.
By storing pipeline code in a Python file and its configuration in a YAML file within a GitHub repository, users can leverage CI/CD workflows to automatically deploy updates to GlassFlow whenever changes are pushed.
Key Features
- Version Control: Maintain your pipeline in GitHub, ensuring versioning and collaboration.
- CI/CD Integration: Automate deployments using GitHub Actions.
- Local Development: Develop and test locally before pushing changes.
- Team Collaboration: Work with colleagues using GitHub workflows.
Setting Up Your Pipeline
The following steps showcase how to setup a pipeline on a github repository. You can also fork our template repository on github and use it to get started.
1. Structure Your GitHub Repository
Ensure your repository contains:
2. Define Your Pipeline Code
Create a transform.py
file with your transformantion code. Example:
3. Configure Your Pipeline
Create a pipeline.yaml
file to specify pipeline settings:
4. Set Up GitHub Actions Workflow
Create .github/workflows/on_push.yaml
to automate deployment:
5. Setup Github secrets for GitHub Actions
GlassFlow GitHub Action needs a Personal Access Token
to make changes to your pipeline.
Get your Token from GlassFlow WebApp and set it as a github secret named GlassflowPAT
6. Push Changes and Deploy
Commit and push your files to GitHub:
GitHub Actions will trigger the deployment, reflecting updates in GlassFlow.
Pipeline YAML specification
Note:
A pipeline consists on three components: source, transformer and sink. All three components are required to define a pipeline. For now, we can only define one of each type and connect them sequentially (source -> transformer -> sink)
Pipeline Components
Transformer
The transformer component specifies the transformation layer of the pipeline.
Source
Sorce component configures the data source of the pipeline.
To use a managed source connector, provide a kind
which the type of connector you want to configure. To send data without a use of a managed connector (e.g via API or with Python SDK) remove the kind
parameter.
A complete list of source connectors can be found on the integrations page.
Sink
Sink connector component configures the data sink for the pipeline.
To use a managed sink connector, provide a kind
which the type of connector you want to configure. To send data without a use of a managed connector (e.g via API or with Python SDK) remove the kind
parameter.
A complete list of sink connectors can be found on the integrations page.
Monitoring and Debugging
- Check workflow logs in GitHub Actions under the Actions tab.
- Use GlassFlow WebApp to monitor the pipeline, get access tokens and view logs
- Update
hanlder.py
andpipeline.yaml
, then push changes to trigger a redeployment.
Conclusion
By integrating GlassFlow with GitHub Actions, you can manage your data pipelines efficiently with CI/CD, enabling automated deployments, collaboration, and a streamlined development workflow.