Deploying Pipelines with GitHub Actions

GlassFlow enables seamless integration with GitHub, allowing users to build, deploy, and maintain their data pipelines using GitHub Actions.

By storing pipeline code in a Python file and its configuration in a YAML file within a GitHub repository, users can leverage CI/CD workflows to automatically deploy updates to GlassFlow whenever changes are pushed.

Key Features

  • Version Control: Maintain your pipeline in GitHub, ensuring versioning and collaboration.
  • CI/CD Integration: Automate deployments using GitHub Actions.
  • Local Development: Develop and test locally before pushing changes.
  • Team Collaboration: Work with colleagues using GitHub workflows.

Setting Up Your Pipeline

The following steps showcase how to setup a pipeline on a github repository. You can also fork our template repository on github and use it to get started.

1. Structure Your GitHub Repository

Ensure your repository contains:

2. Define Your Pipeline Code

Create a transform.py file with your transformantion code. Example:

3. Configure Your Pipeline

Create a pipeline.yaml file to specify pipeline settings:

4. Set Up GitHub Actions Workflow

Create .github/workflows/on_push.yaml to automate deployment:

5. Setup Github secrets for GitHub Actions

GlassFlow GitHub Action needs a Personal Access Token to make changes to your pipeline. Get your Token from GlassFlow WebApp and set it as a github secret named GlassflowPAT

Github Secret

6. Push Changes and Deploy

Commit and push your files to GitHub:

GitHub Actions will trigger the deployment, reflecting updates in GlassFlow.


Pipeline YAML specification

Note:

A pipeline consists on three components: source, transformer and sink. All three components are required to define a pipeline. For now, we can only define one of each type and connect them sequentially (source -> transformer -> sink)

Pipeline Components

Transformer

The transformer component specifies the transformation layer of the pipeline.

Source

Sorce component configures the data source of the pipeline. To use a managed source connector, provide a kind which the type of connector you want to configure. To send data without a use of a managed connector (e.g via API or with Python SDK) remove the kind parameter.

A complete list of source connectors can be found on the integrations page.

Sink

Sink connector component configures the data sink for the pipeline. To use a managed sink connector, provide a kind which the type of connector you want to configure. To send data without a use of a managed connector (e.g via API or with Python SDK) remove the kind parameter.
A complete list of sink connectors can be found on the integrations page.

Monitoring and Debugging

  • Check workflow logs in GitHub Actions under the Actions tab.
  • Use GlassFlow WebApp to monitor the pipeline, get access tokens and view logs
  • Update hanlder.py and pipeline.yaml, then push changes to trigger a redeployment.

Conclusion

By integrating GlassFlow with GitHub Actions, you can manage your data pipelines efficiently with CI/CD, enabling automated deployments, collaboration, and a streamlined development workflow.