This tutorial explores how you can leverage GlassFlow and Large Language Models (LLMs) from Open AI to create a real-time log data anomaly detection pipeline and send immediate notifications to Slack.
Skip the intro and scroll to Step 1 of the tutorial
What is anomaly detection?
Anomaly detection is a technique to identify unexpected patterns or behaviors in data. Anomalies in log data indicate security threats, audit issues, application failures, or other unusual and suspicious activities. Detecting these anomalies in real-time allows for quick action, minimizing downtime and preventing potential damage. Our pipeline processes server logs that record various user activities, such as logging in, accessing files, executing commands, and accessing directories. These logs include the timestamp, IP address, username, and the specific action performed.
Here is an example of the kind of logs with anomaly:
[01/Feb/2024 14:46:21] 220.129.215.153 veronica08 accessed file 'concern.html'
At first glance, this might seem like a regular log entry. However, upon closer inspection, you realize that "veronica08" has been accessing several sensitive files. It would help if you had a real-time way to detect and alert you about such activities.
What is GlassFlow?
GlassFlow enables Python developers to create data streaming pipelines for real-time use cases within minutes.
Why GlassFlow is useful?
GlassFlow excels in the real-time transformation of events so that applications can immediately react to new information. GlassFlow offers a zero infrastructure environment where you can develop pipelines without a complex initial setup. You can integrate with various data sources and sinks using managed connectors or implementing custom connectors using GlassFlow SDK for Python. You write code to implement data transformation logic in Python and deploy it in your pipeline with simple clicks using GlassFlow WebApp. With auto-scalable serverless infrastructure, you can easily deal with billions of log records in real-time.
Pipeline components
Our real-time log data anomaly detection pipeline consists of the following key components:
-
Data generator: We'll use the data generator to simulate server logs data in Python. This allows us to create realistic log entries for demonstration purposes without relying on actual server logs.
-
Data Source: Custom connector to ingest server log data from data generator. You can also update the Data Source code to collect logs from application servers, database servers, network devices, and more.
-
Data Transformation: AI-powered transformation using GlassFlow to detect anomalies.
-
Data Sink: Custom connector to send notifications to Slack.
Tools we use
-
GlassFlow WebApp: To create a pipeline in a low-code environment.
-
OpenAI: To use OpenAI API models such as
gpt-3.5-turbo
orGPT-4o
. We request Chat Completion API for each log entry event in the transformation function and classify logs for data anomalies. -
Slack: To send real-time alerts and notifications.
-
GlassFlow Python SDK: To build custom connectors for the data source and sink.
Setting up the Pipeline with GlassFlow WebApp in 3 minutes
Prerequisites
To start with the tutorial you need a free GlassFlow account.
Step 1. Log in to GlassFlow WebApp
Navigate to the GlassFlow WebApp and log in with your credentials.
Step 2. Create a New Pipeline
Click on "Create New Pipeline" and provide a name. You can name it "Log Anomaly Detection".
A pipeline will be created in the default main Space.
Step 3. Configure a Data Source
Select "SDK" to configure the pipeline to use Python SDK for ingesting logs. You will learn how to send data to the pipeline in the upcoming section.
Step 4. Define the transformer
Choose the "AI anomaly detection" transformer template from the Template dropdown menu.
We use OpenAI's GPT-3.5-turbo model to provide insights into the log data event and flag any unusual or suspicious activities.
💡By default, the transformer function uses a free OpenAI API key provided by GlassFlow
You can replace it with your API key too. To do so:
-
Have an OpenAI API account.
-
Create an API key.
-
Set the API key in the transformation code: Add the following code by editing the transformation function code just write after the import statements:
💡You can also import other Python dependencies (packages) in the transformation function. Read more about Python dependencies
Step 5. Configure a Data Sink
Select "SDK" to configure the pipeline to use Python SDK for sending notifications to Slack.
Step 6. Confirm the pipeline
Confirm the pipeline settings in the final step and click "Create Pipeline".
Step 7. Copy the pipeline credentials
Once the pipeline is created, copy its credentials such as Pipeline ID and Access Token.
Sending data to the Pipeline
Prerequisites
To continue with the rest tutorial, make sure that you have the following:
-
Python is installed on your machine.
-
Pip is installed to manage project packages.
-
Slack account: If don't have a Slack account, sign up for a new free one here and go to the Slack Get Started page.
-
Slack workspace: You need access to a Slack workspace where you're an admin. If you are creating just a new workspace, follow this guide.
-
You created an incoming webhook for your Slack workspace.
Create an environment configuration file
Add a .env
file in your project directory and add the following configuration variables:
Replace your_pipeline_id
,your_pipeline_access_token
and your_slack_workspace_webhook_url
with appropriate values obtained from your GlassFlow pipeline and Slack workspace.
Install GlassFlow SDK
Install the GlassFlow Python SDK and other required libraries using pip
.
Optional: Create a virtual environment before installing Python dependencies. Run the following command: python -m venv .venv && source .venv/bin/activate
Create a data source connector
We create a custom data source connector to publish every log event to the GlassFlow pipeline. Here is the source_connector.py
Python script.
This sample data_generator.py
script generates fake server logs:
Consuming data from the Pipeline
After detecting anomalies, we notify the relevant stakeholders promptly. In our pipeline, we use Slack to send these notifications.
Next, we need to set up the data sink to send notifications to Slack. Below is the Python code sink_connector.py
to create a custom data sink connector:
Test the pipeline
You'll run the source and sink connector scripts to test the pipeline.
Run the Source Connector
Run first source_connector.py
Python script in a terminal to publish server log data to the GlassFlow pipeline:
Run the Sink Connector
Run the sink_connector.py
Python script in a separate terminal window to see the output side-by-side:
This script will continuously consume new events from the GlassFlow pipeline. Upon receiving transformed events, it will send notifications to Slack. You should see an output indicating that messages are being received on Slack.
Conclusion
Following this tutorial, you’ve set up a real-time log data anomaly detection pipeline using GlassFlow, Open AI, and Slack. Enriched logs, containing identified anomalies, can also be sent to Amazon S3 or OpenSearch Service for further analysis and long-term storage. Additionally, alert notifications can be integrated with communication platforms such as Microsoft Teams or SMS services like Twilio.
This pipeline can be easily adapted for other real-time alerting use cases. That includes monitoring financial transactions for fraud, detecting security breaches, tracking performance metrics, and ensuring compliance with regulatory requirements.
Start leveraging the power of GlassFlow and AI today to build robust and scalable pipelines!