Classified ads are short advertisements you usually see in newspapers, magazines, or online platforms to buy or sell products or services. They are typically organized into categories such as real estate, vehicles, job listings, and personal items. Presenting rich, detailed information for classified ads can attract potential buyers for typical online stores. Platforms like OLX and eBay have seen a 25% and 15% increase in user engagement respectively, thanks to real-time ad enrichment. Similarly, Zillow experienced a 20% boost in conversion rates by enriching property listings.
This blog post explores how you can build a data processing pipeline to enrich classified ads in real time. We'll use GlassFlow to process ads, enrich them with additional information, categorize them using Langchain and OpenAI, and store the enriched ads in Redis for quick and advanced search.
Understanding real-time classified ad enrichment
Think about developing an online marketplace for classified ads. Users post ads for various items like electronics, furniture, and vehicles. For instance, if a user posts an ad for a used car, a real-time data processing pipeline can automatically analyze the car images, extract relevant information (like manufacturer, model, and condition), categorize the ad under "Vehicles," and create a summary. This enriched ad is then stored in Redis, ensuring it’s quickly retrievable when users search for cars.
Using this pipeline, you can classify ads correctly, making it easier for users to find what they want. Also, you provide a better browsing experience for users with enriched and well-organized ads. See other use cases the same pipeline can be applied to at the end of the article.
Tools used in the pipeline
To enrich classified ads in real time, we'll use the following tools:
-
GlassFlow provides a serverless environment for real-time data transformation without the need for complex infrastructure setup. It’s easy to integrate with various data sources and offers a Python SDK for seamless data processing.
-
Langchain is used to process and understand the content of the ads using larger language models. It helps in categorizing the ads, summarizing the descriptions, and providing additional information about the images in the ads.
-
Redis acts as a high-performance database to store enriched ads. The enriched data in Redis is made available to the frontend of the classified ads platform where users see the most relevant and informative ads. You can further develop the example to host the UI part using Streamlit or any other UI tools.
Problems solved with the pipeline
You want to enrich ads as soon as they are posted so that users see the most up-to-date information. When users browse the online marketplace, every ad they see feels like it was handpicked just for them. Real-time data transformation with GlassFlow is key here to make the enrichment process efficient and effective. You can integrate machine learning models that detect and filter out spam ads in real time, ensuring you only see legitimate listings.
By analyzing user interactions and preferences as they happen, this setup personalizes ad recommendations to fit your unique tastes and needs. This level of personalization boosts user engagement and satisfaction, making the platform not just a marketplace, but a place where they enjoy spending their time.
Facebook Marketplace facilitated over 760 million new listings in its first year. Real-time ad enrichment contributed to a 15% increase in revenue for Facebook Marketplace. Reference
Components of the pipeline
You can create a pipeline using GlassFlow WebApp with easy steps in the low-code environment. Let's see what pipeline components it involves.
Data Source
You start by sending sample classified ads to the pipeline using the GlassFlow Python SDK. With SDK, you can build a custom connector to ingest data from a custom source or use managed connectors for various data sources like PostgreSQL, Debezium, Google Pub/Sub, Amazon SQS, etc. The sample data includes ad author, title, descriptions, images, and other relevant details.
Here’s an example of a classified ad in JSON format before the transformation:
With this ad image:
Transformation
At this stage, you write a custom transformation function Python script to process ads.
The transformation function does the following:
-
Enriches Information: Uses Langchain and OpenAI to analyze and add more information to the ad, especially focusing on images.
-
Categories Ads: Classifies the ads into appropriate categories.
-
Creates Summaries: Generates concise summaries for each ad.
After the transformation, the output looks something like the below:
As you can see from the output, ad tags are generated based on the input image to provide more context.
Data Sink
The JSON documents with enriched ads are then continuously written into Redis using the GlassFlow connector.
Further Improvement
You can also use Redis as a vector database for efficient storage and fast retrieval of ad data. Ad descriptions are often unstructured. One approach to storing and searching through unstructured data is to use vector embeddings and store them in a vector database. When you use a vector database, it takes the heavy lifting out of advanced searches. It can quickly index and retrieve enriched ads based on various attributes, making your search experience smooth and efficient. In the transformation, you can update the code to generate vector embeddings and save them to Redis together with other ad details.
See how to store vector embeddings into Redis and query ads by performing a vector search in the Redis documentation.
Other use case examples
-
Personalized Job Recommendations: The same solution can be used for job portals to enrich job postings in real-time, categorize them accurately, and generate summaries. This can include tagging job requirements, extracting key skills, and personalizing job recommendations based on user profiles. The enriched data can be stored in Redis for fast retrieval, enabling users to find jobs that match their skills and preferences more efficiently.
-
Enhanced Real Estate Listings: For real estate platforms, integrating GlassFlow and Langchain can help enrich property listings by categorizing them into residential, commercial, or rental properties and summarizing key details like amenities, nearby facilities, and price trends. Images of properties can be analyzed to generate descriptive tags, improving searchability.
-
Improved Event Ticket Listings: Event management platforms can benefit from GlassFlow and Langchain by enriching event listings with detailed descriptions, categorizing events (e.g., concerts, sports, theater), and generating tags from event images. This enriched data, stored in Redis, can enhance the user experience by providing more relevant search results and personalized event recommendations.
Interested in learning more?
Dive into our GlassFlow documentation and explore our use cases to see how GlassFlow can revolutionize your data processing needs.