Automate Data Pipeline for RAG with Github Actions
HTML-код
- Опубликовано: 29 июл 2024
- Data is a key aspect of a RAG system. In some cases, we want to always get the latest data. For example, if we're building a chatbot for financial reports, research, or news, we want to obtain the most recent information. However, how to automate this pipeline is rarely discussed, and that's what I want to share in this video. I'll cover how to set up an ETL pipeline, introduce you to Supabase Vector, and show you how to automate the pipeline using GitHub Actions.
Chapter:
00:00:00 - Intro
00:01:02 - Project Overview
00:02:29 - ETL Pipeline Explanation
00:05:00 - Set Up Database
00:05:50 - Test the Pipeline (Insert Data into Supabase)
00:06:25 - Set Up GitHub Actions
00:09:30 - Save Environment Variables in GitHub Repository
00:10:08 - Check the Results of the Automated Process in GitHub
00:11:05 - Add Command in actions.yaml to Save Files After Process Runs (git push)
00:11:30 - Add a Scheduler
00:12:13 - Outro
#etl #githubactions #python #etl