Running Airflow 2.0 with Docker in 5 mins
HTML-код
- Опубликовано: 4 окт 2024
- Airflow 2.0 is out! How it works, what are the new features, what can do with your DAGs, to answer all those questions, you need to run Airflow 2.0.
What is the easiest and fastest way to do it?
By using Docker!
Let's discover how to run Apache Airflow 2.0 with the CeleryExecutor locally by using Docker!
👍 Smash the like button to become an Airflow Super Hero!
❤️ Subscribe to my channel to become a master of Airflow
🏆 Take my course : www.udemy.com/... to join the legends of Airflow
🚨 My Patreon: / marclamberti to support my work and be friend for life
The docker-compose file:
airflow.apache...
Wow, its amazing how far the Airflow team has come with this. Thanks Marc!
Thanks Mike
The only of many tutorial that actually helped! Thanks
Fabulous! Thanks Marc! I have installed Airflow2.0 successfully. The webserver was failed to start in my MAC. But after I have increased the memory to 4GB ...it works.
how did u increase your mac memory
arrgh .. spent about 3h trying to figure this out, basically all the online-instructions missed one small bit or another ... with your instructions, le voila, it works straight up. Thanks a lot!
I struggled to get airflow running for a long time and this short video helped me SO MUCH, thank you!!
Happy to help!
Thanks marc to sort the installation part of airflow👏👏👏👏
Thaaaaanks, i have been having issues with running airflow and now it worked!!! Ill now be able to automate tasks and be lazier lol,
Let’s gooooo
Thank you! very efficiently and clearly explained !
Thanks Marc ! Great work
My pleasure!
Thank you VERY MUCH!! Marc. This video is very useful.
I always try to find a docker image to perform experiments.
Thanks for providing a reference that I can refer anytime in future.
Here it is 😁
This instruction really helps me thank you so much !
super
amazing job
thank u!
Thanks Marc, It helped me a lot!
For anyone coming here from the 2024's and beyond, in Linux, specifically Ubuntu, remember to use: `docker compose up init-airflow`
I finally got it up and running! Thank you, Marc!
Thanks , it worked Man !
Awesome Marc, thanks for sharing
this is so much fun and informative, thanks.
Great job boy. keep it up.❤
VERY EASY TO UNDERSTAND
I love this!!! thanks man!
Awesome! Thanks for the tutorial, Marc!
Thank you so much, Marc. Great content!
TIL Using YAML aliases and
Love it too 😁
It works on Windows and Mac, I tried both and it works, thanks (on Windows with some tricks)
Please can you share the tricks on windows..I tried on windows its not working for me. Please do reply will be very helpful
@@anjanashetty482 for windows use the wsl tool to run the commands described in the video
@@kikecastor Thanks for your response Armonia. I was able to install airflow 2 with wsl but when I create a dag and try to debug in VS I am getting error : ModuleNotFoundError: No module named 'airflow'
@@anjanashetty482 are you in the correct environment?
@@kikecastor Yes I am, do I have to explicitly do pip install apache-airflow
neat & clean! thanks!!
Thank you ❤️
Thank you Marc. I was in hurry to find out how to run airflow and kept failed somehow.
However with your nice clear explanation, nothing is mysterious anymore~
excellent walkthrough,,, Thanks :)
Great video Marc. It's sort of crazy how easy that was (even on WSL2)... Thank you
Some things I'm still considering afterwards:
1) Is this enough for a production deployment of Airflow if the database was decoupled from the rest of the container? If the container crashed for whatever reason all of the connections would be lost, so separating is a good idea.
3) For local testing/debugging of an instance I'm going to try and mount DAGS that exist in another project folder instead of the one that we created.
3b) For local testing I might also try and store connection details in environment variables in the .env file rather than relying on the persistence of the database.
Docker compose is not enough for production, but you can take the same components (containerized) and use kubernetes to go to production
Thanks Marc ! awesome tutorial.
Thank you 😁
Thank you so much!
This is an awesome tutorial ! Thanks a lot~
Thanks Marc!!! Great Job!
Thanks Alex
Thanks for the video!
Thanks, simple and clear ;)
Awesome man!
Amazing! Thank you very much!
Glad you like it!
super video Marc je m'abonne !
🙌🙌🙌🙌
Hi Marc, thanks a lot for this!! :)
pleasure :)
THANK YOU
Kindly demonstrate on Teradata and keycloak containerisation
I have many problems using PythonVirtualenvOperator or ExternalPythonOperator inside docker because you must include system site packages as True (it creates conflicts between venv and base python libraries) or otherwise you will get "ERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv"
Hi Marc, great tutorial. Airflow is running w/o Problems. I tried to use vs code with airflow and found your new video "Configure VS Code to Develop Airflow DAGs with Docker at ease!" However, I don't understand where the Dockerfile come into the picture. Can you please elaborate! ---> Reopen in Container looks totally different as in your video. Thanks
Marc, Big fan of your content. Can you make a video for deploying Airflow 2.0 (with Celery executor) on Azure Containers?
Hi Marc, great video. Just wondering if you could show us how to install a triggerer into your airflow stack using docker compose? Thanks!
Great! but where can i locate the requirements.txt to add for example the apache-airflow-providers-snowflake?
For those that have a Mac and install Docker Desktop, you will not need to install Compose separately. It comes with Docker Desktop
Great tutorial Old but relevant. Thanks! Marc, I am using Visual Studio Code and everytime I want to save my dag file, I need click a button "Retry as Sudo". Can you tell me what to do here... it is quite annyoing! Regards!
This is super useful... Thank you. One question: Can I use it on an AWS instance. How should I configure the security group and firewalls.
For my Mac with M1 chip: I had to increase the amount of RAM available to Docker to 8GB, and swap to 2GB's.
The only issue is that I can't import anything to my dag from other folders (not dag folder). I don't know why but I get a Import Error
I had to increase the amount of RAM available to Docker to 6GB for this to work on my Mac. Also had to enable permissions to the folder i worked in with CHMOD.
Hi, could you please let me know how you enabled permissions using CHMOD?
I keep getting the following error: "OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "version": executable file not found in $PATH: unknown"
I got this error "Error response from daemon: manifest for apache/airflow:2.6.0.dev0 not found: manifest unknown: manifest unknow" after I typied "docker-compose up airflow-init"
Why?
Thanks a lot Marc.
Can you also please make a video for deploying Airflow 2 using helm chart? and go over the options on values.yaml file?
Thanks in advance
Coming but there are some issues right now with the Helm chart 😬
@@MarcLamberti Thanks!
Thanks Marc for this video. Question: Do I need to run everytime I spin the containers?
Thank you for the awesome tutorial. I do have one question though: how can I install python packages with docker-compose when creating the containers? for example I would like to install Pymongo.
Hi, I would recommend use PythonVirtualenvOperator
@@ramsescoraspe yes, but how do I install a python library like PyMongo, or OpenCV in the container? PythonVirtualenvOperator allows for functions/methods to be created including the module imports they need and then they are destroyed, but I do not have those modules installed in the container. Until now, each time I installed Python modules in containers I did it with a help of a Dockerfile (e.g. inside the Dockerfile I enter "RUN pip install opencv-python") but it is not clear to me how to do the same using a docker-compose.yaml file.
Edit: figured it out: I had to add a pointer to the Dockerfile in the docker-compose.yaml
@@derzemel hi, how do you do this ? (add a pointer)
@ in my case, the airflow webserver service is build like this (the Dockerfile is in the same dir as the compose):
airflow-webserver:
build:
context: .
dockerfile: Dockerfile
@@derzemel do you know how can I add Airflow dependencies inside the docker-compose.yaml file? Also is there a way to provide access to my AWS resources, such as S3, either on the yaml file on the Airflow UI?
how do you create a celery executor if you haven't specified a dedicated backend mysql or Postgres db in the yaml file?
Thanks for the wanderful tutorial. I understood that the DAG file I stored in DAG folder will be added to Airflow. But what happen if Airflow is running in remote docker that I only have web access? I can upload DAG from my local disk to remote? Or is there other way to do it?
thanks. you help me a lot.
Is this usable in production? Could you create a production setup?
I got all of this running, but once I add a new py in dags it doesnt show on the airflow interface. Anyone had the same issue?
First of all, thank you so much for this *awesome* video. It is really helpful. I followed this tutorial and was able to access AirFlow seamlessly. But I want to have apache-airflow-providers operators. So, I tried giving them in _PIP_ADDITIONAL_REQUIREMENTS and also building using Dockerfile. But nothing worked and I still see "error: command 'gcc' failed with exit status 1". I changed airflow image to 2.1.2-python3.7 as slim versions don't include extra libraries. But no luck. Could you help me resolve this issue?
It was great video thanks.
How would i push my custom airflow python file into docker container?
Hi Marc, thanks for your sharing! I'd like to know how to install third-party modules. When I installed yfinance module, there was a dag import error : no module named yfinance.
Great tutorial! Short and sweet. I followed the exact same steps and checked the containers status, redis and postgres were healthy but airflow-scheduler, flower, worker, webserver and triggerer were unhealthy then I deleted all the containers and repeated all the steps and now I'm getting error as "database "airflow" does not exist". Redis and postgres containers are running without any problem. I would appreciate if you can help me understand the error. Thanks.
hi, did you ever resolve the problem? I am having the same issues. Thanks
Hi guys, I got this problem same as you. I am operating in Windows 10. Instead of applying "echo -e....." command, I created a .env file on same directory as .yaml fileAIRFLOW_UID=50000 in it. Problem was solved!
Hi I'm a noob I'm using the same YAML file but after running the command "docker-compose up airflow-init" on my ubuntu machine I'm getting this error please help.
ERROR: The Compose file './docker-compose.yaml' is invalid because:
Invalid top-level property "x-airflow-common". Valid top-level sections for this Compose file are: services, version, networks, volumes, and extensions starting with "x-".
You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see docs.docker.com/compose/compose-file/
services.airflow-init.depends_on contains an invalid type, it should be an array
services.airflow-scheduler.depends_on contains an invalid type, it should be an array
services.airflow-webserver.depends_on contains an invalid type, it should be an array
services.airflow-worker.depends_on contains an invalid type, it should be an array
services.flower.depends_on contains an invalid type, it should be an array
Question: what's the recommended way to increase the number of celery workers using docker compose? Say from 2 workers to 10 workers? Copy&paste worker keys in docker compose yaml files?
No, use docker-compose up --scale airflow-worker=10 :)
@@MarcLamberti Thank you for your quick reply! TIL docker-compose up -scale!
Hi Mark! could I perform these steps without problem on a raspberry pi?
For me it does not work. Docker compose is creating path "./local" and I can not access it. Airflow can not read my DAGS. it is very frustrating. I have been installing airflow for 8th time and none of them worked...
Hi Marc,
I have used docker compose to install airflow.
However, the sample dags seems not to work for me and I found no logs.
Nice 😊
Can someone explain to me why we are running "docker-compose up airflow-init" and then "docker-compose up"?
Hello Marc, Your videos are always great and helpful and with your video I get Airflow running quite well. The only trouble is that I need to run java within docker and I have not found any good description of how to get this working. I am starting a shell script that starts a java runtime within the terminal. Could you give me some help on how to get this running? Thanks, Armin
Hello guys.
Just a tip for everyone:
Do not try to create the airflow folder outside your folder user... You will run into permissions problems (a tried to create a Airflow folder o /opt/, but i strong don't recommend that).
where to place my custom Python file into the docker container?
Thanks BTW for good video
getting error : Import "airflow" could not be resolved
while importing 'from airflow import DAG'
When I try to run the official 2.0.1 docker-compose.yaml file at airflow.apache.org/docs/apache-airflow/2.0.1/docker-compose.yaml on my Ubuntu 18.04 LTS I get the following error:
ERROR: The Compose file './docker-compose.yaml' is invalid because:
Invalid top-level property "x-airflow-common". Valid top-level sections for this Compose file are: services, version, networks, volumes, and extensions starting with "x-".
You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see docs.docker.com/compose/compose-file/
services.airflow-init.depends_on contains an invalid type, it should be an array
services.airflow-scheduler.depends_on contains an invalid type, it should be an array
services.airflow-webserver.depends_on contains an invalid type, it should be an array
services.airflow-worker.depends_on contains an invalid type, it should be an array
services.flower.depends_on contains an invalid type, it should be an array
Changing the version to 3.4 removes the first error but I still get docker complaining about depends_on. How can I fix it? My docker-compose version is
docker-compose version 1.17.1, build unknown
docker-py version: 2.5.1
CPython version: 2.7.17
OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
while docker version is
Client:
Version: 19.03.6
API version: 1.40
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Fri Dec 18 12:21:44 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Thu Dec 10 13:23:49 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.3-0ubuntu1~18.04.2
GitCommit:
runc:
Version: spec: 1.0.1-dev
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:
Thanks in advance,
Flavio
Is there available docker-compose with mysql?
Can u please share link if you have
While running Airflow 2 via Docker Compose(Just like the above video),
I am unable to successfully execute DockerOperator tasks.
Can you enlighten with a video reference or doc reference about how to properly configure Airflow Docker compose file or Docker Operator to run tasks
The container for the webserver seems to be restarting continuously every minute or so. Any idea why this may happen?
You must increase the RAM of docker and that is how it will work
Great video with details! I am following your steps but I constantly get WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'. when I did docker-compose up airflow-init.
I keep getting this as well. The all containers seem to run fine except the init container.
When i followed the steps and installing, I am getting the error " manifest file not found". I have seen this error is reported by others as well; I changed the image to 2.0.1 and it worked by later I get an error about old version used.
I installed everything and could not open the localhost:8080. Tried many times and safari said that "safari cannot open the page. The server dropped the connection. This happens when the server is busy" why does that happen?
Hi mark. I bought your course but I got an error trying to run the bash operador that insert data into the user table. I have tried the comand alone in the console and it works but when I used inside my dag in my bash operator I got this error bash command failed. The comand returned a non-zero exit code. I have tried a lot but I still can't found a solution for this
does not do the magic in Windows. beside the curl command not working, the docker-compose up airflow-init command says: top level object must be a mapping.
Hello, when are you going to release a course containing Docker with airflow.providers.apache.spark ?
Localhost:8080 aint opening for me. How to check the logs for any issues?
I am importing by from airflow import DAG in a file inside my directory but vscode is unable to recognise airflow
That’s because Airflow runs in Docker. You need to connect your VSCode to Docker
Hello Marc thanks for the videos it's great, I have a question for you
how can we version the dag in production ?
Right now, the only way is to change the dag id with the version. For example, my_dag_v1.0.0, my_dag_v1.0.1 and so on. DAG versioning is coming soon but not yet available
I followed the steps mentioned here, but getting no response from gunicorn master within 120 seconds and the webserver keeps getting restarted. Can anyone help with any lead here please?
How can we add and setup airflow.cfg file inside project folder?
The first time I run "docker-compose up airflow-init", everything is okay. But after I run "docker-compose down" and then I run "docker-compose up airflow-init" once again, I get the message "container for service "postgres" is unhealthy" and the airflow-init container fails. I have to run "docker-compose up airflow-init" once more time to start airflow-init container.
Does anyone get the same problem like me? Could you give me some advice to avoid this, please? Thanks all!
Hello sir, how can we launch every task of etl in a different container as we do via k8s pod operator to launch every task of dag in a different pod?
Hello Marc, I have recently installed airflow 2 using docker compose file as suggested in this video. But, when I enhanced the dag with mutiple connections i.e., Gdrive->S3, S3->Snowflake,Snowflake->S3 operations using pyspark and sql scripts, the webserver keeps restarting and at times shows unhealthy. Can you please suggest or advice what could have gone wrong or should I consider increasing docker memory?
Hi Marc
I installed docker desktop at windows using Ubuntu wsl. I changed the dags directory path in .yaml file to my c:\ drive folder in windows.
When I start web UI, it doesn't pick my dags.py file. what can be the issue.
How can I install required packages to docker or how can I mentioned required packages in .yaml file
When I got to the docker-compose up airflow-init, I get the following error:
"Python-dotenv could not parse statement starting at line 1
Traceback (most recent call last):
File "docker\api\client.py", line 214, in _retrieve_server_version
File "docker\api\daemon.py", line 181, in version
File "docker\utils\decorators.py", line 46, in inner
"
A few dozen more error lines afterwards, but I can't make it work so far
How do we install providers after installing airflow on docker
Very cool! Unfortunately, I got an error saying 'port 5555 is already allocated..' but I am pretty sure there is nothing on there. So, not sure what's going on.
You should already have something running on that port. You can change the port in docker compose file for flower
Thank you! Marc, got that fixed!
I don't know why but it is giving me some python error when i am executing docker compose up airflow-init.
Any suggestion ?