What tools should you know as a Data Engineer?

Поделиться
HTML-код
  • Опубликовано: 27 июл 2024
  • The modern data engineering stack is overwhelming...
    Everyday there seems to be a new tool and it can be hard to know what to learn.
    While it’d be impossible to share every single possible option...
    in this video I want to share some of the most commonly used ones across the modern stack.
    Use the tools mentioned in this video to guide your learning.
    Figure out what you want to learn next and overall become more aware of what's out there.
    *Important Note*
    We'll be going through a lot of tools in this video. I want you to know that you don't need to learn how to use every single platformed mentioned to be a great data engineer.
    I personally have not used all of them in depth and I've never met anybody who is an expert at them all either.
    Ultimately the goal is to give you a perspective of the overall data landscape and help guide your journey to becoming a great engineer. Pick out one tool you don't know today and start learning it. Once you master that, move on to another.
    ►► The Starter Guide for The Modern Data Stack (Free PDF)
    Simplify the “modern” data stack + better understand common tools & components → bit.ly/starter-mds
    Timestamps:
    0:00 - Intro
    0:37 - Databases
    2:32 - ELT Components
    4:52 - Version Control & CICD
    6:31 - Infrastructure
    8:00 - BI & Analytics
    Title & Tags:
    What tools should you know as a Data Engineer?
    #kahandatasolutions #dataengineering #analytics

Комментарии • 70

  • @KahanDataSolutions
    @KahanDataSolutions  2 года назад +10

    ►► The Starter Guide for The Modern Data Stack (Free PDF)→ bit.ly/starter-mds
    Simplify “modern” architectures + better understand common tools & components

    • @DaddyShegz
      @DaddyShegz Год назад

      Hi. I tried the link but it says "forbidden". Is there another way to access the pdf? Thanks

  • @mrviper3344
    @mrviper3344 11 месяцев назад +30

    All the names of the tools talked in the video:
    *Coudbased db
    Amazon Redshift
    Google BigQuery
    Snowflake
    Azure Synapse
    *Traditional row-based db
    SQL Server
    MySQL
    PostgreSQL
    *NoSQL db
    MongoDB
    elastic
    cassandra
    cosmosDB
    amazon DynamoDB
    *Extract & Load
    Batch
    Fivetran
    Stitch
    Airbyte
    Azure Datafactory
    Amazon Glue
    *Streaming
    Apache Kafka
    Amazon Kinesis
    *Transform
    dbt - data built tool
    *Reverse ETL
    Census
    hightouch
    rudderstack
    *Version Control & automation
    GitHub
    GItLab
    CI/CD
    *Task Orchestration & Scheduling
    Apache Airflow
    Jenkins
    Luigi
    *Infrastructure
    Management
    Terraform
    Ansible
    *Containers
    Docker
    *Container Orchestration
    Kubernetes
    *BI & Analytics
    Reporting
    Power BI
    Tableau
    Looker
    *Open Source
    Metabase
    *Spreadsheets

    • @r.c.r7308
      @r.c.r7308 2 месяца назад

      Or just turn on subtitles ^ ^ but thanks for the effort :D

  • @TNTsGOboom
    @TNTsGOboom Год назад +5

    You have a new subscriber! I love the way you explain data engineering. You and Seattle Data Guy are my faves when it comes to Data Engineering Content Creators.

  • @kevon217
    @kevon217 Год назад

    thanks for an overview of the landscape!

  • @robertoferro8512
    @robertoferro8512 Год назад +1

    What an absolutely power video. Please keep such good content coming!

  • @ZawmyoHtet-lg7jn
    @ZawmyoHtet-lg7jn 8 месяцев назад +1

    This is really helpful, Bro. Thanks a lot.

  • @hamsansari2111
    @hamsansari2111 2 года назад +1

    Yesterday
    I said in your post
    That its overwhelming with so many tools and today got a video :D

    • @KahanDataSolutions
      @KahanDataSolutions  2 года назад +1

      I got you! You're definitely not alone in that feeling so I figured it'd be a good topic for a video

  • @Rex_793
    @Rex_793 2 года назад

    This was a very informative video - very useful to "get the lay of the land" so to speak.

  • @ligiaimusic
    @ligiaimusic 28 дней назад

    Thank you so much for this video! Really helpful!

  • @aniltembhare2985
    @aniltembhare2985 5 месяцев назад

    Thanks you for great information.

  • @cyclonus01
    @cyclonus01 2 года назад +1

    Good stuff bro. I'd add prefect to orchestration/task flow.

  • @AlexKashie
    @AlexKashie 10 месяцев назад +1

    You’ve got a new subscriber. Thank you

  • @DjBaxter15
    @DjBaxter15 Месяц назад

    Some other alternatives for scheduling and orchestration are:
    Dagster
    Prefect
    Oozie
    Or whatever your cloud offering might have, I know Google Cloud has Cloud Scheduler.
    If you suggest Jenkins as a job scheduling tool in this day in age, I will hunt you down...

  • @mohammedaminelachhabe2087
    @mohammedaminelachhabe2087 4 месяца назад

    Very good video. I think we can also add the cloud functions to this list.

  • @tomastruchly9484
    @tomastruchly9484 Год назад +4

    This video is kick in the balls of Oracle 😀

  • @yashikakarunan2636
    @yashikakarunan2636 Год назад

    thank you,great explaination

  • @adamo1262
    @adamo1262 2 года назад +11

    I'm really interested in this field and currently leaning Python. I must say this list is great but I'm really overwhelmed by the amount stuff one has to learn to transition in this field! I'm gonna stick with it and hopefully come through from the other end 😁

    • @KahanDataSolutions
      @KahanDataSolutions  2 года назад +54

      Definitely stick with it! One thing to remember is while there are many tools, you don't need to know ALL of them to have a successful career and you also don't need to learn all at once (it takes a whole career to do that).
      Here is a recommendation to help you get started:
      1. Start with getting very comfortable w/ SQL (and/or Python if you'd like)
      2. Learn more about data modeling techniques (ex. dimensional modeling, star schema) and the way data typically moves (ex. ETL vs ELT)
      3. Pick a common database to study and practice on (ex. Snowflake or SQL Server)
      4. Learn how to use a tool like dbt to transform data within those databases which also will show you other important concepts like Version Control
      5. Pick a data visualization tool (ex. Power BI or Tableau) and use your transformed data to make a cool dashboard
      6. Pick another part of the process (ex. Extract tools, scheduling tools, etc.) and keep adding to your skillset
      Good luck!

    • @adamo1262
      @adamo1262 2 года назад +5

      @@KahanDataSolutionsI really want to thank you for this thoughtful response and the road map provided. I honestly didn't expect this swift response and it shows that you love what you do! I will defo stick with it and hopefully make a successful career out of it. Thanks again 💪🏿

    • @Agnostic080
      @Agnostic080 2 года назад

      @@KahanDataSolutions this is a pretty good list! You could probably even do a video talking about this process

    • @splashoui3760
      @splashoui3760 2 года назад

      @@KahanDataSolutions Thank you for your extra detailed explanation to Adam 1. I would like to ask that this video would be more helpful for senior people who is deciding what their companies should use depend on their business case and requirements?

    • @splashoui3760
      @splashoui3760 2 года назад

      And about the spreadsheets part, you are def right. We are using Google spreadsheets and using python to automate the process to write our outputs there.

  • @nickriebe245
    @nickriebe245 2 года назад +5

    Phenomenal video. What tool(s) do you recommend for documentation and/or data dictionaries?

    • @cloveravalon444
      @cloveravalon444 Год назад

      It depends where you store data on-primese or cloud.

  • @adityalakkad499
    @adityalakkad499 2 года назад +2

    Apache Superset is one of the promising BI tools in my opinion, Can you share your opinion on this, if possible

  • @StephenRayner
    @StephenRayner 21 день назад

    Brilliant

  • @johnh7770
    @johnh7770 3 месяца назад

    Apache Superset is another open source BI/analytics option

  • @__shaikmalikbasha__
    @__shaikmalikbasha__ 8 месяцев назад

    Could you please make a complete series on Apache Airflow ❤

  • @vb140772
    @vb140772 7 месяцев назад

    Thanks!

  • @nicky_rads
    @nicky_rads 2 года назад +1

    Nice well rounded video, thanks !
    One question, where does Databricks and spark fit into the stack?

    • @KahanDataSolutions
      @KahanDataSolutions  2 года назад +2

      Thanks! Databricks would fall in the same area as "cloud databases". Spark would fit in around the "ELT Components" and used primarily to process large amounts of data.

  • @ukaszdugozima816
    @ukaszdugozima816 Год назад

    Hello! Thank you for your invaluable video! I find it extremely useful for beginners! I would like to ask about one thing regarding Data Engineer Career. I learnt Pandas in terms of Data Wrangling and Transformation. Therefore, how about Pandas for Data Engineers? Is it useful tool for ETL/ELT transformations? Obviously, the next step will be PySpark, but I would like to start learninig Pandas. It seems it is a good path for the next one. What do you think about it ? I would appreciate it if you could share your views about it.

  • @andrewmaxwell9399
    @andrewmaxwell9399 Год назад +2

    Hey man, may i ask a question?
    I have an ETL experiences with 2 etl tools and multiple RDBMS (on premise), and i wanted to shift into Data Engineering roles that works usually combining ETL Tools+Python and its libraries/frameworks, am i considered as new graduates or industry professionals? Since i don't have any experiences with Python ?
    And does it usually means i have to take "paycut"? let's say i make $500 a month as ETL Developer, and i wanted to shift to Data Engineer roles , does it means i will be getting paid like $300 a month since i don't have DE experiences?
    I really need some guidance... Thankyou :)

  • @poizentv
    @poizentv 6 месяцев назад

    I really need this so bad. Do you have a Data engineer course ? Or any recommendations?

  • @guruprasadashridharhegde6792
    @guruprasadashridharhegde6792 4 месяца назад

    Apache airflow is a great Orchestration tool.

  • @FroFoLife
    @FroFoLife 4 месяца назад

    Hi, thank you for your video. I know that this is old now but I wish you would put the names of each tool you listed under the tool. If you aren't familiar with the specific tool it can be hard to know how to spell it. I know I can Google but I was taking notes as I was following along. Thank you.

  • @himanshuagrawal2800
    @himanshuagrawal2800 7 месяцев назад

    Hi can you tell me where exactly apache spark fit in this picture

  • @Faz13able
    @Faz13able 5 месяцев назад

    What about spark or pyspark? Where does it fit in?

  • @adityanjsg99
    @adityanjsg99 Год назад

    I know Databricks, dbt, airflow, kafka and power bi

  • @TheRealNCYank
    @TheRealNCYank 8 месяцев назад

    No Oracle for the second layer?

  • @postmandev
    @postmandev Год назад

    What about Clickhouse?

  • @muhammadahtshamulhaq4476
    @muhammadahtshamulhaq4476 Год назад

    I want to be data engineer but still not good in programming language tried a lot python just know SQL how can I be data engineer

  • @thomashass1
    @thomashass1 Год назад

    Very surprised Apache Spark is not mentioned here.

  • @isaacmoreno7518
    @isaacmoreno7518 Год назад

    I guess you have not tried Exasol (analytical database, arguably the fastest in the market).

  • @arsenijen9797
    @arsenijen9797 Год назад

    👍🏻👌🏻💯%

  • @travis3366
    @travis3366 Год назад

    Is learning informatica worth it?

    • @KahanDataSolutions
      @KahanDataSolutions  Год назад +1

      If you are applying for a job that uses it, then yes. I'm sure there are still many companies that use it.

  • @willi1978
    @willi1978 Год назад +1

    ETL doesn't care what the destination is. The expression "Reverse ETL" makes no sense, it's still an ETL process.

    • @KahanDataSolutions
      @KahanDataSolutions  Год назад

      I agree that the term is a bit odd, but that's what has stuck as of today. Another term you might see used to describe that process is "Operational Analytics"

  • @skateforlife3679
    @skateforlife3679 Год назад

    Apache airflow gets older, lots of problems in production

  • @BeastFitness472
    @BeastFitness472 Год назад

    Python..

  • @hfuhruhurr
    @hfuhruhurr 2 года назад

    Surprised there was no mention of Pandas.

    • @KahanDataSolutions
      @KahanDataSolutions  2 года назад +1

      That's a good one too. I personally haven't used Pandas much but I know others do.

    • @in6tinct
      @in6tinct 2 года назад +1

      Or Spark/Databricks

  • @naheliegend5222
    @naheliegend5222 Год назад

    Companies in 2022 still running SQL Server with SSIS and SSAS :D

  • @sunil-de
    @sunil-de 8 месяцев назад

    you just list out, half of the data team
    (Devops Engineer, Data Engineer, DBA, SQL Developer, Server Executive, Data Analyst, Business Analyst),
    You dont need to learn the all of this to be data engineer...