What tools should you know as a Data Engineer?

Поделиться
HTML-код
  • Опубликовано: 4 фев 2025

Комментарии • 70

  • @KahanDataSolutions
    @KahanDataSolutions  2 года назад +10

    Want to build a reliable, modern data architecture without the mess?
    Here’s a free checklist to help you → bit.ly/kds-checklist

    • @DaddyShegz
      @DaddyShegz Год назад +1

      Hi. I tried the link but it says "forbidden". Is there another way to access the pdf? Thanks

  • @mrviper3344
    @mrviper3344 Год назад +38

    All the names of the tools talked in the video:
    *Coudbased db
    Amazon Redshift
    Google BigQuery
    Snowflake
    Azure Synapse
    *Traditional row-based db
    SQL Server
    MySQL
    PostgreSQL
    *NoSQL db
    MongoDB
    elastic
    cassandra
    cosmosDB
    amazon DynamoDB
    *Extract & Load
    Batch
    Fivetran
    Stitch
    Airbyte
    Azure Datafactory
    Amazon Glue
    *Streaming
    Apache Kafka
    Amazon Kinesis
    *Transform
    dbt - data built tool
    *Reverse ETL
    Census
    hightouch
    rudderstack
    *Version Control & automation
    GitHub
    GItLab
    CI/CD
    *Task Orchestration & Scheduling
    Apache Airflow
    Jenkins
    Luigi
    *Infrastructure
    Management
    Terraform
    Ansible
    *Containers
    Docker
    *Container Orchestration
    Kubernetes
    *BI & Analytics
    Reporting
    Power BI
    Tableau
    Looker
    *Open Source
    Metabase
    *Spreadsheets

    • @r.c.r7308
      @r.c.r7308 8 месяцев назад

      Or just turn on subtitles ^ ^ but thanks for the effort :D

  • @TNTsGOboom
    @TNTsGOboom 2 года назад +5

    You have a new subscriber! I love the way you explain data engineering. You and Seattle Data Guy are my faves when it comes to Data Engineering Content Creators.

  • @vb140772
    @vb140772 Год назад

    Thanks!

  • @hamsansari2111
    @hamsansari2111 2 года назад +2

    Yesterday
    I said in your post
    That its overwhelming with so many tools and today got a video :D

    • @KahanDataSolutions
      @KahanDataSolutions  2 года назад +1

      I got you! You're definitely not alone in that feeling so I figured it'd be a good topic for a video

  • @ZawmyoHtet-lg7jn
    @ZawmyoHtet-lg7jn Год назад +1

    This is really helpful, Bro. Thanks a lot.

  • @mohammedaminelachhabe2087
    @mohammedaminelachhabe2087 11 месяцев назад

    Very good video. I think we can also add the cloud functions to this list.

  • @AlexKashie
    @AlexKashie Год назад +1

    You’ve got a new subscriber. Thank you

  • @kevon217
    @kevon217 Год назад

    thanks for an overview of the landscape!

  • @robertoferro8512
    @robertoferro8512 2 года назад +1

    What an absolutely power video. Please keep such good content coming!

  • @adamo1262
    @adamo1262 2 года назад +11

    I'm really interested in this field and currently leaning Python. I must say this list is great but I'm really overwhelmed by the amount stuff one has to learn to transition in this field! I'm gonna stick with it and hopefully come through from the other end 😁

    • @KahanDataSolutions
      @KahanDataSolutions  2 года назад +54

      Definitely stick with it! One thing to remember is while there are many tools, you don't need to know ALL of them to have a successful career and you also don't need to learn all at once (it takes a whole career to do that).
      Here is a recommendation to help you get started:
      1. Start with getting very comfortable w/ SQL (and/or Python if you'd like)
      2. Learn more about data modeling techniques (ex. dimensional modeling, star schema) and the way data typically moves (ex. ETL vs ELT)
      3. Pick a common database to study and practice on (ex. Snowflake or SQL Server)
      4. Learn how to use a tool like dbt to transform data within those databases which also will show you other important concepts like Version Control
      5. Pick a data visualization tool (ex. Power BI or Tableau) and use your transformed data to make a cool dashboard
      6. Pick another part of the process (ex. Extract tools, scheduling tools, etc.) and keep adding to your skillset
      Good luck!

    • @adamo1262
      @adamo1262 2 года назад +5

      @@KahanDataSolutionsI really want to thank you for this thoughtful response and the road map provided. I honestly didn't expect this swift response and it shows that you love what you do! I will defo stick with it and hopefully make a successful career out of it. Thanks again 💪🏿

    • @Agnostic080
      @Agnostic080 2 года назад

      @@KahanDataSolutions this is a pretty good list! You could probably even do a video talking about this process

    • @splashoui3760
      @splashoui3760 2 года назад

      @@KahanDataSolutions Thank you for your extra detailed explanation to Adam 1. I would like to ask that this video would be more helpful for senior people who is deciding what their companies should use depend on their business case and requirements?

    • @splashoui3760
      @splashoui3760 2 года назад

      And about the spreadsheets part, you are def right. We are using Google spreadsheets and using python to automate the process to write our outputs there.

  • @aniltembhare2985
    @aniltembhare2985 Год назад

    Thanks you for great information.

  • @ligiaimusic
    @ligiaimusic 7 месяцев назад

    Thank you so much for this video! Really helpful!

  • @__shaikmalikbasha__
    @__shaikmalikbasha__ Год назад

    Could you please make a complete series on Apache Airflow ❤

  • @cyclonus01
    @cyclonus01 2 года назад +1

    Good stuff bro. I'd add prefect to orchestration/task flow.

  • @himanshuagrawal2800
    @himanshuagrawal2800 Год назад +1

    Hi can you tell me where exactly apache spark fit in this picture

  • @Rex_793
    @Rex_793 2 года назад

    This was a very informative video - very useful to "get the lay of the land" so to speak.

  • @DjBaxter15
    @DjBaxter15 7 месяцев назад

    Some other alternatives for scheduling and orchestration are:
    Dagster
    Prefect
    Oozie
    Or whatever your cloud offering might have, I know Google Cloud has Cloud Scheduler.
    If you suggest Jenkins as a job scheduling tool in this day in age, I will hunt you down...

  • @nickriebe245
    @nickriebe245 2 года назад +5

    Phenomenal video. What tool(s) do you recommend for documentation and/or data dictionaries?

    • @cloveravalon444
      @cloveravalon444 Год назад

      It depends where you store data on-primese or cloud.

  • @poizentv
    @poizentv Год назад

    I really need this so bad. Do you have a Data engineer course ? Or any recommendations?

  • @adityalakkad499
    @adityalakkad499 2 года назад +2

    Apache Superset is one of the promising BI tools in my opinion, Can you share your opinion on this, if possible

  • @StephenRayner
    @StephenRayner 7 месяцев назад

    Brilliant

  • @Faz13able
    @Faz13able 11 месяцев назад

    What about spark or pyspark? Where does it fit in?

  • @tomastruchly9484
    @tomastruchly9484 2 года назад +4

    This video is kick in the balls of Oracle 😀

  • @yashikakarunan2636
    @yashikakarunan2636 Год назад

    thank you,great explaination

  • @guruprasadashridharhegde6792
    @guruprasadashridharhegde6792 10 месяцев назад

    Apache airflow is a great Orchestration tool.

  • @TheRealNCYank
    @TheRealNCYank Год назад

    No Oracle for the second layer?

  • @Swelouise
    @Swelouise 10 месяцев назад

    Hi, thank you for your video. I know that this is old now but I wish you would put the names of each tool you listed under the tool. If you aren't familiar with the specific tool it can be hard to know how to spell it. I know I can Google but I was taking notes as I was following along. Thank you.

  • @johnh7770
    @johnh7770 9 месяцев назад

    Apache Superset is another open source BI/analytics option

  • @nicky_rads
    @nicky_rads 2 года назад +1

    Nice well rounded video, thanks !
    One question, where does Databricks and spark fit into the stack?

    • @KahanDataSolutions
      @KahanDataSolutions  2 года назад +2

      Thanks! Databricks would fall in the same area as "cloud databases". Spark would fit in around the "ELT Components" and used primarily to process large amounts of data.

  • @ksaha6387
    @ksaha6387 28 дней назад

    How come Spark not mentioned ?

  • @andrewmaxwell9399
    @andrewmaxwell9399 2 года назад +2

    Hey man, may i ask a question?
    I have an ETL experiences with 2 etl tools and multiple RDBMS (on premise), and i wanted to shift into Data Engineering roles that works usually combining ETL Tools+Python and its libraries/frameworks, am i considered as new graduates or industry professionals? Since i don't have any experiences with Python ?
    And does it usually means i have to take "paycut"? let's say i make $500 a month as ETL Developer, and i wanted to shift to Data Engineer roles , does it means i will be getting paid like $300 a month since i don't have DE experiences?
    I really need some guidance... Thankyou :)

  • @adityanjsg99
    @adityanjsg99 Год назад

    I know Databricks, dbt, airflow, kafka and power bi

  • @postmandev
    @postmandev 2 года назад

    What about Clickhouse?

  • @thomashass1
    @thomashass1 Год назад

    Very surprised Apache Spark is not mentioned here.

  • @isaacmoreno7518
    @isaacmoreno7518 Год назад

    I guess you have not tried Exasol (analytical database, arguably the fastest in the market).

  • @ukaszdugozima816
    @ukaszdugozima816 2 года назад

    Hello! Thank you for your invaluable video! I find it extremely useful for beginners! I would like to ask about one thing regarding Data Engineer Career. I learnt Pandas in terms of Data Wrangling and Transformation. Therefore, how about Pandas for Data Engineers? Is it useful tool for ETL/ELT transformations? Obviously, the next step will be PySpark, but I would like to start learninig Pandas. It seems it is a good path for the next one. What do you think about it ? I would appreciate it if you could share your views about it.

  • @muhammadahtshamulhaq4476
    @muhammadahtshamulhaq4476 2 года назад

    I want to be data engineer but still not good in programming language tried a lot python just know SQL how can I be data engineer

  • @travis3366
    @travis3366 2 года назад

    Is learning informatica worth it?

    • @KahanDataSolutions
      @KahanDataSolutions  2 года назад +1

      If you are applying for a job that uses it, then yes. I'm sure there are still many companies that use it.

  • @willi1978
    @willi1978 2 года назад +1

    ETL doesn't care what the destination is. The expression "Reverse ETL" makes no sense, it's still an ETL process.

    • @KahanDataSolutions
      @KahanDataSolutions  2 года назад

      I agree that the term is a bit odd, but that's what has stuck as of today. Another term you might see used to describe that process is "Operational Analytics"

  • @Lapookie
    @Lapookie Год назад

    Apache airflow gets older, lots of problems in production

  • @TechnologyUncovered-b1i
    @TechnologyUncovered-b1i Год назад

    Python..

  • @arsenijen9797
    @arsenijen9797 Год назад

    👍🏻👌🏻💯%

  • @sunil-de
    @sunil-de Год назад

    you just list out, half of the data team
    (Devops Engineer, Data Engineer, DBA, SQL Developer, Server Executive, Data Analyst, Business Analyst),
    You dont need to learn the all of this to be data engineer...

  • @naheliegend5222
    @naheliegend5222 2 года назад

    Companies in 2022 still running SQL Server with SSIS and SSAS :D

  • @hfuhruhurr
    @hfuhruhurr 2 года назад

    Surprised there was no mention of Pandas.

    • @KahanDataSolutions
      @KahanDataSolutions  2 года назад +1

      That's a good one too. I personally haven't used Pandas much but I know others do.

    • @in6tinct
      @in6tinct 2 года назад +1

      Or Spark/Databricks