You have a new subscriber! I love the way you explain data engineering. You and Seattle Data Guy are my faves when it comes to Data Engineering Content Creators.
I'm really interested in this field and currently leaning Python. I must say this list is great but I'm really overwhelmed by the amount stuff one has to learn to transition in this field! I'm gonna stick with it and hopefully come through from the other end 😁
Definitely stick with it! One thing to remember is while there are many tools, you don't need to know ALL of them to have a successful career and you also don't need to learn all at once (it takes a whole career to do that). Here is a recommendation to help you get started: 1. Start with getting very comfortable w/ SQL (and/or Python if you'd like) 2. Learn more about data modeling techniques (ex. dimensional modeling, star schema) and the way data typically moves (ex. ETL vs ELT) 3. Pick a common database to study and practice on (ex. Snowflake or SQL Server) 4. Learn how to use a tool like dbt to transform data within those databases which also will show you other important concepts like Version Control 5. Pick a data visualization tool (ex. Power BI or Tableau) and use your transformed data to make a cool dashboard 6. Pick another part of the process (ex. Extract tools, scheduling tools, etc.) and keep adding to your skillset Good luck!
@@KahanDataSolutionsI really want to thank you for this thoughtful response and the road map provided. I honestly didn't expect this swift response and it shows that you love what you do! I will defo stick with it and hopefully make a successful career out of it. Thanks again 💪🏿
@@KahanDataSolutions Thank you for your extra detailed explanation to Adam 1. I would like to ask that this video would be more helpful for senior people who is deciding what their companies should use depend on their business case and requirements?
And about the spreadsheets part, you are def right. We are using Google spreadsheets and using python to automate the process to write our outputs there.
Some other alternatives for scheduling and orchestration are: Dagster Prefect Oozie Or whatever your cloud offering might have, I know Google Cloud has Cloud Scheduler. If you suggest Jenkins as a job scheduling tool in this day in age, I will hunt you down...
Hi, thank you for your video. I know that this is old now but I wish you would put the names of each tool you listed under the tool. If you aren't familiar with the specific tool it can be hard to know how to spell it. I know I can Google but I was taking notes as I was following along. Thank you.
Thanks! Databricks would fall in the same area as "cloud databases". Spark would fit in around the "ELT Components" and used primarily to process large amounts of data.
Hey man, may i ask a question? I have an ETL experiences with 2 etl tools and multiple RDBMS (on premise), and i wanted to shift into Data Engineering roles that works usually combining ETL Tools+Python and its libraries/frameworks, am i considered as new graduates or industry professionals? Since i don't have any experiences with Python ? And does it usually means i have to take "paycut"? let's say i make $500 a month as ETL Developer, and i wanted to shift to Data Engineer roles , does it means i will be getting paid like $300 a month since i don't have DE experiences? I really need some guidance... Thankyou :)
Hello! Thank you for your invaluable video! I find it extremely useful for beginners! I would like to ask about one thing regarding Data Engineer Career. I learnt Pandas in terms of Data Wrangling and Transformation. Therefore, how about Pandas for Data Engineers? Is it useful tool for ETL/ELT transformations? Obviously, the next step will be PySpark, but I would like to start learninig Pandas. It seems it is a good path for the next one. What do you think about it ? I would appreciate it if you could share your views about it.
I agree that the term is a bit odd, but that's what has stuck as of today. Another term you might see used to describe that process is "Operational Analytics"
you just list out, half of the data team (Devops Engineer, Data Engineer, DBA, SQL Developer, Server Executive, Data Analyst, Business Analyst), You dont need to learn the all of this to be data engineer...
Want to build a reliable, modern data architecture without the mess?
Here’s a free checklist to help you → bit.ly/kds-checklist
Hi. I tried the link but it says "forbidden". Is there another way to access the pdf? Thanks
All the names of the tools talked in the video:
*Coudbased db
Amazon Redshift
Google BigQuery
Snowflake
Azure Synapse
*Traditional row-based db
SQL Server
MySQL
PostgreSQL
*NoSQL db
MongoDB
elastic
cassandra
cosmosDB
amazon DynamoDB
*Extract & Load
Batch
Fivetran
Stitch
Airbyte
Azure Datafactory
Amazon Glue
*Streaming
Apache Kafka
Amazon Kinesis
*Transform
dbt - data built tool
*Reverse ETL
Census
hightouch
rudderstack
*Version Control & automation
GitHub
GItLab
CI/CD
*Task Orchestration & Scheduling
Apache Airflow
Jenkins
Luigi
*Infrastructure
Management
Terraform
Ansible
*Containers
Docker
*Container Orchestration
Kubernetes
*BI & Analytics
Reporting
Power BI
Tableau
Looker
*Open Source
Metabase
*Spreadsheets
Or just turn on subtitles ^ ^ but thanks for the effort :D
You have a new subscriber! I love the way you explain data engineering. You and Seattle Data Guy are my faves when it comes to Data Engineering Content Creators.
Thanks Turk! Much appreciated
Thanks!
Thank you!
Yesterday
I said in your post
That its overwhelming with so many tools and today got a video :D
I got you! You're definitely not alone in that feeling so I figured it'd be a good topic for a video
This is really helpful, Bro. Thanks a lot.
Very good video. I think we can also add the cloud functions to this list.
You’ve got a new subscriber. Thank you
Thank you!
thanks for an overview of the landscape!
What an absolutely power video. Please keep such good content coming!
Much appreciated! Thanks for watching
I'm really interested in this field and currently leaning Python. I must say this list is great but I'm really overwhelmed by the amount stuff one has to learn to transition in this field! I'm gonna stick with it and hopefully come through from the other end 😁
Definitely stick with it! One thing to remember is while there are many tools, you don't need to know ALL of them to have a successful career and you also don't need to learn all at once (it takes a whole career to do that).
Here is a recommendation to help you get started:
1. Start with getting very comfortable w/ SQL (and/or Python if you'd like)
2. Learn more about data modeling techniques (ex. dimensional modeling, star schema) and the way data typically moves (ex. ETL vs ELT)
3. Pick a common database to study and practice on (ex. Snowflake or SQL Server)
4. Learn how to use a tool like dbt to transform data within those databases which also will show you other important concepts like Version Control
5. Pick a data visualization tool (ex. Power BI or Tableau) and use your transformed data to make a cool dashboard
6. Pick another part of the process (ex. Extract tools, scheduling tools, etc.) and keep adding to your skillset
Good luck!
@@KahanDataSolutionsI really want to thank you for this thoughtful response and the road map provided. I honestly didn't expect this swift response and it shows that you love what you do! I will defo stick with it and hopefully make a successful career out of it. Thanks again 💪🏿
@@KahanDataSolutions this is a pretty good list! You could probably even do a video talking about this process
@@KahanDataSolutions Thank you for your extra detailed explanation to Adam 1. I would like to ask that this video would be more helpful for senior people who is deciding what their companies should use depend on their business case and requirements?
And about the spreadsheets part, you are def right. We are using Google spreadsheets and using python to automate the process to write our outputs there.
Thanks you for great information.
Thank you so much for this video! Really helpful!
Glad it was helpful!
Could you please make a complete series on Apache Airflow ❤
Good stuff bro. I'd add prefect to orchestration/task flow.
Good call - Thanks for watching
Hi can you tell me where exactly apache spark fit in this picture
This was a very informative video - very useful to "get the lay of the land" so to speak.
Glad to hear it! That was the goal
Some other alternatives for scheduling and orchestration are:
Dagster
Prefect
Oozie
Or whatever your cloud offering might have, I know Google Cloud has Cloud Scheduler.
If you suggest Jenkins as a job scheduling tool in this day in age, I will hunt you down...
Phenomenal video. What tool(s) do you recommend for documentation and/or data dictionaries?
It depends where you store data on-primese or cloud.
I really need this so bad. Do you have a Data engineer course ? Or any recommendations?
Apache Superset is one of the promising BI tools in my opinion, Can you share your opinion on this, if possible
Brilliant
What about spark or pyspark? Where does it fit in?
This video is kick in the balls of Oracle 😀
thank you,great explaination
Glad it was helpful!
Apache airflow is a great Orchestration tool.
No Oracle for the second layer?
Hi, thank you for your video. I know that this is old now but I wish you would put the names of each tool you listed under the tool. If you aren't familiar with the specific tool it can be hard to know how to spell it. I know I can Google but I was taking notes as I was following along. Thank you.
Apache Superset is another open source BI/analytics option
Nice well rounded video, thanks !
One question, where does Databricks and spark fit into the stack?
Thanks! Databricks would fall in the same area as "cloud databases". Spark would fit in around the "ELT Components" and used primarily to process large amounts of data.
How come Spark not mentioned ?
Hey man, may i ask a question?
I have an ETL experiences with 2 etl tools and multiple RDBMS (on premise), and i wanted to shift into Data Engineering roles that works usually combining ETL Tools+Python and its libraries/frameworks, am i considered as new graduates or industry professionals? Since i don't have any experiences with Python ?
And does it usually means i have to take "paycut"? let's say i make $500 a month as ETL Developer, and i wanted to shift to Data Engineer roles , does it means i will be getting paid like $300 a month since i don't have DE experiences?
I really need some guidance... Thankyou :)
I know Databricks, dbt, airflow, kafka and power bi
What about Clickhouse?
Very surprised Apache Spark is not mentioned here.
Same..
I guess you have not tried Exasol (analytical database, arguably the fastest in the market).
Hello! Thank you for your invaluable video! I find it extremely useful for beginners! I would like to ask about one thing regarding Data Engineer Career. I learnt Pandas in terms of Data Wrangling and Transformation. Therefore, how about Pandas for Data Engineers? Is it useful tool for ETL/ELT transformations? Obviously, the next step will be PySpark, but I would like to start learninig Pandas. It seems it is a good path for the next one. What do you think about it ? I would appreciate it if you could share your views about it.
I want to be data engineer but still not good in programming language tried a lot python just know SQL how can I be data engineer
Is learning informatica worth it?
If you are applying for a job that uses it, then yes. I'm sure there are still many companies that use it.
ETL doesn't care what the destination is. The expression "Reverse ETL" makes no sense, it's still an ETL process.
I agree that the term is a bit odd, but that's what has stuck as of today. Another term you might see used to describe that process is "Operational Analytics"
Apache airflow gets older, lots of problems in production
Python..
👍🏻👌🏻💯%
you just list out, half of the data team
(Devops Engineer, Data Engineer, DBA, SQL Developer, Server Executive, Data Analyst, Business Analyst),
You dont need to learn the all of this to be data engineer...
Companies in 2022 still running SQL Server with SSIS and SSAS :D
Surprised there was no mention of Pandas.
That's a good one too. I personally haven't used Pandas much but I know others do.
Or Spark/Databricks