Database vs Data Warehouse vs Data Lake | What is the Difference?

Поделиться
HTML-код
  • Опубликовано: 7 июн 2024
  • Database vs Data Warehouse vs Data Lake | Today we take a look at these 3 different ways to store data and the differences between them.
    Check out Analyst Builder! www.analystbuilder.com/
    ____________________________________________
    SUBSCRIBE!
    Do you want to become a Data Analyst? That's what this channel is all about! My goal is to help you learn everything you need in order to start your career or even switch your career into Data Analytics. Be sure to subscribe to not miss out on any content!
    ____________________________________________
    RESOURCES:
    Coursera Courses:
    Google Data Analyst Certification: coursera.pxf.io/5bBd62
    Data Analysis with Python - coursera.pxf.io/BXY3Wy
    IBM Data Analysis Specialization - coursera.pxf.io/AoYOdR
    Tableau Data Visualization - coursera.pxf.io/MXYqaN
    Udemy Courses:
    Python for Data Analysis and Visualization- bit.ly/3hhX4LX
    Statistics for Data Science - bit.ly/37jqDbq
    SQL for Data Analysts (SSMS) - bit.ly/3fkqEij
    Tableau A-Z - bit.ly/385lYvN
    Please note I may earn a small commission for any purchase through these links - Thanks for supporting the channel!
    ____________________________________________
    SUPPORT MY CHANNEL - PATREON/MERCH
    Patreon Page - / alextheanalyst
    Alex The Analyst Shop - teespring.com/stores/alex-the...
    ____________________________________________
    Websites:
    Website: AlexTheAnalyst.com
    GitHub: github.com/AlexTheAnalyst
    Instagram: @Alex_The_Analyst
    ____________________________________________
    All opinions or statements in this video are my own and do not reflect the opinion of the company I work for or have ever worked for

Комментарии • 285

  • @o.nature
    @o.nature 2 года назад +409

    Alex, i don't know if you remember me. Over the past 2 years, I've been on your streams, commented on your videos, and emailed you my resume for help. I finally got a data analyst position 2 weeks ago and i love it. Thank you for everything.

  • @KahanDataSolutions
    @KahanDataSolutions 2 года назад +52

    Great breakdown of such critical concepts. I find that it's also common to get tripped up when companies decide to use such goofy internal names for their databases/lake/warehouse. To the point where you can lose sight of what it actually "is" you're being asked to work on. Being able to relate back to these foundational concepts is always a helpful exercise. Big fan of the channel btw!

  • @ItsJustTooRed
    @ItsJustTooRed 2 года назад +5

    This is great content, thanks. I've been working as an analyst/developer in anti-money laundering for over two years now, with zero tech experience going in. There are a lot of things where I've learned how to work with them without actually learning much about them, like the differences between databases and warehouses.
    This sort of short-form content is useful for quickly covering those questions I didn't even realise I had. Will definitely be referring to in the future.

  • @JoeCMath
    @JoeCMath 2 года назад +10

    Timestamps for chapters:
    0:00 Introduction
    0:35 What is a Database?
    1:13 What is a Data Warehouse?
    2:34 Key Differences between Database and Data Warehouse
    3:15 What is a Data Lake?
    4:10 Database vs. Data Warehouse vs. Data Lake
    In my previous job we worked with a Data Lake, which ended up being amazing for building general SQL skills as cleanup was needed to join the disjoint tables and get valuable results. I never looked at the exact definitions so thank you for this video!

  • @traetrae11
    @traetrae11 2 года назад +8

    Thanks for this. I already knew what a database and data warehouse was but had never heard of a data lake.

  • @umarahmed6853
    @umarahmed6853 2 года назад +5

    Short and to the point, just like i wanted. Thanks!

  • @jenniferbell514
    @jenniferbell514 2 года назад +4

    This was awesome! I'd been googling for answers to these very questions, and the visuals helped bring it into perspective. I see "data warehouse/ing" a lot in job descriptions, so it's great to get a handle on what it actually is.

  • @torontothomas8689
    @torontothomas8689 2 года назад +109

    Thanks for this! I would love to see you make a video on ETL’s and automation used in it breaking it down for rookie data analyst !

    • @gangaadhikari9491
      @gangaadhikari9491 2 года назад +6

      Yes! ETL is something I've been trying to learn and I got a new job where I will be able to learn more but having a source to learn the foundations would be great!

    • @brittanyholloman5693
      @brittanyholloman5693 2 года назад +1

      I would like to see this as well! Thank you!

    • @jonahjohnbaba
      @jonahjohnbaba 2 года назад +1

      I will love to have you do this for us at a granular level. Many thanks

    • @TheSchmed
      @TheSchmed 2 года назад +6

      I write a lot of my own ETLs, with T-SQL/powershell/cmd, etc., most very large amounts of delimited text based data into a Star schema type of data structure. I always build it to tolerate changing file structures, using a “Stage then Load” type of method, bulk load and stage to all Varchar defined columns then Dynamically build and merge (upsert) from stage to prod/fact table by stage/fact column name match via the system catalog or some type of metadata definition tables for column mapping/join and key columns to use for merging (mostly time series) data. It’s important to use features defined sample sizes (i.e. Batchsize) so as not to cause excessive T log or resource usage on the back end, especially if a shared Relational data server. Many of these vendor products cannot handle files that change, whether data type, column / header name change, etc. without the load failing, or requiring a new definition/format setup each time. I’ve actually come across one or two products that attempt a single transaction when loading very large data files to tables, that caused excessive T-log growth, and brought the DB server to a halt. I write mine so it will always attempt the load to “known” columns, then report any that were new or missing. I also define processes to “transform” based on staging to fact column data type match, one example being many files from excel sheets have this “serial” date value that needs to be cast to datetime from a 5 digit numerical value. I need to go the next step though and how to incorporate AI with it. We get tons of loan payment data to be analyzed, probably 40-50 GB a month, with 30 or so years currently stored in SQL tables, one table 4-5 TB in size, both page compressed and partitioned, used for both very selective queries and queries that process 2-4 year data samples. Fun stuff. I would love to learn new methods that successfully replace pre aggregating, that work fast and are very flexible. I’ve Worked in the past with products like Hyperion, Cognos, SSIS, etc. as well as some vendor DW products like Paladyne, but they are always an 80% solution, with the remaining 20% being 80% of the work.

  • @svenhohlfeld3483
    @svenhohlfeld3483 6 месяцев назад +2

    Thank you so much. Having an already found understanding about OLTP and OLAP systems I was always struggling to understand what this datalake thing is! Now I know: It’s simply a file system. It’s a synonym for storage. Thank you for finally explaining this. I will always point to your video as a reference. 🎉

  • @MartijnVos
    @MartijnVos 2 года назад +3

    Nice breakdown. From your explanation, I get the impression that the neo4j graph database we used on my previous project was actually a data warehouse for us, because we filled it with data we drew from many other databases, and structured it in a specific way (relations between the many different data elements from the different systems) for the reporting tool we built on top of it.
    And much of our data didn't come directly from those other databases, but drew it daily from an intermediary portal that contained all sorts of different kinda of data from different systems, which I guess was a data lake of some sort.

  • @russellchase4544
    @russellchase4544 2 месяца назад +1

    This was perfect. Short, sweet, and to the point. Thanks!

  • @mansouralshamri1387
    @mansouralshamri1387 11 дней назад +2

    I have been wondering what the difference is between these three. You explained it very well. Glad I saw this video one day before my exam.

  • @abdulrehmancheema5121
    @abdulrehmancheema5121 5 месяцев назад

    The way you break it down and make it simple is great. Thank you for such a quick and insightful video.

  • @JoshuaJMorley
    @JoshuaJMorley 2 года назад +52

    And now we see the latest iteration of a data topology, the lakehouse :)
    Great video! i really like that you mention the summarisation as a point of differentiation. Analytics on a data warehouse is typically done by a data analyst with traditional data analyst skillsets (SQL, R etc). Analytics on a data lake is typically done by a data scientist (Python, ML etc)
    An important point to note is "if you have all this data and you have no idea what to do with it" for data lake, a vital thing to focus on when creating a data lake is the structure of your data in the lake (as in directory and file structure)
    if you just dump it all in, it will become a data swamp and waste of money

    • @AlexTheAnalyst
      @AlexTheAnalyst  2 года назад +4

      For sure - I'm implementing an Azure Data Lake at my company right now and that's exactly what we are trying to avoid lol

    • @JoshuaJMorley
      @JoshuaJMorley 2 года назад

      @@AlexTheAnalyst adls gen2 using hierarchical namespaces? good technology :)

    • @MartijnVos
      @MartijnVos 2 года назад +2

      A data swamp is a great addition to the list, although it's of course the thing you want to avoid.

    • @xMastJedi
      @xMastJedi 2 года назад +1

      Better lakehouse than leakhouse :D

    • @andrij.demianczuk
      @andrij.demianczuk 2 года назад +3

      Lakehouse is a term that marries the Data Lake and the Data warehouse. It does this by adding an abstraction layer to help organize and normalize data in the data lake with a combination of a hive meta store and a Delta format.

  • @niftyoptionslivetradingand7231
    @niftyoptionslivetradingand7231 Год назад +2

    There are so many big universities around the world, but this guy made it so clear for me. You deserve to put your own University brother, thanks for enlightening me 🙏👍😊all the best 💐💐

  • @n19ence
    @n19ence Год назад +2

    Whoa, Alex with this clarity and instruction you're going to get my University ph.d instructors "Fired".

  • @blankdevs
    @blankdevs Год назад +1

    Understood the difference and I now know what I need for my own purposes.
    Amazing content kudos 🥂

  • @NdubisiOnuora
    @NdubisiOnuora Год назад

    Simple and easy to use. Great voice and extremely friendly and humble.

  • @jesselima_dev
    @jesselima_dev 2 года назад +1

    By the first time, those concepts got clear to me. Thanks!

  • @_incarnate
    @_incarnate Год назад

    Nothing but awesome!! This is very nice Alex. You won't ever know how much you have come through from me. 👏

  • @Vucci_Mane
    @Vucci_Mane 2 года назад +6

    Appreciate the video. I needed to know the difference between the data platforms since at some point I’ll be transitioning to data engineering once I studied enough.
    Also wanna take the time to say thank you so much for your vids! Been watching you since last year, your vids helped me prepared a lot for DA. As of now, I’ll soon be starting my 1st DA role with a great (I think lol) starting salary of $65k in SaaS. Would’ve been completely lost if it hadn’t been for your career insights and projects. Cheers from your HOU neighbor!!

    • @AlexTheAnalyst
      @AlexTheAnalyst  2 года назад +2

      I'm so honored to hear that! Keep up the good work!

  • @jenbac.s1858
    @jenbac.s1858 Год назад

    Great video! I just understood the differences between these key terms, thanks to your video. Something I did not grab with a very long texts written for the same purpose . Well done Alex 👏

  • @user-tc8sm4bq4x
    @user-tc8sm4bq4x 2 года назад +1

    So simple and so fast!) Thank you so much)

  • @user-fl1ip3gr1f
    @user-fl1ip3gr1f Год назад

    Than you. I am making a similar transition into becoming a Data Analyst so this background is extremely helpful
    .

  • @tund3_
    @tund3_ 8 месяцев назад

    This is an amazing summary, clear and easy to understand, especially for someone with networking and security background. Thanks a lot.

  • @user-vb1gl7tn8y
    @user-vb1gl7tn8y 2 года назад +5

    Alex, this is such a good concise description for folks! I’d love to learn more about data lakes, in my current role we have so many streams of data but apart from its initial use the data just gets lost in Excel files on shared drives.
    I’m wondering if we could leverage it better by having a centralized way of storing it. But I just don’t know that much about data lakes. Would love more content exploring this!

    • @AlexTheAnalyst
      @AlexTheAnalyst  2 года назад

      Yeah that's usually what a data lake is for - some type of centralized system

  • @jamiethesailor
    @jamiethesailor 2 года назад

    Great explanation! Been struggling with understanding the differences, this really cleared it up!

  • @Vikram_8621
    @Vikram_8621 2 года назад +1

    Thanks for simplifying, great video! 👍

  • @GuruDanny
    @GuruDanny 4 месяца назад

    Thank you - for me, I learned new concepts - had no idea about data lake. Once again thanks for sharing your knowledge.

  • @justindorsey8922
    @justindorsey8922 Месяц назад

    This was a great video. Concise and very clear. Excellent job on keeping it brief and giving all the information either. Could not have asked for better video

  • @rajkumar-xg3iy
    @rajkumar-xg3iy 2 года назад +1

    Oh. Much needed. Learing these concepts now on job

  • @user-zy7dh7mn8v
    @user-zy7dh7mn8v 10 месяцев назад +1

    Clear and short, thank you!

  • @j.nakajima9070
    @j.nakajima9070 2 года назад +1

    Thank you Alex, this was a question I had in a recent interview for a MN company in Singapore, I did not know how to answer.

  • @jal6008
    @jal6008 2 года назад +7

    Thank you very much for this video. Actually I am currently doing the Coursera Google Data Analytics Course and in there I found these words databases and data warehouse and I was really curious about the difference between them and just found your video on it.

  • @dominiqueingrid7086
    @dominiqueingrid7086 Год назад

    Short, helpful, well explained. Thanks!❤

  • @mosesvarghese4566
    @mosesvarghese4566 2 года назад +1

    Alex, it is awesome to see these videos man! Very informative.

    • @AlexTheAnalyst
      @AlexTheAnalyst  2 года назад

      Is this THE Moses Varghese?? Thank you man! :D

  • @jessechichi5609
    @jessechichi5609 3 месяца назад

    wow this nice, so much clarity and simplicity, thank you.

  • @allsin03
    @allsin03 Месяц назад

    this helped me get a better grasp on the 3. great video

  • @oyindamolatomoye6520
    @oyindamolatomoye6520 2 года назад +1

    Super helpful. Thanks Alex

  • @bigstupidgrin
    @bigstupidgrin 4 месяца назад +5

    I don't know if I'll ever be ready for a Data Ocean...

  • @JHatLpool
    @JHatLpool Год назад

    A nice, clear presentation and nice explanations of the key terms. Thanks !

  • @TrickZaddy
    @TrickZaddy 2 года назад +9

    Hi Alex, would love to hear your thoughts on the subject matter of adjusting to a new analyst position. I’m a new senior data analyst and going through a steep learning curve. Would love to hear your advice. How long does it take? Tips on being successful, your personal experience, etc. I think it would be a great video.

    • @veronicab2096
      @veronicab2096 2 года назад +1

      I’m new to a higher level data position than I’m used to. I would love this kind of content as well.

  • @myyntisuurvisiiri
    @myyntisuurvisiiri Год назад +1

    This was the best explanation in RUclips. Thanks :)

  • @raj_kundalia
    @raj_kundalia 6 месяцев назад

    It was helpful and made sense in terms of my current project too. Thank you!

  • @toshioikene8200
    @toshioikene8200 2 года назад +1

    Thanks for that clear concise explanation man.

  • @katkyle8169
    @katkyle8169 2 года назад

    This is a great video and you are a great communicator. I was having alot of trouble communicating this to my leadership team and this was really helpful!

  • @damianztone
    @damianztone 2 месяца назад

    amazing video- concise and very well explained

  • @arielspalter7425
    @arielspalter7425 8 месяцев назад

    Super useful video. Much appreciated!

  • @JH-py9wf
    @JH-py9wf 2 года назад +7

    Thanks Alex. Would love a video on how you query data from data warehouses/data lakes and how that method differs from SQL

    • @janpedersen5780
      @janpedersen5780 2 года назад +1

      A data warehouse is queried using SQL, if the data warehouse is built using a relational database technology, such as MySQL, Oracle RDBMS, SQL Server. You usually don't query data lakes. They are mainly used for AI/Data Science purposes, and since they also contain non-structured data, you would not use SQL (Structured Query Language) which is designed for relational databases.

  • @mangaart3366
    @mangaart3366 Год назад +1

    Dope video, thanks!

  • @rvipinkumar
    @rvipinkumar 2 года назад +1

    Super easy explanation. Thanks Alex.

  • @JaeHoYun
    @JaeHoYun 2 года назад +1

    Thanks Alex. This is simple and useful for me

  • @jhewitthunt
    @jhewitthunt 9 месяцев назад +9

    Nice simple video. Good job. Only negative comment is, there's no need to constantly show your face which blocks part of what you're trying to show - diagram, title, description etc...

  • @tobiasrekker5376
    @tobiasrekker5376 17 дней назад

    Thank very much, this is a very clear explanation.

  • @NomNomCactus
    @NomNomCactus 2 года назад +1

    Such a good and to the point explanation.thanks

  • @shadowitself
    @shadowitself 2 года назад

    clear, straightforward, fantastic ;)
    thx

  • @stephenjones6260
    @stephenjones6260 Год назад

    Very well done Alex!

  • @durgaprasadvadlamoodi1271
    @durgaprasadvadlamoodi1271 10 месяцев назад

    Thanks for explaining, very clear now

  • @ruchaj.5550
    @ruchaj.5550 Год назад

    Very informative,simple, easy to understand..thanks a lot

  • @guilboy
    @guilboy 6 месяцев назад

    You're really good at walking people through concepts 👏👏👏👏👏👏

  • @burinyuybongfen7857
    @burinyuybongfen7857 3 месяца назад

    Very well explained. Thank you

  • @keifer7813
    @keifer7813 2 года назад

    Awesome video. I got a question about a data lake on an interview recently and was stumped, so this was helpful

  • @mohamedeliwa1380
    @mohamedeliwa1380 7 месяцев назад

    Thank you. That helped me to understand and take care if opportunities I'm handling

  • @I_am_smooth_as_butter
    @I_am_smooth_as_butter Год назад +1

    Awesome explanation thanks

  • @mzkhan1576
    @mzkhan1576 9 месяцев назад

    thank you Sir Alex. Great and concise video.

  • @E_Gaks
    @E_Gaks 2 года назад +1

    Really instructive video ! Thank you Alex for these details. I learned a lot.

  • @amitavapaul118
    @amitavapaul118 2 года назад

    very comprehensive video..Thanks!

  • @harshalgavali
    @harshalgavali Год назад

    very helpful! thanks :)

  • @is.3846
    @is.3846 2 года назад +1

    Alex always presents points in clear fashion.

  • @vijayhpune
    @vijayhpune 4 месяца назад

    The way you explained was very simple and easy to understand
    Thanks

  • @alainb4734
    @alainb4734 Год назад

    Very well explained. Thank you for sharing.

  • @Shoto_UK
    @Shoto_UK Год назад

    Super helpful thanks.

  • @kevinmcinturf8976
    @kevinmcinturf8976 Год назад

    This is great!

  • @LLhawksley
    @LLhawksley 5 месяцев назад

    best explanation of these 3 that I've seen

  • @namanbhayani1016
    @namanbhayani1016 10 месяцев назад

    Very well explained!

  • @out_of_ends
    @out_of_ends 2 года назад

    Very informative. Thanks for sharing

  • @meryemLux
    @meryemLux Год назад +1

    Thanks, that was really well explained :)

  • @richbashaw9240
    @richbashaw9240 11 месяцев назад

    thank you. great comparison and explanation

  • @majidrasouli2841
    @majidrasouli2841 Год назад

    Much appreciated
    keep up the great actions

  • @whoopinyou
    @whoopinyou Год назад

    Great video. Subscribed.

  • @mehmetkaya4330
    @mehmetkaya4330 Год назад +1

    Great explanation! Thanks

  • @dmitrya9435
    @dmitrya9435 Год назад

    Cool stuff, short and informative.

  • @JustNavika.k
    @JustNavika.k 2 года назад

    Awesome explanation. Thank you.

  • @danicoleb5394
    @danicoleb5394 Год назад +1

    I wish Alex would make his own course. Lol! Everything is always so easy to understand.

  • @ifechukwumaduabuchi6455
    @ifechukwumaduabuchi6455 2 года назад +1

    Thanks for this video

  • @sammail96
    @sammail96 4 месяца назад

    Very great explanation

  • @marieriokeme3010
    @marieriokeme3010 2 месяца назад

    Amazing explanation

  • @nikhilgoyal007
    @nikhilgoyal007 Год назад +1

    wow! thank you!!!

  • @rajibroy1170
    @rajibroy1170 2 года назад

    Like your Explanation Sir.

  • @homaiphuonganh131
    @homaiphuonganh131 9 месяцев назад

    thanks Alex, love your channel and very clear explanation of critical concepts :) - can you also cover data lakehouse?

  • @osPA78
    @osPA78 Год назад +1

    Clear cut explanation!!! Thank you!!!

  • @Studio-oy6cu
    @Studio-oy6cu 2 года назад +1

    Such a good recommendation by youtube !
    - Thnx Alex.

  • @yochai4561
    @yochai4561 2 года назад +1

    Thanks for the explanation!
    I would expect to see some examples for each one, it makes the terms much more 'close' to us..
    Keep going with this channel, it's super helpful! Thanks again

  • @andrij.demianczuk
    @andrij.demianczuk 2 года назад +1

    Also have a look at Delta Lake. It’s about providing warehousing capabilities on a data lake, relying on a hive metastore and parquet normalization for column features. Delta lake is OSS and becoming more widely supported :)

  • @fabianaltendorfer11
    @fabianaltendorfer11 2 года назад +1

    Great. I really like your videos

  • @lazyGirl014
    @lazyGirl014 Год назад

    Thanks for your video, it is really clear and helpful information.

  • @drew315ful
    @drew315ful 2 года назад

    Hi Alex... Great video... I have a question. Is it right to say that Hadoop or HDFS is a data lake?

  • @CharlieBasta
    @CharlieBasta 2 года назад

    Great video.

  • @shibugeorge1618
    @shibugeorge1618 11 месяцев назад

    Hi Alex, Thanks for the video. It is very clear. One question, what about the schema for Data Lake ? Where it is stored ?

  • @paulofernandodemello9958
    @paulofernandodemello9958 Год назад

    very nice content!