Dremio
Dremio
  • Видео 566
  • Просмотров 9 361 474
Whoop's Carlos Peralta on Building a Data-Driven Culture at Whoop and Moderna
Data Disruptors - A Podcast for Data Leaders. Listen to Carlos Peralta, Whoop Data Leaders disrupting technologies giving the company access to cutting-edge insights on Building a Data-Driven Culture at Whoop and Moderna: Collaboration and Alignment.
In this episode, Tomer Shiran interviews Carlos Peralta, MLOps & Data Eng Global Director, WHOOP discusses his experience building data platforms and driving a data-driven culture. He emphasizes the importance of collaboration between technical teams and business stakeholders, as well as the need for data quality and integrity. Carlos also highlights the challenges of model interpretability and fairness, as well as automated feature engineerin...
Просмотров: 52

Видео

EP57 - From Hadoop & Hive to Minio & Dremio: Moving Towards a Next Gen Data Architecture
Просмотров 11912 часов назад
Legacy data platforms often fall short of the performance, processing and scaling requirements for robust AI/ML initiatives. This is especially true in complex multi-cloud (public, private, edge, airgapped) environments. The combined power of MinIO and Dremio creates a data lakehouse platform that overcomes these challenges, delivering scalability, performance and efficiency to ensure successfu...
Hands-on with Dremio #5 - Dremio with Python and BI Tools
Просмотров 11216 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Hands-on with Dremio #4 - Apache Iceberg and Git-for-Data
Просмотров 7716 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Hands-on with Dremio #3 - Data Reflections
Просмотров 5916 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Hands-on with Dremio #2 - Preparing Data Across Sources (Joins, Type Conversions, Drop Columns, etc)
Просмотров 4616 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Hands-on with Dremio #1 - Setup and Source Connections
Просмотров 7016 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Dremio Hands-On Demo #8 - BI Dashboards with Dremio
Просмотров 5219 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Dremio Hands-on Demo #7 - Using Dremio in Python
Просмотров 5419 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Dremio Hands-On Demo #6 - Apache Iceberg & Git-for-Data
Просмотров 3619 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Dremio Hands-On Demo #5 - Reflections
Просмотров 6819 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Dremio Hands-On Demo #4 - Preparing Data (Change Data Types, Drop Columns, Joins)
Просмотров 4719 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Dremio Hands-On Demo #3 - Connecting Sources
Просмотров 3619 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Dremio Hands-On Demo #2 - Running SQL & Creating Views
Просмотров 5319 часов назад
In this video series, Dremio's Alex Merced tours the basics of working with the Dremio Lakehouse Platform. Repo with Environment: github.com/developer-advocacy-dremio/dremio-demo-env-092024 Get Started with Dremio: www.dremio.com/get-started?
Dremio Hands-on Demo #1 - Environment Setup (Docker Compose)
Просмотров 7219 часов назад
Dremio Hands-on Demo #1 - Environment Setup (Docker Compose)
Special Edition 1 - Apache Iceberg Q&A
Просмотров 23714 дней назад
Special Edition 1 - Apache Iceberg Q&A
Hands-on with Apache Iceberg on Your Laptop: Deep Dive with Apache Spark, Nessie, Minio, Dremio...
Просмотров 41214 дней назад
Hands-on with Apache Iceberg on Your Laptop: Deep Dive with Apache Spark, Nessie, Minio, Dremio...
EP56 - What’s New in Dremio: Improved Automation, Performance + Catalog for Iceberg Lakehouses
Просмотров 13314 дней назад
EP56 - What’s New in Dremio: Improved Automation, Performance Catalog for Iceberg Lakehouses
Dremio Demo - Federated Queries Joining Mongo and Postgres Data (Breaking Data Silos)
Просмотров 11921 день назад
Dremio Demo - Federated Queries Joining Mongo and Postgres Data (Breaking Data Silos)
Real-Time Analytics Across Data Sources Using Dremio
Просмотров 29521 день назад
Real-Time Analytics Across Data Sources Using Dremio
The Iceberg REST Catalog - Meetup - Tampa Bay Data Engineers Group
Просмотров 202Месяц назад
The Iceberg REST Catalog - Meetup - Tampa Bay Data Engineers Group
End-to-End Data Engineering from CSV/JSON/Parquet to Apache Iceberg to Apache Superset Dashboard
Просмотров 441Месяц назад
End-to-End Data Engineering from CSV/JSON/Parquet to Apache Iceberg to Apache Superset Dashboard
A Git Like Experience for Data Lakes
Просмотров 174Месяц назад
A Git Like Experience for Data Lakes
Cyber Lakehouse for the AI Era, ZTA and Beyond
Просмотров 100Месяц назад
Cyber Lakehouse for the AI Era, ZTA and Beyond
EP55 - Unite Data Across Dremio, Snowflake, Iceberg, and Beyond
Просмотров 178Месяц назад
EP55 - Unite Data Across Dremio, Snowflake, Iceberg, and Beyond
EP54 - Mastering Semantic Layers: The Key to Data-Driven Innovation
Просмотров 415Месяц назад
EP54 - Mastering Semantic Layers: The Key to Data-Driven Innovation
Tampa Bay Data Engineers Group - July 2024 - Dremio's Reflections
Просмотров 2042 месяца назад
Tampa Bay Data Engineers Group - July 2024 - Dremio's Reflections
Unifying On-Prem and Cloud Data with Dremio: Cloud Data on Snowflake + On-Prem Data with Minio
Просмотров 2012 месяца назад
Unifying On-Prem and Cloud Data with Dremio: Cloud Data on Snowflake On-Prem Data with Minio
Apache Iceberg Lakehouse crash course
Просмотров 7392 месяца назад
Apache Iceberg Lakehouse crash course
EP53 - Build the next-generation Iceberg lakehouse with Dremio and NetApp
Просмотров 3742 месяца назад
EP53 - Build the next-generation Iceberg lakehouse with Dremio and NetApp

Комментарии

  • @santhoshsandySanthosh
    @santhoshsandySanthosh День назад

    Is that hive table sub partioned on date string ? And then further bucketing.. would drastically reduce those 3 million files ... Without much info on that structure could not agree on performance issues in this example..

  • @rosko1971
    @rosko1971 5 дней назад

    the audio is very very soft. even at 100%

  • @rosko1971
    @rosko1971 5 дней назад

    the audio is very very soft. even at 100%

  • @lynnwilliam
    @lynnwilliam 8 дней назад

    Can you do more of an intro ? Telling people what your tools does ? Tool looks amazing but it was only at the end I found out what it did.

  • @lynnwilliam
    @lynnwilliam 8 дней назад

    Audio is too low, can't hear much even at max volume and it's crackled

    • @Dremio
      @Dremio 8 дней назад

      May have to re-record this, not sure which of my microphones it’s trying to use that much that it’s overdriving the audio, I’ll double check and re-record another iteration of this series, probably as one longer video.

  • @guilhermeranieri8445
    @guilhermeranieri8445 14 дней назад

    Great content as always! Can you do some demo talking about the engines inside dremio config?

  • @elvisasihene2403
    @elvisasihene2403 15 дней назад

    Thanks Alex! I have been watching most of your video and blog tutorials. You have really improved my knowledge in Datalakehouse, Apache Iceberg, and Dremio. However, I can't find any detailed enterprise BI cloud solution in Dremio Solar and Arctic. Can you please take us through an enterprise cloud BI solution using the Azure Data Lake Gen2 as the storage layer? Otherwise, could you point me to an existing blog on the Dremio website? Regarding this tutorial, do I really need Spark since I can accomplish all my tasks in Dremio? Cheer!

    • @Dremio
      @Dremio 15 дней назад

      Nope, I could do everything in Dremio or other tools. I was just demonstrating how different tools can work with a single copy of the data in Iceberg. Regarding BI tools, what tool are you looking to use. When using Dremio connecting to a BI tool is exactly the same regardless of storage layer. Essentially, you connect your sources to Dremio then Dremio to your BI tool and Dremio can serve all your data to that BI tool on a single connection.

    • @elvisasihene2403
      @elvisasihene2403 8 дней назад

      @@Dremio Well noted and thank you! My BI tool is PowerBI.

  •  16 дней назад

    So to understand/use iceberg you also need 20 other components on top of running hadoop and all of the things that that requires? Seems like an... "improvement"? aka. Hadoop should have never been turned into a data warehouse to begin with.

    • @Dremio
      @Dremio 16 дней назад

      Hadoop wasn't used in this video, and you don't need all the tools shown. A Variety of tools were used in the demo to demonstrate the portability of the data.

  • @nawazishkhan9230
    @nawazishkhan9230 17 дней назад

    Very poor sound quality!

  • @JulioTorresM
    @JulioTorresM 17 дней назад

    First comment

  • @rembautimes8808
    @rembautimes8808 17 дней назад

    Very nice and concise 😂

  • @SonuKumar-fn1gn
    @SonuKumar-fn1gn 19 дней назад

    Great video ❤

  • @rembautimes8808
    @rembautimes8808 20 дней назад

    Very nice tutorial, am beginning to see how branching is useful

  • @user-if2kq8nh8m
    @user-if2kq8nh8m 23 дня назад

    This was super helpful!

  • @clintonchikwata4049
    @clintonchikwata4049 26 дней назад

    Amazing ---

  • @ahmedshamma
    @ahmedshamma Месяц назад

    Instructor talks too fast that difficult to cop with specially for english as a second language speaker. I advice the instructor to slow down to be able to convey his message clearly , after all thank you for the architectural lecture about data lakehouse as it is a confusing term for many of us.

  • @gregorywpower
    @gregorywpower Месяц назад

    Pretty awesome talk! I would love to see geospatial formats be supported in Apache Iceberg, but I'm glad that this fork exists!

  •  Месяц назад

    There are so many tools used here my head hurts. For "getting data into dwh" how can this possibly be seen as better than classic parque files in hive-partitioned storage with an external table on? In gcp this is 5 lines of sql code, in redshit you might have to add Glue to the mix. And then any sql-running tool can do the rest inside the database/dwh. In 99% of companies theres at best 3-4 data engineers, managing all this + data modelling would mean nothing ever gets done.

  • @ettaroo
    @ettaroo Месяц назад

    Alex is the man - explains these things and the context so well.

  • @Matt-n9l1l
    @Matt-n9l1l Месяц назад

    Excellent overview - thanks! - My question is about how best to get that first "Raw Copy" into Iceberg. Using the SQL Connector I can hook Dremio up to MS-SQL data sources, and they are then available for reflections - but I don't see any clear way to copy the data into an Iceberg table that would persist (in the case that the source data became unavailable) am I missing something?

    • @Dremio
      @Dremio Месяц назад

      If you connect an Iceberg catalog source (hive, Nessie) you can write data to it, so you can use syntax like CTAS to write the data and it will be persisted in the configured storage when you connected the catalog. You’d have to orchestrate workloads so it updates the new iceberg table from the source system or you can use an external tool to land the data as iceberg into your lake like spark, Flink, Upsolver, Airbyte, etc.

    • @guilhermeranieri8445
      @guilhermeranieri8445 Месяц назад

      @@Dremio Using airbyte to do the ingestion, the first iceberg table created using nessie will automatically update itself?

    • @Dremio
      @Dremio Месяц назад

      @@guilhermeranieri8445 depends on how you are doing, how are you ingesting with Airbyte, directly into Iceberg?

    • @guilhermeranieri8445
      @guilhermeranieri8445 15 дней назад

      @@Dremio using minio. I send data to minio and manipulate data on spark, but already created views on dremio for testing

  • @smokindave74
    @smokindave74 Месяц назад

    How does Dremio support the Iceberg Rest Catalog........I cannot seem to find a way, other than Nessie to create an S3 resident Iceberg table in Dremio......

    • @Dremio
      @Dremio Месяц назад

      You could also use AWS Glue, but the rest catalog connector is coming soon in Dremio is on the horizon.

  • @recs8564
    @recs8564 2 месяца назад

    Why is the spark configuration with all of the lakehouse services hardcoded in a notebook? Shouldn’t these configurations be incorporated into the docker image you’re using for Spark?

    • @Dremio
      @Dremio 2 месяца назад

      I do that primarily for educational purposes to help people learn the spark configs so they can apply the learning to their environment. Many tutorials abstract configs then when people try to apply what they learned they don't know what the configs are or where they come from. - Alex

  • @recs8564
    @recs8564 2 месяца назад

    This is so nice. Now I dont have to pay for databricks in order to learn Spark!

  • @MrKamalNeel
    @MrKamalNeel 2 месяца назад

    why did they create manifest list & manifest files as two separate layer? They could have just created metadata file & manifest file for each snapshot. What challenges we could have if we don't have manifest list in iceberg design

    • @Dremio
      @Dremio 2 месяца назад

      the reason being as tables get larger that manifest list if it listed files would get larger and take longer to traverse. By breaking it up into manifests you can more efficiently only scan the portions of file listing you need for the query. Using partition pruning I may have 100 manifests in a snapshot but only need to scan 10 relevant to the query resulting in much faster scan planning. Also these manifests can be reused, so you use up less storage space not having to rewrite lists of of files that have already been listed in a pre-existing manifest.

  • @databro1991
    @databro1991 2 месяца назад

  • @AshikaUmanga
    @AshikaUmanga 2 месяца назад

    Thanks for the tutorial. If I use CDC based ingestion as the data source, where the Spark Job for writing Iceberg Table ( steamWrite) runs on ? Is it Inside Airbyte?

  • @ledinhanhtan
    @ledinhanhtan 2 месяца назад

    Thanks!!

  • @user-if2kq8nh8m
    @user-if2kq8nh8m 2 месяца назад

    This was really helpful, thanks!

  • @swaroopsuki1322
    @swaroopsuki1322 2 месяца назад

    when we expire the snapshot fi our table created copy-on-write ot merge-on-read the what happen in that case.

    • @Dremio
      @Dremio Месяц назад

      Same thing, if a file is associated with a valid snapshot it will not be deleted.

  • @AhmedShamma-q3n
    @AhmedShamma-q3n 2 месяца назад

    The infrastructure is knowledgeable and able to address the new technology in a detailed pace. the challenge that I can see the instructor's pronunciation is not clear in some points. I advise that the instructor articulate the idea in clear pronunciation, as sometime he is too fast and some times too slow.

  • @roman87ljp
    @roman87ljp 2 месяца назад

    great presentation! just one question. In dremio, which would be the case scenario where a view WON'T need reflections activated? (and view has many columns and high volume)

    • @Dremio
      @Dremio 2 месяца назад

      I would actually default to not activating reflections, the Dremio engine is very performant on all sources directly. Dremio has a reflection recommended feature that'll analyze query history and identify reflections you should activate and even give the SQL to create them.

  • @pilarriush.9373
    @pilarriush.9373 2 месяца назад

    Pregunta. ¿Necesitas realizar alguna conexión desde Dremio hacia Docker? A mi se me presenta el error Invalid staging location provided Al momento de cargar los archivos. ¡Por favor, ayuda!

    • @Dremio
      @Dremio 2 месяца назад

      ¿Tiene más detalles sobre a qué está intentando conectarse desde Dremio? Si solo está intentando evaluarlo con Docker, siga las instrucciones de este blog. www.dremio.com/blog/intro-to-dremio-nessie-and-apache-iceberg-on-your-laptop/

  • @pilarriush.9373
    @pilarriush.9373 2 месяца назад

    ⚠⚠Pregunta. ¿Necesitas realizar alguna conexión desde Dremio hacia Docker? A mi se me presenta el error Invalid staging location provided Al momento de cargar los archivos. ¡Por favor, ayuda!

  • @pilarriush.9373
    @pilarriush.9373 2 месяца назад

    ⚠⚠Pregunta. ¿Necesitas realizar alguna conexión desde Dremio hacia Docker? A mi se me presenta el error -Invalid staging location provided- Al momento de cargar los archivos. ¡Por favor, ayuda!

  • @neilgodfree22
    @neilgodfree22 2 месяца назад

    How do you persist the data using volumes? When container is deleted I don't want to recreate everything. Please send me link to solution if there is one.

    • @Dremio
      @Dremio 2 месяца назад

      github.com/developer-advocacy-dremio/dremio-compose look through different version docker-compose files here may find some examples that help.

  • @AxelNtwari
    @AxelNtwari 2 месяца назад

    nice overview

  • @stephanierandall1170
    @stephanierandall1170 3 месяца назад

    🐬 fan already, next please show their corporate headquarters

  • @stephanierandall1170
    @stephanierandall1170 3 месяца назад

    🔥

  • @zmihayl
    @zmihayl 3 месяца назад

    Your voice is like an angel to fall asleep😇

  • @santhoshreddykesavareddy1078
    @santhoshreddykesavareddy1078 3 месяца назад

    Hi thanks this is really a great information to start with Apache Iceberg. But I have a question, when modern databases are already doing it with so much advance technology to prune and scan the data, why would we need to store the data in files format instead of directly loading them to a table ?

    • @Dremio
      @Dremio 3 месяца назад

      When you start talking about 10TB+ datasets yo run into issues on whether database can hold the dataset and performantly. Also different purposes need different tools so you need your data in a way that be used by different teams with different tools.

    • @Dremio
      @Dremio 3 месяца назад

      Also with data lakehouse tables there doesn’t have to be any running database server when no one is querying the dataset since they are just files in storage while traditional database tables need a persistently running environment.

    • @santhoshreddykesavareddy1078
      @santhoshreddykesavareddy1078 3 месяца назад

      @@Dremio wow! Now I have got full clarity. Thank you so much for your response.

    • @santhoshreddykesavareddy1078
      @santhoshreddykesavareddy1078 3 месяца назад

      @@Dremio cost saving. Thanks for the tip 😀.

  • @intjprogrammer3877
    @intjprogrammer3877 3 месяца назад

    Thanks for the great video. Question: when we first the DELETE command in the lesson2 branch, does the data also appear in minio ? Like, does minio object storage shows both lesson2 branch and main branch separately ? I am curious this because on minio, there is only data and metadata partition, and there is not directory for main vs lesson2 branch.

    • @intjprogrammer3877
      @intjprogrammer3877 3 месяца назад

      I think I got it now. Storage layer does not have this concept of branches, so in the waraehouse/data/ directory, it stores parquet files both lesson2 branch and main branches. I can tell this because there are files with different timestamp associated with my sql operations in each branch.

  • @nooh_jl
    @nooh_jl 3 месяца назад

    Thank you so much! I have a question. I'm wondering if there might be any way to do these procedures automatically in Iceberg. Do I have to do these things in person every time?

    • @Dremio
      @Dremio 3 месяца назад

      Dremio Cloud has the ability to automate these types of operations

  • @nooh_jl
    @nooh_jl 3 месяца назад

    it's really helpful for me!! Thank you so much

  • @ZaidAlig
    @ZaidAlig 3 месяца назад

    Hi Alex, Really tankful to you for such nice explanation and handson. I got stuck at 'CREATE BRANCH IF NOT EXISTS lesson2 IN nessie' . This keeps failing with error message "syntax error at or near 'BRANCH'". Am I missing something? Kindly assist.

    • @Dremio
      @Dremio 3 месяца назад

      If you want pm me (Alex Merced) your spark configs. Usually it’s a typo or an update that needs to be made the spark configs. Spark can be very touchy on he config side which is one reason using Dremio for a lot of iceberg operations is so nice (much easier).

  • @kenhung8333
    @kenhung8333 3 месяца назад

    Awsome Video !! At 3:18 when explaining different delete format I have question regards to the implementation : As the delete mode only accept MOR or COW , how exactly do I specify the delete operation to use Equality delete or Positional delete ??

    • @Dremio
      @Dremio 3 месяца назад

      It’s mainly based on the engine, most engines will use position delete but streaming platforms like Flink will use equality deleted to keep write latency to a minimum

  • @mdafazal12
    @mdafazal12 3 месяца назад

    very well explained...great job Dipankar

  • @agrohe21
    @agrohe21 3 месяца назад

    Great explanation and details

  • @joeingle1745
    @joeingle1745 3 месяца назад

    Great article Alex. Slight issue creating a view in Dremio, I get the following exception "Validation of view sql failed. Version context for table nessie.names must be specified using AT SQL syntax". Nothing obvious in the console output, any ideas?

    • @AlexMercedCoder
      @AlexMercedCoder 3 месяца назад

      That means the table is in Nessie and it needs to know which branch your using so it would be AT BRANCH main

    • @joeingle1745
      @joeingle1745 3 месяца назад

      @@AlexMercedCoder Thanks Alex. This would seem to be a limitation of the 'Save as View' dialogue, as it doesn't allow me to do this and it doesn't default to the branch you're in the context of currently.