- Видео 185
- Просмотров 110 403
Upsolver
Добавлен 7 ноя 2017
Upsolver is the cloud-native data movement solution that’s designed to make it easy for data developers to deliver high volume, complex prod data to downstream users, without delay or quality blindspots. By decoupling compute instances from data and metadata storage, we guarantee highly reliable ingestion workflows that are fully auto-healing and never miss a beat.
Upsolver also provides a shared storage management layer built on Apache Iceberg, fully-fitted with declarative-SQL powered workflows, continuous optimization, and data and metadata observability.
Upsolver also provides a shared storage management layer built on Apache Iceberg, fully-fitted with declarative-SQL powered workflows, continuous optimization, and data and metadata observability.
The Future of Apache Iceberg | Experts Panel in London
Moderated by Santona Tuli, Ph.D., this conversation dives deep into the trends, challenges, and innovations shaping the next phase of Iceberg’s evolution. From metadata management to real-world use cases, this panel covers it all, offering a unique blend of technical expertise and forward-looking insights.
Recorded live at the Chill Data Summit in London, our Panelists included:
🔵 Ryan Dolley, Vice President of Product Strategy at GoodData
🔵 Chris Tabb, Co-Founder & CCO at Leit Data
🔵 Hugo Lu, Founder at Orchestra
🔵 Yoni Eini, CTO & Co-Founder at Upsolver
🔵 JB Onofré, Board Member of the Apache Software Foundation & Principal Software Engineer at Dremio
Whether you're an engineer, data architect...
Recorded live at the Chill Data Summit in London, our Panelists included:
🔵 Ryan Dolley, Vice President of Product Strategy at GoodData
🔵 Chris Tabb, Co-Founder & CCO at Leit Data
🔵 Hugo Lu, Founder at Orchestra
🔵 Yoni Eini, CTO & Co-Founder at Upsolver
🔵 JB Onofré, Board Member of the Apache Software Foundation & Principal Software Engineer at Dremio
Whether you're an engineer, data architect...
Просмотров: 129
Видео
The Importance of Data Modeling in Decision Making & System Design | Presentation by Seda Kocak
Просмотров 1842 месяца назад
In this session from the Chill Data Summit in London, Seda Kocak, Senior Data Analyst at The Dot Collective explores the importance of data modeling when it comes making decisions and designing systems. Seda explains how effective data modeling lays the foundation for sound decision-making and robust system architecture, helping organizations unlock the full potential of their data. Through rea...
Change Data Capture to Apache Iceberg | Presentation by Santona Tuli Ph.D.
Просмотров 5382 месяца назад
In this engaging presentation from the Chill Data Summit in London, Santona Tuli tackles the topic of Change Data Capture (CDC) in Apache Iceberg. CDC plays a crucial role in real-time data processing, enabling data updates and accurate analytics in dynamic environments. Santona walks us through how Apache Iceberg supports CDC to handle data changes efficiently and reliably, covering key use ca...
Analytical Data Transformations with Apache Iceberg Materialized Views | Presentation by Jan Kaul
Просмотров 1832 месяца назад
Watch this presentation from Jan Kaul, Founder and CEO at Dashbook, as he takes a deep dive into analytical data transformations with Iceberg materialized views in this informative session recorded live at the Chill Data Summit in London. In this presentation, Jan explores how Apache Iceberg leverages materialized views to simplify and optimize data transformations, providing a seamless experie...
The Future of Apache Iceberg & Navigating it with Polaris | Presentation by JB Onofré
Просмотров 3282 месяца назад
Watch this keynote from JB Onofré, Principal Software Engineer at Dremio and Member of the Board of Directors of The Apache Software Foundation. JB opened our Chill Data Summit event in London with his presentation on the future of Apache Iceberg and Apache Polaris (Incubating). In this talk, JB explains the components of the data lakehouse, exploring how the query engine, catalog, table format...
What Does the Future Hold for Apache Iceberg? | Experts Panel in San Francisco
Просмотров 1042 месяца назад
Enjoy the panel discussion from the Chill Data Summit in San Francisco, where leading minds in the data and open-source space debate the future of Apache Iceberg, a cutting-edge solution for managing large-scale data lakes. Moderated by Santona Tuli, Ph.D., this conversation dives into the trends, challenges, and innovations shaping the next phase of Iceberg’s evolution. From metadata managemen...
Apache Iceberg and Building in the Open | Presentation by Holden Karau
Просмотров 1172 месяца назад
In this engaging and honest talk, @HoldenKarau-an Open Source Engineer, speaker, author, and Apache Spark Committer-shares her wealth of experience working on open source software (OSS) projects. With a background in contributing to and maintaining some of the most widely used open-source frameworks, Holden offers a unique perspective on the realities of working in the OSS community. In this se...
REST Catalogs in Apache Iceberg | Presentation by Lisa Cao
Просмотров 4092 месяца назад
In this insightful talk, Lisa N. Cau, a leading contributor to the data community, explores the emerging role of REST catalogs in simplifying data management in an Apache Iceberg lakehouse. With her extensive experience in building scalable data solutions, Lisa delves into how REST-based catalogs offer a flexible and accessible way to interact with metadata, making it easier to manage large-sca...
Apache Iceberg, Arrow, Substrait, and the Inescapable Power of Open | Presentation by Jacques Nadeau
Просмотров 6142 месяца назад
In this presentation, Jacques Nadeau-co-creator of Apache Arrow and a visionary in the open-source data ecosystem-dives deep into the potential of open technologies. Jacques explores how projects like Apache Iceberg, Arrow, and Substrait are reshaping the future of data processing and analytics, starting with a look at Databricks' acquisition of Tabular and where that leads now. With his extens...
Apache Iceberg and the Deconstructed Database | Keynote by Julien Le Dem
Просмотров 1,7 тыс.2 месяца назад
Watch this keynote from Julien Le Dem, a leading voice in the open-source community, for an insightful view of Apache Iceberg and the concept of the deconstructed database. Julien, known for his contributions to projects like Apache Parquet and his pioneering work in data architecture, explores how Iceberg is transforming data lakes by offering schema evolution, partitioning, and efficient data...
Change Data Capture (CDC) to Snowflake
Просмотров 1263 месяца назад
Change Data Capture (CDC) to Snowflake
Ready to learn Apache Iceberg?
Просмотров 1424 месяца назад
Chill Data Summit on tour is here! Join us in San Francisco, London, New York or Tel Aviv.
Part 13 - Data quality validation - Hive to Iceberg Tables Migration eLearning Module
Просмотров 645 месяцев назад
🛠️ Considering Iceberg Lakehouse ? Book a free consultation with an expert here: www.upsolver.com/discover 🎓 Watch additional Iceberg eLearning modules here: www.upsolver.com/resources/iceberg-academy Description: In this video, we explore the benefits of using Iceberg, including straightforward SQL query comparisons for source and target data, monitoring schema evolution over time, and identif...
Part 12 - Post migration considerations - Hive to Iceberg Tables Migration eLearning Module
Просмотров 345 месяцев назад
🛠️ Considering Iceberg Lakehouse ? Book a free consultation with an expert here: www.upsolver.com/discover 🎓 Watch additional Iceberg eLearning modules here: www.upsolver.com/resources/iceberg-academy Description: After migrating from Hive to Iceberg, it's crucial to monitor key metrics to ensure smooth operations. Check the ingestion rate for any drops or conflicts, as Iceberg's snapshot-orien...
Part 11 - Choosing the ideal strategy for you - Hive to Iceberg Tables Migration eLearning Module
Просмотров 405 месяцев назад
🛠️ Considering Iceberg Lakehouse ? Book a free consultation with an expert here: www.upsolver.com/discover 🎓 Watch additional Iceberg eLearning modules here: www.upsolver.com/resources/iceberg-academy Description: In this video, we summarize the various strategies for migrating from Apache Hive to Apache Iceberg, breaking down the options into manageable steps. From in-place snapshot migration,...
Part 10 - Selective migration - Hive to Iceberg Tables Migration eLearning Module
Просмотров 415 месяцев назад
Part 10 - Selective migration - Hive to Iceberg Tables Migration eLearning Module
Part 9 - Mirror migration - Hive to Iceberg Tables Migration eLearning Module
Просмотров 645 месяцев назад
Part 9 - Mirror migration - Hive to Iceberg Tables Migration eLearning Module
Part 8 - Duplicate migration - Hive to Iceberg Tables Migration eLearning Module
Просмотров 595 месяцев назад
Part 8 - Duplicate migration - Hive to Iceberg Tables Migration eLearning Module
Part 7 - In Place (metadata only) migration - Hive to Iceberg Tables Migration eLearning Module
Просмотров 1135 месяцев назад
Part 7 - In Place (metadata only) migration - Hive to Iceberg Tables Migration eLearning Module
Part 6 - Migration strategies - Hive to Iceberg Tables Migration eLearning Module
Просмотров 615 месяцев назад
Part 6 - Migration strategies - Hive to Iceberg Tables Migration eLearning Module
Part 5 - Migration considerations - Hive to Iceberg Tables Migration eLearning Module
Просмотров 595 месяцев назад
Part 5 - Migration considerations - Hive to Iceberg Tables Migration eLearning Module
Part 4 - The Iceberg difference - Hive to Iceberg Tables Migration eLearning Module
Просмотров 925 месяцев назад
Part 4 - The Iceberg difference - Hive to Iceberg Tables Migration eLearning Module
Part 3 - Challenges with Hive based data lakes - Hive to Iceberg Tables Migration eLearning Module
Просмотров 945 месяцев назад
Part 3 - Challenges with Hive based data lakes - Hive to Iceberg Tables Migration eLearning Module
Part 2 - Why migrate to Iceberg - Hive to Iceberg Tables Migration eLearning Module
Просмотров 2935 месяцев назад
Part 2 - Why migrate to Iceberg - Hive to Iceberg Tables Migration eLearning Module
Part 1 - Intro - Hive to Iceberg Tables Migration eLearning Module
Просмотров 1375 месяцев назад
Part 1 - Intro - Hive to Iceberg Tables Migration eLearning Module
Part 14 - Testing - Hive to Iceberg Tables Migration eLearning Module
Просмотров 285 месяцев назад
Part 14 - Testing - Hive to Iceberg Tables Migration eLearning Module
Part 9 - Iceberg Table Services - Building Iceberg Lakehouse With Spark - eLearning Module
Просмотров 637 месяцев назад
Part 9 - Iceberg Table Services - Building Iceberg Lakehouse With Spark - eLearning Module
Part 8 - Optimistic Concurrency - Building Iceberg Lakehouse With Spark - eLearning Module
Просмотров 717 месяцев назад
Part 8 - Optimistic Concurrency - Building Iceberg Lakehouse With Spark - eLearning Module
Part 7 - Deleting Rows - Building Iceberg Lakehouse With Spark and Upsolver - eLearning Module
Просмотров 827 месяцев назад
Part 7 - Deleting Rows - Building Iceberg Lakehouse With Spark and Upsolver - eLearning Module
Part 6 - Understanding How Manifests Work (Create/Insert) - Building Iceberg Lakehouse With Spark
Просмотров 667 месяцев назад
Part 6 - Understanding How Manifests Work (Create/Insert) - Building Iceberg Lakehouse With Spark
So which rest catalog is the best choice for an Enterprise?
There are a whole lot of assumptions made here that, in my experience, just are not true. Eg. processing is almost always much more expensive than storage, so no cost savings there by saving on space at the expense of processing. Also, the warehouse is not a business model at all. Seems this is from a very narrow perspective of someone who focuses mainly on data science, rather than on data engineering. There is huge value in having a centralised location for data where the definitions are the same, the data is enriched from many sources into a cohesive and unified view of the data, much less duplication of data (so the marginal cost savings of space in this presentation are nullified anyway). The main reason that this approach is popular, at the moment, is that people don't have to wait so long to access the data and they can run off an do their own empire building in a silo. This is great for the individual, but not great at all for the business as a whole. Data scientists are used to mass duplication of data and massive processing costs in building their models, so this is normal for them. There is big trouble ahead for those who are designing everything around the very specific needs of machine learning. That is only a small part of the whole.
I have a pyspark glue job in aws, that is charge of compacting my iceberg table, it is using iceberg procedure to compact the table. It's job has been taking more than 4 hours running, i have been getting timeout, is there any efficient way to compact a table without spark ?
This conversation is totally biased because you guys are taking assumptions Iceberg is the de facto standard. And this is a narrative the Iceberg community is pushing in total disregard for what happening: - AWS support for Delta and Hudi was released in Redshift Spectrum before the support for Iceberg. When Iceberg support was added last year it was in read only. - Snowflake support Iceberg in read/Write and Delta in read only. - GCP BigQuery support both Iceberg and Delta - Oracle support Delta and Delta Sharing - Salesforce doesn’t only support Iceberg but also Delta and have Zero ETL integration with the Databricks platform. - Microsoft Fabric is built on top of Delta - OneHouse which is contributing a lot on Hudi released the Apache XTable to handle this interoperability between the 3 tables format and they’re working with Databricks on the Delta UniForm capability - and the list goes on… Databricks have more open track records than Snowflake, who only recently started to open up to the OSS community as a desperate move to stop the customer moving to Databricks. Delta with the Delta Kernel makes it easier to integrate with the Delta spec and support in a more uniform way Delta capabilities. In Iceberg one query engine might support version 1 and some capabilities of Version 2, or another only partially supporting version 2, which bring is set of challenges for customers! Iceberg won on the community side against the 2 other formats, and have used that to gain more momentum. But features wise, there are more innovations happening on the 2 other formats, than on Iceberg. Iceberg has some great features, but is that plus having a bigger community enough to claim it’s the de facto standard? Not sure about that. It might be time for the Iceberg community to stop playing politics and works with the 2 other communities to end this table format war, which will be the best way moving forward to have total interoperability. Do you guys remember the time when different cellular operator was using different technology, and peoples was having different phones to be able to contact friends, family and colleagues. Then when GSM was adopted, how easy it became moving from operator or from one state to another or traveling abroad? Now nobody thinks about it and we’re all using GSM world wide. Why can we have the same in data engineering?
Newbie here! Would a catalog be equivalent to the database schema?
Great question! A catalog is a wrapper around the Iceberg REST API which allows us to make commits to an Iceberg lakehouse. Iceberg comes with a REST catalog, and there are proprietary extensions to this catalog that can enrich the lakehouse experience with additional features such as data tag management, data governance through RBAC etc. And yes, part of the catalog's job is to expose tables in the lakehouse, along with their schema and statistics, to users.
Other than the Iceberg open data format I was unable to notice anything else I could reference as “open” from the suggested solution architecture.
Well, Iceberg is a spec, not a piece of software. It existed before Tabular and will exist after the acquisition. I'm really scratching my head about its long-term goal. I also wonder what Tabular customers have been told and if that product will continue to exist.
pls bro fix your mic, btw good video
Could you please show the reverse process, i.e., from athena to kafka cluster.
We do not currently support Kafka as a target for ingestion/transformation jobs however we are happy to consider it for a future release. Would you be open to discussing your use case with us?
Great to hear these amazing announcements from Ori and Yoni about Upsolver's new features to support Apache Iceberg, announced at the Chill Data Summit NYC. What a brilliant first event 🎉
should we create real-time streaming data solutions?
How is this different than snowflake data warehouse and sql?
Hello can we have the datasets??
I was exploring this tool and this tools looks good. But, there is no sufficient materials online. I think more quality videos are needed. A good demo with various examples are mandatory.