- Видео 78
- Просмотров 156 463
endjin
Великобритания
Добавлен 16 июл 2015
We help small teams achieve big things.
We are a UK based, fully remote consultancy who specialise in Data, Analytics & AI, and Cloud Native App Dev on Microsoft Fabric & Azure, and are a .NET Foundation Corporate Sponsor.
We produce two free weekly newsletters;
☁️ Azure Weekly - azureweekly.info for all things about the Microsoft Azure Platform,
📈Power BI Weekly - powerbiweekly.info for all things data visualisation and Power Platform.
Keep up with everything that's going on at endjin via our blog:
👉 endjin.com/blog
👉 endjin.com/talks
👉 endjin
👉 www.linkedin.com/company/endjin
Information about our Open Source projects can be found at endjin.com/open-source
Find out more at endjin.com
#Microsoft #MicrosoftFabric #PowerBI #AI #Data #Analytics #DevOps #Azure #Cloud
We are a UK based, fully remote consultancy who specialise in Data, Analytics & AI, and Cloud Native App Dev on Microsoft Fabric & Azure, and are a .NET Foundation Corporate Sponsor.
We produce two free weekly newsletters;
☁️ Azure Weekly - azureweekly.info for all things about the Microsoft Azure Platform,
📈Power BI Weekly - powerbiweekly.info for all things data visualisation and Power Platform.
Keep up with everything that's going on at endjin via our blog:
👉 endjin.com/blog
👉 endjin.com/talks
👉 endjin
👉 www.linkedin.com/company/endjin
Information about our Open Source projects can be found at endjin.com/open-source
Find out more at endjin.com
#Microsoft #MicrosoftFabric #PowerBI #AI #Data #Analytics #DevOps #Azure #Cloud
Microsoft Fabric + Data Mesh - a perfect fit? ❤️ or 💔?
In this video, Barry Smart, Director of Data & AI, examines how Microsoft Fabric can support a Data Mesh vision for data and analytics.
Overview:
As we transition to a digital age powered by data, we discuss how data professionals can help their organizations thrive. Discover the capabilities of Microsoft Fabric, its role in creating data products, and how it aligns with the core principles of Data Mesh. Learn about the advancements in technology, the integration of open-source tools, and the importance of a socio-technical approach to data. Don't miss insights on federated computational governance, and DataOps. Watch to understand how to drive value and innovation in your organization with...
Overview:
As we transition to a digital age powered by data, we discuss how data professionals can help their organizations thrive. Discover the capabilities of Microsoft Fabric, its role in creating data products, and how it aligns with the core principles of Data Mesh. Learn about the advancements in technology, the integration of open-source tools, and the importance of a socio-technical approach to data. Don't miss insights on federated computational governance, and DataOps. Watch to understand how to drive value and innovation in your organization with...
Просмотров: 501
Видео
Data Engineering Observability in Microsoft Fabric!
Просмотров 433Месяц назад
In part 5 of this course Barry Smart, Director of Data and AI, walks through a demo showing how to improve observability of your data engineering processes using readily available technology and platforms in with Microsoft Fabric and Azure. Barry begins the video by explaining that one of the pitfalls of using Fabric Notebooks in an operational setting is that it can impede observability of wha...
Visualise your Medallion Architecture with Task Flows in Microsoft Fabric
Просмотров 505Месяц назад
Adopting Task Flows in Fabric: Titanic Diagnostic Analytics Series Part. 4 In this episode of our Titanic Diagnostic Analytics series, we dive into the new task flow feature in Fabric to optimize workspace organization and implement reference architectures. We'll recap our ongoing data product development aimed at creating an interactive Power BI report to analyze Titanic passenger survival pat...
Testing Notebooks with Microsoft Fabric - Titanic Survivor Predictive Analytics - Part 3
Просмотров 4172 месяца назад
In part three of the Titanic Diagnostic Analytics series, Barry Smart delves into testing data engineering functionality in Microsoft Fabric Notebooks. This video focuses on developing, testing, and automating code to project data to the gold layer of the lake, employing test-driven development principles. We also outline the benefits of using Fabric Notebooks and how to overcome potential pitf...
Power BI Data Stories: Global Brand Insights: 20 Years of Financial Trends
Просмотров 2,1 тыс.2 месяца назад
This walkthrough of The Global Brand Insights Report using financial data ingested from the Yahoo Finance API, examines how design and visualisation choices data stories with Power BI compelling. Explore the creative journey behind the Global Brand Insights Report on Power BI, showcased on the Power BI Data Stories Gallery: community.fabric.microsoft.com/t5/Data-Stories-Gallery/Global-Brand-Ins...
Compelling Data Storytelling with Power BI: Titanic Survivors
Просмотров 3374 месяца назад
Creative Walkthrough: Titanic Passenger Diagnostic Report in Power BI Explore the Titanic Passenger Diagnostic Report created using Power BI and published to the Data Stories Gallery. In this video, Paul Waller, walks you through the design decisions and data visualization techniques used, inspired by an interactive museum exhibit. Learn about the demographics, survival rates, and the aftermath...
10x Spark performance improvement in Microsoft Fabric
Просмотров 6034 месяца назад
Boosting Apache Spark Performance with Small JSON Files in Microsoft Fabric. Learn how to achieve a 10x performance improvement when ingesting small JSON files in Apache Spark hosted on Microsoft Fabric. Ian Griffiths, Technical Fellow at endjin, shares insights and techniques to overcome Spark's challenges with numerous small files, including parallelizing file discovery and optimizing data lo...
Microsoft Fabric: Good Notebook Development Practices 📓 (End to End Demo - Part 8)
Просмотров 3 тыс.5 месяцев назад
Microsoft Fabric End to End Demo - Part 8 - Good Notebook Development Practices Notebooks can very easily become a large, unstructured dump of code with a chain of dependencies so convoluted that it becomes very difficult to track lineage throughout your transformations. With a few simple steps, you can turn notebooks into a well-structured, easy-to-follow repository for your code. In this vide...
Microsoft Fabric: Machine Learning Tutorial - Part 2 - Data Validation with Great Expectations
Просмотров 1,5 тыс.6 месяцев назад
In part 2 of this course, Barry Smart, Director of Data and AI, walks through a demo showing how you can use Microsoft Fabric to set up a "data contract" that establishes minimum data quality standards for data that is being processed by a data pipeline. He deliberately passes bad data into the pipeline to show how the process can be set up to "fail elegantly" by dropping the bad rows and conti...
Microsoft Fabric: Machine Learning Tutorial - Part 1 - Overview of the Course
Просмотров 1,4 тыс.6 месяцев назад
In this video Barry Smart, Director of Data and AI, provides an overview of the end to end demo of Microsoft Fabric that we will be providing as a series of videos over the coming weeks. The demo will use the popular Titanic data set to show off features across both the data engineering and data science experiences in Fabric. This will include Notebooks, Pipelines, Semantic Link, MLflow (Experi...
Data is a socio-technical endeavour
Просмотров 376 месяцев назад
Our experience shows that the the most successful data projects rely heavily on building a multi-disciplinary team.
No Code Low Code is Software DIY How Do You Avoid a DIY Disaster
Просмотров 776 месяцев назад
No-code/Low-code democratizes software development with little to no coding skills needed. But how do you evaluate if software DIY is the right choice for you? From the blog post: endjin.com/blog/2024/03/no-code-low-code-software-diy
How to Build Navigation into Power BI
Просмотров 716 месяцев назад
Explore a step-by-step guide on designing a side nav in Power BI, covering form, icons, states, actions, with a view to enhancing report design & UI. From the blog post: endjin.com/blog/2024/03/how-to-build-navigation-in-power-bi
Data & AI Engineering Maturity
Просмотров 236 месяцев назад
As data and AI become the engine of business change, we need to learn the lessons of the past to avoid expensive failures. From the blog post: endjin.com/blog/2024/03/data-ai-engineering-maturity
The Heart of Reactive Extensions for .NET (Rx.NET)
Просмотров 1,1 тыс.7 месяцев назад
The Heart of Reactive Extensions for .NET (Rx.NET)
Microsoft Fabric: Processing Bronze to Silver using Fabric Notebooks
Просмотров 6 тыс.10 месяцев назад
Microsoft Fabric: Processing Bronze to Silver using Fabric Notebooks
Microsoft Fabric: Role of the Silver Lakehouse in the Medallion Architecture
Просмотров 2,6 тыс.10 месяцев назад
Microsoft Fabric: Role of the Silver Lakehouse in the Medallion Architecture
Microsoft Fabric: Local OneLake Tools
Просмотров 2,8 тыс.Год назад
Microsoft Fabric: Local OneLake Tools
Show & Tell: A Brief Intro to Tensors & GPT with TorchSharp
Просмотров 778Год назад
Show & Tell: A Brief Intro to Tensors & GPT with TorchSharp
Microsoft Fabric: Creating a OneLake Shortcut to ADLS Gen2
Просмотров 6 тыс.Год назад
Microsoft Fabric: Creating a OneLake Shortcut to ADLS Gen2
Microsoft Fabric and The Pace of Innovation - The Decision Maker's Guide - Part 3
Просмотров 760Год назад
Microsoft Fabric and The Pace of Innovation - The Decision Maker's Guide - Part 3
Microsoft Fabric & Generative AI - The Decision Maker's Guide - Part 2
Просмотров 1,1 тыс.Год назад
Microsoft Fabric & Generative AI - The Decision Maker's Guide - Part 2
Hedging your Microsoft Fabric Bet - The Decision Maker's Guide - Part 1
Просмотров 2,3 тыс.Год назад
Hedging your Microsoft Fabric Bet - The Decision Maker's Guide - Part 1
Microsoft Fabric: Ingesting 5GB into a Bronze Lakehouse using Data Factory - Part 3
Просмотров 7 тыс.Год назад
Microsoft Fabric: Ingesting 5GB into a Bronze Lakehouse using Data Factory - Part 3
Microsoft Fabric: Inspecting 28 MILLION row dataset in Bronze Lakehouse - Part 2
Просмотров 7 тыс.Год назад
Microsoft Fabric: Inspecting 28 MILLION row dataset in Bronze Lakehouse - Part 2
Microsoft Fabric: Lakehouse & Medallion Architecture - Part 1
Просмотров 16 тыс.Год назад
Microsoft Fabric: Lakehouse & Medallion Architecture - Part 1
A 10 minute Tour Around Microsoft Fabric
Просмотров 6 тыс.Год назад
A 10 minute Tour Around Microsoft Fabric
Microsoft Fabric Briefing - after 6 months of use on the private preview.
Просмотров 23 тыс.Год назад
Microsoft Fabric Briefing - after 6 months of use on the private preview.
Reactive Extensions API in depth: Marble Diagrams, Select() and Where()
Просмотров 115Год назад
Reactive Extensions API in depth: Marble Diagrams, Select() and Where()
I've really enjoyed your videos in this series and would love to see more!
Really nice series, when is the next episode due?
TDD in Data Analysis 😍
You may enjoy another talk of ours: endjin.com/what-we-think/talks/how-to-ensure-quality-and-avoid-inaccuracies-in-your-data-insights
Great presentation and Video!! The slides are fantastic!! Any hints where to find a pdf with this?
Hello Ed and thank you for the video, I have a question : Is it possible to see Notebook and the others "non data related" items in Azure Storage Explorer ?
Awesome series. I work for a large organisation and was wondering how to implement the medallion architecture. Would it be best to have workspaces per domain/groups e.g. Transport/Finance/HR each with bronze/silver/gold?
It is no question that Jeffrey van Gogh painted the marbles since his Grand-Grand-Grand-Grand father was the famous painter Van Gogh!!
Great video and explanation of the data mesh and dataOps principals (if anyone wants more info on this watch the other Titantic videos here), most of the phases are being covered.
Thank you for watching, if you enjoyed this episode, please hit like 👍subscribe ✅ and turn on notifications 🔔- it helps let the RUclips algorithm know that our content is worth watching! 🙏
how do i create such best report , i have ever seen , pls share the technical session of the creation of this report to me.
What a great bit of investigative problem-solving - nice work!
For people like me who aren't user experience gifted there was lots of great ideas and value here, as someone else commented, less is definitely more!
Yes, we obviously very much agree!
This video is great, it helps me take principles and practices we understand and believe in the C# world and apply them to our fabric projects - nice work!
Excellent! Yes, that's very much the point. Many of the new cloud native analytics platforms allow data folk to adopt and embrace the engineering practices that software folk have enjoyed for the past 25 years! 🎉
A really valuable video to help with an often overlooked part of using notebooks!
We're really glad you found it useful!
Thank you for watching, if you enjoyed this episode, please hit like 👍subscribe, and turn notifications 🔔 on it helps let the RUclips algorithm know that our content is worth watching!
Thank you for watching, if you enjoyed this episode, please hit like 👍subscribe, and turn notifications on 🔔it helps us more than you know. 🙏
Thank you for watching, if you enjoyed this episode, please hit like 👍subscribe, and turn notifications on 🔔it helps us more than you know. 🙏
Thank you for watching, if you enjoyed this episode, please hit like 👍subscribe, and turn notifications on 🔔it helps us more than you know. 🙏
Thank you for watching, if you enjoyed this episode, please hit like 👍subscribe, and turn notifications on 🔔it helps us more than you know. 🙏
Thank you for watching, if you enjoyed this episode, please hit like 👍subscribe, and turn notifications on 🔔it helps us more than you know. 🙏
Cool! Thanks for sharing this!
vary good practice but code files not uploaded . please to help me to understand code upload to github
Just now i have completed this whole playlist as i have to start with one new project on fabric. Thanks a lot for providing these framework level information. Highly appreciated. Please do continue updating this playlist with more insight. liked and subscribed 😊
That is brilliant -- thanks for making this series of such quality on Microsoft Fabric.
Is it possible to do a step by step video on this ? 🥹
What aspects are you particularly interested in?
We are interested to know how the onelake process works. Appreciate it very much.
Hello. Thanks for your interest. We'll try to find time to do another demo to show how we could move the solution onto Microsoft Fabric and use OneLake. There are two other video series in our channel which step through data engineering features in Microsoft Fabric including how we make use of OneLake.
hi, do you have a source code somewhere in github that i can experient on..
Perfect example of how simplicity is the powerful. unlike other dashboard crowded with too much information yours just take care of one element at a page. Great learning.
Thanks for the feedback! I worked with my colleague Paul on this project who is a visual design expert. My inclination is to throw more and more data onto the page. But Paul always challenges that. He makes sure we adhere to a user experience (UX) that is accessible, intuitive, delivers a user journey and visually pops. When we build reports like this for clients we also have a domain expert involved. So we do lots of small iterations to get the right balance between those three concerns: business goals, analytics and UX. This multi-disciplinary approach really helps to make sure you get a good product at the end.
from where you got other data
Hi there. The Postcode data comes from: geoportal.statistics.gov.uk/datasets/a8a2d8d31db84ceea45b261bb7756771/about Ed
Hi, great video series. I am really enjoyed by watching your contribution of knowlegde. One thing I need to ask about is when you read the csv file, it has no headers. Then you do the apply_transformation things which I believe need header info to work properly, or am I wrong? I can't see any steps or code that add header info before you do the df = PricePaidWrangler.apply_transformations(df). Can you comment on this?
Great content! Fingers crossed you release next parts in the series soon
Part 3 is currently being worked on!
Hello Nice video. could you please explain from where exactly did you get the dfs url that you had pasted in the ADLS shortcut connector? in my url i am having a blob part in it and that is preventing me from doing the connection. Thanks
Hi there, You should be able to just change "blob" to "dfs" if your storage account is ADLS Gen2 enabled. If it's not ADLS Gen2 enabled, then sadly you can't create a shortcut to a Blob Storage account at the moment. Ed
@@endjin Thanks for your reply. I found out that my storage account is not ADLS Gen2 enabled.
how to move data from one lakehouse's to another lakehouse table using pyspark?
Connect to both lakehouses on the left pane of the notebook. Use spark sql to select and manipulate the data from your bronzelake, then essentially do the merge similar to what is done in the video while selecting the silverlake.
Further to Ben's response, I would recommend using two-part naming (<lakehouse_name>.<table_name>) when transferring this data. However, if you're just wanting to expose the data from one lakehouse in another lakehouse, remember you can avoid copying data altogether by using Shortcuts. And if you need to do this programmatically, there's an API for this: learn.microsoft.com/en-us/rest/api/fabric/core/onelake-shortcuts/create-shortcut?tabs=HTTP#create-shortcut-one-lake-target-example Ed
The video was ok, but it did not explain how the ADLS storage account networking needs to be configured. More specifically, how to configure it in a secure manner, without allowing access from all networks.
Hi there, Thanks for the feedback! This video was recorded at a time when it wasn't possible to connect to ADLS Gen2 if it wasn't publicly accessible. However, now that's changed: if you'd like to understand how to access a partially restricted ADLS account, please see the documents here: learn.microsoft.com/en-us/fabric/security/security-trusted-workspace-access. If you've fully disabled public access, there's no easy way to create a Shortcut that I'm aware of, sadly. Hopefully that functionality will come. Ed
Hi @endjin great videos, have you uploaded the architecture diagram file anywhere that I can download and reuse for my own projects?
Not yet - but it will come! Thanks for your comment, Ed
How can i contact you pls let me know Thanks
This is a framework level work, not sure how many will understand and appreciate your efforts you did to create a video, but I will highly appreciate your thoughts and work and at one point I was thinking if I got a chance to create a framework how I will do, you gave very nice guide line here, once again thank you for video, I would like to see your other videos too.
Thanks for the comment! Ed
Is there an option to connect from your local machine directly to the synapse spark cluster? Doesn't seem that debug friendly, having to compile & upload it every time. It almost feels more sensible to host your own autoscaling Spark Cluster in Azure Kubernetes Services. If I do so, I can interact directly with the Cluster and build Sessions locally. What do you think?
In this scenario, it would make more sense to run Spark locally. There are a few ways you can do that, but as you'd expect it's not entirely straightforward, and not something easily addressed in a comment.
very clean explanation, appreciate your efforts. is there any chance we get code on each layer ( Bronze to sliver etc.. advance thank you
you are the best
Thanks for the kind words! Ed
Thanks for the video! Is there any alternative to run the test notebooks of synapse from the cicd pipeline in azure devops?
There's a couple of ways to achieve this - neither are immediately obvious but definitely possible! There's no API for just running a notebook in Synapse, but you can submit a Spark batch job via the API. However, this requires a Python file as input, so it might mean pulling your tests out of a Notebook and writing and storing them separately in an associated ADO repo: learn.microsoft.com/en-us/rest/api/synapse/data-plane/spark-batch/create-spark-batch-job?view=rest-synapse-data-plane-2020-12-01&tabs=HTTP Possibly an easier route would be to create a separate Synapse Pipeline definition that runs your test notebook(s) and use the API to trigger that pipeline run from your ADO pipeline. This is a straightforward REST API but operates asynchronously, so you'd need to poll for completion as the pipeline/tests are running: learn.microsoft.com/en-us/rest/api/synapse/data-plane/pipeline/create-pipeline-run?view=rest-synapse-data-plane-2020-12-01&tabs=HTTP Hope that helps!
Do you know how to view the definition of the view or stored procedures?
Hi - I don't believe there's a way in Synapse Studio to automatically script out the definitions like you can do in, say SQL Server Management Studio. But you can see the column definitions for you View if you find your database under the Data tab and expand the nodes in the explorer. Hope that helps!
looking forward to many more of this!
Barry has ~9 parts planned!
Great videos 👍👍 Microsoft advocate using seperate workspaces for bronze, silver and gold but that seems to be harder to achieve due to some current limitations. If we go with a single workspace and a folder based set up like the example will it be hard to switch to seperate workspaces in future? Is there any prep we can do to make this switch easier going forward (or would there be no need to switch to a workspace approach)?
Hi there, Thanks for your comment! It's a great question. Personally, unless there are strong requirements (e.g. high data sensitivity/unique security requirements) for splitting zones apart into separate workspaces, I would default to one workspace. In my experience, the same team is often involved in managing all three layers of the Lakehouse, and "end-users" mostly only get access to semantic models (or the "Gold" layer at a push), so giving every zone its own workspace isn't really justified. Deployment also becomes trickier when multiple workspaces are involved. All that being said, I can appreciate there are scenarios where multiple is more suitable. W.r.t. how to design for the future: my first comment would be "only do it if you need to". No need to change tack just for the sake of it. If a single workspace is working for you then just stick with it. But if you do need to switch for whatever reason, then sadly there'll always be a significant migration overhead. The best thing you can do is have well structured workspaces and notebooks like I've shown here. Highlight the artifacts that are relevant to each layer of the architecture so you can get a picture of the interdependencies. Deployment pipelines / REST APIs will be your friend if you have to migrate too - worth getting familiar with these if you haven't already. The beauty of the medallion architecture is that the core structure of your data pipelines stays the same, whatever workspace infrastructure architecture you opt for! Hope this helps, Ed
Thanks a lot Barry. Great video. I couldn't find the repository for the series among your Github repos. Will there be one?
Thanks for the feedback. Glad to hear you are enjoying it. Yes - we are planning to release the code for this project on Git at some point soon.
Thanks for the great video, very useful. One question: you are using PySpark in your notebooks, but how would you recommend modularizing the code in Spark SQL? Maybe by defining UDFs in separate notebooks that are then called in the 'parent' notebook?
Sadly you don't have that many options here without having to fall back to Python/Scala. You can modularize at a very basic level using notebooks as the "modules", containing a bunch of cells which contain Spark SQL commands. Then call these notebooks from the parent notebook. Otherwise, as you say, one step further would be defining UDFs using some Python and then using spark.udf.register to be able to invoke them from SQL. Ed
Really looking forward to this series, thanks for taking the time to put it together. I really enjoy the pace and detail level if the conte t @endjin put together.
Great series, what the naming convention you are using in the full version of the solution ? I noticed the LH is prefixed with HPA
The HPA prefix stands for "House Price Analytics", although the architecture diagram on the second video has slightly old names, as you've probably noticed. The full version uses <medallion_layer>_Demo_LR, where LR stands for "Land Registry". Ed
@@endjin - Thanks for the clarification Ben.
Finally someone made this video!! Thank you for doing this.
Hi Ed, We have loaded few tables using synapse link into adls gen2 and created shortcut to access the adlsgen2 files in fabric,but while loading the files into tables,we are not getting the column names for the tables and it is showing as c0,c1....etc which is causing an issue,can you please give some insights on how to overcome this and load the tables with metadata also
Hi - thanks for the comment! Which Synapse Link are you using? Dataverse? If so, this uses the CDM model.json format which doesn't include header rows in the underlying CSV files. You would have to read the shortcut data, apply the schema manually, and then write the data out to another table (inside a Fabric notebook or something) if you wanted to use that existing data. However, if you're using Synapse Link for Dataverse, you should instead consider using the new "Link to Microsoft Fabric" feature available in Dataverse: learn.microsoft.com/en-us/power-apps/maker/data-platform/azure-synapse-link-view-in-fabric. This will include the correct schema.