Microsoft Fabric: How to load data in Lakehouse using Spark; Python using the notebook

Поделиться
HTML-код
  • Опубликовано: 3 фев 2025

Комментарии • 41

  • @christophehervouet3280
    @christophehervouet3280 Год назад +1

    Fantastic study Amit

    • @AmitChandak
      @AmitChandak  Год назад

      Thanks 🙏
      Hope you will like the series
      Mastering Microsoft Fabric 35+ Videos: ruclips.net/video/p-v0I5S-ybs/видео.html

  • @sanumpzha
    @sanumpzha Год назад +1

    Tried and it went well.! thank you Amit

    • @AmitChandak
      @AmitChandak  Год назад +1

      Glad to hear that. Let me know, if you have any questions
      Thanks 🙏

  • @LearnAtHomewithGulshan
    @LearnAtHomewithGulshan Год назад +1

    Kudos for you Amit!!

  • @Kulfi2241
    @Kulfi2241 Год назад +1

    Great session.
    Q:-To use python in this notebooks, what libraries are required Amit.

    • @AmitChandak
      @AmitChandak  Год назад +1

      Depend on what you need. Basic Libraries are already available. You just need add import.
      You can check this out for more information
      learn.microsoft.com/en-us/fabric/data-science/python-guide/python-library-management

  • @nabjad94
    @nabjad94 Год назад +1

    very useful thank you

    • @AmitChandak
      @AmitChandak  Год назад +1

      You are welcome. Thanks 🙏
      Hope you will like the full series
      Mastering Microsoft Fabric 35+ Videos: ruclips.net/video/p-v0I5S-ybs/видео.html

  • @joelluis4938
    @joelluis4938 Год назад +1

    Hi
    Great video and I have 3 questions because I've started to learn about fabric this week and I would love to hear your comments to keep learning more about this
    -Can I explore my table using pandas and all his methods? I thought that I had to use Spark lenguage to do that
    - Why should I transform my table to a Delta Table if it's currently working fine from csv or it could be an SQL view from SQL serve ? I still don't understand the advantage of doing that..
    -I saw many posts showing the transformation of flat files to delta files. But what if I have an sql view from SQL server .. We could also change that view to a delta table to improve or no make sense?

    • @AmitChandak
      @AmitChandak  Год назад +1

      You should be able to use most of the Pandas functions and analyze the data. However, please keep in mind that the Spark engine is running when you execute the code.
      In Fabric, a Delta Parquet format file is treated as a table. That's why we prefer to save it back as Delta Parquet, enabling us to analyze it using SQL endpoints or Power BI.
      We need to bring in data from SQL Server to this environment. If you use Dataflow Gen2 or a data pipeline, it will be saved in Parquet format.
      When you save using Spark, remember to choose the appropriate format while saving

  • @samirsahin5653
    @samirsahin5653 Год назад +1

    I love your fabric videos.
    In this video we are pulling data from csv, transform it on notebook via pandas and creating table for sql so we can run sql queries on our data. Then visualize it on power bi. This is almost what I was looking for.
    Next step how can we make sure refresh data in background?
    If I create a scheduled refresh on PBI, I think it will only get data what is in table. It won’t check csv updateszz How can we create a refesh to make sure get most updated data from csv and python code refresh that gross column?
    Second question related do datalakehouse. I’m on fabric trial. Can I access from this trial fabric workspace to my current premium workspace and manipulate data on my current pbi reports?

    • @AmitChandak
      @AmitChandak  Год назад +1

      You should be able to create a pipeline and call the notebook. Check How have done Incremental ( I am not using pipeline)
      Microsoft Fabric Part 15: Incremental Data Load(ETL) for Warehouse using Dataflow Gen 2, SQL Procedure, Data Pipeline: ruclips.net/video/mpFRnZBXsvQ/видео.html
      How about incremental without Dataflow, Only using pipeline and SQL procedure for Microsoft Fabric Warehouse
      Microsoft Fabric Part 16: ruclips.net/video/qsOIfTzjCSQ/видео.html
      Try to see if you have Lakehouse connector. Or create a report on Power Bi desktop using new lakehouse connector and try to publish on Premium workspace

  • @SanthoshReddy-u7d
    @SanthoshReddy-u7d 4 месяца назад +1

    Hi Amit, can you please help me how to remove underscore ?
    When uploading CSV files or any data source to Microsoft Fabric's Warehouse or Lakehouse, field names may automatically convert to underscores if they contain spaces.

    • @AmitChandak
      @AmitChandak  4 месяца назад +1

      As Lakehouse does not support space in the name, if will convert space to to Underscore.
      You can try this before save
      df_renamed = df \
      .withColumnRenamed("First Name", "FirstName") \
      .withColumnRenamed("Last Name", "LastName")
      or something like this
      # Generate New Column Names by Removing Spaces
      new_columns = [c.replace(" ", "") for c in df.columns]
      # Rename Columns Dynamically
      df_renamed = df.toDF(*new_columns)

  • @Srikanthmanchala
    @Srikanthmanchala 4 месяца назад +1

    Hi,
    Lets say, I have data in on prem DB which I need copy to One lake. how we load it?(Gateway is required). Please provide the solution.

    • @AmitChandak
      @AmitChandak  4 месяца назад +1

      You can use Dataflow Gen 2 or Data Pipeline with On-premise Gateway
      Load local SQL server data in Lakehouse using on-premise Gateway| Dataflow Gen2: ruclips.net/video/oEF-jHVmvdo/видео.html
      Microsoft Fabric Using Data Pipelines: Gateway Update: ruclips.net/video/sVePvZOjtoo/видео.html
      You can choose any destination lakehouse or warehouse
      Mastering Microsoft Fabric 50+ Videos:
      ruclips.net/video/p-v0I5S-ybs/видео.html

    • @Srikanthmanchala
      @Srikanthmanchala 4 месяца назад

      @@AmitChandak Thank you for your response. Yes, can be done using Dataflow or using a DF pipeline. just wanted to check any possible way using notebook.

  • @bulletkip
    @bulletkip 9 месяцев назад

    is it possible to use python locally to send data frame or csv to lake house

    • @AmitChandak
      @AmitChandak  9 месяцев назад

      Refer if these can help
      #microsoftfabric: Use Token Authentication to load local SQL server to #lakehouse using Python code: ruclips.net/video/a38jhtZG6x8/видео.html
      Microsoft Fabric: Use Token Authentication to load local SQL server to Warehouse using Python code: ruclips.net/video/OGsLJTxnbjE/видео.html
      Microsoft Fabric: Load incremental data from local SQL server to Warehouse using on-premise Python: ruclips.net/video/gBGiWGJS5Cs/видео.html
      Microsoft Fabric: Load local SQL server data in Warehouse using Python, Pandas, and sqlalchemy: ruclips.net/video/P0o-a-8rFH0/видео.html
      Refer full series
      Mastering Microsoft Fabric 45+ Videos in English:
      ruclips.net/video/p-v0I5S-ybs/видео.html

  • @wb6ez4jd7s
    @wb6ez4jd7s 10 месяцев назад +1

    What if you want to execute that notebook from external editor like VS Code on your desktop? This method wouldn't work.

    • @AmitChandak
      @AmitChandak  10 месяцев назад +2

      Please check, if these can help
      #microsoftfabric: Use Token Authentication to load local SQL server to #lakehouse using Python code- ruclips.net/video/a38jhtZG6x8/видео.html
      Microsoft Fabric: Use Token Authentication to load local SQL server to Warehouse using Python code- ruclips.net/video/OGsLJTxnbjE/видео.html
      Microsoft Fabric: Load incremental data from local SQL server to Warehouse using on-premise Python- ruclips.net/video/gBGiWGJS5Cs/видео.html
      Microsoft Fabric: Load local SQL server data in Warehouse using Python, Pandas, and sqlalchemy- ruclips.net/video/P0o-a-8rFH0/видео.html

  • @GenZSchool
    @GenZSchool Год назад +1

    Amit Can weuse python in Microsoft fabric to read data from SFTP server?

    • @AmitChandak
      @AmitChandak  Год назад +1

      If we want read from SFTP we can do that using python and load that to Fabric. In case you need encryption while transfer you have option for that

  • @hirenrami7348
    @hirenrami7348 Год назад

    I think your videos are amazing! I have followed your instructions and managed to import data using an API and create a dataframe. However, I have encountered an issue where the column headers are not named as expected and have been automatically assigned. I am wondering if you have any suggestions on how to address this problem?

    • @AmitChandak
      @AmitChandak  Год назад

      Hi @hirenrami7348, Can you check if the column name are in first row. It may happen sometime, In that case we have to use first row as header.

  • @faheemiftikhar706
    @faheemiftikhar706 Год назад

    Hi, really good video. I just had one question. I wonder what other ways are there to load data with similar approach, I mean in your case reading from your github repository. What other sources can I use to load the data. I mean like can i use jdbc or odbc

    • @AmitChandak
      @AmitChandak  Год назад

      In the case of sources for which we have a PySpark connector, we should be able to load them.

  • @parajf
    @parajf 10 месяцев назад +1

    Now same table if I want to use in spark sql is it possible, how

    • @AmitChandak
      @AmitChandak  10 месяцев назад +1

      We can Manage the Lakehouse table to managed using Spark Python and Spark SQL
      Check
      Managing Microsoft Fabric Lakehouse using Spark SQL: Data Manipulation Insert, Update, Delete, Alter-ruclips.net/video/PunKbz4iCEg/видео.html

  • @prachijain5955
    @prachijain5955 Год назад +1

    Hello Sir, I want to access SFTP Server data using Azure Notebook but for that I need to set Inbound Rule . As Notebook will have dynamic IP then how can we allow Notebook IP to access Port 22? Thank you in advance . Please advise

    • @AmitChandak
      @AmitChandak  Год назад +1

      Please check and allow ips for app.powerbi.com
      learn.microsoft.com/en-us/power-bi/admin/power-bi-allow-list-urls

    • @prachijain5955
      @prachijain5955 Год назад

      @@AmitChandak Thank you so much Sir

  • @faheemiftikhar706
    @faheemiftikhar706 Год назад +1

    Hi is there any way to integrate notebook with Github ?

    • @AmitChandak
      @AmitChandak  Год назад +2

      The current list does not include a notebook, but this may be covered soon. I will check and confirm
      learn.microsoft.com/en-us/fabric/cicd/git-integration/intro-to-git-integration

    • @faheemiftikhar706
      @faheemiftikhar706 Год назад

      @@AmitChandak Okay thanks

  • @hirenrami7348
    @hirenrami7348 Год назад

    One more question is I want to make my PowerBi dashboard interactive like I want to take input from user like API and use that API key in my pyspark and load the data accordingly can you suggest how this can be done ?
    Thanks :)I have a question regarding my Power BI dashboard. I want to make it interactive by allowing users to input an API key, which will then be used in Pyspark to load the relevant data. Could you please advise on how I can achieve this?
    Thank you very much!

    • @AmitChandak
      @AmitChandak  Год назад

      I have python code in a few video, about how to use key, check if that can help
      Loading Microsoft Fabric Lakehouse made easy- Python code using OAuth Token:
      ruclips.net/video/fddp2MhFCBY/видео.html
      REST API- Use Token based Authentication to load local/on-premise SQL server data to Warehouse using Python code: ruclips.net/video/OGsLJTxnbjE/видео.html
      if these can not help. Refer
      support.gpsgate.com/hc/en-us/articles/360021707379-How-to-get-an-API-token-and-authorization-in-REST-with-Python
      medium.com/python-rest-api-toolkit/python-rest-api-authentication-with-json-web-tokens-1e06e449f33
      medium.com/geekculture/how-to-execute-a-rest-api-call-on-apache-spark-the-right-way-in-python-4367f2740e78

  • @trone_tip
    @trone_tip Год назад

    Can you please create video on how to data pull from Excel file to fabric