Load data from Azure Blob Storage into Python

Поделиться
HTML-код
  • Опубликовано: 19 авг 2024
  • Code below:
    from datetime import datetime, timedelta
    from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions
    import pandas as pd
    #enter credentials
    account_name = 'ACCOUNT NAME'
    account_key = 'ACCOUNT KEY'
    container_name = 'CONTAINER NAME'
    #create a client to interact with blob storage
    connect_str = 'DefaultEndpointsProtocol=https;AccountName=' + account_name + ';AccountKey=' + account_key + ';EndpointSuffix=core.windows.net'
    blob_service_client = BlobServiceClient.from_connection_string(connect_str)
    #use the client to connect to the container
    container_client = blob_service_client.get_container_client(container_name)
    #get a list of all blob files in the container
    blob_list = []
    for blob_i in container_client.list_blobs():
    blob_list.append(blob_i.name)
    df_list = []
    #generate a shared access signiture for files and load them into Python
    for blob_i in blob_list:
    #generate a shared access signature for each blob file
    sas_i = generate_blob_sas(account_name = account_name,
    container_name = container_name,
    blob_name = blob_i,
    account_key=account_key,
    permission=BlobSasPermissions(read=True),
    expiry=datetime.utcnow() + timedelta(hours=1))
    sas_url = '' + account_name+'.blob.core.windows.net/' + container_name + '/' + blob_i + '?' + sas_i
    df = pd.read_csv(sas_url)
    df_list.append(df)
    df_combined = pd.concat(df_list, ignore_index=True)

Комментарии • 41

  • @EwaneGigga
    @EwaneGigga 13 дней назад +2

    Thanks a lot for this very clear video. I spent hours trying to do this until I luckily stumble across your video. I agree that this video should definitely have more views!!

    • @dotpi5907
      @dotpi5907  13 дней назад

      I'm glad it helped. Thanks for the support!

  • @CapitanFeeder
    @CapitanFeeder 6 месяцев назад +1

    Videos like yours should have way more views. Thank you for what you do.

    • @dotpi5907
      @dotpi5907  6 месяцев назад

      Thanks so much! I really appreciate the support

  • @k2line706
    @k2line706 3 месяца назад

    Fantastic video. Very clear explanation and clean code for us to follow. Thank you!

  • @RoamingAdhocrat
    @RoamingAdhocrat 2 дня назад

    ooh. I support a SaaS app and _hate_ Azure Storage Explorer with a burning passion. if I can access logs etc from Python instead of ASE that would be a very happy rabbithole to go down. I suspect I don't have access to those keys though

  • @dhruvajmeri8677
    @dhruvajmeri8677 3 месяца назад

    Thank you fo this video! It saved my time

  • @kevinduffy2428
    @kevinduffy2428 2 месяца назад

    What if you did not want to bring the files down to the local machine? How would you process the files up on Azure? And run the Python code on Azure. For instance, the files were placed in blob storage and now you wanted process them, clean them up and then save out the results out in blob storage. The Python code is not complicated , just what are the pices/configuration up on Azure.

  • @investing3370
    @investing3370 7 месяцев назад +2

    What with happen when you have SAS token on hand, can it be replaced with account key?

    • @dotpi5907
      @dotpi5907  7 месяцев назад

      Hi @investing3370, try changing line 16 to:
      sas_token = 'your sas token'
      connect_str = 'DefaultEndpointsProtocol=https;SharedAccessSignature=' + sas_token + ';EndpointSuffix=core.windows.net'
      you wont line lines 11 to 13
      let me know if that works

  • @thisissuvv
    @thisissuvv Год назад +1

    what if i have multiple directories inside the container and blobfiles are present inside those directories?

  • @remorabay
    @remorabay 5 месяцев назад

    I have a python script that reads an EDI file and, from there, creates unique data tags and elements (basically a CSV file with one tag and data field, per line). I need to process to load this into Azure and, for the outbound, to extract into the same tags+data. This looks close. Anyone interested in giving me a quote for this (can you show it working?). Thanks.

  • @ohaya1
    @ohaya1 Год назад +1

    Mega like, thank you so much!

  • @user-bu3zh4rm5j
    @user-bu3zh4rm5j 10 месяцев назад

    I have image dataset stored in the azure datastores filestorage. I have a model in azure ml studio. So how do i access the dataset.

  • @BigBob8681
    @BigBob8681 7 месяцев назад

    Do you have any suggestions for how to then write a file in a similar fashion to the storage blob?

  • @yuthikashekhar1718
    @yuthikashekhar1718 Год назад +3

    Can we do the same for json files stored in blob storage?

    • @sohamjana3802
      @sohamjana3802 7 месяцев назад

      I have the same question. Have you found a solution to your question?

  • @_the.equalizer_
    @_the.equalizer_ 11 месяцев назад +1

    Well Explained ! Actually I want to read ".docx" file from blob. How can I do that?

    • @jsonbourne8122
      @jsonbourne8122 10 месяцев назад +1

      You will probably have to read it in bytes and store locally or create a BytesIO object first and then pass to python-docx

  • @learner-df2ns
    @learner-df2ns 11 месяцев назад +1

    Hi Sir
    Which version of pandas you have used, can we load into pyspark dataframe instead of pandas data frame if Yes, pls share me the syantax ASAP

    • @benhiggs8834
      @benhiggs8834 11 месяцев назад

      Hi @learner-df2ns, Try replacing these lines:
      df = pd.read_csv(sas_url)
      df_list.append(df)
      df_combined = pd.concat(df_list, ignore_index=True)
      with these lines:
      from pyspark.sql import SparkSession
      spark = SparkSession.builder.appName("CSVtoDataFrame").getOrCreate()
      df = spark.read.csv(csv_file_path, header=True, inferSchema=True)
      spark.stop()
      df_list.append(df)
      df_combined = reduce(lambda df1, df2: df1.union(df2), df_list)

  • @AndresPapaquiNotario
    @AndresPapaquiNotario Год назад +1

    thanks! super helpful 👍

    • @dotpi5907
      @dotpi5907  Год назад

      That's great to hear @AndresPapaquiNotario

  • @satyakipradhan2359
    @satyakipradhan2359 Год назад +1

    getting HTTP Error 403: This request is not authorized to perform this operation using this resource type

    • @dotpi5907
      @dotpi5907  Год назад

      Hi @satyakipradhan2359 , thanks for the comment. 403 means that your connection is working but you don't have permission. So, your Azure account might have extra security on it.
      Try changing a few of the other options in the 'Generate SAS' tab. Some of the options like adding your IP address to the 'Allowed IP addresses' and checking that you have read permissions in the 'permissions' dropdown. Hope that helps!

  • @charlieevert7666
    @charlieevert7666 Год назад +1

    You da real MVP

    • @dotpi5907
      @dotpi5907  Год назад

      Thanks @charlieevert7666!

  • @kartikgupta8413
    @kartikgupta8413 7 месяцев назад +1

    thank you for this video

    • @dotpi5907
      @dotpi5907  6 месяцев назад

      Thanks for watching!

  • @sumitsp01
    @sumitsp01 Год назад +1

    Can we do similar thing to load video file from azure blob storage using libraries like OpenCV .
    I want to load and analyze videos from blob storage inside azure machine learning studio

    • @dotpi5907
      @dotpi5907  Год назад +1

      Hi sumit Sp, thanks for the comment, that sounds like an interesting project and it sounds like it can be done. I'll have a play around and let you know if I figure it out, maybe this weekend.

    • @sumitsp01
      @sumitsp01 Год назад +1

      @@dotpi5907 Thank you for the reply. I tried above thing and I am able to do it now. We can read videos from azure blob storage by providing correct URI path and then we can convert it into frames and store in another location to use in our ml models.
      Now I’m looking for a way to read live videos coming directly from camera 😄

  • @nikk6489
    @nikk6489 Год назад +1

    @dotpi5907 can we load the hugging face save model like this? if so can you guide how? or there is any alternate solution? many thanks in advance. Cheers

    • @dotpi5907
      @dotpi5907  Год назад

      Hi Nik K, thanks for the comment! That sounds like great idea for a video. Im away for a few days, but when i get back ill look into it and (all things going well) make a video on it

  • @surajbhu
    @surajbhu 11 месяцев назад

    Is there an way to select files/blob from an Azure container using a flask application for further usage in the application just like request.files.getlist('files') function help in selecting files from the local directory. Can someone help me with this?

    • @oujghoureda1303
      @oujghoureda1303 4 месяца назад

      this worked for me:
      for blob in container_client.list_blobs():
      if blob.name.startswith('resumes/') and blob.name.endswith('.pdf') :
      blob_list.append(blob.name)

  • @nirajmodh5086
    @nirajmodh5086 Год назад +1

    can we set the expiry time to be infinite?

    • @dotpi5907
      @dotpi5907  Год назад +1

      Hi Niraj, thank you for the comment and sorry for the late reply. I don't think you can set the expiry date to be infinite unfortunately. The main reason for this is if someone outside of your organization were to get a hold of your sas key, they would be able to access the file for as long as the sas key is valid or the file is deleted.
      I don't know too much more on this subject, but there is some more info here learn.microsoft.com/en-us/azure/storage/common/sas-expiration-policy?tabs=azure-portal.
      Hope that helps!

  • @alexandrakimberlychavezaqu8290
    @alexandrakimberlychavezaqu8290 Год назад +2

    Thank you so much!!!!