Pyspark Advanced interview questions part 1

Поделиться
HTML-код
  • Опубликовано: 31 дек 2024

Комментарии • 31

  • @abhilash0410
    @abhilash0410 3 года назад +8

    Bro bring more real-time interview questions like these thank you so much !

  • @saachinileshpatil
    @saachinileshpatil 11 месяцев назад +1

    Thanks for sharing 👍, very informative

  • @rocku4evr
    @rocku4evr 2 года назад +1

    Great......fortunate to be your subscriber

  • @sjitghosh
    @sjitghosh 3 года назад +3

    You are doing an excellent work. Helping a lot!!

  • @vedanthasm2659
    @vedanthasm2659 3 года назад +3

    One of the best explanation. Bro..Please make more videos on Pyspark

  • @seshuseshu4106
    @seshuseshu4106 3 года назад +1

    Very good detailed explanation, thanks for your efforts, keep continue ..

  • @nsrchndshkh
    @nsrchndshkh 3 года назад +1

    Thanks Man. This was some detailed explanation. Kudos

  • @akashpb4044
    @akashpb4044 3 года назад +1

    Awesome video... Cleared my doubts 👍👍👍

  • @fratkalkan7850
    @fratkalkan7850 2 года назад

    very clean explanation thank you sir

  • @achintamondal1494
    @achintamondal1494 2 года назад +1

    Awesome video.
    Could you please share the notebook, it will really help.

  • @janardhanreddy3267
    @janardhanreddy3267 10 месяцев назад

    nice explanation ,please attach csv file or json in description to practice

  • @sanooosai
    @sanooosai 9 месяцев назад

    great thank you

  • @rajanib9057
    @rajanib9057 Год назад

    can you pleaae explain how did spark filter those 2 colums as bad data? I don't see any where condition mentioned for the corrupt column

  • @shreekrishnavani7868
    @shreekrishnavani7868 3 года назад

    Nice explanation 👌 thanks

  • @varuns4472
    @varuns4472 2 года назад

    Nice one

  • @janardhanreddy3267
    @janardhanreddy3267 10 месяцев назад

    please upload all pyspark interview questions videos

  • @rahulyeole6411
    @rahulyeole6411 2 года назад

    Please share basic big data video

  • @naveendayyala1484
    @naveendayyala1484 Год назад

    plz share the notebook in .dbc format

  • @balajia8376
    @balajia8376 2 года назад

    seems querying _corrupt_record is not working. I tried it today and not allowing me to query with the column name.cust_df.filter("_corrupt_record is not null"). AnalysisException: Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
    referenced columns only include the internal corrupt record column
    (named _corrupt_record by default). For example:
    spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count()
    and spark.read.schema(schema).csv(file).select("_corrupt_record").show().
    Instead, you can cache or save the parsed results and then send the same query.
    For example, val df = spark.read.schema(schema).csv(file).cache() and then
    df.filter($"_corrupt_record".isNotNull).count().

    • @TRRaveendra
      @TRRaveendra  2 года назад

      cust_df.cache()
      Cache dataframe and it's won't raise exception

    • @balajia8376
      @balajia8376 2 года назад

      @@TRRaveendra Yes I did, even after that also not allowing to write a query on _corrupt_record is null or not null.

    • @balajia8376
      @balajia8376 2 года назад

      seems badRecordsPath is only the solution.

  • @sachintiwari6846
    @sachintiwari6846 Год назад

    Woah what a explanation

  • @johnsonrajendran6194
    @johnsonrajendran6194 3 года назад

    are any such mode options available while reading parquet files?

  • @balajia8376
    @balajia8376 2 года назад

    cust_df.select("_corrupt_record").show() is working but not allowing is null or not null. cust_df.select("_corrupt_record is null").show(). let me know if this is working for you. thank you.

  • @swagatikatripathy4917
    @swagatikatripathy4917 3 года назад +1

    Why do we write inferschema= true

    • @TRRaveendra
      @TRRaveendra  3 года назад +2

      InferSchema =True Creating datatypes based on data.
      Header = True creating columns from file first line

  • @srikanthbachina7764
    @srikanthbachina7764 2 года назад

    Hi pls share ur contact details I am looking for python, pyspark, databricks training

  • @balajia8376
    @balajia8376 2 года назад

    root
    |-- cust_id: integer (nullable = true)
    |-- cust_name: string (nullable = true)
    |-- manager: string (nullable = true)
    |-- city: string (nullable = true)
    |-- phno: long (nullable = true)
    |-- _corrupt_record: string (nullable = true) . display(cust_df.filter("_corrupt_record is not null")). FileReadException: Error while reading file dbfs:/FileStore/tables/csv_with_bad_records.csv.
    Caused by: IllegalArgumentException: _corrupt_record does not exist. Available: cust_id, cust_name, manager, city, phno