5. Count rows in each column where NULLs present| Top 10 PySpark Scenario Based Interview Question|

Поделиться
HTML-код
  • Опубликовано: 4 ноя 2024

Комментарии • 7

  • @2412_Sujoy_Das
    @2412_Sujoy_Das 10 месяцев назад +1

    Sagar sir.... my solution in spark sql
    1) df_1 = spark.read.csv("dbfs:/FileStore/tables/Spark_Practise_1.csv", header=True)
    2) df_1.createOrReplaceTempView("Sujoy_1")
    3) %sql Select SUM(CASE WHEN ID like 'null' THEN 1 ELSE 0 END) as ID,
    SUM(CASE WHEN Name like 'null' THEN 1 ELSE 0 END) as Name,
    SUM(CASE WHEN Age like 'null' THEN 1 ELSE 0 END) as Age
    from Sujoy_1

  • @biramdevpawar9902
    @biramdevpawar9902 10 месяцев назад +1

    df2=df1.columns
    column_counts={}
    for nums in df2:
    df3=df1.filter(col(nums).isnull()).count()
    column_counts[nums] = df3
    print(column_counts)

  • @tanushreenagar3116
    @tanushreenagar3116 9 месяцев назад

    nice

  • @syedahamed3728
    @syedahamed3728 4 месяца назад

    df1=df.select([sum(col(c).isNull().cast('int')).alias('c') for c in df.columns])
    df1.show()

  • @surbhinabira3514
    @surbhinabira3514 6 месяцев назад

    from pyspark.sql.functions import count,when
    df = spark.read.option("nullValue","null").csv("dbfs:/FileStore/testing.csv", header=True)
    df.createOrReplaceTempView("temp")
    display(spark.sql("select count(*)-count(id) as nullcount_for_id, count(*)-count(name) as nullcount_for_name,count(*)-count(age) as nullcount_for_age from temp"))

  • @manjulagulabal2923
    @manjulagulabal2923 10 месяцев назад +2

    data = [
    (1, "A", 23),
    (2, "B", None),
    (3, "C", 56),
    (4, None, None),
    (5, None, None)
    ]
    data_schema=['ID','Name','Age']
    df=spark.createDataFrame(data,data_schema)
    df1=df.select([(df.count()-count(i)).alias(i) for i in df.columns])
    df1.show()
    +---+----+---+
    | ID|Name|Age|
    +---+----+---+
    | 0| 2| 3|
    +---+----+---+