Capgemini Data Engineer Interview Question - Round 1 | Save Multiple Columns in the DataFrame |

Поделиться
HTML-код
  • Опубликовано: 22 ноя 2024

Комментарии • 27

  • @kunuturuaravindreddy5879
    @kunuturuaravindreddy5879 3 месяца назад +1

    very good you are posting real interview questions many of them simply explain concer defentitiins

    • @GeekCoders
      @GeekCoders  3 месяца назад

      @@kunuturuaravindreddy5879 thanks

  • @sourav_sarkar_2000
    @sourav_sarkar_2000 9 месяцев назад +4

    # creating a dict of columns as to avoid checking multiple datatypes
    d={}
    for col in df.dtypes:
    if col[1] not in d:
    d[col[1]] = [col[0]]
    else:d[col[1]].append(col[0])
    for key,val in d.items():
    df.select(val).show()
    # write df to the location

  • @Offical_PicturePerfect
    @Offical_PicturePerfect 3 месяца назад

    int_cols = [col for col, dtype in df.dtypes if dtype == 'int']
    string_cols = [col for col, dtype in df.dtypes if dtype == 'string']
    float_cols = [col for col, dtype in df.dtypes if dtype == 'float']
    Creating DataFrames for each data type
    int_df = df.select(int_cols)
    string_df = df.select(string_cols)
    float_df = df.select(float_cols)

  • @sourav_sarkar_2000
    @sourav_sarkar_2000 9 месяцев назад +1

    # creating a dict of columns to avoid checking multiple datatypes
    d={}
    for col in df.dtypes:
    if col[1] not in d:
    d[col[1]] = [col[0]]
    else:d[col[1]].append(col[0])
    print(d)
    for key,val in d.items():
    df.select(val).show()
    # write df to the location
    # df.write.mode('overwrite').save(f'temp_loc/{key}')

  • @myl1566
    @myl1566 10 месяцев назад +1

    Good problem to solve. Thanks for posting sagar!

  • @aamirmansuri69
    @aamirmansuri69 10 месяцев назад +2

    Thank you for posting this video. But, can you please post pyspark interview questions for freshers. Thank you!

  • @rawat7203
    @rawat7203 10 месяцев назад +1

    My Way Sir
    intType = []
    stringType = []
    floatType = []
    for i in df.dtypes:
    if i[1] == 'int':
    intType.append(i[0])
    elif i[1] == 'string':
    stringType.append(i[0])
    elif i[1] == 'float':
    floatType.append(i[0])
    dfInt = df.select(*intType)
    dfString = df.select(*stringType)
    dfFloat = df.select(*floatType)

  • @Dataengineeringlearninghub
    @Dataengineeringlearninghub 10 месяцев назад +1

    Great problem sagar

  • @Nextgentrick
    @Nextgentrick 9 месяцев назад

    Shouldn’t you use append instead of overwrite

  • @vutv5742
    @vutv5742 10 месяцев назад +1

    Completed 👏

  • @ug1880
    @ug1880 9 месяцев назад +1

    Were u asked for any imocha test ?

    • @GeekCoders
      @GeekCoders  9 месяцев назад +1

      No

    • @ug1880
      @ug1880 9 месяцев назад

      @@GeekCoders okk...

  • @rawat7203
    @rawat7203 10 месяцев назад +1

    Thanks a lot Sir

  • @pradishpranam6175
    @pradishpranam6175 9 месяцев назад

    cool question

  • @bhumikalalchandani321
    @bhumikalalchandani321 10 месяцев назад

    okay, is this internal functionality of conversion to parq format

  • @2412_Sujoy_Das
    @2412_Sujoy_Das 10 месяцев назад

    My solution is as follows:
    string = df
    integer = df
    float = df
    for i in df.dtypes:
    if i[1]!='string' and i[1]=='int':
    string = string.drop(i[0])
    float = float.drop(i[0])
    elif i[1]!='string' and i[1]=='float':
    string = string.drop(i[0])
    integer = integer.drop(i[0])
    elif i[1]!='int' and i[1]=='string':
    integer = integer.drop(i[0])
    float = float.drop(i[0])
    elif i[1]!='int' and i[1]=='float':
    integer = integer.drop(i[0])
    string = string.drop(i[0])
    elif i[1]!='float' and i[1]=='string':
    float = float.drop(i[0])
    integer = integer.drop(i[0])
    else:
    float = float.drop(i[0])
    string = string.drop(i[0])
    print(string)
    print(integer)
    print(float)

  • @pratyushkumar8567
    @pratyushkumar8567 10 месяцев назад +1

    Hi Sagar
    this Capgemini Data Engineer Interview Question - Round 1 | Save Multiple Columns in the DataFrame
    what was the experience the candidate has ?

  • @SouvikMitul
    @SouvikMitul 6 месяцев назад

    my solution:
    dict={}
    for i in df.dtypes:
    if i[1] in dict.keys():
    l=dict.get(i[1])
    l.append(i[0])
    dict.update({i[1]:l})
    else:
    l=[]
    l.append(i[0])
    dict.update({i[1]:l})

    for i in dict.keys():
    df_s=df.select(dict.get(i))
    df_s.show()
    ##did show instead of writing