Great mock interview! Thanks for uploading it. The candidate did a great job on the market place case study. If I have to nitpick, when the candidate followed the AARRR framework and picked # of downloads and # of login as metrics for acquisition/activation, I don't think those general-purpose metrics apply to market place. To my knowledge, MP is not a stand-alone website/app nor requires separate login; you access it through FB website/app using your FB account. Correct me if I am wrong.
Thanks for the mock video. It is good to know the kinds of notes the interviewers put down. I have learnt here that the interview is really about asking intelligent questions (avoid assumptions), being logical in your approach (rather than guess) and 'showing' a good understanding of the problem (having understanding is not enough, you must talk about it in every step). However, I am not sure the real interviewers are usually this patient from experience because there is an expectation to answer certain number of questions before the interview is over.
I feel the second ask needs more data points to build a query for desired output. You can filter user who published in last 7 days from user_activity but you can not tie these published post to post_reaction without a post_id in user_activity. At best you can only work with post_reaction table by filtering the date for last 7 days, and grouping by the post_id without knowing what user published this what day. Let me know if someone felt the same way..
why do you need post_id in user_activity?? It does not matter what post it is, interviewer wants you to find number of posts that received more emojis than likes from any users who published in last 7 days. For example, if you published anything (does not matter how many times) in last 7 days, I want you to find number of posts you posted that received more emojis than likes.
I have the same concern! The second table has UID of users who reacted to the posts, not of the ones who have posted, so it does not make sense to do a join on that field. There should be a post_id field in the first table!
@@pss1030 The corner case here is that if the publisher likes his own post today that he published two months ago then you are now including this post but shouldnt. Post_id could be added as a foreign key to user_activity to address this but not sure if this messes up some of the other questions.
for the second query, if one user has published 2 posts on the same day, it will left join both posts with each other and thus have 4 rows which only should have 2.
It does not matter in this case because date is not considered. Although left join was not necessary as I'd assume there wouldn't be any posts without user activity - so inner join should be fine.
im on question 2 right now - feeling like this would be easier with a pivot of the data :) --q1 SELECT Date ,Publish_Cnt ,Publish_Cnt / (Publish_Cnt+Comment_Cnt+Upload_Cnt) AS Publish_Pct ,Comment_Cnt ,Comment_Cnt / (Publish_Cnt+Comment_Cnt+Upload_Cnt) AS Comment_Pct ,Upload_Cnt ,Upload_Cnt / (Publish_Cnt+Comment_Cnt+Upload_Cnt) AS Upload_Pct FROM ( SELECT Date ,MAX(CASE WHEN Activity = 'Publish' THEN TTL END) AS Publish_Cnt ,MAX(CASE WHEN Activity = 'Comment' THEN TTL END) AS Comment_Cnt ,MAX(CASE WHEN Activity = 'Upload' THEN TTL END) AS Upload_Cnt FROM ( SELECT Date ,Activity ,COUNT(*) AS TTL FROM user_activity GROUP BY 1,2 )SQ1 )SQ2 --q2 COUNT(*) FROM ( SELECT Post_Id FROM ( SELECT Post_Id ,MAX(CASE WHEN Reactions = 'Emoji' THEN TTL END) AS Emoji_Cnt ,MAX(CASE WHEN Reactions = 'Like' THEN TTL END) AS Like_Cnt FROM ( SELECT Post_Id ,Reactions ,COUNT(*) AS TTL FROM post_reactions PR INNER JOIN user_activity UA ON (UA.UID = PR.UID AND UA.Date > today()-7) GROUP BY 1,2 )SQ1 )SQ2 WHERE Emoji_Cnt > Like_Cnt )SQ3
Second query (Replace IF with CASE for PSQL) SELECT Post_ID, SUM(IF(Reactions='Like', 1, 0)) AS like_count, SUM(IF(Reactions='Emoji', 1, 0)) AS emoji_count FROM post_reactions WHERE UID IN ( SELECT ua.UID FROM user_activity WHERE Date BETWEEN CURRENT_DATE - INTERVAL '7 days' AND CURRENT_DATE AND Activity = 'Publish' ) WHERE Reactions = 'Like' OR Reactions = 'Emoji' GROUP BY Post_ID HAVING SUM(IF(Reactions='Like', 1, 0)) < SUM(IF(Reactions='Emoji', 1, 0))
this is supppper helpful!!!! is someone can help me understandig why expecting candidate elaborate more on the user experience before dive into the question?
WITH du AS (SELECT u.Activity , u.Date ,count(UID) as sum_act FROM user_activity as u GROUP BY u.Activity, u.Date) , Daily AS ( SELECT u.Date ,COUNT(UID) as d_sum_act GROUP BY 1) SELECT du.*,du.sum_act/daily.d_sum_act FROM du INNER JOIN Daily ON du.Date=Daily.Date Here is my answer to the first part of the question, any comments/feedbacks would be appreciated thank you!
Great mock interview! Thanks for uploading it. The candidate did a great job on the market place case study. If I have to nitpick, when the candidate followed the AARRR framework and picked # of downloads and # of login as metrics for acquisition/activation, I don't think those general-purpose metrics apply to market place. To my knowledge, MP is not a stand-alone website/app nor requires separate login; you access it through FB website/app using your FB account. Correct me if I am wrong.
Thanks for the mock video.
It is good to know the kinds of notes the interviewers put down.
I have learnt here that the interview is really about asking intelligent questions (avoid assumptions), being logical in your approach (rather than guess) and 'showing' a good understanding of the problem (having understanding is not enough, you must talk about it in every step).
However, I am not sure the real interviewers are usually this patient from experience because there is an expectation to answer certain number of questions before the interview is over.
I feel the second ask needs more data points to build a query for desired output. You can filter user who published in last 7 days from user_activity but you can not tie these published post to post_reaction without a post_id in user_activity.
At best you can only work with post_reaction table by filtering the date for last 7 days, and grouping by the post_id without knowing what user published this what day.
Let me know if someone felt the same way..
why do you need post_id in user_activity?? It does not matter what post it is, interviewer wants you to find number of posts that received more emojis than likes from any users who published in last 7 days. For example, if you published anything (does not matter how many times) in last 7 days, I want you to find number of posts you posted that received more emojis than likes.
I have the same concern! The second table has UID of users who reacted to the posts, not of the ones who have posted, so it does not make sense to do a join on that field. There should be a post_id field in the first table!
@@pss1030 how do you plan to find out what user posted what posts without user_id in the other table?
@@pss1030 The corner case here is that if the publisher likes his own post today that he published two months ago then you are now including this post but shouldnt. Post_id could be added as a foreign key to user_activity to address this but not sure if this messes up some of the other questions.
for the second query, if one user has published 2 posts on the same day, it will left join both posts with each other and thus have 4 rows which only should have 2.
It does not matter in this case because date is not considered. Although left join was not necessary as I'd assume there wouldn't be any posts without user activity - so inner join should be fine.
im on question 2 right now - feeling like this would be easier with a pivot of the data :)
--q1
SELECT Date
,Publish_Cnt
,Publish_Cnt / (Publish_Cnt+Comment_Cnt+Upload_Cnt) AS Publish_Pct
,Comment_Cnt
,Comment_Cnt / (Publish_Cnt+Comment_Cnt+Upload_Cnt) AS Comment_Pct
,Upload_Cnt
,Upload_Cnt / (Publish_Cnt+Comment_Cnt+Upload_Cnt) AS Upload_Pct
FROM
(
SELECT Date
,MAX(CASE WHEN Activity = 'Publish' THEN TTL END) AS Publish_Cnt
,MAX(CASE WHEN Activity = 'Comment' THEN TTL END) AS Comment_Cnt
,MAX(CASE WHEN Activity = 'Upload' THEN TTL END) AS Upload_Cnt
FROM
(
SELECT Date
,Activity
,COUNT(*) AS TTL
FROM user_activity
GROUP BY 1,2
)SQ1
)SQ2
--q2
COUNT(*)
FROM
(
SELECT Post_Id
FROM
(
SELECT Post_Id
,MAX(CASE WHEN Reactions = 'Emoji' THEN TTL END) AS Emoji_Cnt
,MAX(CASE WHEN Reactions = 'Like' THEN TTL END) AS Like_Cnt
FROM
(
SELECT Post_Id
,Reactions
,COUNT(*) AS TTL
FROM post_reactions PR
INNER JOIN user_activity UA ON (UA.UID = PR.UID AND UA.Date > today()-7)
GROUP BY 1,2
)SQ1
)SQ2
WHERE Emoji_Cnt > Like_Cnt
)SQ3
Use CTEs for everyone's sakes.
Second query (Replace IF with CASE for PSQL)
SELECT
Post_ID,
SUM(IF(Reactions='Like', 1, 0)) AS like_count,
SUM(IF(Reactions='Emoji', 1, 0)) AS emoji_count
FROM post_reactions
WHERE UID IN (
SELECT ua.UID
FROM user_activity
WHERE Date BETWEEN CURRENT_DATE - INTERVAL '7 days' AND CURRENT_DATE
AND Activity = 'Publish'
)
WHERE Reactions = 'Like' OR Reactions = 'Emoji'
GROUP BY Post_ID
HAVING
SUM(IF(Reactions='Like', 1, 0)) < SUM(IF(Reactions='Emoji', 1, 0))
I think it should be just 2-3 metrics, I didn't like product section that much
The 2nd table doesn't make sense to me at all
I kinda agree, I am not sure the USER_ID refers to a user who provided the post or the reaction.
The interviewer is so patient given the situation....
this is supppper helpful!!!! is someone can help me understandig why expecting candidate elaborate more on the user experience before dive into the question?
Because they don't just want to test you for how fast you can solve a leetcode problem but also see how you can tie your numbers to business side
WITH du AS (SELECT
u.Activity
, u.Date
,count(UID) as sum_act
FROM user_activity as u
GROUP BY
u.Activity, u.Date) ,
Daily AS (
SELECT u.Date
,COUNT(UID) as d_sum_act
GROUP BY 1)
SELECT du.*,du.sum_act/daily.d_sum_act
FROM du
INNER JOIN Daily
ON du.Date=Daily.Date
Here is my answer to the first part of the question, any comments/feedbacks would be appreciated thank you!
Thats how I would have done this if it makes you feel any better.