Key takeaways: 1. Talk a lot and explain why you decided to go with the specific solution 2. Don't be afraid to take a pause and rethink the problem 3. Don't get fixated on one aspect of the problem too much, always try to approach it from the bird's view 4. Focus first on the easier parts of the problem and then approach the harder ones 5. Remember it's a conversation with the interviewer, not the solo showoff 6. Ask them questions about the work they've done so far, what they learnt in the company, look for the signs of being valued as a person in a team, how would the first 6 months on the job look like
No you don't want it to be a conversation with the interviewer. Point 5 is wrong, if you are the 'solo showoff' and you talk through a detailed correct solution to a problem for 30 minutes and you hit many of the faucets then the interviewer will be impressed. You want as little tips or hints for the answer as possible, questions to clarify the problem are great if they aren't dumb questions.
I appreciate that this interview is more like a conversation, with a focus on problem-solving. In the past, I've had interviews where the tone was passive-aggressive instead of constructive, so it's refreshing to have a more productive experience. It's great when interviews can be a valuable use of time, rather than a frustrating waste.
I love the part they pull open a Google doc to clarify their points and have it noted down. Excellent problem-solving approach. Focusing on the problem, its scope, its limitation, whats expected behaviour, what isnt etc ...was always something that I was rushing ahead with. Jumping to the solution right away is something that I had to unlearn. Also great on Kylie for being super composed. Another super power and not easy to do in live situations. Great video!
As someone who just finished their masters and is looking for a job in data science, this interview really boosted my confidence cuz I was able to respond to every question
Even not knowing much about data science, this interview was very helpful in learning how it might be applied to real, known problems, and the mutual feedback at the end was very helpful to learn about the dynamics of interviews, too! Thank you guys for doing this.
Thank you for this interview. For the spam issue, you can tag on the traditional features that Kylie Ying mentioned, and then use a multi-lingual embedding model to create vectors out of the post content. Use these features + embeddings to train your model.
Thank you for sharing this video. As someone who is transitioning into the Data Science field( Machine Learning/ AI), I was very surprised that I was able to keep during the interview. I was it lost at all. I don’t have a technical background, but I’ve been studying ,Azure,python,GitHub,R,SQL etc… pretty hard for the past few months, and doing some labs I’m feeling pretty confident I can make the move.
I really liked that Keith decided to use a Google doc, because in these what I would First Round 'team fit' and knowledge-gauging interviews the assumption is you'll just talk over zoom and not do whiteboarding or use another tool. This was a good reminder to expect the unexpected - you could be asked to do anything - maybe even code :)
Hello, I dont usually comment on RUclips but this time I just wanna say thank you for the people involved in this video. This really helped go through my technical interview and I ended up getting the job I wanted. The introduction and close of the interview was almost a copy of what happen in my interview. Obviously, the technical part was different (in my case they just asked about the technical challenge I had to solve prior the interview) but the way I approached the questions was very similar. Just THANKS!!
my approach is slightly different : Basic data collection : 1. You have basic details capture when you create a YT account, name, email, DOB, Image, etc 2. Assume that everytime you log into RUclips your activity is recorded as follows : comment made (if any), time, post, ip_address, email, like, dislikes, report, type of report, followers, activity spent (scrolling, browsing etc) . Selection on queries : 1. Filter on accounts where comments are made and activity time in one session is extremely long (say more than 12hrs) or very frequent activities in a time interval ( e.g. log in/out 5 times in 1 minute) On feature engg side : Extract features -> in 1day how many comments are made, number of links posted in comment, number of reports per time interval, number of words, number of words which are spam, difference between account followers vs those followed , activity time in minutes/seconds (scrolling 100 videos/minutes indicates bot action). Target variables: 1. Set threshold based on existing patterns visible eg. more than 50 comments are day, more than 100 reports in 1hr, clicking 100 video's in 1min, if spam words > 10 ( should satisfy any of these conditions ) -> set to 1 else set to 0. Classification model side : Sklearn (fast and quick to test your ideas and features), Model- XGB, Logistic/SVM (baseline) Deploy, see and rework This is not perfect but this what I was coming up while seeing this video.
Thanks to Nick and Kylie, it was very informative. Love the flow of the interview. I wanted to add something about the real issue around Spam Bots, as this is internal to the platform or system, you can always add a small CAPTCHA routine around the message sender side. Example, the click of "post" or enter button may pre-calculate the CAPTCHA before message is posted in a response, it may look heavy but can easily be done in a lean way.
In my opinion this data scientist interview is pretty weak and useless. The interviewer didn't probe deeply enough on important details to get a real answer of whether the candidate is capable of building a successful spambot model, applying it to the production system and maintaining/improving it continuously. A large part of the interview is spent on listing a bunch of potential features that could be used by the model to classify spam, which does not require deep thinking, understanding or knowledge of the problem. Very little implementation details were actually asked or answered. This candidate only talks about the approach in a very high-level and vague way, like "I'd use Tensorflow and try a few things to get a sense of a good-enough model." This kind of general answer is not helpful at all. We've seen so many candidates who can do the high-level talking points, but fail to have a good understanding of the entire ML production workflow and lack basic implementation skills.
I agree. I feel like she’s more of a computer scientist than a data scientist whose speciality is building models and be able to discuss which models to pursue and how to assess these models. A computer scientist can talk about data science at a very high level like you said but it takes a data scientist to actually go deeper into technical details that make the models and fix any flaws. If she does not have the MIT title, I don’t think I would’ve been very impressed with her although I’m sure she’s very intelligent. She’s just not a data scientist per se but a very smart computer scientist who knows data science concepts like one hot encoding and used tools like tensorflow
Nice video. I think in the beginning when asked if she had any first thoughts on the issue of spam bots, one thing that could've been added was 'what are the positives of bots' Too much was said about the negative aspects of bots, and the first impression I had was, if all bots are so negative, just ban bots. But bots do have a role, and many bots are used to automate functionality. So the real important point is, identify bots that are being malicious in some way. Then dive into how to develop metrics to identify the concept of 'malicious'.
Not only interviewees should see this Mock Interviews, but also the interviewers, this way, they can learn how to do them, because out there are many companies trying to do this type of process but they do it bad and confusing. Thanks for this great video!
Great video, I watched it from beginning to end. I liked how she answered the questions. I was surprised at how conversational the interview was. If possible, could you post a Data Analytics interview as well please. Thank you.
⭐ Contents ⭐ ⌨ (0:00:00) Video overview & format ⌨ (0:02:13) Introductory Behavioral questions ⌨ (0:07:46) Social media platform bot issue task overview ⌨ (0:15:26) What are some features we should investigate regarding the bot issue? ⌨ (0:25:02) Classification model implementation details (using feature vectors) ⌨ (0:41:38) What would a dataset to train models to detect bots look like? How would you approach collecting this data? ⌨ (0:51:38) Technical implementation details (python libraries, cloud services, etc) ⌨ (0:56:01) Any questions for me? ⌨ (1:03:42) Post-interview breakdown & analysis
This was really a useful video. If interviews are like this. It's love 🎉 One thing we can add in this feature are the links. Certain spam posts consists of similar links to the same post/ profile. Not only bots do it, but even people constantly spam their account in the comments. So Link Frequency, Link Context and so on. That feature used with Bayes Classifier can be useful to make the model more robust. Bag of Words from NLP can also be used in order to make this Link tasks easier. Just an opinion of mine. 😊
Can you pls provide Data Engineer interview Process as well? It will be greatly appreciated. Thank you for changing lives for better around the world, including mine! 😁✌️
that was pretty intense at the beginning when he started asking about how would a email filter model look like or how would data set look like to feed the model... She went into implementation immediately which is a normal resort because how else can you defin the data set if your not gonna think of the function that might be doing the filtering. Overall was handled nicely from both sides. Thanks for sharing.
I watched this video like a year ago knowing very little. Now, I feel like I can completely answer each question in detail and follow all of the concepts being mentioned.
Interview seems very easy, as a product owner I speak to our Data scientists and many of them are building business projection models etc. Is the technical implementation the key skill here? Because working in tech now, the first hour of this interview I think most of our stakeholders would even be able to answer during requirements gathering due to subject matter expertise. Like what to train the model on is usually subject matter expertise. I think its so well understood among tech stakeholders that something happens in the system, can you show me a 1 if it, or a combination of things happened, or a 0 if not and push that to some visualization report.
This is a great interview, thanks for doing this! If only all data science interviews would be like this in reality (w/ some exceptions), the world would be a better place.
@@nb7070 personal experience from my machine learning interview - 90% interview is a conversation between you and the hiring person about ur projects and your ideas and approach.
Actually, I think so. I joined a fresher DS Interview a year ago, and they asked me to whiteboard coding, IQ problem solving, mathematics, machine learning, etc. Even though I did not do too worse except for the whiteboard part, I did not land that job. Lmao
I don’t think this approach is quite right. You’re diving into models before any data analysis. Can we clearly define a bot? We need to look at sample bot accounts. How many different types of bots can we identify? What are the similarities between bot groups? Can all the different bot group feature values ranges fit inside a single data envelope or should we concentrate on identifying a single bot group at a time? The amount of time the account has been active is prob a strong feature. Where are bots coming from historically? Are account names and profile pic very similar to an existing account? There could also be an anti-dataset, where accounts that were classified as spam complained and got reinstated? This could help mitigate misclassification. Ethically, could some demographics’ accounts be more susceptible to being wrongly classified as spam?
True, but to effectively tackle the problem, you need solid domain knowledge as well as conducting a deep analysis that arises from interacting with the environment, which is inherently absent in an interview. Not to mention, this is an NLP problem based on some features she mentioned, such as the 'content of account's posts'.
Yes, and also, If we are given Semi-supervised environment (with some accounts blocked due to spam, and others not labeled at all), maybe clustering could be a good strategy to group similar types of accounts based on their features. I would bet the model would help identfying accounts created recently, with low followers, lots of posts/comments, lots of tags. I think creating a class only based on the number of spams restricts the information available. Maybe weight that feature when clustering to give more importance, but I would not use it s the label.
OMG!! I literally said the same thing.... I got interested in AI almost when the ironman first movie released and I kinda moved slowly from Electrical Engineering to DS. I wanted to have my own JARVIS and get into robotics later in the future and since then have been in AI. Have said the same thing in all my interviews! cant believe she had the same motivation :) assuming the motivation is real regardless of it being a mock interview :)
being reported as spam shouldnt be the the only way to mark your mail as spam, many bots/trolls that doing just because they can. World of warcraft has this useless report spam /offensive that automatically kick and mute you from the game for a month if enough reports are summited and doesnt need that many reports in the first place. But we need to have another check to be sure isnt just fake reports, an extra check can be made from the last 10 posts, if any have offensive/repetitive languages combined how often was written.
This interview is interesting but is so unlike any of the 4 in-person interviews I've had in the last 2 months as to be comical. Observations: 1. I don't believe the person being interviewed is answering these case problems cold... some guidance has been given to direct their thinking in preparation. Ex: How many people on the fly can give 5 reasons for x decision-making in classification? In this case the person has serious experience solving the problems and is clearly reading her other screen to develop her answers!!! 2. In my interviews I am not convinced the person actually read my resume. 3. In my interviews I am not convinced I was being seriously considered... in one case I was told the interview was 30 minutes only, and the interviewer kept cutting me off.
one way that will help in labeling for a given account , is how many of the account's posts are reported as spam . If an account shares a post that is for some reason , is reported spam 1000 of times but the other posts have 0 spam report , then how confident should we be to label it as a bot Feature idea : alongwith the number of followers . Each follower should get a weight , i.e, if an account is followed by genuine people (celebrities) then the weight of that incoming link should be high . Note that the problem with the idea that whether an account is followed by other bots or not , was circular in nature .
There are flaws with some of those solutions. Like flagging an account as spam if it is follower by spam accounts. That could lead to valid accounts being flagged as spam or even attacked by bots intentionally adding users to flag them as spam. The thing about using email from random domains is also problematic in many ways, and also using emails that use random characters (some of us use random emails for different accounts precisely to keep spam away and improve privacy). You could also not even guess bot characteristics and feed data for models to try to find common characteristics and trends.
That's what I thought of too. For example, checking if a name is common is not possible. But of course, we're watching from the sideline so it's easy to form criticism.
If time is considered as attribute... A model that predicts Human entered predicted time based on Tweet length can be made.. now after getting the predicted time we can have Bayesian Network with a particular account, time took to predict whether its a SPAM/Bot attack.
I think a good approach for labeling things as spam is writing a program called checkspam that references the posting frequency or if the post are the same within a certain time frame. It could label it as spam if it falls within such parameters. You could make it so the program checkspam would only run if it was reported by another account in order to combat the potential of false flagging from people with negative intentions.
a lot of us would ace our interviews only if the interviewer didn't have some sort of superiority complex. during my interviews, i was never made comfortable and that keeps me on edge throughout the interview and i end up underperforming.
So, I think I want to go into Data Science beside Web Development, and this was pretty handful. Even tho, I miss some points, I answered some questions in a pretty good way. Thank you!!
I didn't care that much for it. The problem was trivial. Reminded me of something from a university lab lecture with breakout groups. I find that she really knows what she is talking about and he used a lot of bluffing, bluster, and talking too much without saying anything to compensate for his lack of knowledge. It's a common tactic I see among white males. I do like that she schooled him at the end.
Thanks for the video. Both of you did a great job in making it feel realistic. My only question is. Is this really the sort of difficulty of the entry data science jobs? It feels suspiciously easy or shallow to me. Can someone back this up? Thank you again!
I have been interviewing for entry level data science jobs and this is fairly accurate! Although I have often been asked about what advanced concepts I know and how I've used it.
Great Video! But are these interview really that easy? So is it like if you have confidence and you're able to have a smooth conversation about what you are thinking about the topic, you get the job? Or is it like for an intern level role, hence easier questions?
This is just one type of interview that you'll encounter in a job search process. The open-ended nature of it should make it less stressful than technical coding interviews. That being said, there is a lot of opportunity here to really demonstrate your abilities. A senior data scientist candidate would be expected to go into a lot more complexity & implementation details than an intern-level candidate. A senior candidate should also be able to clearly communicate trade-offs of any decisions that they make. This type of interview is really designed to see how well someone understands the data science process and to measure how well they can communicate what they know. In my opinion, part of the reason this interview seemed pretty easy is that Kylie is very confident in her approach and could get to key details without needing much/any prodding from me on the interviewer side. To get the job, you'll probably need to succeed in this interview as well as a technical coding interview or two and a behavioral interview.
@@KeithGalli so specially for a Data Science role do the interviews go any deep into implementation details. Like right now I'm looking for similar roles and was easily able to answer these questions myself. The only thing I'm not that confident about is if they ask how decision trees mathematically work or implement a neural network from scratch (I mean I could, but in the heat of the moment I cant). Do they ask such questions in Data Science roles or just in ML roles?
Every interview is different so it's definitely possible that a company could ask you to go into implementation details, but from my perspective knowing when to use decisions trees or neural networks is more important than being able to implement them from scratch. In the real world, we have libraries that make decision trees & neural networks very easy to use. You almost never need to implement something from scratch. As a result, it's more important to understand how they work at a high-level and when to use them and what the relevant Python libraries are. Hope this makes sense!
Usually there are at least 2 rounds: A round like this which is personality/ high level problem solving/culture fit, and there will usually be a technical screening as well. Technical screenings are usually first and weed out people who don't understand the tech stack/ Data Science principles at all or very well. If you know the tech stack or most of it, it is usually no stress.
I have noticed that there wasn't much conversation in regards to what ML model to use or what hyperparameters or architectures. Is this normal for an ML/Data Science interview?
Impressive! Well detailed! @KeithGalli, that siren was pretty loud. Perhaps you have trained a model that picks and feeds only @KylieYYing's voice to you and removes any other unwanted voices or sounds 😁😜
That would have been a fairly easy job interview. No on the fly algorithmic problems to be solved, no mathematical questions, no deep understanding of ML (distributions, statistics, metrics, solvers, backprop, stochastic gradient, ....)...
Calm down, what she said wasn't earth shattering, she's just using the lingo. Once you learn the lingo you too can speak the way she does plus learn data.
i wonder how elon's bandaid solution of monetizing the ability to even use twitter has been working, thats just making more steps for the attackers to achieve their goal but anyways, wonderful example of behavior questions, i wish i was asked these tailored questions instead of the more general ones (how to deal with bad coworkers, etc), the technical questions are really useful too because i was able to be excited to give my own answers while Kylie was answering
@@ramg4699 Well for one, she gives vague, non-technical answers to questions she most likely had before hand. I would expect this level of thoughtfulness from a high school student interviewing to get into a low level college program but not from a Computer Science/Chemical Engineering Grad from MIT.. what’s crazier is that they released this as an example of what a good performance looks like..
@@anon.cashpoorloser5285 But he didn't ask technical questions, that was more business case scenario and how she would deal with it. She is not "applying" for Chemical Engineer either, how did you expect her to answer?
Haha I think it's because Zoom does a good job of filtering out background noise. I didn't hear it during the call, but I used the audio directly from Kylie's camera when I put this video together, so I could hear the siren in the editing phase!!
this question refers to the time around 45 minutes, If your programming the the thing to classify as spam or not spam based on X amount of times reported, wouldn't that just be explicit programming rather than machine learning.
@@kleefbellevue1014 Hey, late response. That would be a simple rule. You can call that a model. But in reality, that simple rule may result in some problems. Accounts could be highly reported due to spam / bots. Imagine a football team account going down due to a high number of reports from bots. We usually need more information to have a high precision to avoid false positives. For this specific case i dont think using the number of reports would be a good measure to label data. Maybe a clustering model would be better. And clustering here means a model (algorithm) that tries to join similar records together and create clusters, in the sense that the distance cross-cluster is maximized while distance between records within same cluster is minimized. This way we could use other features (such as account time creation, number of posts, posts/second ratio, etc), and group these together. Surely accounts with high number of posts/s, with lots of tags per post, lots of reports, and usually originating from the same domain would be identified within the same cluster.
Question for professional Data scientist: @49:35, is a feasible solution to run the feature vectors through a clustering model, then label the clusters as spam/not spam?
She really fumbled with the classification question. The correct answer is that we have two possible approaches. Supervised learning and unsupervised learning. She really did a poor job explaining each of those concepts. She also fumbled explaining feature vectors in a technical manner. She did a good job listing all the data that should be collected but then didn't really know how to turn that data into a feature vector and how to assign weights to each feature. She also didn't explain deployment in any detail. "Cloud" is a very vague answer and the interviewer also doesn't sound like he knows how to deploy a solution to the cloud. It's like neither of them used cloud machine learning tools like Azure ML Studio. Her knowledge would qualify her for a trainee position. I'm amazed that she has her own youtube account where she supposedly explains data science to people. She barely knows any technical details. Does she try to sell any courses to naive people?
@@Kaity11 She showed a very shallow understanding of machine learning concepts even for a junior position. If you perform like this on your job interview, you're not going to get hired by any respectable company.
I agree. I would probably classify this problem as Semi-supervised even. And I would probably have used clustering models to identify similarities among bot accounts, and try to identify them as bots
Key takeaways:
1. Talk a lot and explain why you decided to go with the specific solution
2. Don't be afraid to take a pause and rethink the problem
3. Don't get fixated on one aspect of the problem too much, always try to approach it from the bird's view
4. Focus first on the easier parts of the problem and then approach the harder ones
5. Remember it's a conversation with the interviewer, not the solo showoff
6. Ask them questions about the work they've done so far, what they learnt in the company, look for the signs of being valued as a person in a team, how would the first 6 months on the job look like
Talk with a purpose. I hate people that talk to fill space lol
No you don't want it to be a conversation with the interviewer. Point 5 is wrong, if you are the 'solo showoff' and you talk through a detailed correct solution to a problem for 30 minutes and you hit many of the faucets then the interviewer will be impressed. You want as little tips or hints for the answer as possible, questions to clarify the problem are great if they aren't dumb questions.
I learned that you should probably do some research on the company you want to work at so that you spend more time practising relevant topics
😊
I appreciate that this interview is more like a conversation, with a focus on problem-solving. In the past, I've had interviews where the tone was passive-aggressive instead of constructive, so it's refreshing to have a more productive experience. It's great when interviews can be a valuable use of time, rather than a frustrating waste.
Thanks for featuring!!
You did an amazing job, keep up the good work
Very Good Job . superb way to learn from you
I admire you.
You are really confident while communicating with interviewer. It's almost like 2 colleagues discussing about a problem. Great work
You deserve that
I love the part they pull open a Google doc to clarify their points and have it noted down.
Excellent problem-solving approach.
Focusing on the problem, its scope, its limitation, whats expected behaviour, what isnt etc ...was always something that I was rushing ahead with.
Jumping to the solution right away is something that I had to unlearn.
Also great on Kylie for being super composed. Another super power and not easy to do in live situations.
Great video!
As someone who just finished their masters and is looking for a job in data science, this interview really boosted my confidence cuz I was able to respond to every question
Exactly my thoughts…. Glad to hear someone else with similar thought
enjoy your underpaid intern in the capitalistic world. Everyone is replaceble, you will make zero difference.
@@yugiohfanatic1964 that's a very sad way to view the world. Hope you feel better soon
@@duckcluck123 hello simp
@@yugiohfanatic1964 are you 12
Even not knowing much about data science, this interview was very helpful in learning how it might be applied to real, known problems, and the mutual feedback at the end was very helpful to learn about the dynamics of interviews, too! Thank you guys for doing this.
Thank you for this interview. For the spam issue, you can tag on the traditional features that Kylie Ying mentioned, and then use a multi-lingual embedding model to create vectors out of the post content. Use these features + embeddings to train your model.
As someone who is currently in college and is actively preparing for interviews, this video helped me because i answered every question almost easily.
Grrr i couldn’t answer them all , but I’m gonna catch up to you soon GIDEON
Thank you for sharing this video. As someone who is transitioning into the Data Science field( Machine Learning/ AI), I was very surprised that I was able to keep during the interview. I was it lost at all. I don’t have a technical background, but I’ve been studying ,Azure,python,GitHub,R,SQL etc… pretty hard for the past few months, and doing some labs I’m feeling pretty confident I can make the move.
I really liked that Keith decided to use a Google doc, because in these what I would First Round 'team fit' and knowledge-gauging interviews the assumption is you'll just talk over zoom and not do whiteboarding or use another tool. This was a good reminder to expect the unexpected - you could be asked to do anything - maybe even code :)
Hello, I dont usually comment on RUclips but this time I just wanna say thank you for the people involved in this video. This really helped go through my technical interview and I ended up getting the job I wanted. The introduction and close of the interview was almost a copy of what happen in my interview. Obviously, the technical part was different (in my case they just asked about the technical challenge I had to solve prior the interview) but the way I approached the questions was very similar. Just THANKS!!
my approach is slightly different :
Basic data collection :
1. You have basic details capture when you create a YT account, name, email, DOB, Image, etc
2. Assume that everytime you log into RUclips your activity is recorded as follows : comment made (if any), time, post, ip_address, email, like, dislikes, report, type of report, followers, activity spent (scrolling, browsing etc) .
Selection on queries :
1. Filter on accounts where comments are made and activity time in one session is extremely long (say more than 12hrs) or very frequent activities in a time interval ( e.g. log in/out 5 times in 1 minute)
On feature engg side :
Extract features -> in 1day how many comments are made, number of links posted in comment, number of reports per time interval, number of words, number of words which are spam, difference between account followers vs those followed , activity time in minutes/seconds (scrolling 100 videos/minutes indicates bot action).
Target variables:
1. Set threshold based on existing patterns visible eg. more than 50 comments are day, more than 100 reports in 1hr, clicking 100 video's in 1min, if spam words > 10 ( should satisfy any of these conditions ) -> set to 1 else set to 0.
Classification model side : Sklearn (fast and quick to test your ideas and features), Model- XGB, Logistic/SVM (baseline)
Deploy, see and rework
This is not perfect but this what I was coming up while seeing this video.
My Knowledge increase a lot by watching this. Please Upload more mock interviews like this. I also some technical details in model implementation.
Keith is one fantastic teacher. He took my analytics skills from 0 to 5 very quickly. Great content.
Ghanta
@@yuti65 why bro?
Thanks to Nick and Kylie, it was very informative. Love the flow of the interview. I wanted to add something about the real issue around Spam Bots, as this is internal to the platform or system, you can always add a small CAPTCHA routine around the message sender side. Example, the click of "post" or enter button may pre-calculate the CAPTCHA before message is posted in a response, it may look heavy but can easily be done in a lean way.
In my opinion this data scientist interview is pretty weak and useless. The interviewer didn't probe deeply enough on important details to get a real answer of whether the candidate is capable of building a successful spambot model, applying it to the production system and maintaining/improving it continuously. A large part of the interview is spent on listing a bunch of potential features that could be used by the model to classify spam, which does not require deep thinking, understanding or knowledge of the problem. Very little implementation details were actually asked or answered. This candidate only talks about the approach in a very high-level and vague way, like "I'd use Tensorflow and try a few things to get a sense of a good-enough model." This kind of general answer is not helpful at all. We've seen so many candidates who can do the high-level talking points, but fail to have a good understanding of the entire ML production workflow and lack basic implementation skills.
I agree. I feel like she’s more of a computer scientist than a data scientist whose speciality is building models and be able to discuss which models to pursue and how to assess these models. A computer scientist can talk about data science at a very high level like you said but it takes a data scientist to actually go deeper into technical details that make the models and fix any flaws. If she does not have the MIT title, I don’t think I would’ve been very impressed with her although I’m sure she’s very intelligent. She’s just not a data scientist per se but a very smart computer scientist who knows data science concepts like one hot encoding and used tools like tensorflow
I kinda surprisingly enjoyed this. I didn't even know when the interview started. Feels like two people having a convo about data science
Nice video. I think in the beginning when asked if she had any first thoughts on the issue of spam bots, one thing that could've been added was 'what are the positives of bots' Too much was said about the negative aspects of bots, and the first impression I had was, if all bots are so negative, just ban bots. But bots do have a role, and many bots are used to automate functionality. So the real important point is, identify bots that are being malicious in some way. Then dive into how to develop metrics to identify the concept of 'malicious'.
Not only interviewees should see this Mock Interviews, but also the interviewers, this way, they can learn how to do them, because out there are many companies trying to do this type of process but they do it bad and confusing. Thanks for this great video!
Great video, I watched it from beginning to end. I liked how she answered the questions. I was surprised at how conversational the interview was. If possible, could you post a Data Analytics interview as well please. Thank you.
Can we have Data Analyst mock interviews too?
Yes please
Yeah, that'd be great!
No, unfortunately not 😔
Please 🙏
Yes please
⭐ Contents ⭐
⌨ (0:00:00) Video overview & format
⌨ (0:02:13) Introductory Behavioral questions
⌨ (0:07:46) Social media platform bot issue task overview
⌨ (0:15:26) What are some features we should investigate regarding the bot issue?
⌨ (0:25:02) Classification model implementation details (using feature vectors)
⌨ (0:41:38) What would a dataset to train models to detect bots look like? How would you approach collecting this data?
⌨ (0:51:38) Technical implementation details (python libraries, cloud services, etc)
⌨ (0:56:01) Any questions for me?
⌨ (1:03:42) Post-interview breakdown & analysis
This was really a useful video. If interviews are like this. It's love 🎉
One thing we can add in this feature are the links. Certain spam posts consists of similar links to the same post/ profile.
Not only bots do it, but even people constantly spam their account in the comments. So Link Frequency, Link Context and so on.
That feature used with Bayes Classifier can be useful to make the model more robust. Bag of Words from NLP can also be used in order to make this Link tasks easier.
Just an opinion of mine. 😊
Can you pls provide Data Engineer interview Process as well? It will be greatly appreciated.
Thank you for changing lives for better around the world, including mine! 😁✌️
more mock interviews please
This was so helpful, thank you! It was a delight to watch Kylie's problem solving approach :)
The guy was very friendly and knowledgeable!
Wow, I didn't expect such a video. Very interesting. Thank you for sharing!
that was pretty intense at the beginning when he started asking about how would a email filter model look like or how would data set look like to feed the model... She went into implementation immediately which is a normal resort because how else can you defin the data set if your not gonna think of the function that might be doing the filtering. Overall was handled nicely from both sides. Thanks for sharing.
I watched this video like a year ago knowing very little. Now, I feel like I can completely answer each question in detail and follow all of the concepts being mentioned.
Please upload all mock interviews for web development, app development etc.
this is very helpful. I wish people do more of this type of content
Very useful seeing Kylie thought process in coming up with the answers
Interview seems very easy, as a product owner I speak to our Data scientists and many of them are building business projection models etc. Is the technical implementation the key skill here? Because working in tech now, the first hour of this interview I think most of our stakeholders would even be able to answer during requirements gathering due to subject matter expertise. Like what to train the model on is usually subject matter expertise. I think its so well understood among tech stakeholders that something happens in the system, can you show me a 1 if it, or a combination of things happened, or a 0 if not and push that to some visualization report.
This is a great interview, thanks for doing this! If only all data science interviews would be like this in reality (w/ some exceptions), the world would be a better place.
tell us more about the interviews you had plz
@@alirezouali3119 can you tag me , if he replies
Real life interviews are simillar. Usualy there would be a separate interview to asses your coding skills.
This actually got me hyped for a job interview!
are data science interviews even remotely similar to this mock interview? If so i would be hyped too lol
@@nb7070 personal experience from my machine learning interview - 90% interview is a conversation between you and the hiring person about ur projects and your ideas and approach.
Actually, I think so. I joined a fresher DS Interview a year ago, and they asked me to whiteboard coding, IQ problem solving, mathematics, machine learning, etc. Even though I did not do too worse except for the whiteboard part, I did not land that job. Lmao
I don’t think this approach is quite right. You’re diving into models before any data analysis. Can we clearly define a bot? We need to look at sample bot accounts. How many different types of bots can we identify? What are the similarities between bot groups? Can all the different bot group feature values ranges fit inside a single data envelope or should we concentrate on identifying a single bot group at a time?
The amount of time the account has been active is prob a strong feature.
Where are bots coming from historically?
Are account names and profile pic very similar to an existing account?
There could also be an anti-dataset, where accounts that were classified as spam complained and got reinstated? This could help mitigate misclassification.
Ethically, could some demographics’ accounts be more susceptible to being wrongly classified as spam?
True, but to effectively tackle the problem, you need solid domain knowledge as well as conducting a deep analysis that arises from interacting with the environment, which is inherently absent in an interview.
Not to mention, this is an NLP problem based on some features she mentioned, such as the 'content of account's posts'.
Yes, and also, If we are given Semi-supervised environment (with some accounts blocked due to spam, and others not labeled at all), maybe clustering could be a good strategy to group similar types of accounts based on their features. I would bet the model would help identfying accounts created recently, with low followers, lots of posts/comments, lots of tags.
I think creating a class only based on the number of spams restricts the information available. Maybe weight that feature when clustering to give more importance, but I would not use it s the label.
Thanks for boosting my confidence, that I might get a job if i stay focused on learning. Good luck
Cool interview,
Wanted to add, the account's age would make a very good feature I guess
OMG!! I literally said the same thing.... I got interested in AI almost when the ironman first movie released and I kinda moved slowly from Electrical Engineering to DS. I wanted to have my own JARVIS and get into robotics later in the future and since then have been in AI. Have said the same thing in all my interviews! cant believe she had the same motivation :) assuming the motivation is real regardless of it being a mock interview :)
The initial tip I got from this was to not shave before my interview.
being reported as spam shouldnt be the the only way to mark your mail as spam, many bots/trolls that doing just because they can. World of warcraft has this useless report spam
/offensive that automatically kick and mute you from the game for a month if enough reports are summited and doesnt need that many reports in the first place. But we need to have another check to be sure isnt just fake reports, an extra check can be made from the last 10 posts, if any have offensive/repetitive languages combined how often was written.
Interesting to see a data science interview, it's the same format as other tech interviews
This interview is interesting but is so unlike any of the 4 in-person interviews I've had in the last 2 months as to be comical. Observations:
1. I don't believe the person being interviewed is answering these case problems cold... some guidance has been given to direct their thinking in preparation. Ex: How many people on the fly can give 5 reasons for x decision-making in classification? In this case the person has serious experience solving the problems and is clearly reading her other screen to develop her answers!!!
2. In my interviews I am not convinced the person actually read my resume.
3. In my interviews I am not convinced I was being seriously considered... in one case I was told the interview was 30 minutes only, and the interviewer kept cutting me off.
one way that will help in labeling for a given account , is how many of the account's posts are reported as spam .
If an account shares a post that is for some reason , is reported spam 1000 of times but the other posts have 0 spam report , then how confident should we be to label it as a bot
Feature idea : alongwith the number of followers . Each follower should get a weight , i.e, if an account is followed by genuine people (celebrities) then the weight of that incoming link should be high . Note that the problem with the idea that whether an account is followed by other bots or not , was circular in nature .
Proven. Reliable. Keith Galli!! I love you man!! 😂😂
excellent interview it inspired me to go ahead and update myself into Data Scientist again.
wow is that how it works? I'm gonna update myself to billionaire.
It would be great and crazy if after this video she does the implementation of the model like step by step
There are flaws with some of those solutions. Like flagging an account as spam if it is follower by spam accounts. That could lead to valid accounts being flagged as spam or even attacked by bots intentionally adding users to flag them as spam. The thing about using email from random domains is also problematic in many ways, and also using emails that use random characters (some of us use random emails for different accounts precisely to keep spam away and improve privacy). You could also not even guess bot characteristics and feed data for models to try to find common characteristics and trends.
That's what I thought of too. For example, checking if a name is common is not possible. But of course, we're watching from the sideline so it's easy to form criticism.
If time is considered as attribute... A model that predicts Human entered predicted time based on Tweet length can be made.. now after getting the predicted time we can have Bayesian Network with a particular account, time took to predict whether its a SPAM/Bot attack.
I think a good approach for labeling things as spam is writing a program called checkspam that references the posting frequency or if the post are the same within a certain time frame. It could label it as spam if it falls within such parameters. You could make it so the program checkspam would only run if it was reported by another account in order to combat the potential of false flagging from people with negative intentions.
I simply think any implementation that checks the number of reports is easily abusable
Right? There could have been more out-of-the-box thinking@@MrTheanimekiller
a lot of us would ace our interviews only if the interviewer didn't have some sort of superiority complex. during my interviews, i was never made comfortable and that keeps me on edge throughout the interview and i end up underperforming.
this is the most reasonable interview I've ever seen, which ironically makes it somewhat unrealistic and useless.
great video! more data science mock interviews, pls
Greate video. Create a Data Analyst Mock, please
So, I think I want to go into Data Science beside Web Development, and this was pretty handful. Even tho, I miss some points, I answered some questions in a pretty good way. Thank you!!
Great video, but next time can you please post a how would successful job interview look like?
I didn't care that much for it.
The problem was trivial. Reminded me of something from a university lab lecture with breakout groups. I find that she really knows what she is talking about and he used a lot of bluffing, bluster, and talking too much without saying anything to compensate for his lack of knowledge. It's a common tactic I see among white males. I do like that she schooled him at the end.
I fell asleep and woke up to this playing 💀
This mock interview really impressed me.
Thanks for the video. Both of you did a great job in making it feel realistic. My only question is. Is this really the sort of difficulty of the entry data science jobs? It feels suspiciously easy or shallow to me. Can someone back this up? Thank you again!
That's what I'm saying
I have been interviewing for entry level data science jobs and this is fairly accurate! Although I have often been asked about what advanced concepts I know and how I've used it.
Great Video! But are these interview really that easy? So is it like if you have confidence and you're able to have a smooth conversation about what you are thinking about the topic, you get the job? Or is it like for an intern level role, hence easier questions?
This is just one type of interview that you'll encounter in a job search process. The open-ended nature of it should make it less stressful than technical coding interviews. That being said, there is a lot of opportunity here to really demonstrate your abilities. A senior data scientist candidate would be expected to go into a lot more complexity & implementation details than an intern-level candidate. A senior candidate should also be able to clearly communicate trade-offs of any decisions that they make. This type of interview is really designed to see how well someone understands the data science process and to measure how well they can communicate what they know.
In my opinion, part of the reason this interview seemed pretty easy is that Kylie is very confident in her approach and could get to key details without needing much/any prodding from me on the interviewer side.
To get the job, you'll probably need to succeed in this interview as well as a technical coding interview or two and a behavioral interview.
@@KeithGalli so specially for a Data Science role do the interviews go any deep into implementation details. Like right now I'm looking for similar roles and was easily able to answer these questions myself. The only thing I'm not that confident about is if they ask how decision trees mathematically work or implement a neural network from scratch (I mean I could, but in the heat of the moment I cant). Do they ask such questions in Data Science roles or just in ML roles?
Every interview is different so it's definitely possible that a company could ask you to go into implementation details, but from my perspective knowing when to use decisions trees or neural networks is more important than being able to implement them from scratch. In the real world, we have libraries that make decision trees & neural networks very easy to use. You almost never need to implement something from scratch. As a result, it's more important to understand how they work at a high-level and when to use them and what the relevant Python libraries are. Hope this makes sense!
@@KeithGalli yup, that makes sense. Thanks for replying!
Usually there are at least 2 rounds: A round like this which is personality/ high level problem solving/culture fit, and there will usually be a technical screening as well.
Technical screenings are usually first and weed out people who don't understand the tech stack/ Data Science principles at all or very well. If you know the tech stack or most of it, it is usually no stress.
Would love to see the technical programming interview as well! Thank you for sharing this.
she can't code LOL!
Very Nice , Post more content on Data Science!
I have noticed that there wasn't much conversation in regards to what ML model to use or what hyperparameters or architectures. Is this normal for an ML/Data Science interview?
Meaningful insights, thanks for sharing it!
Great video on interview processes! ❤
Impressive! Well detailed! @KeithGalli, that siren was pretty loud. Perhaps you have trained a model that picks and feeds only @KylieYYing's voice to you and removes any other unwanted voices or sounds 😁😜
That would have been a fairly easy job interview. No on the fly algorithmic problems to be solved, no mathematical questions, no deep understanding of ML (distributions, statistics, metrics, solvers, backprop, stochastic gradient, ....)...
So mr tusk is hiring again, so soon? Seriously though, A software developer interview would be great.
cried meself to sleep when she said she has a masters
edit: the way she speaks about things omg, wow, I wanna be like her one day
Calm down, what she said wasn't earth shattering, she's just using the lingo. Once you learn the lingo you too can speak the way she does plus learn data.
Great Interview, thank you you guys! CEO of quitter Elon Tusk, got me😂
One for Full stack development please.
Another feature to be included could be the IP address from which the request is coming
Would love one for Data engineering too
30:00 - omg, and im literally hearing the sirens with my studio pairs headphones.
"You had me at 'Bachelors and Masters from MIT' 😁"
She literally does have that tho 😅
Clarification at 48:37 can’t we just consider the accounts that have already been reported as spam and create a dataset just based on these accounts?
Incredible video! Nice job.
Share please what kind of headphones you have 😃 GJ guys!
Thank you for this. Wow!
Why not just train a neural network on text patterns including spam and flag the messages to be filtered ?
Thank you so much for sharing.
i wonder how elon's bandaid solution of monetizing the ability to even use twitter has been working, thats just making more steps for the attackers to achieve their goal
but anyways, wonderful example of behavior questions, i wish i was asked these tailored questions instead of the more general ones (how to deal with bad coworkers, etc), the technical questions are really useful too because i was able to be excited to give my own answers while Kylie was answering
"our CEO has been complaining...". I would fire this guy
Excellent video! What about a video on an entry-level no experience ciber security interview? 😊
Thanks for this great insight!
My big takeaway from this interview is .... bots and spam are here to stay
This channel is amazing
Very exciting!
At 28:00 I think mentioning dataset format to train would be tabular or non tabular include which features as discrete variable could better
Thanks for sharing this.
For all those who think this level of interview performance will land you a job in this market.. it won't. Good luck out there!
Explain why
@@ramg4699 Well for one, she gives vague, non-technical answers to questions she most likely had before hand. I would expect this level of thoughtfulness from a high school student interviewing to get into a low level college program but not from a Computer Science/Chemical Engineering Grad from MIT.. what’s crazier is that they released this as an example of what a good performance looks like..
@@anon.cashpoorloser5285 But he didn't ask technical questions, that was more business case scenario and how she would deal with it. She is not "applying" for Chemical Engineer either, how did you expect her to answer?
I sincerely doubt that’s a standard interview.
It looks more like a guided project on Coursera.
I heard the Siren..How come the interviewer wearing such a good headset could not
Haha I think it's because Zoom does a good job of filtering out background noise.
I didn't hear it during the call, but I used the audio directly from Kylie's camera when I put this video together, so I could hear the siren in the editing phase!!
Sir i have completed bsc physics but now I m very instrested in programing can i m eligible in msc IT Or data science
this question refers to the time around 45 minutes,
If your programming the the thing to classify as spam or not spam based on X amount of times reported, wouldn't that just be explicit programming rather than machine learning.
am just a high school freshman trying to get into machine learning, so I dont yet know how it works.
@@kleefbellevue1014 Hey, late response. That would be a simple rule. You can call that a model. But in reality, that simple rule may result in some problems. Accounts could be highly reported due to spam / bots. Imagine a football team account going down due to a high number of reports from bots. We usually need more information to have a high precision to avoid false positives. For this specific case i dont think using the number of reports would be a good measure to label data. Maybe a clustering model would be better. And clustering here means a model (algorithm) that tries to join similar records together and create clusters, in the sense that the distance cross-cluster is maximized while distance between records within same cluster is minimized. This way we could use other features (such as account time creation, number of posts, posts/second ratio, etc), and group these together. Surely accounts with high number of posts/s, with lots of tags per post, lots of reports, and usually originating from the same domain would be identified within the same cluster.
@@kleefbellevue1014 but we would need to find that X value. I guess we'll have to use logistic regression.
Question for professional Data scientist:
@49:35, is a feasible solution to run the feature vectors through a clustering model, then label the clusters as spam/not spam?
Amazing interview
She really fumbled with the classification question. The correct answer is that we have two possible approaches. Supervised learning and unsupervised learning. She really did a poor job explaining each of those concepts. She also fumbled explaining feature vectors in a technical manner.
She did a good job listing all the data that should be collected but then didn't really know how to turn that data into a feature vector and how to assign weights to each feature.
She also didn't explain deployment in any detail. "Cloud" is a very vague answer and the interviewer also doesn't sound like he knows how to deploy a solution to the cloud. It's like neither of them used cloud machine learning tools like Azure ML Studio.
Her knowledge would qualify her for a trainee position. I'm amazed that she has her own youtube account where she supposedly explains data science to people. She barely knows any technical details. Does she try to sell any courses to naive people?
🙄
Nobody is gonna get every question right.. nobody’s god
@@Kaity11 She showed a very shallow understanding of machine learning concepts even for a junior position. If you perform like this on your job interview, you're not going to get hired by any respectable company.
I agree. I would probably classify this problem as Semi-supervised even. And I would probably have used clustering models to identify similarities among bot accounts, and try to identify them as bots