Accused Harvard Professor Claims Innocence! (Fake Data Scandal)

Поделиться
HTML-код
  • Опубликовано: 24 ноя 2024

Комментарии • 459

  • @PeteJudo1
    @PeteJudo1  Год назад +18

    Go to ground.news/pete to see all sides of every story and stay informed on breaking news. Sign up or subscribe through my link before Oct 31, 2023 for 30% off unlimited access.

    • @emiel89
      @emiel89 Год назад +10

      I think you should respond to some of the comments below. As it seems, you are one of the few who is convinced by her defense. It would be very interesting to get your response to some of the better comments below your video. (I can't remember if you said that you read or never read your comments, so here is my attempt at getting a response)

    • @PeteJudo1
      @PeteJudo1  Год назад +7

      @@emiel89 the conversation is great. Will make a follow up for sure.
      But to clarify, I’m not convinced she is totally innocent. I’m just convinced there is enough here for me to doubt whether all the allegations against her for this particular study, will stand.
      People make a lot of good points here though. Open to being wrong, like any good scientist.

    • @Jimothy-723
      @Jimothy-723 Год назад +2

      @@PeteJudo1 7:06 no college in the united states has ever applied that principle when investigating their own students, so why do we have that burden when they do not?

    • @Jimothy-723
      @Jimothy-723 Год назад

      "open to being wrong like any good scientist."
      you realize that many times if a scientist gets it wrong, people will die as a result... right? example: medicine. Another example: hand washing.

    • @superben1755
      @superben1755 Год назад

      Pretty terrible ad for ground news here. You said Harvard was ranked "the worst college for free speech", but failed to mention the source of that ranking - the right wing funded FIRE organization. If ground news is making you fall for obvious right-wing talking points, doesn't seem like a great product. Just so people know, these are some of the ~extreme~ examples of free speech restriction FIRE listed for Harvard's ranking:
      1. Rescinding Kyle Kashuv’s acceptance. It was rescinded after multiple instances of him using antisemitic and racist language in high school emerged online.
      2. Termination of David D. Kane for a college blog he had founded and moderated that posted racially charged language under a pseudonym. There is no evidence that Kane was even terminated or if this was related to his leaving Harvard.
      3. Failed to give tenure to García Peña. They give no reason as to why this is a free speech issue.

  • @stumpy2000
    @stumpy2000 Год назад +828

    I'm afraid I don't. Removing the suspicious data points and saying "look it's still significant" isn't that compelling. It assumes the obviously suspicious data points were the only manipulation. The fact that any part of the data set looks (very) dodgy means the whole set should be regarded with suspicion.

    • @l4m41987
      @l4m41987 Год назад +34

      true and agree, but a requires a lot more now to prove the case.

    • @forallthestupidshit3550
      @forallthestupidshit3550 Год назад +2

      "We delete outliers" was the policy at Theranos, for data points. Compelling or not, it didn't work out for them.

    • @EvelynBodoni
      @EvelynBodoni Год назад

      I disagree. I think it's difficult to argue that there was additional manipulation. Given Appendix B, there doesn't seem to be any pressure to commit fraud. If the data was already statistically significant, why manipulate it for the same result?
      The discrepancies can also be explained on Gino's website. Tbh very biased, but there is a strong argument since the article published in this video. The dataset analyzed by Harvard Business School and Data Colada may not be the original dataset. Studies often involve multiple dataset files, especially when shared across multiple RA's and faculty. Gino claims the original file was remitted from the investigation.
      The company hired by HBS to analyze the dataset was Maidstone Consulting Group, and they themselves expressed their uncertainty that this was the correct dataset to be analyzing. This "original" data is consistent with the data posted on OSF. Gino also explains discarded or corrected data between the July 13th and July 16th as "excluding participants... deemed to be non-complaint."
      Note: I did find the last modified times to be a bit suspicious in Gino's rebuttal. She claims the OG file was last modified 4:30PM and the HBS file was last modified 8:57PM. There's no evidence to say that these were not the same original spreadsheet with data fraud committed later in the day.

    • @2021philyou
      @2021philyou Год назад +50

      why neither Harvard nor Prof. Gino request that the contested experiments are repeated by an independent group of researchers on newly collected data? If the research results are true, repeating the experiments will confirm them above any doubts. So simple. Does anyone of the two stakeholders really want the research to be verified? Thus running the risk a reputation blow for both stakeholders in case of the research appears to be fabricated?

    • @Spoopball
      @Spoopball Год назад +23

      ​@@2021philyou
      I assume no one wants to shell out the cash to redo the findings, after all most of the research was done with funding/grants

  • @Archimedes115
    @Archimedes115 Год назад +264

    Shouldn’t there just be *zero* manipulated data points

    • @thombaz
      @thombaz Год назад +1

      Well data thats are extreme are considered errors as I know. At least thats what I learned in biology and in mechanic. If you mesure the growth of a tree and you got data like: 1, 1.2, 1.1, 0.9, 4.2, 1, 1.3, 1.1, than you assume that the 4.2 is an error and there was some problem with the mesuring ect so you don't use that data.

    • @Happydrumstick93
      @Happydrumstick93 Год назад +13

      @@thombaz It is true that extreme outliers are *sometimes* omitted, sometimes they are kept in because it can give an indication of the measurement error.
      But in the case of Francesca the data this was duplicated was IDs - not measurements from a sensor. So someone assigned unique IDs to every person using a program, if there was a lot of them that a human couldn't look at and verify then it would have been done using a function to ensure there are no duplicates. Then other sorting would be done.
      The only possible way that this could come out is if she assigned the IDs, done the sorting, found out that there was some issues, fabricated or manipulated data and forgot to do the check for IDs again.

    • @thombaz
      @thombaz Год назад +2

      @@Happydrumstick93 I know this is a different case I just reacted on the comment about zero manipulation. Omiting data can be seen as manipulation, but It's pretty common as I know (but I don not work on the field. I think you are right on the given case tho.

  • @nobodyqwertyu
    @nobodyqwertyu Год назад +295

    But this "explanation" doesn't explain why there was suspicious data in the first place. Why would some students have duplicate id numbers and why would she switch the rows around? Her defense that here were even more duplicate IDs that they didn't find is rather weak. It just proves that there's more manipulation than they caught. At best, she is extremely careless with her data analysis and it's not up to the standard for a professor at a university.

    • @giomjava
      @giomjava Год назад +10

      Yes. Maybe a MS student level, definitely no a professor.

    • @THEweelaful
      @THEweelaful Год назад +4

      I'd guess human error. Professors do that too.

    • @danf8172
      @danf8172 Год назад +9

      You’ve clearly never worked with real data this shit happens all the time

    • @giomjava
      @giomjava Год назад +22

      @@danf8172 this "shit" happens only if you're inexperienced or careless. And it definitely should never have been using in a published paper.

    • @mxg30
      @mxg30 Год назад +5

      Moreover, they found all of them. If you read DC original post they explained that other differ in major, age and other field, it could be just a mistake in ID

  • @lukas_613
    @lukas_613 Год назад +105

    I disagree with your overall assessment. Her main argument is that data colada didn't identify all of the problematic rows in their original blog post, which could theoretically introduce bias.
    However, the Harvard investigation had access to the original files before they were tampered with. Data colada did a recent blog post on September 16 that you didn't mention in your video, where they discuss some of the screenshots from the original files that became available in the course of the trial. Apparently, it could be confirmed that all 8 rows mentioned in the original blog post were meddled with, while they missed an additional 3. Furthermore, there were many more alterations that affected individual Excel cells.
    Francesca attacking the blog post is irrelevant and a distraction. She needs to explain all the discrepancies between early and late versions of the data files for her studies.

    • @albertonishiyama1980
      @albertonishiyama1980 Год назад

      Also, the manipulate data being ripped isnt a answer... putting the real data back is.
      Data Colada showed what that the numbers are bellow aceptable with the original data, so what she's doing is basically cuting the negative correction she should had and stoping the drop on results just under the limit.
      It's like saying "I pickpocket 50 dolars from a friend who I was 25 dolars indebt. He found and I gave the 50 back, so now he has a 25 dolar debt to me"

  • @theondono
    @theondono Год назад +227

    I’m not at all convinced by this “arguments”. The guys at Data Colada never said they had found an exhaustive list of false entries, so I don’t think it’s really that important that removing the points doesn’t get rid of the full effect, the tendency of the change is good enough.
    It also stands to reason that, given how badly concealed the supposed manipulation was, the reason for having those entries amongst the other ones is that those values were *edited*, not created anew.
    If the previous values were evidence against the hypothesis, changing them for values that support the hypothesis will have this exact effect (removing the points still keeps the effect significant).

    • @paolobailo
      @paolobailo Год назад +9

      this! as I was watching the video I was screaming these exact words at the screen :D very nicely said, sir!

    • @janedrowned
      @janedrowned Год назад +4

      That's a very good point. A follow-up question though: if the undesired entries were overwritten and previously belonged to real students, why change student ID if you could only change the results to fit the hypothesis?

    • @uncutuser
      @uncutuser Год назад

      My thoughts exactly as I was watching this !

    • @theondono
      @theondono Год назад +3

      @@janedrowned That would require way more knowledge into the details of how the research was conducted, but this may be done to throw of the people involved in the data collection.
      Before publication it’s not uncommon to seek feedback from everyone involved. These people are the closest to the study, and thus the ones that are in the best position to detect the fraud. If I’m the TA/post doc/phd student that ran a particular data collection session, it would be normal to keep my own record of it.
      Missing columns are not a big issue, since that data might have been pruned through statistical methods, they might have been considered an outlier, etc… But records with matching IDs and different values would be suspicious and harder to justify, and any enterprising member will be sure to raise the issue.

    • @janedrowned
      @janedrowned Год назад +2

      @@theondono that makes a lot of sense, thank you. (Still though, the tamperer could've come up with IDs that weren't previously introduced, rather the duplicating.)

  • @01Aigul
    @01Aigul Год назад +267

    This is far too generous. Saying "these results would hold if we removed the suspicious data points" only matters if the allegation is that these data points were manufactured out of nothing. I believe Data Colada suggested that the points were moved from one condition to the other, in which case removing them would be dropping all of the points that tended not to support her hypothesis, so this would still be cheating. Additionally, the reasonable doubt standard is used in criminal trials when someone could end up in prison. In a civil trial, which is much more similar to this case, the standard is "a preponderance of evidence" (at least in the US)

    • @psblbl
      @psblbl Год назад

      Thanks for writing this so I didnt have to 🙂. Regrettably our host here has become and unwitting victim of Francesca FUD --> Useful Idiot.

    • @nondescriptnyc
      @nondescriptnyc Год назад +17

      I wholeheartedly agree. Even if the results WERE to still come back significant in sport of Gino’s conclusions, that does not clear her of the data fraud charges. It is like saying, “I peed into the office coffee maker, but nobody got sick. So, why is everyone upset?” You see, Gino doesn’t seem to realize fudging data, inherently, is the problem-and this certainly isn’t about the original data supporting her hypothesis or not at all.

    • @nonename7869
      @nonename7869 Год назад +6

      My friend is a teacher at a technical college and he has expressed his concerns to me about the future because some of his students have displayed an inability to round numbers consistently. He told me that the school doesn't necessarily do much by way of reprimanding cheaters either. We are now the older generation and are concerned that the buildings of the future may be built and designed by morons and dullards because the system cranks them out in exchange for tuition fees masked by certificates and job placement numbers.

    • @Heyu7her3
      @Heyu7her3 Год назад

      ​@@nonename7869technical colleges are typically suggested to students who didn't do well in traditional academics. so, it's also a recruitment/ marketing issue

    • @nonename7869
      @nonename7869 Год назад

      @@Heyu7her3 academia has been infiltrated by frauds and also for profit business model. These days I believe it's more practical for young people to get a useful trade than to debate peer reviewed journals then end up master debaters like Ben Shapiro lacking a lot that he will never see or admit to seeing lol. My friend quit teaching because it's not worth it to be one of a few maintaining the integrity and standards. God help us all.

  • @quaest
    @quaest Год назад +129

    No, I'm not convinced. They pointed out the most obvious ones. The question is why are there 'irregularities' like that in the study at all? Could it be that she just did a better job on hiding the ones?

    • @freshrockpapa-e7799
      @freshrockpapa-e7799 Год назад +1

      Why would she do a good job at hiding some and not others? That's reaching

    • @lukas_613
      @lukas_613 Год назад +17

      @@freshrockpapa-e7799 We don't know why she chose to duplicate some of the participants or manipulate entire rows. But she doesn't really "need to do a good job" of hiding her manipulations. She can just go to an individual Excel cell and change the number that's there. And according to the evidence from the trial, she did that as well. This is hard to detect without accessing the original files.

    • @vvslavavv
      @vvslavavv Год назад +6

      @@freshrockpapa-e7799 Who knows? laziness? forgetfulness, etc

    • @freshrockpapa-e7799
      @freshrockpapa-e7799 Год назад

      @@lukas_613 You're assuming she did manipulate the data to begin with, so I'm gonna disregard your opinion since it's not based on logic and evidence.

    • @freshrockpapa-e7799
      @freshrockpapa-e7799 Год назад

      @@vvslavavv "etc" including she didn't do it in the first place, I assume.

  • @lukas_613
    @lukas_613 Год назад +273

    She hired a PR firm and is trying to abuse the legal system to intimidate whisteblowers. This should not be rewarded by "reserving our judgement and looking at both sides of the debate", it should be strongly condemned.

    • @passerby4507
      @passerby4507 Год назад +20

      You're making a deeply wrong argument. Assuming she is innocent, then she would very obviously sue Harvard for defamation because her reputation was indeed extremely damaged. That's not intimidation.
      If you assume she's guilty, then yes it's intimidation, but do you see how wrong this is?

    • @lukas_613
      @lukas_613 Год назад +1

      @@passerby4507 I don't just assume she was guilty. I was pretty sure after reading the four posts on the data colada blog. It was still theoretically possible that one of her research assistants was responsible. However, Harvard then formed a committee which did an extensive investigation and wrote a report that was allegedly over a thousand pages long. Harvard came to the conclusion that she committed fraud, and must have investigated other possibilities, e.g. the rogue assistant.
      This is how the scientific community polices fraud. A committee is formed by the university and/or the research journal, which conducts an investigation, which reaches a verdict. If you disagree with the verdict and believe the university wasn't impartial, you could ask an external committee of scientists to do a second investigation. But I really don't see how involving a judge and the courts would help settle the question.

    • @ironyelegy
      @ironyelegy Год назад +10

      @@passerby4507 question: why invest any money in preventing people from speaking out about your practices?

    • @yunki_
      @yunki_ Год назад +16

      @@ironyelegyCease and desist actions are necessary to counter harassment campaigns, but they can just as easily be abused. Your personal bias affects how you view these actions.

    • @j2simpso
      @j2simpso Год назад +8

      @@ironyelegybecause it takes decades to build a reputation and a couple of minutes to destroy it

  • @press2701
    @press2701 Год назад +75

    I'm not convinced. "If I removed the suspicious data I get same result" is nonsense. p

    • @ethanganes9266
      @ethanganes9266 Год назад +10

      THANK YOU! I was literally just thinking less than .05 is a HUGE difference than .001 my god

    • @andrewmiller3055
      @andrewmiller3055 Год назад +4

      True. Just a mess. The argument of, "I'm still right just remove the nonsense columns" is a non starter. The study had to address that, and it didn't. Joke.

    • @darylrichardson8567
      @darylrichardson8567 Год назад +1

      EXACTLY!!! They’re dependent on people, not knowing proportions and decimals. Either one is a huge measure. That’s why the Margin of Error exists.

  • @jacksonallan954
    @jacksonallan954 Год назад +44

    The argument that removing the suspect points doesn't change the overall trend of data isn't as convincing as you think because it assumes, firstly, that the obviously suspect data points are the only ones that were manipulated and, secondly, that these points were added. If the professor chose to alter the data points that most conflicted with her desired outcome, then simply removing those points will not show the original outcome of the study.

    • @LilJbm1
      @LilJbm1 Год назад +5

      Good point so what you're saying is data points which initially would've disproven her hypothesis (increased p value) were edited to support her hypothesis (decreased p value). Just removing them isn't restoring the data because you lose the data point dragging the average UP. 🤔

  • @TripImmigration
    @TripImmigration Год назад +56

    Wait a minute:
    DataColada flagged suspicious data, and now she pointed more suspicious data
    Is this a TikTok where people are registering on camera their own crimes? 🤣

  • @JonKPowers
    @JonKPowers Год назад +42

    I want to hear an explanation of why these data ordering and duplicate anomalies are there in the first place. A more convincing case against their approach would have been to explain how these got there in the first place, which eliminates the “suspicious” label altogether. And the motive is obvious: an ordinary significance vs. an extremely high significance will make the paper more compelling.

  • @marcfelix1006
    @marcfelix1006 Год назад +95

    Assuming innocence until proven guilty does not mean that one has to interpret any argument in her favor.

    • @1Hydraulic1
      @1Hydraulic1 Год назад +7

      I was just thinking this, the whole video is biased upon this

    • @dudeiii2069
      @dudeiii2069 Год назад

      This just means that the burden of proof is in Data Colada, meaning they have to either debunk her arguments or bring up new arguments. Now, whether the proof given by Data Colada, is sufficient is beyond me. But what Francesca Gino trying to say was that ‘the prove was not up to standard of ‘beyond reasonable double’ so I should retain this presumption of innocence’. Whether you find her argument convincing depends on your opinion, but it is very important to note that this is a ‘beyond reasonable doubt standard’ which is on the same level as criminal cases and just based on a balance of probability. I mean ask yourself, are you almost absolutely sure that this is data fraud? If not then the prove is not good enough.

    • @marcfelix1006
      @marcfelix1006 Год назад +4

      @@dudeiii2069 I never said that everyone should have a certain final conclusion regarding this case. The thing I criticise is that pretty weak arguments are here displayed as "pretty convincing", which is simply not the case. Data Colada has given some good arguments, and all she does is rely on technicalities to "disprove" them, like saying that also other (not important) entries are out of order, or that the elements that are out of order could be selected differently or some nonsense. If there was no fraud, then why not just explain why there were any elements out of order in general?
      If there are compelling arguments for fraud, and if the person in question can not explain how they occured in a non-fraudulent context, and instead just relies on statements like "why would I cheat if I didn't need to", then I would argue that the term "beyond reasonable doubt" is pretty fitting here.

    • @3glitch9
      @3glitch9 Год назад

      ... or _not_ .

  • @spshkyros
    @spshkyros Год назад +68

    She needs to show that the data was not manipulated by providing an innocent explanation for those unusual rows. The argument you are forwarding from her website is that since the flagged rows only increase the effect, rather than are responsible for it entirely, she must be innocent, but that presumes that the other data is not manipulated - which we have reason to doubt now. The presumption of innocence is lost when there are signs of manipulation, and she needs to address those signs. I agree it may feel less clear cut, but Science is built on trust, and without an explanation for altered data in this pattern we cannot continue with that trust in her.

    • @giomjava
      @giomjava Год назад +8

      And this guy claims he thought about this long and hard.

  • @alexmikhylov
    @alexmikhylov Год назад +371

    "beyond a reasonable doubt" is only a standard for criminal charges. for academic fraud "really really fishy" is well enough.

    • @C0N72
      @C0N72 Год назад +21

      Pretty much every academic institution uses "on the balance of probabilities" which basically means more than 59% likely. I think under that judgement she is still guilty

    • @sherrieludwig508
      @sherrieludwig508 Год назад +19

      The actual standard for academic truth is: is the study reproducible? If multiple attempts to reproduce do not yield similar results, the study is an outlier, to be generous.

    • @boss566y
      @boss566y Год назад +27

      ​@@sherrieludwig508They aren't evaluating her scientific claims. They are investigating her for possible fraud. As a result, reproducibility would have little bearing on her guilty or innocence.

    • @googleyoutubechannel8554
      @googleyoutubechannel8554 Год назад

      ​@@boss566y Nope, reproducibility, or the complete absence of, is the prior, it has an enormous impact on any evaluation regarding the probability of fraud. You don't understand the world very well do you.
      Let me help you not get scammed in every facet of your life: If 100% of the people have died jumping off the empire states building and you know this data, but a smooth talking grifter that just took out a life insurance policy on you said they jump off it all the time and are fine and that you should try it, should you?

    • @zah936
      @zah936 Год назад

      ​@@boss566yGreat point

  • @Christian-qu8zi
    @Christian-qu8zi Год назад +102

    Elaborating that the data pool contains even more dodgy data results and then accuse Data Cokada of "cherry-picking" examples is no argument in her favour. Not at all.

    • @television9233
      @television9233 Год назад +7

      It is though, they disregarded rows that were both anomalous and went AGAINST the hypothesis that she concluded with. In either case removing the data still yielded the same conclusion as before removing the data.
      This doesn't prove innocence but adds massive doubt to her being "guilty beyond a reasonable doubt"

    • @stanleyklein524
      @stanleyklein524 Год назад

      Necessary and sufficient conditions are tough aren't they?@@television9233

    • @1Hydraulic1
      @1Hydraulic1 Год назад +17

      @@television9233 She doesn't need to be guilty beyond a reasonable doubt, her act of having manipulated data is already fishy and is enough for a retraction.

    • @television9233
      @television9233 Год назад +2

      @@1Hydraulic1 that's the thing though, the evidence as is doesn't prove that there was data manipulation beyond a reasonable doubt.
      The duplicate IDs and sort being out of order don't necessarily entail data manipulation, they do however require strong explanations from her to explain such odd quirks with the excel sheet.

  • @switted823
    @switted823 Год назад +29

    7:30 "Ha! You missed another examples that shows I was dishonest, checkmate!"

    • @antonf.9278
      @antonf.9278 5 месяцев назад +1

      there has been a bliz chess game where the loosing accidentally player made an illegal move, noone noticed and he said afterwards that his opponent should have noticed and called him out. His argument was that not challenging an illegal move is itself illegal and he should be awarded the victory. Thankfully the adjudicator didn't do that

  • @charlie-qh2ll
    @charlie-qh2ll Год назад +60

    It seems that her explanation is that regardless of data manipulation her results are the same so no harm no foul. Though the issue is manipulating data in the first place. Arguing that removing the allegedly manipulated data still results in her findings doesn't address the issue of manipulating the data.

    • @IsomerSoma
      @IsomerSoma Год назад

      Duplicates in the dataset can indicate manipulation but must not be manipulation. Her argument isnt "regardless of data manipulation her results are the same so no harm no foul", but that without a strong incentive taking the risk of manipulating the data isnt rational as it has little to no pay-off. It proofs nothing. Afterall no obvious irregularities must not be absence of manipulation either. Maybe these are just the data points where she didnt do a clean job.

    • @hosszu2010
      @hosszu2010 Год назад +4

      And so if the results are the same without manipulation why did they manipulate?

    • @charlie-qh2ll
      @charlie-qh2ll Год назад +7

      @@hosszu2010 Exactly? Why did she manipulate the data?

    • @7embersVeryOwn
      @7embersVeryOwn Год назад +3

      The point is that she might make this argument with a subset of the data points or a subset of the manipulations. The only way to prove herself is to show that no manipulation happened

  • @spqueue42
    @spqueue42 Год назад +145

    I still have questions about how exactly the participant ID's came to be duplicates... I'm a Ph.D. student who runs a department's student pool at an R1, and it's EXTREMELY odd to me that the ID's were duplicated. In our pool, we give each student a unique 5 digit anonymous ID to use for studies. Then, the researcher adds sequential ID's to the data and removes the initial 5 digit number. SO: The researcher would have to use a generate variable command to create the ordered ID's. In Stata, this command would be "gen subject= _n". I would want to know exactly how they managed to have duplicate responses as well... That seems SO suspicious, or at least it seems they don't know how to use online survey software. Basically, I have many questions about the process, not just the overall results...

    • @kinesissado9636
      @kinesissado9636 Год назад +7

      That’s still more of a question of competency in software use then it is a question of academic integrity. Either way it’s pretty irrelevant when it comes to ethics in science in this case

    • @EvelynBodoni
      @EvelynBodoni Год назад

      I think it's important to recognize the study being criticized was performed over 10 years ago. Looking at Gino's website, it looks like the study conducted was on paper, and she has documents to back that up. It seems like the data was then digitized using Qualtrics Cognos, which may explain why there are duplicate IDs. Having digitized records to Cognos before, there is a chance human error can factor in.
      Why this was not corrected I'm not sure. I don't believe it was fraud given there was no pressure/motivation to commit the crime, but I do question the extent of Gino's negligence and the effectiveness of internal controls implemented for this process.

    • @2021philyou
      @2021philyou Год назад +1

      Hi spqueue42 is there any reason why you are not asking some independent party to repeat the contested experiments? So that any doubt can be removed in a sense or the other. Instead of discussing micro-details that are irrelevant with the validity of the research results?

    • @jacob9673
      @jacob9673 Год назад +8

      I mean, it’s not like she’s an actual scientist. Why take these fields as seriously as the actual sciences.

    • @stanleyklein524
      @stanleyklein524 Год назад

      Exactly. It is a child's idea of science.@@jacob9673

  • @14zrobot
    @14zrobot Год назад +50

    I do not think, this is as compelling of the defence. The data itself was never something that can prove anything. But if someone finds enough consistently weird datapoints that also benefit person who is in charge, investigation was definitely required. And I do not expect org like Harvard to let go someone with no good legal backing, so I assume they got something more legally significant than "data looks weird" argument. If not, someone will pay up for sure

  • @terriplays1726
    @terriplays1726 Год назад +11

    Pete, I disagree with you on this one. Go back to the plot at 9:02. Now ask yourself: Are the red dots drawn from a significantly different distribution than the blue points? If you accept a 5% significance level, then the answer is clearly yes. If you draw from a distribution defined by the blue dots there is a less than 5% chance to get the red dots.
    Also, the argument that there is still a significant effect after excluding the red dots is just misleading, and you fell for it. Classic Defense lawyer’s argument. The claim was never that the manipulation only consisted of adding a few data points out of order. The claim is that the entire data set is manipulated, and the proof is that some entries are out of order. It’s like being caught next the empty cookie jar munching on the last one, then claiming the rest of the cookies were already gone when you arrived.

  • @9090Glenn
    @9090Glenn Год назад +5

    Data Colada would need to be given an opportunity for rebuttal - data manipulation of ANY kind has to be a HUGE RED FLAG as to the integrity of the result and consequently the author - for such a short data Table the claim that (10) obvious irregularities are known to exist seems incredulous - HOW can that occur ? - HOW can anyone hold any confidence on ANY of the data in that Table ? - there appears to be a p-value < 0.05 for statistically significant intellectual FRAUD

  • @Spoopball
    @Spoopball Год назад +15

    "Why would I lie if the data would have proven I was right"
    "Why would a pro athlete dope? why would a famous gamer cheat?"
    Because the pressure of society causes those of mild success to strive for greater renown, wasn't there an entire other story in this same channel where the other professor who faked studies literally admitted to this reasoning?

  • @western_lord
    @western_lord 9 месяцев назад +1

    Her argument is previously with all that suspicious data, it was p

  • @markopinteric
    @markopinteric Год назад +11

    Pete, one issue you have overlooked is the value of suspect data BEFORE they have been manipulated. One can reasonably assume that the suspect data worked against the hypothesis before they were manipulated, and that the hypothesis would be disproved if they were counted in its original state. So it is not enough to simply EXCLUDE suspect data and recreate the statistics, you should rather REVERSE their value.

    • @western_lord
      @western_lord 9 месяцев назад

      Heck yeah, nobody decided to think of it like this!!! You are the only one who pointed it out..!

  • @gaerekxenos
    @gaerekxenos 11 месяцев назад +1

    I was wondering how the charts would look with all of the manipulated data removed. Thanks for the update
    Unfortunately that doesn't clear up the situation very cleanly, as people outside will undoubtedly be skeptical about the entire set rather than just the few points that look suspicious. I was going to go "maybe the findings would still be valid if the manipulated data was removed" -- but then I realized that the manipulated data could have belonged anywhere, so just removing them isn't solving anything if the weight was supposed to go elsewhere, such as the opposite side of where they were moved to. So I'm unfortunately going to have to agree with the others that have said that the entire data set should be treated with suspicion if any part of it has been shown to have been manipulated

  • @lupissarra
    @lupissarra Год назад +2

    Pete, you're looking at this the wrong way: it is not the DataColada🍹assertions that need analysis but the original data and conclusions. Saying I didn't need to massage the data to reach this conclusion in no way demonstrates such a 'session' didn't occur.
    In spite of small 'p's, the plot does not show significant differences between the 3 conditions.

  • @boltvanderhuge8711
    @boltvanderhuge8711 Год назад +6

    Having ANY duplicate participant IDs, especially if those rows are not identical, completely invalidates all the data. There is absolutely no logical reason for it unless it's a typo (something like two 8s and no 80), but even then it demonstrates a lack of any effort to validate their own data.

  • @whiterottenrabbit
    @whiterottenrabbit Год назад +11

    I watch a lot of videos about video game fraudsters (as surely many of your viewers do), so here's some food for thought:
    1) Her arguments and her behaviour are very much like those of cheaters (e.g. "why would I be faking?", playing the victim card or accusing the critics of foul play).
    2) Remember what Karl Jobst said: "Players don't cheat to get faster times, they cheat to get times faster".
    3) But also, remember what happened to Dream: yes, he cheated, but most probably unwittingly.

  • @ericpenrose3649
    @ericpenrose3649 Год назад +3

    Excluding flipped data is not enough. If these points were manipulated, they were changed from a different value. We do not know what the original value was so we cannot conclude whether or not a manipulation alterred the conclusion. The argument you state is the most convincing to you is actually the thinnest and was appropriately buried in the appendicies, or better yet excluded.
    The built-in excel audit trail was sound evidence and was not convincingly refuted.

  •  Год назад +11

    The ex-professor's explanation doesn't really explain anything. How did the same participant IDs happen? How did the CalcChain get out of order? If only one part of data is fishy, even if cherry-picked with malicious intent (which is far from being proven beyond reasonable doubt, don't researchers at Data Colada deserve the same benefit of the doubt), does that mean that the rest of the data is trustworthy?

  • @ArdiSatriawan
    @ArdiSatriawan Год назад +4

    "Only include data that is not suspicious." is not convincing. How do we know that all the spreadsheets and the data are not compromised? The fact that she acknowledged that there is compromised data means that we can't fully trust the integrity of the data.

  • @spagornasm
    @spagornasm Год назад +45

    9:48 dude 0.01 is 50 times larger than 0.002 - that js a truly enormous difference between results. To argue that they’re basically the same tells me she’s still lying and dissembling about the data fakery. That you accepted it honestly makes me question your own training.

    • @brianwatson9687
      @brianwatson9687 9 месяцев назад +7

      great math skills here! It's 5 times larger. Really!

    • @theresachung703
      @theresachung703 9 месяцев назад

      Me too. It makes me think that he simply goes with theatest narrative

    • @EmanueleZeppieri1
      @EmanueleZeppieri1 Месяц назад

      Dude, 0.01/0.002=5, not 50.
      Such a spectacular display of ignorance shows better than anything else the level of hatred, bigotry and intellectual dishonesty that moves all these absurd accusations against Francesca Gino.

  • @DethWench
    @DethWench Год назад +2

    Nevertheless, she is 100% guilty of bad data governance and poor leadership. I’m sorry, I expect more professional behavior from academic leaders than messy undocumented spreadsheets.

  • @terriplays1726
    @terriplays1726 Год назад +6

    “Innocent until proven guilty beyond reasonable doubt” is a proof standard used in court in criminal cases. The court system has different levels of proof standards for different situations, e.g. in a civil lawsuit they can be lower (relevant for Havard vs. Gino), and for getting things like search warrants they are even lower.
    More importantly, we in science have much higher standards to prove our findings. I cannot submit to Science and tell the editors “you have to prove beyond reasonable doubt that my results are fabricated!”. No, it works the other way around, I have to bring the evidence to support my paper. If the state ever brings criminal charges against her (misuse of public funds or whatever), then the proof standard you mentioned will be applied and probably not met. But as long as this is not a criminal case, you accepted the proof standard she claimed, which was the most favourable for her, without even thinking on whom the burden of proof lies in this case.

  • @SiMyt848
    @SiMyt848 Год назад +3

    Removing the tempered data points from the analysis proves nothing. They were already doctored to not skew the test results, de facto they are already not affecting the result. This "proof" is laughable. Still not convinced? Say I am computing the mean value of rolling a dice and I have a sample of 10k realizations. I then replace from the sample all 5 and 6 values with a uniform distribution of discrete values in 1 to 4. I will then conclude the mean value of trowing a dice is 2.5. You tell me I doctored the sample and tell me which value I have manipulated, if we exclude them from the sample, we still conclude that the mean of throwing a dice is 2.5 instead of 3.5 ....

  • @ross4
    @ross4 Год назад +3

    Not buying it. No one ever said the manipulation was solely to generate significant findings. It could just as easily be that she just wanted a stronger P value. She created a strawman to knock down.

  • @Mr1123581325
    @Mr1123581325 Год назад +2

    FYI - beyond a reasonable doubt is a standard of proof for criminal cases. Which is quite as high standard. For civil cases the standard is: on balance of probabilities, which is much lower. So, unless Francesca Gino was facing criminal prosecution, we could apply the lower balance of probability standard.

  • @9adam4
    @9adam4 Год назад +2

    I disagree with applying the "beyond a reasonable doubt" standard. This isn't a criminal court. A preponderance standard is reasonable here.

  • @jonahhekmatyar
    @jonahhekmatyar Год назад +56

    One thing I find bizarre is that as a project I don't understand why undregrads don't try and replicate these massively popular psychology papers and even if the tests are somewhat poorly done and not as standardized across the schools (being done by 20ish yr old psychology students) if ten universities all can't replicate the results that might indicate there is some kind of problem with the original paper and more scrutiny can be placed on more professionally done studies by maybe higher level students.

    • @g00nther
      @g00nther Год назад +33

      I read that a post-grad tried and failed to replicate one of the studies, and when she did try to publish her findings she was told to drop it essentially, as going against such a star in her field would be bad for her career. She was one of those whose reached out to Data Colada. Her name escapes me now.
      Academia is a joke.

    • @ThanksALott
      @ThanksALott Год назад +3

      Because during your undergrad those stellar experiments are assumed to be executed well. Once you reach the point where you doubt studies you are working on things that further your own career. Unfortunately recreating others experiments is not beneficial for that.

    • @kardkovacslevente4496
      @kardkovacslevente4496 Год назад +11

      This is actually an interesting point imo. I study physics and we do lots of experiments that have known results (duh), it's how we learn how to conduct experiments properly. I'd be interested to know if there is a reason for not doing this in softer sciences like psychology and sociology (other than logistics: probably more difficult to find 50-100 people for your study than an lasers, oscilloscopes and such) - especially given that in these fields studies cannot be replicated more often than in "hard sciences".

    • @jonahhekmatyar
      @jonahhekmatyar Год назад +2

      @kardkovacslevente4496 finding applicants is 100%, not the problem. If someone was offering $5 to fill out a questionnaire or freelunch ticket, etc. You could easily find 100 students in a university who'd be willing to participate. I'm not at all involved in the psychology academic field but the outwards impression I get is they care more to cast a wide net even if it mostly pulls up unrepeatable results than focus as a field narrowly on a few types of experiments.

    • @2021philyou
      @2021philyou Год назад +8

      finally somebody thinking along basic principles. If the research is true, results could be easily replicated by other researchers.

  • @Matt-nv2qg
    @Matt-nv2qg Год назад +43

    I fundamentally disagree. Yes, there's a lawsuit and it will be tried against legal standards. However, the argument "well, they found my data was wrong, but not in all the same ways I found it was wrong and that changes the calc" is bullshit. A professional of this caliber and position is required to do better quality control. You may be able to stipulate mens rea is in favor of innocence, but it's still incompetence and should have never made it to publication. That the results still stand is sheer luck, the greater problem is incompetence, lack of quality control. When dealing with scientific analysis, if you have neither of those then your information is worthless and shouldn't be published. She needs to go. Had she seen this, stated, oh, there's some errors and this needs to be retracted for revision, sure. But staying on the premise of even worse QC is a pass undeserved.

    • @uncutuser
      @uncutuser Год назад +3

      Exactly. At worst it's fraud. At best it's outright incompetence or gross oversight and negligence. Either way... bye bye

  • @nikkid4890
    @nikkid4890 Год назад +1

    It still doesn’t explain how such glaring ‘errors’ were made on the first place. She is lucky that these errors were not enough to be of statistical significance, but how does a scientist of this level explain away such sloppy work?

  • @jensandersen9468
    @jensandersen9468 Год назад +1

    Hi Pete. Thank you for your interest in quality assurance. Everybody removes data from their data sets to prove their cases since this type of behaviour is endorsed by ISO 5725. Please let's first get rid of ISO 5725 that causes havoc to science, and then we can talk about starting cases. Further, the p-value refers to your own data set of your own lab, which means that it addresses precision and not accuracy. Hence, there's a huge risk that the results cannot be reproduced, and that is the real problem that should be addressed because that's yet another systemic mistake.

  • @Kitties_are_pretty
    @Kitties_are_pretty Год назад +2

    Uh oh, looks like she threatened to sue Pete.

  • @mikeycham3643
    @mikeycham3643 Год назад +5

    So, if the study mildly bears out her hypothesis, she had no motive to fudge the data to make it look as if it bore it out more dramatically?

  • @AnythingForSouls
    @AnythingForSouls Год назад +13

    the why would i fake it if it shows im right argument doesnt work with me. She was pretty highly regarded and i feel like she needed the study to be an overwhelming point proven rather than just a well it looks like this worked kinda thing to keep her reputation up

  • @Msa83-u9j
    @Msa83-u9j Год назад +3

    Floyd Landis had all kinds of arguments and a presentation on why he was innocent of doping. Except, he was guilty. Pete, you need to be a researcher, not a lawyer. Stop using legal terminology, use science. If someone wrongly accused you of faking research, what would you do to prove them wrong? I don’t think you’d complain about academic review processes, and I don’t think you’d rework the analysis using a different subset of the data. I think you would:
    1. Provide the original dataset
    2. Provide affidavits from the study participants
    3. Offer to rerun the study (if you were rich, certainly something you could do)
    4. Provide the explanation about why the data looks as it does (if you didn’t fake it, there would have to be some other explanation)
    I’m sure you can think of other things that this professor isn’t doing. She’s clearly using the “throw anything against the wall and see what sticks” approach that guilty people use.

  • @lorenzogumier7646
    @lorenzogumier7646 Год назад +1

    Wait wait wait, what is the fraud about? That data were manipulated or the outcome of the study? Are we questioning her intentions or the robustness of the findings? The argument that you define as most compelling is the less relevant in mt view. Btw, including the suspicious data points makes the study more significant. We are questioning her integrity not the scientific truth of the study.

  • @Aurigandrea
    @Aurigandrea Год назад +1

    Why doesn’t she uses these claims int he lawsuit instead of her 5 excuses?

  • @sukkim8445
    @sukkim8445 Год назад +2

    You missed the mark here. The issue is not about including or excluding the fish entries. The issue is what was the original value of those fish entries. One can assume here that if the original value had been included in the analysis, the result would have disprove her hypothesis.

  • @AF-oq5bu
    @AF-oq5bu Год назад +12

    Her explanation for why those rows are not suspicious does not hold water - it boils down to "look, this data you do not accuse me of changing supports the hypothesis!" when the issue is the data that potentially did NOT.
    Her argument is that if you take the "suspicious" values out then there is still a stat sig difference between the groups -- and that is fine and dandy. But that is only a legit argument if the contention is that these rows were ADDED to an existing dataset (i.e., the other data are real and only those rows were added). Then it would absolutely make no sense to add those data. However, if there were in fact ~100 (or however many) responses and the "suspicious" ones were altered, then it is very important to know what the original values of those responses were -- because if those answers went against the hypothesis, then the study might have been not stat significant (and it would not matter at all whether the remaining respondents' answers show a significant difference because that would be a cherry-picked sample). If the results of the overall sample were NOT significant and you were willing to manipulate them, then it would be very simple to switch group assignments of respondents so that those who did not follow the hypothesis suddenly got placed in the other group, or switch a roughly equal number of answers from both study arms in the direction the hypothesis suggests, etc. Showing the results with the "suspicious" rows removed does not address the second possibility (that the actual results from all respondents were not stat sig and that is why the data manipulation was done). And, really, if one were to commit fraud, it would be easier to just alter existing data than to add cases (esp. since records about how many people completed were likely to exist either in terms of compensation notes or similar).
    Hope that is clear!
    Also, and my big issue, is the p-value itself. On a sample of 100, with three groups (so ~30 respondents each) it would be so difficult to get a p as low as reported. In fact, Dr. Gino's defense of herself makes me more suspicious there -- because, yes, smaller samples make it more difficult to get significant p-values, but it would be ESPECIALLY challenging to get such tiny p-values with diminished samples -- and the "suspicious" values increased the significance of the results above what just an increase in sample would be expected to do.
    Also, the duplicate IDs are a HUGE deal since they matched actual to stated behavior based on those (per the method description in the original article), so given that there are two duplicate IDs, we know that some of those values are incorrect regardless of whether the data are faked. That is unforgivably sloppy.
    So -- not saying this was or was not fraud, but the defense is lousy. One way to "prove" innocence (yes, I know it cannot be proven) would be to see the original paper responses -- I know it was a while back, but any paper copies saved?
    (Ps: given the sample sizes, changing the answers of just 3 respondents per group would change the proportion by 10% each, so that impressive 37% vs 67% would become a non-significant 47% vs. 57%).

  • @Marko-kv9op
    @Marko-kv9op Год назад +6

    Sounds like 'I faked it and increased the effect size, but it's OK cause p

  • @kane6106
    @kane6106 Год назад +4

    Has this study been replicated successfully? If not, then nothing about this argument would convince me. Far too often you find single studies that are cited but never really replicated, because who is to say she didn't curate the data and forgot to remove the more obvious faked data points?

  • @larryhuffine2814
    @larryhuffine2814 Год назад +1

    It really sets off alarm bells to me that you skip right over the additional suspicions data points yet so quickly and willingly change your mind based on her argument, when shes the one pointing out more duplicate IDs were those factored in? Yet you swallow her explination hook line and sinker. Thats as alarming as her pointing out more suspicious data while entirely overlooking it

  • @hkrohn
    @hkrohn Год назад +2

    So she's admitting there was a lot of errors, but "it's still significant". So withdraw the original paper and publish a new one without the multiple errors!

  • @Adeoxymus
    @Adeoxymus Год назад +3

    The reason student 7 is probably the out of order student and not student 5 is because the calcchain.xml shows student 7 condition 2 (row 70) to be calculated in between row 3 and 4 (student ID 3 and 10 of condition 0). This is also addressed in the blog post of data colada.
    Also do not forget the independent forensic firm compared found 11 rows that they could not match to the RA file. 8 of which where identical to the ones found by the data colada team. No calculation of the significance of removing the 11 missing rows? I think they are all at the extremes. Ok I quickly calculated, the effect remains significant but less so (0.027), more importantly it decreases the effect size (cohen's d) from large to medium (0.81 to 0.59)
    Final edit:
    Removing the flagged rows is only part of the exercise. If these rows are in fact originating from other conditions (as we saw with student 7 who might be moved from cond 0 to cond 2) then for a proper calculation of the original result we should move them back into proper place rather than delete the rows. Deleting them still messes up the data.

  • @NiKoNethe
    @NiKoNethe Год назад +2

    What the enlightened centrism is this? How embarrassing.

  • @casey1167
    @casey1167 Год назад +5

    What I frankly don't get is why people use one study, no matter how reputable, as final. Even if everyone agrees it is a good study I would think replication would be done just to make sure the sampling (people) used was objective.

  • @Evanstonian60201
    @Evanstonian60201 Год назад +3

    It doesn't strike me as very convincing. While some intro course to statistics may claim that a p < 0.05 is "significant", everybody know that it's not a whole lot of significance. A rock-star professor of pop psychology has a lot of incentive to make her results appear not just a bit better than anecdote, but solid. Merely from examining the papers we likely won't know what exactly happened. Maybe outright fraud, maybe just sloppiness that through some incentives always tends to shift results in one direction. Suing not only her former employers, but independent researchers looking into her work, also reeks of despair -- it's almost certainly not winnable, given the extremely high bar of first-amendment rights, and thus presumably brought for reasons that in a sense are dishonest by themselves, perhaps in order to get the authors to retract in order not to spend their savings on lawyers.

  • @nonename7869
    @nonename7869 Год назад +3

    My friend is a teacher at a technical college and he has expressed his concerns to me about the future because some of his students have displayed an inability to round numbers consistently. He told me that the school doesn't necessarily do much by way of reprimanding cheaters either. We are now the older generation and are concerned that the buildings of the future may be built and designed by morons and dullards because the system cranks them out in exchange for tuition fees masked by certificates and job placement numbers.

  • @nickmullen402
    @nickmullen402 Год назад +4

    If the accusation was specifically that the "flagged" data points were outright fabricated, then her defense would make sense. However, it seems more likely that the "flagged" data points were edited (rather than fabricated) because the original value cut against the hypothesis. So the fact that the results are still significant if these data points are excluded does not prove there would have been no motivation to edit the data. Naturally, excluding inconvenient data points can give you a similar result to changing them favorably.

  • @thecutestcat897
    @thecutestcat897 Год назад +1

    Instead of reflecting on her mistakes, she will think that she should improve her data falsification skills.

  • @jefft8597
    @jefft8597 Год назад +38

    Am I the only one that wonders why this study was done in the first place? Are people running out of seriously important subjects to study?

    • @g00nther
      @g00nther Год назад +21

      Yes, I do find most of this stuff trivial. I'm amazed that people make a lucrative career out of it. Oh well

    • @metalslegend
      @metalslegend Год назад +1

      What do you consider to be a "seriously important subject"?

    • @sirranhaal3099
      @sirranhaal3099 Год назад +13

      Mechanisms of disease. Particle physics. Organic synthesis. Cell biology. Basically not this kind of “gee whiz” psychology

    • @metalslegend
      @metalslegend Год назад +8

      @@sirranhaal3099 In the end, you also do basic research in the subjects you listed. Most of it probably also has no direct relevance or lets say applicability, but helps to understand the field as a whole. For example by testing the validity of existing theories and hypotheses from a new perspective.
      Nothing else happens in psychology. What you might label as 'gee-whiz psychology' can be seen as fundamental research into trends in human behavior and experience, at least within the context of Western societies.

    • @stanleyklein524
      @stanleyklein524 Год назад

      One that had some serious theoretical basis and made an attempt to distinguish between alternate theory interpretations of a legitimate question for the discipline of psychology. That is, a serious scientific study tests theoretically mandated claims, not just coughing up un-grounded demonstrations. @@metalslegend

  • @vvslavavv
    @vvslavavv Год назад +5

    You are assuming that the only manipulated data is the one that has been uncovered. It could be that other manipulations where much better hidden. Has this study actually been validated in other experiments? Have other researches found the same conclusion?

  • @Olivia-wg8gv
    @Olivia-wg8gv Год назад +2

    Can’t excel order the columns in ascending order automatically? Even if it wasn’t on purpose, how does this even happen lol

  • @willerwin3201
    @willerwin3201 Год назад +1

    What bugs me about this is the fact that the underlying scientific finding doesn't seem to have been refuted or corroborated; the personal drama is overshadowing the question for all involved. Honestly, how hard would it be to do a replication study? A third party with no skin in the game should be able to re-run the experiment. If the underlying principle holds, then the results should be similar, and the study ought to stand. If it doesn't, then we either have a statistical fluke (unlikely given the P rating) or we have fraud.

  • @nahakuma
    @nahakuma Год назад +2

    If I understand what is being claimed here, this is very, very stupid. Why would you exclude the points that disagree with her version?

  • @christophh3233
    @christophh3233 Год назад +3

    What confuses me is what is considered significant in Psychology. 101 Samples is just not enough. I know it‘s harder to get more data in some fields. What bugs me is the missing understanding how likely false results are with such a low significance…

  • @RazeenMujarrab
    @RazeenMujarrab Год назад +1

    Thanks to the excellent comments over here I learned that evidence that proves her "guilty beyond a reasonable doubt" isn't required in this case since it's not under criminal law. A "preponderance of evidence" is enough in this case as academic fraud is more similar to civil law, as per the US judicial system.
    However, can someone explain to me the level of importance of "due process" in this case? She claims that both Harvard and DC broke the preset due process: Harvard because they created and held her to a new academic fraud guideline that was created without her knowledge, and DC because they state in their website that they reach out to the professors when they find anomalies in the data and in this case they sent those reports to HBS, her employer, instead. Does this mean that she has an appropriate legal basis for suing those parties?
    Thanks in advance!

  • @psychotropicalresearch5653
    @psychotropicalresearch5653 11 месяцев назад +1

    Would the fragility index metric be of any help here?

  • @mikebaker2436
    @mikebaker2436 Год назад +2

    Beyond reasonable doubt? Why the criminal code guilt threshold?
    Why not "By the preponderance of the evidence" or "by the clear and convincing evidence" ...because those are the legal thresholds for civil and fraud cases.

  • @flaviochuahy3440
    @flaviochuahy3440 Год назад +3

    Using p values to justify that their data isnt manipulated is just dumb.. this arbitrary p value threshold should just go.. thats the flaw in her argument..

  • @ineedazerosuit6128
    @ineedazerosuit6128 Год назад +7

    Francesca Gino's argument here is a decent defense because it deflects from some of the core issues. She ignores that the data appears to be manipulated. Why were the cells moved? Her claim of not needing to have manipulated the data for the result is predicated on the assumption that only the out-of-place data could be fake. How do we know that the cells in the correct places were not manipulated?
    A problem with the suspicious cells is the trend. Just from the manipulated cells, the significance decreases. Gino tries to change the suspicious points with her argument. In the case of 7 vs 5, she has a decent point in terms of interpretation. Her argument of 13 as a duplicate is not convincing because the two 13s have different conditions, majors, and responses whereas the 49s are identical. She has found a way to make the argument from Data Colada weaker, but this paper alone is still suspicious and should warrant investigation in the context of other suspicious findings. Gino is intelligent, unlike the guy at FSU, so I expect her to come up with ways to cast doubt on Data Colada and the investigation by Harvard.

    • @theresachung703
      @theresachung703 9 месяцев назад

      She is trying to create exactly what this guy is doing. Doubts about her guilt.
      She's Veyr very good

  • @CloverLam11220914
    @CloverLam11220914 Год назад +3

    I don’t find her arguments convincing at all. First argument about the duplicated 13: Data Colada obviously grouped the conditions first and then sorted the ID, being forgiving about duplicated ID in different groups! Second argument about manipulating disordered records: it’s not about THOSE values, it’s about possible manipulation and the credibility of the whole dataset! Also, some researchers are obsessed with extremely small p values to show super significant. So the reason for why manipulating is just “chasing for the perfection”.

  • @krazoe6258
    @krazoe6258 Год назад +19

    For me those papers are forever tainted. Independent replication needs to happen before I would trust the data

  • @KimFamily-office
    @KimFamily-office Год назад +2

    It’s about trust, it’s about intention, and it was shown through the fake data. Just because it “worked out” doesn’t negate her intention that destroyed trust.

  • @JessicaPradoHanson
    @JessicaPradoHanson Год назад +1

    I don’t agree with presuming innocence or guilt. I think we should be seeking the truth of what is healthier for us all.

  • @smartsaik
    @smartsaik Год назад +2

    As not a researcher with a generally curious mind, I am wondering What if those data points are actually not in favour of her hypothesis; Eliminating them may negate any possibility of rejecting the hypothesis significantly.

  • @breesco
    @breesco Год назад +12

    She *better* protest her innocence - she won't be able to fake results and publish them any more, and have to get a real job. What a tragedy!

    • @bimalpudasaini3576
      @bimalpudasaini3576 Год назад

      Someone will hire her, she has brains to be a Harvard professor.

  • @Otsuguacor
    @Otsuguacor Год назад +1

    I think Gino explanations are not very convincing. She pointed to apparent inconsistencies in the TABLES shown as examples of fake data by Colada and she conveniently omitted the rest of the data. For example, Gino assumed that the cases colored by Colada are not repeated just because the repetition did not appear in that shown sample that actually are not ordered. I think Gino is cherry picking apparent inconsistencies in the table presented that are actually not inconsistencies taking into account the entire data set... Obviously, I am believing that Colada analysis was correct.

  • @justaname999
    @justaname999 Год назад +5

    Not that convincing. I'd like to see a model on how likely it is that the data points that look dodgy are also towards the extreme ends of the scale. What it looks like to me now is her saying what the people from dataColada saw as suspicious and that also "completely coincidentally" happened to match up with the extreme ends of the scale are not the only dodgy ones, so she really goes on a hunt for data that "should" be flagged as well. This is a bit bizarre.

  • @emiel89
    @emiel89 Год назад +6

    Pete, I really do not agree with you here.
    1. The accusation of cherry-picked data is troubling because she herself picks out even more troubling data points. Just because the data colada team picked out a few of them that are most troubling does not mean that this is cherry-picked. They are examples of problems in the dataset as a whole, of which Gino conveniently picks out even more troubling data points as if to say, "Why would a guilty person pick out even more troubling data points?" this is a defense you often see with people who are guilty, you see this with scammers a lot (no I am not trying to say Gino is a scammer, it's where I have a lot of experience as a skeptic society member). You even see this a lot with legal defenses at court. So It would not surprise me if this is done because this was suggested to her by her legal counsel. And those out-of-order data points were still mostly out of order even if you take those two others out. And if they were conveniently placed there for a reason, which is the accusation in the first place, then those other data points that seem in order don't do much as a counterargument.
    This argument is actually very weak and very much expected, as the accusation by her of cherry-picking is made very easily by her cherry-picking data as well.
    2. That the effect still is statistically significant after deleting the troubling data points is not a good argument at all, because firstly, statistically significant does not in any way mean practically significant, it is not an effect size. It means that the probability of the null hypothesis being true is very low, not that the alternative hypothesis is true, as no frequentist statistics can tell you that with any reasonable certainty. Officially, the interpretation of a p-value is "P values are the probability of observing a sample statistic that is at least as extreme as your sample statistic when you assume that the null hypothesis is true."
    Secondly, the drop in significance after deleting the 8 or 10 (of more than 100) is quite big for deleting so few data points. Which suggests that there are probably even more troubling aspects of data set in a whole. As @theondone has stated so nicely
    "It also stands to reason that, given how badly concealed the supposed manipulation was, the reason for having those entries amongst the other ones is that those values were *edited*, not created anew.
    If the previous values were evidence against the hypothesis, changing them for values that support the hypothesis will have this exact effect (removing the points still keeps the effect significant)."
    So the defense of Gino is not strong at all and not very convincing. And for me, it matters little if she is guilty or not. I only find the data detective elements of these stories fascinating. That professors of high esteem faked data is after so many examples not at all surprising, but neither would it be all that surprising if the allegations against her were not true, if the case was less strong.
    After this video and reading some of her website, I still am not convinced of her innocence.
    statisticsbyjim.com/hypothesis-testing/p-values-misinterpreted/
    statisticsbyjim.com/hypothesis-testing/interpreting-p-values/

  • @parl8150
    @parl8150 Год назад +1

    That’s what I don’t love about media - they shape the crowds opinion towards one “truth”. And when the opposing side tells their arguments, the crowd looks at the arguments from the prior media-shaped “truth” point of view. And instead of listening to arguments presumably with zero prior knowledge, we further shape a biased opinion.
    Look all of these comments.
    “I don’t agree, here is why … *something sounding logical*”
    And this *something sounding logical* have not been processed through lenses of both sides, which would be unbiased. Nor has this *smth* been processed with scientific rigour.
    I don’t take sides here, but urging people that the world often much more complex than what our “logic” tells us and reaching the truth is astoundingly difficult. Be open minded, and always have doubts not only in others opinion, but yours opinion included.

  • @wolfRAMM
    @wolfRAMM Год назад +3

    From her excuses: "The RAs conducting the study simply stacked the paper copies by condition, and then manually entered the data in the order in which the papers were stacked. The sequence of paper entries tended to be in ascending sequence; this makes sense, since it was probably the same sequence in which the participants performed the tasks". No, it doesn't make sense, because then the sequence would tend to be in DESCENDING order, because of LIFO (last in, first out). And why would you give some cards to the participants, if you could just print those numers on the form itself disguised as that fictitious OMB number at the top rigth? If your research assistants can't even count properly from 1 to 100 - how could you trust any data produced by the study/experiment at all?

  • @DavidRichardson-y3b
    @DavidRichardson-y3b Год назад +1

    Actually the significance with the suspicious data removed points strongly to a non exculpatory explanation. Assuming as the Data Colada authors did that the manipulated data was changed from least conforming to most conforming to the hypothesis would make removal correct roughly half the effect. She ends up with P value of .028, but if that is only half the actual effect the original data would have had a P value of .056, just over the level for statistical significance. In this case the data strongly supports the conclusion, just not strongly enough and it is terribly tempting to tweak a few data points to clear the academic hurdle without meaningfully changing the managerial significance. I have several papers which have languished for years because they rely on real world data and fall a few points shy of statistical significance, and additional data which would provide more power simply does not exist. In this case, a few more experiments would probably have made the difference, but the authors were too lazy and unethical to bother.

  • @sblower9410
    @sblower9410 Год назад +1

    I am not convinced, sorry. Video assumes the only reasons data could be bad is because of mistakes or data rigging after the fact. However, if we look at the story another way, assuming the data was tampered with intentionally, would the data have been manipulated before or after the conclusion of the paper was written? I think before the conclusion of the paper is a better assumption. Less chance anyone who helped with the paper would have noticed the data being changed if it was changed in real time. Changing the data before she knew how much cheating she needed to do would also give her a chance to edit out some of her own cheated data under the gize of basic editing if need be.

  • @airman122469
    @airman122469 Год назад +2

    Unconvinced.

  • @pietrosala2501
    @pietrosala2501 Год назад +1

    Great content, as always! Since the matter has already become a "mediac" trial, let's at least consider all points of view objectively

  • @MutedStoryteller
    @MutedStoryteller Год назад +3

    Ypu could say, no matter how you turn it, you extremely up the significance. Dont forget that not all tampered lines have to have been found out. You can tamper data without moving data too.

  • @VictorVMendonca
    @VictorVMendonca Год назад +2

    Glad to see that 'Academia Strikes Back' is probably coming in at least four parts, I can spot some enjoyable entertainment right there

  • @pwnyboy01
    @pwnyboy01 Год назад +2

    It makes me sad to see people talk about judging others by the american legal standard of criminal innocence. There are many reasons we want the courts to presume innocence. None of them apply here. If anything she should be living up to strict scrutiny because she is in a trusted position in society. We need to be able to trust people doing and recording science. So I disagree with your supposition Pete that the appropriate standard is beyond a reasonable doubt. Even in law this case wouldn't be viewed that way as it is not a criminal inquiry. And all the reasons that's important are actually in play against assuming innocence on the part of Gina.
    I also found her reasoning specious, but I forgive that because i expect her to strongly advocate for herself. it just doesn't seem like a strong data argument to say "there were even more mistakes." It is an emotional argument designed to make you think the mistakes were negligence and not intentional. but it isn't compelling, because if we are going to assume negligence then any trend toward her hypothesis would be counter to that. In addition, the correct thing to do with the supposedly errant data points from Collider would be to remove those completely, not cherry pick the other way for balance. And if you completely remove the "in question" data Collider's case is just as strong.

  • @dr.patton7415
    @dr.patton7415 Год назад +8

    Pete did you happen to reproduce Gino’s last analysis you covered? It’d be surprising to me if DataColada didn’t check whether the main result was still significant without the anomalous rows.

  • @pr4360
    @pr4360 Год назад +1

    Has anybody ever replicated their studies?

  • @barracuda008l4
    @barracuda008l4 Год назад +1

    In science the standard for identity poor job is no " beyond reasonable doubt " it is just to identify pathetic error

  • @Iosifavich
    @Iosifavich Год назад +2

    The second set of of duplication (Id:13) probably should have also been flagged as anomalous, however a minor mistake on their part is not an indictment or invalidates their investigation. However the rest of the data they are looking for out of order, mainly contiguous order. That the contiguous order is broken from row67 to row72, then after it picks up in contiguous order, we are assuming that there is nor more anomalies in order row72. At 9:17 the plot she generated is completely wrong and in it self a manipulation. She is presenting this chart as if it is the Dataclada findings with her findings using the same rules, this is not accurate and deceiving. However she has rendered the data she claims was in/out of order not the data that Dataclada at the same time she is rendering the control group data. It is hard to trust her revised calculation because of the tampered and misleading graph. She basically says "look they did this wrong based on MY reading of what they said they did". Then she says "oh look they did to analysis on the control so i did that for them" and uses that as a vehicle to produce a cleaned up data set that supports her narrative. Without addressing that the same "mistakes" that helped her also appeared in 3 other studies. If anyone would be able to manipulate the opinion of people it would be a behavior scientist. This is pure spin as it is clear that Dataclada didn't see any significant anomalies in the control worth talking about and potentially confusing the reader, they are after all producing content for mass consumption.
    Harvard preformed their own investigation in to the Dataclada accusations, using their findings as a starting point for their investigation. Because Harvard has a pretty good law school that you might have herd of, i find it unlikely they would be allowed to terminate a ten-yeared and senior professor without significant and through due diligence. Something we do not have is the Harvard internal investigation report, this is why i am watching this law suit, because that is going to be some interesting reading if this ever goes to discovery. I think is unlikely we will ever get to discovery, especially since she is speaking out in public. if she had a good case her lawyer would be making her remain silent, this is a smear campaign against Harvard to get them to settle. Since this is about data manipulation there is only 2 paths to victory for her. The first is to be exonerated which is only disposable if there is clear evidence that both Dataclada and Harvard preformed bad investigation & analysis. The second is to get a settlement that will likley be sealed and that she will be able to spin is an admission of Harvard's wrongful termination. This former is based entirely on the data and two separate investigation and the premise that they got is incredibly wrong. The latter is based on pitting public opinion against Harvard to force them to settle to make it fade away.

  • @timothyleffel3186
    @timothyleffel3186 Год назад +3

    an inflated effect is certainly desirable from the perspective of an academic looking to publish in high-impact venues -- of course the fact that the results are in the same direction without the outliers does not mean that the data were not manipulated. the idea that someone would manipulate data to increase the magnitude and significance level of an existing effect is not at all implausible -- people commonly do this by playing around with different statistical techniques without even hiding it (e.g. multiple comparisons). working scientists tell themselves all kinds of stories to justify their methodological decisions, because the pressure to produce eye-popping results is extremely high. given the culture and incentives involved in academia, i would expect many more people have manipulated data to make a weak effect seem stronger than have manipulated data to create an effect that didn't exist in the actual data in the first place.

  • @willerwin3201
    @willerwin3201 Год назад +2

    Simple solution: Someone does a replication study and publishes the results. If they're similar, then she was sloppy, but not fraudulent. If not, then her study is either an unlikely fluke or a fraud.
    Incidentally, I find it strange that she made a distinguished career out of studies like this, whether or not they were fraudulent. This isn't my field, and I don't get why this is so significant.

    • @Principles_of_Psychology
      @Principles_of_Psychology Год назад

      The study is about ways for promoting honesty. Your answer clearly indicates that you care about honesty. Does that not support the value of this research?

    • @willerwin3201
      @willerwin3201 Год назад

      @@Principles_of_Psychology Not really, no. The good intentions driving a given piece of research are of significantly less consequence than the quality of the methods, the data, the analysis, and the conclusions. There are many research articles I've read in my own field about topics I care about that aren't sound, useful, or important. I often worry about good intentions driving bad science in my own field of nuclear engineering.
      The study as described sounded crude and reductive to me. Rather than the simple Boolean "put the signature at the bottom or at the top," why not include other variables, considerations, and strategies that might influence people's honesty? Why not see what happens if no honesty pledge is made at all? Why not have the subjects listen to a short story or watch a video made to encourage honesty first? Why not have a combination of different rewards to test the subjects' honesty?
      Like I said, social psychology is not my field. I'm not condemning her or her field; it just seems strange that I see plenty of master's candidates evaluating non-psychological aspects of human performance in more robust ways than this famous professor's study.

  • @timg6125
    @timg6125 Год назад +2

    Did she pay you to make this video?
    I’m not moved in the slightest. How do you explain the double entry???

  • @forstuffjust7735
    @forstuffjust7735 Год назад +2

    Kinda bonkers how high academics can get away with these, while i get grilled in my masters course because i used the the default setting of smoothing the curves of the plot