Protecting Privacy with MATH (Collab with the Census)

Поделиться
HTML-код
  • Опубликовано: 29 сен 2024

Комментарии • 815

  • @davidgustavsson4000
    @davidgustavsson4000 5 лет назад +335

    I wonder how many minutephysics episodes have been scrapped because they couldn't think of a closing pun.

    • @sagemccarthy4115
      @sagemccarthy4115 3 года назад +5

      I bet 1 decillion

    • @ggsap
      @ggsap 2 года назад +1

      @@sagemccarthy4115 on what?

    • @nerfgunguy4575
      @nerfgunguy4575 Год назад +1

      @@ggsap I'm pretty sure he meant that he bets 1 decillion videos have been scrapped

  • @Mpire101
    @Mpire101 5 лет назад +85

    As someone who actually publishes differential privacy research, I just would like to mention that a privacy budget of 30 is absurdly high; there are cases were a privacy budget of 30 would allow you to reconstruct someone's data with over 95% accuracy. On our research team, we would never consider a privacy budget above 5, and the gold standard was .01.

    • @ninetails6218
      @ninetails6218 4 года назад +1

      So how is this 30 split? Wouldn’t a higher number mean they have more to jitter the results?

    • @vermillion8521
      @vermillion8521 4 года назад

      Nine Tails yea, .01 means its basically the same as the original data. I think Monica Moniot is just trying to be smart

    • @polokan
      @polokan 4 года назад +14

      @@vermillion8521 Read that again. Privacy budget. That means how much privacy you can sacrifice. 0.1 is less than 30, therefore the first case has more privacy and less accuracy.

    • @chaslesvie2417
      @chaslesvie2417 3 года назад

      ty for the insight !

  • @radicalxedward8047
    @radicalxedward8047 5 лет назад +9

    Most people would take a shiny box in exchange for their privacy.
    That’s what’s really scary.

    • @musikali1
      @musikali1 5 лет назад +4

      You are so right... The census is nothing compared to the most intimate data corporations have on EVERYBODY

  • @chadtarheel
    @chadtarheel 5 лет назад +57

    Only ONE allowed: peak or peek. Never the two shall meet.

    • @alexwang982
      @alexwang982 5 лет назад +1

      Endor what about a peeking peak?

  • @dux2508
    @dux2508 5 лет назад

    In Sweden, if you ask the government, you can get a list of the citizens of Sweden with where they live, what they earn, the Swedish form of social security number etcetera. Much information is public.

  • @viiltelijamurhaaja7225
    @viiltelijamurhaaja7225 2 года назад

    The thing that sucks about this is that some people don't for some reason care about privacy when it's from something faceless like a survey or Google

  • @danielwick7756
    @danielwick7756 5 лет назад

    This video is awesome. Complicated statistical concept made much easier to understand. Thank you!

  • @Zw285
    @Zw285 5 лет назад +3

    I would love to see a video on the prisoner's dilemma.

    • @cube2fox
      @cube2fox 5 лет назад +1

      This is a topic from game theory, but he is a physicist. Then again, he could work with experts as in the current video.

  • @Ennar
    @Ennar 5 лет назад

    I do not rely on my government to jitter my data. I'm capable enough of doing it myself. But thank you for the idea of randomizing how much I deviate from the truth.

  • @alexanderarnold4810
    @alexanderarnold4810 5 лет назад

    Pictoral/Factoral; some facts are also the same name as answers to Math functions, probable sabotage of English and other languages.

  • @xeladas
    @xeladas 5 лет назад

    This is all very interesting, however a couple questions comes to mind, what is the point? What damage does this loss of privacy cause?

  • @ilayws4448
    @ilayws4448 5 лет назад +1972

    hmm... are you really 31 or did you roll a dice an added it to your real age to confuse us...?

    • @adama7752
      @adama7752 5 лет назад +119

      2 d20s baby!

    • @Apersonl0l
      @Apersonl0l 5 лет назад +56

      Adam A wait he can be -9 yo?

    • @carlwheeser140
      @carlwheeser140 5 лет назад +36

      @@Apersonl0l No, no, of course not. You multiply them!

    • @nuumi7813
      @nuumi7813 5 лет назад +10

      Hmm... so we are left with a total of 6 possible answers. Great job Sherlock!

    • @nuumi7813
      @nuumi7813 5 лет назад +9

      Actualy the math was wrong... it's around 21 possible answers... yikes

  • @burtosis
    @burtosis 5 лет назад +924

    How I protect my privacy with math.
    Step 1 I start doing math.
    Step 2 Person approaches me and sees I'm doing math.
    Step 3 They slowly back away
    Step 4 Privacy!

    • @Abdega
      @Abdega 5 лет назад +35

      Then one person is like “Oh cool! Privacy jitter calculations!”

    • @opsoc777
      @opsoc777 5 лет назад +7

      So THIS is why I never get laid...

    • @illiiilli24601
      @illiiilli24601 5 лет назад +7

      Instead of running away, you're approaching me?

    • @DefektoPrime
      @DefektoPrime 5 лет назад +13

      I wanted to read your comment, but i saw the word "math", and then slowly backed away

    • @MirceaKitsune
      @MirceaKitsune 5 лет назад +1

      Easier way: When the census takes place, don't answer your door or go away from home in a trip. Though the US loves a bit tyranny of its own, they can't fine thousands of people for not being at home at a certain moment in time. Problem solved!

  • @qqq1701
    @qqq1701 5 лет назад +865

    New census data: There may or may not be people in the USA.

    • @llllllllllllllllllllllllIIIIl1
      @llllllllllllllllllllllllIIIIl1 5 лет назад +21

      The probability is 50-50 right ? Either yes there are people, and no there isn't 🤔🤣🤣

    • @R2Cv1
      @R2Cv1 5 лет назад +18

      @@llllllllllllllllllllllllIIIIl1 Not necessarily, depends on how you acquire people. If you randomly blink people in and out of existence, the the probablility of "no" could get very low if possible cases include 10 people, or 300 people, or 129 people, or 2000000000 people... how likely would it be that you'd get to zero at some point?

    • @thinboxdictator6720
      @thinboxdictator6720 5 лет назад +7

      @@R2Cv1 1/n

    • @R2Cv1
      @R2Cv1 5 лет назад +3

      @@thinboxdictator6720 (That was rhetorical)

    • @thebreakfast8055
      @thebreakfast8055 5 лет назад +14

      Shrödingers population

  • @DelphinidaeZeta
    @DelphinidaeZeta 5 лет назад +394

    Well now, it's neither a minute nor physics.

    • @chan625
      @chan625 5 лет назад +26

      Been like that for a long time but I couldn't care less about title mismatch.. I am still getting my head around some of his vids when it was still minute (or two!) and physics!

  • @hahanamegobrrr6667
    @hahanamegobrrr6667 5 лет назад +883

    beginning of video : citizenships
    end of video : algebra and probability

    • @conficturaincarnatus1034
      @conficturaincarnatus1034 5 лет назад +15

      and a nifty pun that almost ruined the sense of wonder
      almost.

    • @tanmayuniyal6603
      @tanmayuniyal6603 5 лет назад +5

      Read the name of the channel, might give you an idea

    • @holypython4418
      @holypython4418 5 лет назад +4

      Its not that complicated

    • @veggiet2009
      @veggiet2009 5 лет назад +14

      Highschool Students Everywhere: Algebra is stupid, when will I use this in life?
      minutephysics: Well...

    • @onelazynoob15
      @onelazynoob15 5 лет назад +7

      Actually, since we're looking for the slope at a peak it's more calculus. Derivatives are ez pz tho.

  • @NoahPillow
    @NoahPillow 5 лет назад +1704

    So they can jitter my census data to ensure privacy, but when I jitter my tax returns, all I'm ensuring is jail time? Doesn't seem fair

    • @diedewip7912
      @diedewip7912 5 лет назад +162

      Unless you add positive amounts to it

    • @papa_pt
      @papa_pt 5 лет назад +53

      also remember when Equifax didn't do their one job

    • @niter43
      @niter43 5 лет назад +38

      That's a meaningless comparison.
      Government is supposed to know accurate information about it's citizens that's needed to operate properly. And so it does. Data jittering is done for purposes of keeping individual privacy when publishing data to *public* (keeping privacy as in making sure that third-party can't restore individual data from published data, not as in keeping secrets from government).
      Pretty sure they store correct census data of each individual for internal purposes as well.

    • @MRDAVIDJHEMMINGER
      @MRDAVIDJHEMMINGER 5 лет назад +17

      @@niter43 Last time I checked I am part of the government too!! and I really hope this government "of the people, by the people, for the people, shall not perish from the earth"!!! But I also don't want us to be vulnerable to Nazi behavior. What if census form asked if you wanted your data jittered? (like a micro point system [+/- 1 PersonPoint]) If there was a way to do this then the more people who opt out of the jitter the more transparent our data can become, and, it can become this way democratically. We could some day brag that we are so proud and brave that we do not fear others being able to pinpoint us based on our demographics because, for example, those around us our neighbors and fellow citizens would stand up for our human dignity in spite of the differences between us.

    • @goldengryphon
      @goldengryphon 5 лет назад +13

      @@MRDAVIDJHEMMINGER Um ... That sounds amazingly dangerous. Granted, I was one who was all for free access to information years ago - had an open Wi-Fi, used Linux, did all the things - then was accused of storing illegal stuff on my computer. Not a fun experience and one that pushed me to re-examine how well I knew my neighbors and how good an idea having Open Source stuff and sharing information really was.
      End result, if I understood the video correctly, your vote to *not* 'jitter' your information can have the effect of allowing my information to be more easily discovered.
      (As this whole video reminded me of those word problem set ups you do when trying to find out who of three people with red hair rode a bicycle and ate a pie in a six person neighborhood. I'm not good with maths, so I'm trusting y'all to keep me honest.) In the course of using your un-jittered information, my information is more easy to extrapolate. Basically, your freedom to share your information is also sharing my information, that I may not want out there.
      I don't mind standing for human dignity. Nor do I really mind being part of certain kinds of scientific study. I do, however, mind being "voluntold" for a study I don't want to participate in just because someone else thinks it's a great idea. The freedom to *not* be included is also important and should be protected.
      I don't feel a need to have to brag to my great-grandchildren about how "brave and proud" I was to allow social scientists access to my personal data. I can feel brave and proud through having actual accomplishments like doing a job well, or writing a book, or creating something artistic. Being part of a movement to end privacy and personal information from being shared with whomever can cough up the money to purchase access to census data is not on my List of Things To Do.
      But thanks, anyway.

  • @MichaelSteeves
    @MichaelSteeves 5 лет назад +844

    So basically Sudoku with census data and supercomputers.

    • @Soken50
      @Soken50 5 лет назад +12

      more like Einstein's riddle or zebra puzzle, but on a contry-wide scale :)

    • @JorgetePanete
      @JorgetePanete 5 лет назад +2

      @@Ranakastrasz It's*

    • @JorgetePanete
      @JorgetePanete 5 лет назад +2

      @@Ranakastrasz puzzle*

    • @opsoc777
      @opsoc777 5 лет назад +2

      @@Ranakastrasz or*

    • @kmacdough
      @kmacdough 5 лет назад

      @@opsoc777 for*

  • @EDoyl
    @EDoyl 5 лет назад +179

    "I looked forward in time, I saw 14,000,605 futures."
    "How many were plausible?"
    "One."

    • @mattwinward3168
      @mattwinward3168 4 года назад +20

      Eoin Doyle Thanos should have jittered his time lines better.

  • @kevinfontanari
    @kevinfontanari 5 лет назад +146

    "And how do you protect your privacy?"

    • @anandsuralkar2947
      @anandsuralkar2947 4 года назад

      Nord vpn

    • @MelvinGundlach
      @MelvinGundlach 4 года назад

      Kevin Fontanari Only that those don’t really protect privacy but just move the problem.

  • @phsopher
    @phsopher 5 лет назад +146

    This is a trailer for Henry's spinoff channel '12minutestatistics'.

  • @tomsmith6878
    @tomsmith6878 5 лет назад +416

    congrats on getting married

    • @Apersonl0l
      @Apersonl0l 5 лет назад +3

      Thomas Smith gitters

    • @user-vn7ce5ig1z
      @user-vn7ce5ig1z 5 лет назад +3

      *jitter

    • @patu8010
      @patu8010 5 лет назад +94

      Because of the jittering, we can only know that he has 1±1 spouses

    • @votalis4089
      @votalis4089 5 лет назад +5

      @@cameronbigley7483 1:02

    • @gabor6259
      @gabor6259 5 лет назад +5

      Congrats on liking ice cream.

  • @zoatheperson3012
    @zoatheperson3012 5 лет назад +40

    When you get sponsored by a literal department of the US government...damn.

    • @anandsuralkar2947
      @anandsuralkar2947 4 года назад +2

      Lol and not by nord vpn

    • @shawniscoolerthanyou
      @shawniscoolerthanyou 4 года назад

      I wish the government sponsored more quality shit that I like. Not McDs hamberder dinners for athletes and F-35s.

  • @jackiecs8190
    @jackiecs8190 5 лет назад +4

    I noticed that you modeled gender as "M," "F," and "T." This is a poor model. Transgender people are not a separate gender; many of us are male or female. I think you were thinking about nonbinary people, who are a subset of transgender people. In the situation where you have to use one letter, the proper choice for nonbinary people is "X." This is common on official IDs in the US and other countries. It is inappropriate to lump all transgender people in with nonbinary folks because it takes away our clearly identified binary gender.

    • @sUmEgIaMbRuS
      @sUmEgIaMbRuS 4 года назад

      The notion of "gender" is a non-solution to a non-problem.

  • @Rabcup
    @Rabcup 5 лет назад +81

    So perhaps the DMV shouldn’t be selling people’s data to PI’s...

    • @klobiforpresident2254
      @klobiforpresident2254 5 лет назад +1

      What is a PI?

    • @Abdega
      @Abdega 5 лет назад +2

      Klobi for President
      Private Investigator I’m guessing

    • @JorgetePanete
      @JorgetePanete 5 лет назад +2

      PIs*

    • @feronanthus9756
      @feronanthus9756 5 лет назад +2

      You should take that up with your state government.

    • @R2Cv1
      @R2Cv1 5 лет назад +3

      @@JorgetePanete (No, it's PI's)
      (Case is for abbreviations, numbers, individual letters, etc)

  • @TheScienceBiome
    @TheScienceBiome 5 лет назад +57

    Certainly an odd sponsor, but amazing video nonetheless!

    • @wesleyrm76
      @wesleyrm76 5 лет назад +5

      I would be happy if every government agency did things when they're doing public outreach. Plenty of science agencies, especially NASA, have been doing this for years.

  • @axiostechno
    @axiostechno 5 лет назад +276

    This would have been a perfect video to be sponsored by nordvpn or dashlane

    • @QlueDuPlessis
      @QlueDuPlessis 5 лет назад +8

      Kaspersky advertised on it.
      Not sure how they engineered that, but I'm guessing Google was complicit.
      And given that Google already knows way more about each of us than all those census records...

    • @albingrahn5576
      @albingrahn5576 5 лет назад +8

      i can hear the segue in my head lol

    • @ryuuji159
      @ryuuji159 5 лет назад

      stop

    • @Flamingbob25
      @Flamingbob25 5 лет назад +3

      @@QlueDuPlessis Well it was probably by the tags/title I believe thats how ads are chosen. That's why you will sometimes get ads for something on videos hating on that product.

    • @Markle2k
      @Markle2k 5 лет назад +3

      @@QlueDuPlessis Kaspersky advertised on _your_ view. That just tells us what Google thinks about you.

  • @Ethan_N_A
    @Ethan_N_A 5 лет назад +47

    MATH:
    Make
    America
    Think
    Harder
    #YangGang

    • @sonetagu1337
      @sonetagu1337 4 года назад +2

      Ironic your profile pic is american flag.

    • @ninetails6218
      @ninetails6218 4 года назад

      This is hilarious considering he just chickened out of the running

  • @besmart
    @besmart 5 лет назад +11

    The census is a very important tool for keeping our government fair and functioning, and even though it’s been politicized lately, I’m really glad people like Henry are talking about it. Good policies start with good data.

    • @vkmishra364
      @vkmishra364 4 года назад

      But aren't we (humans, or any other intelligent life) violating the Universe's privacy to understand it better and also create new stuff that is both good and bad?

    • @Meekseek
      @Meekseek 4 года назад

      They have already have all the information they need.

  • @ujjwaLoL
    @ujjwaLoL 5 лет назад +107

    So I can tell that I am 5 years less or more than 13.
    So I am 18 technically and 8 also so I can play 18+ games

    • @ujjwaLoL
      @ujjwaLoL 5 лет назад +2

      @foolish fellow OK bro

    • @chatboss000
      @chatboss000 5 лет назад +25

      Actually, no. Jittering is for privacy reasons, not evidence - based ones. If your age is 13 +/- 5 years , there's no solid evidence that you're old enough to buy a game and you probably won't be able to buy it.
      You're sacrificing your ability to buy anything at or below your age to protect your privacy - but that's your call to make :)

    • @klobiforpresident2254
      @klobiforpresident2254 5 лет назад +5

      @@chatboss000
      Even worse, he could claim to be 22±5 and he couldn't buy those games. Sorry privacy conscious 26 year old OP.

    • @anandsuralkar2947
      @anandsuralkar2947 4 года назад

      Hmmm thats not right duh

  • @leovin00
    @leovin00 5 лет назад +15

    “We’ve implemented complex mathematical algorithms to protect our data”
    Russian hacker: im about to end this mans whole career

  • @TheADHDNerd
    @TheADHDNerd 5 лет назад +84

    First known census: 1086.
    2019: "Oooo privacy!"

    • @CasshernSinz1613
      @CasshernSinz1613 5 лет назад +1

      @@davidbechart7674 true

    • @klobiforpresident2254
      @klobiforpresident2254 5 лет назад +7

      @@davidbechart7674
      Funnily enough we know a census happened there at the time (more than one, actually) but we also know it cannot be the one the Bible describes, unless the Bible describes it incorrectly.

    • @lonestarr1490
      @lonestarr1490 5 лет назад +9

      Well, there haven't been too many supercomputers around at 1086, have they?

    • @jorisd6584
      @jorisd6584 5 лет назад +7

      @@klobiforpresident2254 Well maybe the bible scrambles tha data to ensure privacy /s

  • @wissamelkadamani9750
    @wissamelkadamani9750 5 лет назад +54

    Me: Why the hell does anyone need privacy
    Minutephysics: *ICECREAM*

  • @artified3498
    @artified3498 5 лет назад +62

    Moral, give him anything he gives u back MATH...EQ....and...ya...CATS......

  • @b-init1221
    @b-init1221 5 лет назад +75

    See math is everywhere, you can't run from it...
    Even after being the President

  • @kevinmorrill8347
    @kevinmorrill8347 4 года назад +6

    I worked for the USCensus Bureau in 2005 or so, it was an interesting job, and we all took privacy and confidentiality very seriously. I hope all the men and women working on the 2020 census get to see this video.

  • @samposyreeni
    @samposyreeni Год назад +5

    It's truly rare to see a public service announcement involving math. Nicely done!

  • @rea8585
    @rea8585 5 лет назад +39

    Privacy, what is that? 😀

    • @holypython4418
      @holypython4418 5 лет назад +10

      The opposite of china

    • @unflexian
      @unflexian 5 лет назад +2

      Dead.

    • @doemaeries
      @doemaeries 5 лет назад +6

      @@holypython4418 and google

    • @sodiboo
      @sodiboo 5 лет назад +2

      Oh you wouldn’t know, your pfp looks like its taken from your facebook

    • @JorgetePanete
      @JorgetePanete 5 лет назад

      @@sodiboo it's*

  • @azmyadzkiansyah279
    @azmyadzkiansyah279 5 лет назад +7

    9:21 Henry, that's not how seesaws work. If one side goes up the other goes down.

    • @burtosis
      @burtosis 5 лет назад +1

      That's only Euclidean seasaws.

  • @Nakimi190
    @Nakimi190 5 лет назад +12

    I think cats at Minuit Physics is evolving at an alarming rate... THE AGE OF CATS ARE UPON US!

  • @pingz2454
    @pingz2454 5 лет назад +7

    MATH? YOU MEAN 'MAKE AMERICA THINK HARDER'?

    • @jamesli629
      @jamesli629 5 лет назад

      MASA - Make America Smart Again

  • @freesk8
    @freesk8 5 лет назад +3

    The census is authorized in the Constitution to count adults. But it is not authorized to collect information about their sex, race, income, sexual orientation, etc. So I always leave these blank on the census form. Leaving these out globally would increase the privacy of the census, without reducing its accuracy.

    • @kevburger
      @kevburger 5 лет назад +1

      Exactly. The government is allowed to count people, nothing more. Every other demographic statistic is none of their business.

    • @saadisave
      @saadisave 9 месяцев назад

      ​​@@kevburgeryou cannot govern without knowing things about those whom you govern

  • @deep.space.12
    @deep.space.12 5 лет назад +8

    But... can't you extrapolate private information from previous non-rigorously-scrambled census data?

    • @harpiesd96
      @harpiesd96 5 лет назад

      yes, but we gotta start sometime right?

    • @deep.space.12
      @deep.space.12 5 лет назад +3

      ​@@harpiesd96 Right. I just wonder if the "averaging the noise" at 7:38 applies to scrambling within the same census, or across time. Similarly, if the "privacy loss budget" applies across time. Unlike your password, which you can change, if census data has been "leaked" anytime in the past, the privacy will be forever lost unless that person dies. Otherwise "15 yrs old male black arab" will be "20 yrs old male black arab" 5 years later.

    • @TheTrueRandomness
      @TheTrueRandomness 5 лет назад +1

      Differential privacy actually addresses this issue quite neatly: All the 'plausibility' stuff is always formulated in terms of changes between prior beliefs (what an attacker already knew before we released our answer) and posterior belief (what he thinks is plausible now, knowing our answer). So while the 2020 census obviously cannot undo any privacy loss from previous years, it will guarantee that even someone who did attacks on previous years will learn (very close to) no additional information about anybody from the 2020 census.
      Sure, it's not perfect, but you can't undo information leakage, no matter how much jittering you add to the new census ;) At least this formal thing guarantees that it won't help make attacking the old stuff easier and that you will learn basically nothing additional about the data.

  • @ReimuandCirno
    @ReimuandCirno 4 года назад +1

    Is jittering the only tool census takers have at their disposal for protecting privacy? What other (mathematical) methods have been explored?

  • @gunnargu
    @gunnargu 5 лет назад +2

    How to do census in Iceland...
    select count(*) from þjóðskrá;

  • @harrygao7632
    @harrygao7632 3 года назад +1

    2:44
    Use "" to skip by one frame - you can find "hiMOmIluvYou", "PASSWORDPASSWORD", "password1234", and many others!

  • @DaHaiZhu
    @DaHaiZhu 5 лет назад +1

    Perhaps it is time to re-evaluate the types of data collected during a census to ensure it is actually valuable to the strict and specific goal of the census itself - and nothing more. The collection of more data than is absolutely necessary for a national census should be stopped.

  • @maxhaibara8828
    @maxhaibara8828 5 лет назад +17

    hey guys, my age is 50 +/- 50 years old, and I'm either Male or Female.

    • @somedragontoslay2579
      @somedragontoslay2579 5 лет назад +11

      Through my incredible computational power of passing through your RUclips profile, I've narrowed the scope to being a male in your 20's.
      Mwahaha!! Your privacy has been reduced.

    • @jettiz3703
      @jettiz3703 5 лет назад +3

      Max Haibara youre 25 years old male.

  • @yaitz3313
    @yaitz3313 6 месяцев назад +1

    Assuming there was no jitter, how much computing power would it take to get any remotely useful privacy-violating information out of the Census?

  • @drac124
    @drac124 5 лет назад +1

    I already lie my age or date of birth in things that doesn`t matter such as hair dresser salon registration, forum registration, even facebook. So if you lie to US Census (in a way that you think it won`t matter) like saying you are 31 instead of 28 and then the US Census change that a little more, you get a bunch or wrong data all over.

  • @Mark-qp3pp
    @Mark-qp3pp 5 лет назад +1

    But you've just made the fact that there's a 31 year old white married man living in your area not private by publishing that information on youtube.

  • @shrey1265
    @shrey1265 5 лет назад +17

    *I LOVE HOW HE CAN ADD SCHRODINGER'S CAT TO EVERYTHING*

  • @Ratsos12
    @Ratsos12 4 года назад +1

    I fully disagree that the census in 2020 is at all private. It’s mailed out to be Hand Written. It’s mailed back from an address, likely with Fingerprints on it. It’s received by a Human... no, they don’t need to have access to all that metadata. Unless they want to pay me money and sign a contract in person and get it notarized, that data won’t be given.

  • @horselover19
    @horselover19 5 лет назад +2

    Great video! Thanks
    Question: Do you know if these models take into account the noise inherently present in any data collection, especially on this scale?
    I.e., assuming even zero perturbation of the data, the knowledge you get from it is still a proxy of the real information (due to human errors, intentional misinformation, etc.), so taking that into account might give you some leeway in your "privacy budget". Maybe this can be modeled as an increase in budget without harming privacy?
    Just a thought :)

  • @cosmicreciever
    @cosmicreciever 5 лет назад +9

    Thought this was going to be yet another video on encryption and was pleasantly surprised. Nice work!

  • @dasbootyliciousness271
    @dasbootyliciousness271 5 лет назад +3

    This has been literally been the best educational video I have seen this year on RUclips. Why? Because it gives a solution for how companies should act, that collect huge amounts of data! Thank you.

    • @kaitlyn__L
      @kaitlyn__L 5 лет назад +1

      yeah. when selling this stuff they should only sell jittered data, and only averages (even if the source data could be partially reverse engineered). right now data sold to and between ad networks is even worse than un-jittered correlatable averages, it's often just a selection of stats that omits your name, completely bare for the companies to correlate themselves without even having to try and unpack anything first. they're all technically not selling identifiable data if they say "she's a girl and likes cats" but with enough overlap between the information sharing, you only need one ad network picking up your location from a web search and they can eventually tie that together so they know all your preferences, your age, where you live.. just not technically your name. of course that last step, getting the name, is one of the easiest

  • @rasho2532
    @rasho2532 4 года назад +1

    I don't see why I would care that someone could reconstruct my idea if he doesn't know how to attribute this reconstructed Data.
    Sure you might be able know that there exists a 32 year old heterosexual divorced mexican woman with 3 children who loves ice cream but how would you even find that person to harm her?

  • @LeBonkJordan
    @LeBonkJordan 5 лет назад +1

    In my opinion, the concern with privacy isn't that everyone can know an individual's information; it's that single large groups can know disproportionately large amounts of information about large quantities of people. I believe if one group can access not-necessarily-sensitive data like my age and sex, then everyone should be able to as well.

  • @askemervigbahnson333
    @askemervigbahnson333 5 лет назад +1

    Why is this an issue? I mean, why is it a problem if somebody figures out my age and lets say whether I have a girlfriend? Why should such irrelevant information be kept secret?

  • @Deus_Auto
    @Deus_Auto 4 года назад +1

    Today, I learned that there were 11 people in the U.S. in 1990, 13 in 2000, and 14 in 2010. /s

  • @herp_derpingson
    @herp_derpingson 5 лет назад +6

    I was thinking more in lines of entropy, but ok.

  • @LozioLudo
    @LozioLudo 5 лет назад +2

    1:34 transgender

  • @udayy9897
    @udayy9897 5 лет назад +7

    Hands down, one of the best creators on the internet!

  • @videogyar2
    @videogyar2 5 лет назад +1

    Why does privacy matter so much in this regard? It's not like there's any sensitive information in it.

  • @uncertainscientist
    @uncertainscientist 5 лет назад +2

    And then you have credit card companies purposefully not doing any of this so they can sell data that's ostensibly anonymized.

  • @sabriath
    @sabriath 5 лет назад +3

    Sounds to me that bayes theorem against a random chance of false data can come in handy. Similar to asking people if they do something that is specifically embarrassing without knowing whether they actually do that specific thing....by having them roll a die or flip a coin to answer the question (a flip of heads means "answer truthfully" and a flip of tails means "always answer yes", leads far more people to answer honestly without the test taker knowing if it's true).
    So if you give everyone say a 10% chance of false data at a 50% range, then you end up with only a 5% draw down on the overall survey without revealing much information individually.

  • @MrWvid
    @MrWvid 5 лет назад +23

    Minutephysics posts a video which is about math, and which lasts 12 minutes.
    Me: Cool, that is what I subscribed for(no sarcasm)

  • @cavemaneca
    @cavemaneca 5 лет назад +1

    Oh hey, it's one of my favorite channels, TwelveMinuteMath

  • @binaryglitch64
    @binaryglitch64 5 лет назад +2

    Nice explanation of the importance of considering the repercussions de-anonymization algorithms.

  • @hitwalkhook3831
    @hitwalkhook3831 5 лет назад +1

    Dear minutephysics. I have a question.
    I am currently a science student in chemistry and I saw a video about quantum teleportation recently and i want to ask you something.
    Is it possible to use quantum computers with every electron being a kind of data(like 0 and 1) to make accurate measurement of scanning something very far? If so, then could we use that data to determine the chemical compound of the said object? If so, can we also use quantum computers and high sources of energy to accurately control light with controlled heat as well very far to break chemical bonds light years away? I am asking this because if that is right, then could we possibly make chemical reactions occur very far in space?(And thinking about it more, is it possible to teleport someone like that if it is accurately controlled?)
    By accuracy i mean with a lot of like near 100% but not 100% accuracy.
    Bonus question:
    Will it be possible in the future that instead of electrons determining the value of a data as of 0 and 1, we would use their spin quantum number to determine a 0 and 1 then for a secondary array... I will just state this below:
    [0,0,0,0]
    [it's spinning right side, it is moving on the 0 magnetic quantum number, the shape of the orbit is like a sphere, it is on the 1st shell.]
    1s1
    [1,1,1,1]
    [It is spinning on the left side, it is moving on the 1st magnetic quantum number, the shape of orbit is like a dumbell, it is on the 2nd shell]
    1s2, 2s2 2p3
    [1,0,0,0]
    [It is spinning right side, it is moving on the 0 magnetic quantum number, the shape of orbit is like a sphere, it is on the 1st shell]
    1s2
    And so on, or is it already like that? Or would it have some measurement problems?
    Thank you for attention.
    Sidenote:
    Currently it is 0:00am here and i am half dead, so sorry if i messed up the quantum numbers in any way.

  • @DeadtomGCthe2nd
    @DeadtomGCthe2nd 5 лет назад +2

    Perceiving Physics Persons Percolated Preparations Postulating Possible Probablistic Privacy Problems Passified Personal Passion Pertaining Principled Privacy Process.

  • @tec4303
    @tec4303 4 года назад +1

    Well the census bureau protected your privacy but many other government agencies still violate it.

  • @samtibbitts
    @samtibbitts 5 лет назад +1

    1:41 but the datasets *are* kept secret for 72 years. Without the datasets the averages and totals don’t reveal private information.

  • @lodevijk
    @lodevijk 5 лет назад +1

    Can't an attacker extrapolate the new census data from the "protected" statistics combined with past census data?

  • @Marc-dg2en
    @Marc-dg2en 5 лет назад +1

    But isnt my privacy still somewhat protected? The way i understood, the algorithm only knows how many of what kind of people exist, like 2 female ice cream lover, etc., and not whos name is actually behind that statistc. Wouldnt information like that be useless?

    • @ConManAU
      @ConManAU 5 лет назад

      As long as there's enough information to work out where you are in the data, it's still a privacy issue. For example, if everyone knows that you're the only female ice cream hater in town, then once they've reconstructed the data from the video they know your age exactly.
      While the example here sounds fairly innocuous, it can be hard to know the value of data like that, especially if the person has access to other data they can link - what if your insurer wants to know how you feel about ice cream, so they can raise your premiums if you're at risk of diabetes? What if there's another dataset out there that links ice cream preference to income?

  • @bidaubadeadieu
    @bidaubadeadieu 5 лет назад +1

    1:40 holy cow i wish I lived in a world where roughly a fourth of the population is trans or nonbinary, that would be a lovely gender distribution. Much love from your trans audience

  • @AKT1610
    @AKT1610 5 лет назад +4

    "Don't get into those maths. Maths has not helped Einstein discover gravity"

    • @shichengrao5314
      @shichengrao5314 4 года назад

      To whoever said that(maybe not you): So? Math helped with about ten quadrillion things. I’m also pretty sure math helped discover gravity, so joke’s on you

  • @Brocseespec
    @Brocseespec 3 года назад +1

    2:45 "hiMOmIluvYou"
    XD i just died laughing at that

  • @Sivah_Akash
    @Sivah_Akash 5 лет назад +1

    What is the problem of anyone comes to know of our age, gender and other personal data?

  • @thisdood4103
    @thisdood4103 5 лет назад +1

    Damn how were you able to collab with the US Census Bureau lmao

  • @buckyball2003
    @buckyball2003 5 лет назад

    Yay! My favourite channel, 12minutemaths has uploaded a new video!
    (I do actually love the video, I’m not hating, I just think it’s funny.)

  • @uplink-on-yt
    @uplink-on-yt 5 лет назад +1

    Question 1: How much wealth do you have?
    Question 2: Have you ever been convicted for robbery or burglary?
    Jitter that in a bad neighborhood...

    • @ww6372
      @ww6372 4 года назад

      Mine asked what type of white I am...
      I feel like the questions are tailored to the neighborhood

  • @raglanheuser1162
    @raglanheuser1162 5 лет назад +1

    I get the examples but honestly don't see how you can reconstruct anything from census data about millions of people

    • @ConManAU
      @ConManAU 5 лет назад

      It all has to do with the amount of detail being released - if you've got counts of people by county, by sex, by age, by occupation, by marital status, by ethnicity, and so forth, then eventually you'll reach a case where you can figure out something about someone in the data because some people are unusual enough to stand out. Especially if you're one of the people who gets picked to answer extra questions.

  • @alliesakat
    @alliesakat 5 лет назад +1

    "Prominence of peaks on the possibility plot"

  • @Gierwaz
    @Gierwaz 5 лет назад +1

    To match data to a lot of people, you also need a lot of variables. I don't know how it works in US, but with several milions citizens you need several milions of different parameters to be able solve matcher equasions. Moreover, people aren't always honest in their answers. Also having info, that in this city lives 24 yo unmarried guy who likes icecreams, doesn't get you close to who he exactly is. Is it really PRACTICALLY possible, that someone would use those statistics to "decode" demographics?

    • @cube2fox
      @cube2fox 5 лет назад +1

      Maybe the people at the census bureau are really proud of their mathematical achievement, so they overstate its usefulness a bit.

    • @Gierwaz
      @Gierwaz 5 лет назад

      @@cube2fox that's may be the case ;)

  • @blazedinfernape886
    @blazedinfernape886 5 лет назад +5

    Politicians: show inaccurate data to gains votes.
    People: how can you lie to us?
    Politicians: but... but I protected your privacy.

  • @LaunchPadAstronomy
    @LaunchPadAstronomy 5 лет назад

    Brilliant demonstration and discussion of data privacy. Thank, and great job!

  • @St0RM33
    @St0RM33 5 лет назад +3

    What if each participant writes a random value which is then used to introduce noise to each census (their own or of a different participant)? This wouldn't introduce enough random noise that the final results will be both accurate and impossible to reverse with 100% accuracy? Kinda like one-way encryption

    • @ConManAU
      @ConManAU 5 лет назад

      It's a valid option, although in practice you'd probably get the Census Bureau to do that to the data before they calculate what they publish. It's got some benefits over adding noise to the published figures, but there are also drawbacks, mostly around that problem of keeping the results accurate enough to be useful.

    • @Jo_Wick
      @Jo_Wick 5 лет назад

      Short Answer: no, because the attack on privacy of sensitive information relies mostly on the published figures, and as such, when the attackers try to find the plausible inputs, they find every person's fake, random data in addition to the real data from the rest of them. Privacy would still be violated for most persons; the noise must be implemented into the published figures, which fundamentally change everyone's most probable data in such a way as to be rendered incorrect to the attacker.

  • @FlotownMastering
    @FlotownMastering 5 лет назад +1

    What you're describing as "jitter" sounds a lot like what we call "dither" in audio engineering. Similar concept? I guess technically dither is used to remove errors, whereas this is intentionally introducing them in a useful way...

  • @Valendr0s
    @Valendr0s 5 лет назад +1

    But what's the point?
    You can go to nearly any county assessor's website and find the owner of most any house or piece of land in the United States. You can see if it's a 'homestead' (aka the owner lives there). So immediately the name of nearly every homeowner in the US and their home address, what they paid for the property, when they bought it, and who loaned them the money is all publicly available to anybody with a web crawler. You can easily fish out more information about each of those people from publicly available records.
    You can go to the FEC's website and find every donation from everybody... You cross-reference those two data sets, and you now have a general layout of the political affiliation of each house.
    You can spend some cash and get very detailed information from Google and Facebook about everybody. From likes & dislikes to ice cream preferences.
    And that data is far more private than a white male 31 year old lives in this general vicinity. I just don't see that the data provided on the Census is really all that private. And I'd suggest publishing the raw data would be worth more than any privacy loss because of it. Particularly since 1) I'm paying for it already, and 2) this is politically useful information that can be used to disenfranchise people.

  • @joshmckinney3254
    @joshmckinney3254 5 лет назад +1

    This is something I am really interested in. I loved learning about combinatorics in college. I am certain that if we are going to continue to advance in technology at the rate that we are, we must revolutionize how privacy is handled. I am totally guilty of giving Google enormous amounts of information about my spending habits and hobbies, both willingly and unwillingly. I believe that the best way to create more accurate and advanced algorithms (especially with neural networks) is to "feed the beast" and as much data as possible; plus, I feel like I have nothing to hide. That being said, I am growing increasingly weary of who is getting their hands on that data and how it is being used.

  • @Big007Boss
    @Big007Boss 5 лет назад +3

    Why do this?
    If it's only age and number of people and what they do...
    Or does it tell other information than that?

    • @goldengryphon
      @goldengryphon 5 лет назад

      Lots of information other than that. I remember the last census. Pages and pages of data 'for special, randomly selected members of an area'.

  • @AniaKovas
    @AniaKovas 4 года назад +1

    One of the most remarkable videos you've made IMHO. Thanks for all your hard work in explaining things.

  • @pgplaysvidya
    @pgplaysvidya 5 лет назад

    >12min video
    I wonder how accurate the census is compared to what the IRS might think/know based on different data collection methods

  • @deerlakediver5554
    @deerlakediver5554 5 лет назад

    You have completely gone down the rabbit hole.
    A much easier method to ensure my personal 100% privacy.
    Collect the data.
    Shuffle the names associated with the data.
    When 100 percent of my data is associated with someone elses name, you can know nothing about me as an individual, my individual privacy is complete.
    I would have to be seriously paranoid to be afraid of what a computer could/would "assume" about me. Even if those assumptions be 100% accurate.
    For example,....
    My computer algorithm just predicted that you are a human.
    That prediction is 100% accurate.
    Has your privacy been violated?
    NO.

  • @noahhughes2501
    @noahhughes2501 5 лет назад +2

    When you collab with the government

  • @baronvonbeandip
    @baronvonbeandip 5 лет назад +1

    Noise dithering huh?
    Guess I learned something making music.

  • @sethjchandler
    @sethjchandler 9 месяцев назад

    A very good video but I was struggled with the notion of a peak. In order for there to be a mathematically meaningful peak the possibilities have to be ordered, but how do you order the possibilities? I understand why the video doesn’t explain that as it’s a rather technical point, but if any of the clever, RUclips commenters, understand my question, I would be grateful for an answer.

  • @BWS2K
    @BWS2K 4 года назад

    I still don't get this - unless names are a component of the data released, we have the best privacy possible - "I know I fit a lot of your statistics, but I swear I don't fit your conclusion about me!" I don't understand why jittering is necessary when you can just *say* you jittered but didn't. This may explain why I was an art major though, lol

  • @sabouma
    @sabouma 5 лет назад +10

    And this lads is now called Minutephysics' uncertainty principle