The Clever Way to Count Tanks - Numberphile

Поделиться
HTML-код
  • Опубликовано: 3 фев 2025

Комментарии •

  • @numberphile
    @numberphile  6 месяцев назад +198

    See brilliant.org/numberphile for Brilliant and 20% off their premium service & 30-day trial (episode sponsor)

    • @Blutania
      @Blutania 6 месяцев назад +11

      The video: 38 minutes ago
      The comment: 1 day ago
      *_time travel confirmed?_*

    • @rmsgrey
      @rmsgrey 6 месяцев назад

      @@Blutania It's standard for videos to be uploaded to RUclips some time before they go live to everyone, so the uploader and, not infrequently, also patrons, channel members, or other privileged people who get given a link to the still private video can comment on it before it's published.

    • @Electronieks
      @Electronieks 6 месяцев назад

      @@Blutaniawas private yesterday

    • @Electronieks
      @Electronieks 6 месяцев назад +3

      Send this video to Ukraine 🇺🇦

    • @jerredhamann5646
      @jerredhamann5646 6 месяцев назад +2

      Its likely they used that method but a less math way of doing it is permissable. one ur going to be sending spys and spy planes to bases storage yards and depots and since a lot these things are big and in the open and since you can only form the number of tank units u have tanks for u likely have a decent count of the number of units they have at x time. if u know the serial numbering system of the enemy, then the rise in the serial numbers over time from captured equipment will tell u their rates if last month the highest serial numbers in the low 1500s but now they are in the upper 1700 it doesnt take a math phd to figure out ur looking at about 270 tanks also since the serial numbers tell number date and location it tells u something more important the lag in their logistical system. If u know how long it takes for the enemy to make and move stuff u can predict movements and actions to some degree

  • @dowesschule
    @dowesschule 6 месяцев назад +6046

    You didn't just pull out the first and last, but also the middle tanks 15&16!

    • @AndreasHontzia
      @AndreasHontzia 6 месяцев назад +175

      And 23. Iluminati!!!

    • @shripalmehta
      @shripalmehta 6 месяцев назад +78

      there's a mathematician!

    • @docsigma
      @docsigma 6 месяцев назад +224

      Thats’s Numberwang!

    • @tonelemoan
      @tonelemoan 6 месяцев назад +32

      SPOILER ALERT!11

    • @ilonachan
      @ilonachan 6 месяцев назад +118

      the luckiest draw at the unluckiest time!

  • @GrimOrdnance
    @GrimOrdnance 6 месяцев назад +535

    I adore the fact that you left the initial pull in the video, because that is the truth in probabilities. I appreciate your videos!

    • @atimholt
      @atimholt 2 месяца назад +6

      True randomness is clumpy. That's why music streaming services often don't use true randomness-you'll get too much serendipity that feels unshuffled.

  • @adsilcott
    @adsilcott 6 месяцев назад +1554

    6:33 I love the way the turrets are pointing at their actual positions in the number line :)

    • @Denis_Bobrov
      @Denis_Bobrov 6 месяцев назад +24

      Oh, I didn't notice it )

    • @LoveDoveDarling
      @LoveDoveDarling 6 месяцев назад +39

      And how the treads are in motion on the tanks. Editor going above and beyond. Bravo!

    • @taeliantalittia612
      @taeliantalittia612 6 месяцев назад +7

      4:47

    • @miketothe2ndpwr
      @miketothe2ndpwr 6 месяцев назад +8

      It's such a little detail for nerds. Love it as well

    • @LoveDoveDarling
      @LoveDoveDarling 6 месяцев назад +5

      @@miketothe2ndpwr I don’t think it’s exclusive for nerds. It’s for anyone who pays attention at details to appreciate.

  • @bittencourt16
    @bittencourt16 4 месяца назад +147

    I've just simulated 10000 of this operation for number of tanks less than 100 and number of guesses between 10 and 50, and using the maximum value as the total of tanks gives 4 on an average error, while using the maximum value + average gap leaves 2.6 as average error. That method is simply 151% more precise!! Amazing!!!

    • @vitriolicAmaranth
      @vitriolicAmaranth 3 месяца назад +6

      What about other methods, like mean value * 2?

    • @evilduck5691
      @evilduck5691 2 месяца назад +1

      @@vitriolicAmaranth this is exactly where my mind went. I feel like it should almost be equivalent to mean gaps, but I probably just haven't thought hard enough

    • @bancabancabanca
      @bancabancabanca Месяц назад

      @@vitriolicAmaranthseem to work as well!

    • @SuperPuns
      @SuperPuns 29 дней назад

      I wonder if it is really the best estimate tough (as in the average difference to the correct N is minimized). He didn't specifically say anything about that in the video and i would love to see a follow up on this. What i was thinking is the following:
      He mentioned that N=Max has the highest likelihood of producing our draw. Using the formula from 5:36 you can calculate the likelihood P(N=Max+1), P(N=Max+2), ... and so on. The sum of of all the likelihoods is an infinite series that should converge to a limit.* Now we can divide each likelihood by the limit of our sum, which gives us a probability distribution over all N's. The best guess should be the 50th percentile of this distribution.
      * (At least for k=2 or more draws. For a single draw it might not actually converge, in which case i wouldn't know how to go on...). **
      ** (Edit: In a bayesian context, using a geometric prior distribution should let the series converge even if k=1.)

  • @redryder3721
    @redryder3721 6 месяцев назад +3919

    I know it's irrelevant, but there's the old joke about letting three sheep loose in a field, but first labelling them "1" "2" and "4" so the person rounding them up spends ages looking for the 3rd.

    • @agranero6
      @agranero6 6 месяцев назад +72

      I read about this prank in the book Show Me How or More Show Me How.

    • @SunroseStudios
      @SunroseStudios 6 месяцев назад +93

      it's vaguely relevant!

    • @SwedishNeo
      @SwedishNeo 6 месяцев назад +200

      It would also make sense in this case since the Germans wanted to make the appearance that they were building more tanks than they actually were. As such they could have skipped a couple of number in their serial. But I guess it would create to much chaos for the German mind to handle. xD

    • @hakanl2585
      @hakanl2585 6 месяцев назад +137

      MI5 officer Peter Wright wrote in his book Spycatcher that MI5 bugged the Soviet embassy in Ottawa. So MI5 market all listening cable with number 1 and up. But in
      case Soviet would find these cable MI5 omitted some number hoping that Soviet what almost have to tear down the embassy in order to find the missing number.
      ( But trick did not work since Soviet had some spy within MI5 informing Soviet how many cable and what number they had. So Soviet never searched for the omitted
      number. )

    • @h.a.9880
      @h.a.9880 6 месяцев назад +148

      ​@@SwedishNeo "New orders from Berlin: We are to skip a few serial numbers when imprinting parts, so our tank production looks bigger than it is..."
      - "But zat will bring dizorder to mein numbers!"

  • @LeonMatthews
    @LeonMatthews 6 месяцев назад +164

    For several of my clients we incremented the serial number by some prime, rather than one, than in order to obfuscate the output somewhat. It also gave us some degree of parity checking on serial numbers later. Silly, really, but fun.

    • @accountxabcdef
      @accountxabcdef 4 месяца назад +17

      I would use a hash function. A secret number placed after the normal serial number, and then hash it and then use it as official serial number. Then every unit has its own official serial number, you have the secret and you can look it up, what the real serial number was and nobody is able to guess any different valid number. Even if he knows every number (except the secret) and your algorithm to create them.

    • @dafrandle
      @dafrandle 3 месяца назад

      @@accountxabcdef
      I would use a uuid and convert it to numbers via a bespoke translation - just have a check to avoid the rare collision

    • @AndreyCizov
      @AndreyCizov 3 месяца назад +2

      isn't it quite easy to figure out that all numbers are incremented by a prime number?

    • @accountxabcdef
      @accountxabcdef 3 месяца назад +1

      @@AndreyCizov
      You would need to see a few machines bought at the same time. You can not trust, that there will be all numbers used. Often there is a gap when an updated version is used. And when at that time the prime is changed, have fun to reverse engineer the prime...
      There will be enough people who are able to spot it or even reverse engineer it, but that number shouldn't be that great (depends on the batch size, amount sold to individual customers and prize - more expensive it's more reward to get something free as warranty).

  • @fatsquirrel75
    @fatsquirrel75 6 месяцев назад +1136

    Pointing out that lower numbers are more likely is such a good observation. Brady keeps highlighting his genius video after video.

    • @hylen26
      @hylen26 6 месяцев назад +79

      I don't know about genius but he does ask some excellent questions.

    • @rjwiechman
      @rjwiechman 6 месяцев назад +74

      As my late Father would have said, "Not cessinarily!". It is also true that more of the lower numbered tanks would have been destroyed or broken down and replaced and no longer in service.

    • @Yggdrasil42
      @Yggdrasil42 6 месяцев назад +21

      ⁠Exactly. Another type of survivorship bias.

    • @freitchetsleimwor2406
      @freitchetsleimwor2406 6 месяцев назад +5

      So the number line does not reflect a set of equally likely observations. Some of the serial numbers that are not yet observed are less likely to be observed than others.
      I think I am understanding this right, the not yet observed numbers between the maximum and minimum have a higher average probability of being observed than that of the numbers outside the bounds. And if the biases don't cancel each other out, the prediction is skewed. I'm sure this is a well known probability thing I'm just working this out

    • @boggisthecat
      @boggisthecat 6 месяцев назад +5

      It’s a fairly obvious observation, I think. The mathematics being shown assumes that all objects appear at once, so no temporal complications. Presumably the mathematicians engaged in this work factor in the production dates where they were known.
      Another confounding problem is repair or rebuild. For example, Russia is taking old tanks and rebuilding them into modern configurations. So these tanks are not entirely produced from new - but serial numbering is going to be a mix of old and new, dependent upon components. (It’s very complicated in this case, because there are multiple variants and changes between foreign and domestic components. We know how many thermal sights Russia bought from Thales in France, but don’t know how many domestic equivalents are being produced, as an example. So if you get a Thales serial number it’s somewhat useful, but domestic ones require some time to aggregate the data. If you can’t capture enough data then it’s not going to work, but then there are other more obvious reasons for why the information isn’t necessarily helpful in this case.)
      ‘Spys’ typically rely upon stuff like observing rail shipments. This can be gamed (which Russia has a long history of doing, because they aren’t fools) to feed false information to your opponents, however. Serial numbers are much more solid, provided you can make sense of the systems being used. These are kept very secret, unsurprisingly.

  • @caiocc12
    @caiocc12 6 месяцев назад +17

    There's a thing called "fixed-format cryptography" which can be used to make sequential numbers look random. The nice thing about it is that the encrypted number is in the same domain as the plain number (i.e. the original numbers range from 0 to say, 1 million, the encrypted numbers will also be in that range), so the attacker doesn't know they are encrypted and thinks it's just a plain sequential number. I've used that to protect against brute-forcing IDs on a system, while keeping the IDs short enough to be encoded as a barcode

  • @polyaddict
    @polyaddict 6 месяцев назад +2570

    I love how british "they have a bit of a spy" is

    • @reidflemingworldstoughestm1394
      @reidflemingworldstoughestm1394 6 месяцев назад +66

      It's not just a British thing. Sometimes I have myself a bit of a spy as well.

    • @diamondsmasher
      @diamondsmasher 6 месяцев назад +76

      Personally, I have a bit of a lookey-loo

    • @dualcrocadile
      @dualcrocadile 6 месяцев назад +21

      Sounds like a Karl Pilkington story

    • @Rubrickety
      @Rubrickety 6 месяцев назад +9

      I'm glad I had a bit of a spy before making this exact same comment.

    • @bobknip
      @bobknip 6 месяцев назад +8

      A bit of a stickybeak

  • @JS-mp7fy
    @JS-mp7fy 6 месяцев назад +30

    I did this exact maths problem at high school in 1991, what a real blast from the past! Thank you!!!

  • @courtney-ray
    @courtney-ray 6 месяцев назад +576

    At 6:36 you were right on! The gap below your minimum observation WAS equal to the gap above the maximum observation and the true number of tanks!

    • @vez3834
      @vez3834 6 месяцев назад +20

      Amazing accuracy!

    • @IDNeon357
      @IDNeon357 3 месяца назад

      The tank serial numbers were all encrypted by both allies and axis powers making this story entirely false.

    • @NTelling
      @NTelling 3 месяца назад +9

      @@IDNeon357 He addresses that in the video. He said the encryption was cracked.

    • @SpydersByte
      @SpydersByte 3 месяца назад +3

      @@IDNeon357 lol what? First of all he said they deciphered the coding they were using but also how would you know this? and why do you say it like its established fact when it clearly isnt?

    • @SpydersByte
      @SpydersByte 3 месяца назад

      yea Im surprised he didnt really point that out :D

  • @otaviodiniz5934
    @otaviodiniz5934 6 месяцев назад +142

    Man, it's 11pm local time, I'm awake since 4am, my week was a rollercoaster, I'm mad about my job, I'm dealing with a woman that is getting in my nerves, my bank account is zeroed, I'm tired and pissed...
    But for some reason, his enthusiasm telling this story made me happy instantaneously.
    Thank you for this, God bless you and your beloved ones. Got a subscription.

    • @hydra8sk
      @hydra8sk 6 месяцев назад +4

      Keep it up! Better times are ahead pal

    • @connorkapooh2002
      @connorkapooh2002 6 месяцев назад +8

      Bro, in the future you will stumble upon your comment and you'll remember where you are at now in your life. You've made it this far, you'll keep going

    • @sirllamaiii9708
      @sirllamaiii9708 6 месяцев назад +2

      You need money brother? Any way i can help?

    • @arjanab6227
      @arjanab6227 5 месяцев назад

      @@sirllamaiii9708such a kind Man U are bless you sir

    • @panthermodern6572
      @panthermodern6572 5 месяцев назад

      Hope you're doing better now. And even if you're not, it's all gonna be alright ;)

  • @MuffinsAPlenty
    @MuffinsAPlenty 6 месяцев назад +775

    Watching James Grime explain mathematics is such a joy.

    • @perplexedon9834
      @perplexedon9834 6 месяцев назад +12

      All my homies love James Grime

    • @fariesz6786
      @fariesz6786 6 месяцев назад +9

      he's just that fun mixture of adorable, approachable, nerdy, and just proficient in his job

    • @stapler942
      @stapler942 6 месяцев назад +5

      Due to Siivagunner I have this mental image of him approaching menacingly to tell me about *e*.
      But I agree, he is a joy to watch.

    • @derhesligebonsaibaum
      @derhesligebonsaibaum 6 месяцев назад +3

      yeah, he always seems to have so much fun doing it

    • @warp9988
      @warp9988 6 месяцев назад

      Making Math awesome.

  • @user-fi4zi5il9z
    @user-fi4zi5il9z 5 месяцев назад +15

    His enthusiasm is so contagius and it's so cool! the formula is surprisingly simple!

  • @jameswkirk
    @jameswkirk 6 месяцев назад +696

    A company I worked for made computers & peripherals and used 64 bit random serial numbers. They had multiple manufacturing sites, and calculated that the odds of selecting two identical numbers was smaller than human bookkeeping and errors trying to coordinate multiple product lines.

    • @ragnkja
      @ragnkja 6 месяцев назад +166

      So, like RUclips assigning video IDs, they decided that it was faster and more accurate to just check for duplicates, because the probability of the same number being assigned twice in the time it takes to check if it has already been used is extremely small.

    • @SaHaRaSquad
      @SaHaRaSquad 6 месяцев назад +112

      ​@@ragnkja Even checking for duplicates would be unnecessary if cryptographic hashsums are used. The odds of getting randomly occurring collisions with them are so low that on average it would take much longer than the lifetime of the universe.

    • @Rivinwin
      @Rivinwin 6 месяцев назад +14

      Lol, that's awesome. I love and hate it.

    • @Rivinwin
      @Rivinwin 6 месяцев назад +52

      ​@@SaHaRaSquadYah, treat a huge range of numbers as a domain, split it into segments and assign a segment to each factory, ie. 64 bit number where the top 3 or 4 bits are specific to each factory, increment the value at each factory independently of eachother per product, assign a hash of that value as the product serial number 👍

    • @jurjenbos228
      @jurjenbos228 6 месяцев назад +25

      Yep, if you use 64 bit numbers the probability of a single collision in the numbers starts to raise only after about 4 billion devices are manufactured. And even then: so what? Almost all numbers are unique.

  • @bjrnstrottman5637
    @bjrnstrottman5637 6 месяцев назад +45

    My first instinct was to use the Central Limit Theorem to assume that the sample mean would approximately equal the population mean. Since we know the distribution is uniform and the population mean of a population of size n is (n+1)/2, twice our sample mean minus one should approximate the population size.
    Here our sample mean was 17, so this method of estimates the population size as 2(17) - 1 = 33.

  • @mark97199
    @mark97199 6 месяцев назад +1085

    This only works of the serial numbers are sequential. Knowing this, the US named the the third SEAL team "SEAL Team 6" to confuse Soviet intelligence.

    • @penfold-55
      @penfold-55 6 месяцев назад +123

      And if you know where they start. For example, if the serial number was a date, this just wouldn't work (even though the numbers are sequential, they are not consecutive)

    • @AbstruseJoker
      @AbstruseJoker 6 месяцев назад +70

      Dates would still reveal some info about how many tanks there are

    • @chickenwheel45
      @chickenwheel45 6 месяцев назад +48

      He mentions that there's an encoding on top of this

    • @Sp4mMe
      @Sp4mMe 6 месяцев назад +32

      Yeah, real world probably has a lot of further problems. Like what if one month all new tanks go to front X, one month they all go to front Y, and your information and rate of capture/observation is different, for example ... ?
      But then, you might also have some rough indications from observation planes or train schedules or something that might help correlate some gaps in your data. Of course, there might also be decoys and whatnot ... well, I'm sure a lot can be done there.

    • @BenjaminGatti
      @BenjaminGatti 6 месяцев назад +19

      Serial numbers are by definition subset of a series. You need to know the series.

  • @b1oodzy
    @b1oodzy 5 месяцев назад +109

    I thought I was smart with my calculation of (1+15+16+23+30)/5x2 = 34 but this guy pulls out a giant sheet of paper and introduces probabilities.

    • @Ryanmathewsc
      @Ryanmathewsc 5 месяцев назад +15

      My mind went to the same place. As the sample size increases, the average should approach the median number. I wonder if the methods in the video offer a meaningful improvement over simply doubling the observed average.

    • @_..-.._..-.._
      @_..-.._..-.._ 3 месяца назад +3

      The x2 part didn’t make sense to me hmm 🤔

    • @b1oodzy
      @b1oodzy 3 месяца назад +17

      @@_..-.._..-.._ The first part of the equation calculates the average which is 17. To calculate the maximum you'd need to do x2 to get 34.

    • @Exaspatial
      @Exaspatial 3 месяца назад

      Same here

    • @felipea.barretto7503
      @felipea.barretto7503 3 месяца назад +2

      I did the same thing except I subtracted 1 to estimate 33. My reasoning is that if we have N tanks, all with equal probabilities, the expected average of the distribution is 1/N * (sum of 1 to N) = (N+1)/2 . Estimating this with the sample average μ, you get N = 2μ-1, which is why I subtracted the one.

  • @art1099
    @art1099 6 месяцев назад +4807

    No war thunder sponsor? Missed opportunity

    • @Nick-the-fox
      @Nick-the-fox 6 месяцев назад +105

      THis is targeting a different audience
      It's like a opera gx sponsor on a non gamer channel

    • @williamnathanael412
      @williamnathanael412 6 месяцев назад +35

      What is war thunder

    • @Sjobling
      @Sjobling 6 месяцев назад +335

      ​@@williamnathanael412 If you'd typed that into google instead of the RUclips comments, you'd have an answer immediately. But now, you have a sarcastic response 7 minutes later instead.

    • @serinat_1408
      @serinat_1408 6 месяцев назад +76

      Right here I have a bag of german tanks! Do you know where you can also find German tanks? WAR THUNDER!!!!

    • @alexscriabin
      @alexscriabin 6 месяцев назад +22

      ​@@Nick-the-foxDude what is an "anti-gamer channel"? Is it just one that reports on game devs being overworked at fromsoft or that was anti-gamergate ten years ago?

  • @Wagon_Lord
    @Wagon_Lord 6 месяцев назад +6

    I heard this story ages ago, but never understood how it worked. That "flipping the number line around" line makes so much sense; so simple once the trick's revealed. Lovely!

  • @Limrasson
    @Limrasson 6 месяцев назад +813

    His reaction to tank 30 immediately raised suspicion and I would have said "yeah, that's 30 tanks in the bag."

    • @dewhi100
      @dewhi100 6 месяцев назад +62

      Yep "Tank 30, oh, hmm, interesting..."

    • @PixelPhobiac
      @PixelPhobiac 6 месяцев назад +2

      🤣

    • @Alex-ff8si
      @Alex-ff8si 6 месяцев назад +2

      300th like

    • @roffie
      @roffie 6 месяцев назад +2

      30 got the dinks

    • @cubexyz199
      @cubexyz199 6 месяцев назад +4

      I'm on the spectrum and I still cannot see it

  • @Robi2009
    @Robi2009 3 месяца назад +6

    11:57 - the other problem would be the oldest tanks (i.e. built pre-1939) were either destroyed, removed from service or rebuilt into something else (like AA or AT platform) by the end of war

  • @EchosTackyTiki
    @EchosTackyTiki 6 месяцев назад +233

    In arms production it's fairly common for factories to assign serial number ranges to particular products in advance, so the serial number ranges having gaps within them is relatively normal. It's also normal for them to start production at something like 10,000 if they expect to make in the tens of thousands of that particular item, that way they all the items are serialized, but they also maintain the same number of digits in their serial number for uniformity without using a bunch of leading zeros. Overrunning that serial range usually results in a letter prefix or suffix being added.

    • @halfsourlizard9319
      @halfsourlizard9319 6 месяцев назад +3

      By what metric is that better than using leading zeros? Or, why the aversion to leading zeros? (Also, why not just use GUIDs? Fixed size, convey identity but no other information, never going to run out.)

    • @mnxs
      @mnxs 6 месяцев назад +15

      ​@@halfsourlizard9319As for the GUIDs, because the use of serial numbers for arms predates the invention of GUIDs by 100+ years. So, in other words, tradition - why change when you already have a perfectly workable scheme.

    • @cidiousblack2136
      @cidiousblack2136 6 месяцев назад

      @@halfsourlizard9319 When creating records people will often omit leading zeros when recording numbers possibly out of laziness, possibly by convention. Forcing the leading digit to be a non-zero digit prevents this deletion from happening,
      Why care about leading zeros? The zeros still have meaning. For instance the number of digits present can be helpful in indicating that a number in a record is a serial number specifically. Further whenever number codes get concatenated it's important to not omit digits or this will change the shape of the number code, i.e. if the serial number were a concatenation of year-month-number. Granted concatenated codes should be dash separated or similar, But if we can't trust the clerk to put the leading zeros on the number, why would I trust the clerk to bother writing dashes between numbers.

    • @AmiiboDoctor
      @AmiiboDoctor 6 месяцев назад +1

      It's normal now... but it wasn't normal then

    • @gaiamission7200
      @gaiamission7200 6 месяцев назад

      ​@@AmiiboDoctor It was more normal than actually. Sequential serialization is fairly rare

  • @eshed
    @eshed 6 месяцев назад +21

    I used this method with serial numbers of accordions made in the late 30s by Hohner, a German company. Now I have a spreadsheet named "The German Accordion Problem" with more than 150 rows.

    • @dragoncurveenthusiast
      @dragoncurveenthusiast 5 месяцев назад

      Cool!
      So, how many did they produce per month?

    • @eshed
      @eshed 5 месяцев назад +2

      ​@@dragoncurveenthusiast
      Unfortunately they didn't mark the month in the serial number, but fortunately they didn't restart every month either.
      That means I could estimate the total number of accordions with serial numbers between 1934 and 1940 to around 860000.

    • @crownhouse2466
      @crownhouse2466 5 месяцев назад +1

      @@eshed Thats a lot of accordions

    • @JtotheAKOB
      @JtotheAKOB 5 месяцев назад +1

      @@eshed you sure, they did not encode them, so their counter Accordion producers can not estimate the amount of accordions? :P

    • @eshed
      @eshed 5 месяцев назад

      @@JtotheAKOB I'm relatively certain.
      Out of the 150, I have ~20 serial numbers for which I also know the actual production date. If you plot the numbers vs the dates, you get a lovely almost linear (R^2=0.995) graph. The only way I can think of to get this relationship while preventing accurate estimates, would be to randomly skip numbers with a constant probability.

  • @K_Forss
    @K_Forss 6 месяцев назад +255

    My immediate thought was that the average of a random subset should be the same as the average of the whole, so the number of tanks should be twice the mean of the picked ones 2*(1+15+16+23+30)/5=34 for the first pick and 2*(3+10+15+18+24)/5=28 for the second. My guess is that they used multiple estimate methods and weighted the results depending on inherent uncertainties/errors of the methods

    • @sanandanojha2988
      @sanandanojha2988 6 месяцев назад +21

      Yeahs that exactly what I was thinking! Although, I suppose that it might be more susceptible to outliers then the average distance method...

    • @journeymantraveller3338
      @journeymantraveller3338 6 месяцев назад +18

      Same argument applies to the median. You can also get 95% confidence intervals for the mean and the median.

    • @Mayur7Garg
      @Mayur7Garg 6 месяцев назад

      Why twice?

    • @yurie2388
      @yurie2388 6 месяцев назад +12

      @@Mayur7Garg The average is roughly half of the total since you have both low and high numbers. Average tries to arrive at the middle point of the number set when all the numbers are unique and in series.
      (1+15+16+23+30)/5=17, which we know is too little since we have the number 30 in the series.

    • @Mayur7Garg
      @Mayur7Garg 6 месяцев назад +8

      @@yurie2388 Basically it stems from the fact that the median and the mean would be identical for such a series. So if you know the mean, then you can use it like a median to assume that the final number is at twice the distance. But in that case, using the median in the first step directly is more appropriate. Also, one issue that I have with all these solutions including the one in the video is that they do not seem to work if the serial numbers do not start from 1 but from let us say 100.

  • @lindhe
    @lindhe 5 месяцев назад +1

    James is so good! Always a great video when he's in. Also: he always looks happy, even when picking bad samples.

  • @EXPLICITBG
    @EXPLICITBG 6 месяцев назад +296

    Tanks for sharing

    • @volodyadykun6490
      @volodyadykun6490 6 месяцев назад +2

      You know destroyers for bases, get ready for

    • @cubes_art7956
      @cubes_art7956 6 месяцев назад +1

      Came here to say this.

    • @myc0p
      @myc0p 6 месяцев назад +10

      I would like to extend my tanks to Ukraine 🇺🇦

    • @talananiyiyaya8912
      @talananiyiyaya8912 6 месяцев назад +1

      Thanks*

    • @EXPLICITBG
      @EXPLICITBG 6 месяцев назад

      @@talananiyiyaya8912 r/woosh

  • @stco2426
    @stco2426 6 месяцев назад +7

    Cool. When I was studying population biology we were given a task to work out the number of taxis in a city and we used the capture, mark, recapture method, using the taxi number, rather than marking anything. So, just noting the numbers in a given time (capture and 'mark') and then noting the numbers in a given period, which was later (recapture v not seen before). There are all sorts of sample to population complexities and improvements to the estimate with longer observations (but issues with recounts if the obs period is too long). Also, an improvement if a third count period is used.
    I wonder if there are any seminal capture, mark recapture examples that Numberphile might comment on and re-create on brown paper?

  • @macdofglasgow772
    @macdofglasgow772 6 месяцев назад +61

    Excellent. I did laugh at the #1 and #30 thing. Always like Dr Grimes in these videos, I could listen to him just tel me interesting stuff all day.

    • @TheEvilCheesecake
      @TheEvilCheesecake 6 месяцев назад

      It's just the one Grime actually

    • @chriswebster24
      @chriswebster24 6 месяцев назад

      He was probably talking about him and his brother, together, the Dr. Grimes. His brother is a gynecologist.

  • @andrek4619
    @andrek4619 6 месяцев назад +2

    In some armies, the first digit determines the number of the tank company, the second digit determines the platoon, and the last one determines the serial number in the platoon. There are other numbering schemes.

    • @nevillehoward8736
      @nevillehoward8736 6 дней назад

      Isn't that a bit difficult to manage at the manufacturing stage? Or are you talking about a serial no other than a mfg serial no?

    • @andrek4619
      @andrek4619 5 дней назад

      @@nevillehoward8736 Sorry, I was referring to the number on the tank body.

  • @MichaelDoornbos
    @MichaelDoornbos 6 месяцев назад +60

    I love the "German Tank Problem." There's a great video on RUclips showing this method of counting the Commodore 1571 Disk Drives. Using this technique for "other real-world problems" is a fun exercise.

    • @Grunchy005
      @Grunchy005 6 месяцев назад +9

      Upvote for Commodore 1571

    • @thekinginyellow1744
      @thekinginyellow1744 6 месяцев назад +1

      Wow, not even from 8-bit guy!

    • @LuisRamos-jg1gf
      @LuisRamos-jg1gf 6 месяцев назад

      What's the video called? 😊

    • @ampulka
      @ampulka 5 месяцев назад +1

      found it: "How Many Commodore 1581 Disk Drives? The German Tank Problem"

  • @jbeckh2
    @jbeckh2 6 месяцев назад

    This episode was great. If you come across more war history examples, please post them. My son loves war history and was fascinated by this. This helps to understand why math is important.

  • @jamesterwilliger3176
    @jamesterwilliger3176 6 месяцев назад +565

    Spies be like "tank you very much" but the mathematicians be like "tanks but no tanks"

    • @DergyQT
      @DergyQT 6 месяцев назад +1

      Tanks for the quantities

    • @mikegleasonjr
      @mikegleasonjr 5 месяцев назад +2

      That joke blows

    • @user-ro1cc8tz6d
      @user-ro1cc8tz6d 5 месяцев назад +1

      HAHHA

    • @----.__
      @----.__ 5 месяцев назад +5

      That joke tracks.

    • @davebowman6497
      @davebowman6497 4 дня назад

      Mathematicians give their estimate. Spies thinks there armour.

  • @bastawa
    @bastawa 6 месяцев назад

    That was brilliant! your initial picks are exactly why it was so hard for me to grasp probability at school until I realized it is about multiple events and doesn’t work that great for a single event

  • @reedjasonf
    @reedjasonf 6 месяцев назад +265

    The disgust in Dr. Grime's voice at 2:24 when he says "I'm NOT going to let you feel the weight of the bag! [Are you daft?]"

    • @reidflemingworldstoughestm1394
      @reidflemingworldstoughestm1394 6 месяцев назад +40

      And rightfully so. Who gets to heft a German tank factory during a war?

    • @robinsparrow1618
      @robinsparrow1618 6 месяцев назад +10

      the time code you put is after the moment you're talking about

    • @WofWca
      @WofWca 6 месяцев назад +9

      2:16

    • @PeterNjeim
      @PeterNjeim 6 месяцев назад +18

      ​@@robinsparrow1618this is a common phenomenon I've seen over the years. Someone will watch the video, after watching a funny part, they click pause, then copy the timestamp, forgetting that this time stamp is after the clip

    • @hdbrot
      @hdbrot 6 месяцев назад +3

      ⁠​⁠​⁠@@PeterNjeimMaybe OP edits it in. Let‘s hope for the best :)

  • @JackGremlin
    @JackGremlin 5 месяцев назад +2

    I've done nothing but fail math all my life yet I find this video interesting enough to take notes and watch twice.

  • @rPuck
    @rPuck 6 месяцев назад +40

    Tanks for sharing!!!

  • @svenlima
    @svenlima 6 месяцев назад +4

    It's the same question we posed as kids: "How do you count a herd of sheeps?" - "You count the legs and divide the number by 4." At the time we found that funny.

  • @molieros
    @molieros 6 месяцев назад +179

    James: There are 30 German tanks in the bag.
    Chuikov: We were aware of that.

    • @Alex-ff8si
      @Alex-ff8si 6 месяцев назад +1

      50th like + first reply

    • @rogerxiao4458
      @rogerxiao4458 6 месяцев назад +3

      Krebs: That seems unlikely.
      (Downfall movie reference if you don't get it.)

    • @TheBrad574
      @TheBrad574 6 месяцев назад

      Someone read Cornelius Ryan's The Last Battle and his interview with Chuikov.
      I just noticed someone mentioned Downfall too. The book is the source material.

  • @dattaprasadgodbole
    @dattaprasadgodbole 6 месяцев назад

    Every part of this video - from finding out the numbers to objections raised - was brilliant. I love this video.

  • @gustavakerman2566
    @gustavakerman2566 6 месяцев назад +239

    Alternative title: Local British mathematician gets blindsided by sheer stupid luck

  • @1_in_8billion
    @1_in_8billion 2 месяца назад

    Hey everyone, I just started learning how to use octave and just for kicks I made a program to do this very estimation. (Thanks for sharing Numberphile, this is really neat stuff!) Here's the *script* if anyone wants to fiddle around with it: (I've added a percent error so you can see just how remarkably accurate this estimation is!)
    actualNumberOfTanks = ceil(rand*1000);
    disp(["actual number of tanks: ", num2str(actualNumberOfTanks)]);
    totalPoolOfTanks = [1:1:actualNumberOfTanks];
    numberOfPicks = ceil(rand*100);
    disp(["number of picks: ", num2str(numberOfPicks)]);
    tankNumberPicks = [1:1:numberOfPicks];
    for pick = [1:1:numberOfPicks]
    tankNumberPicks(pick) = ceil(rand*actualNumberOfTanks);
    end
    disp("tanks randomly selected: ");
    disp(tankNumberPicks);
    estimatedNumberOfTanks = max(tankNumberPicks) + ((max(tankNumberPicks) - numberOfPicks) ./ numberOfPicks);
    disp(["Estimated number of tanks: ", num2str(estimatedNumberOfTanks)]);
    percentError = round(((estimatedNumberOfTanks - actualNumberOfTanks)/actualNumberOfTanks)*100);
    disp(["percent error: ", num2str(percentError)]);
    %;D

  • @Canzandridas
    @Canzandridas 6 месяцев назад +6

    Somewhere deep within my brain I'm pleased with this video because Dr Grime always reminds me of the young folk who went to ww2 saying they were adults when they weren't and this video is about tanks

  • @Ring_Zero
    @Ring_Zero 6 месяцев назад +5

    We're using similar techniques with serial numbers to investigate production numbers for relatively rare camera models from the early 1970s.

  • @Demasx
    @Demasx 6 месяцев назад +26

    This feels like one of those widely usable maths that I won't be able to find an application for anytime soon... then when the time comes, I'll remember there's a solution but not what it is 😅 Bookmarking it now for that future occasion, haha

    • @EdMcF1
      @EdMcF1 3 месяца назад

      Ukrainians might find it useful

  • @indranilroy4822
    @indranilroy4822 4 месяца назад

    I always find it fascinating how these equations can be derived after rigorous application of a simple general concept, like at the beginning of the video you can feel that the frequency of smaller numbers (hence more smaller gaps) would affect the estimate but the quantifying part takes time to visualize in its precise form

  • @TheDuckofDoom.
    @TheDuckofDoom. 6 месяцев назад +30

    I just tell the german book keeper that I think his records are sloppy, and he shows me all of his work to prove me wrong.

    • @MrZauberelefant
      @MrZauberelefant 6 месяцев назад

      That was in a movie, wasn't it?

    • @lukasskymuh5910
      @lukasskymuh5910 6 месяцев назад +3

      This would never work! .... not unless he plays war thunder...

  • @heinaung6967
    @heinaung6967 6 месяцев назад

    Thank you Brady for making these videos, every time I watch it motivates me to do my job better as an engineer/computer scientist

  • @impossiblemission4ce
    @impossiblemission4ce 6 месяцев назад +45

    First Enigma, now these tanks. Sometimes it feels as though James is gearing up for a time travel mission.

    • @talananiyiyaya8912
      @talananiyiyaya8912 6 месяцев назад

      Obviously not...

    • @_invencible_
      @_invencible_ 6 месяцев назад +3

      @@talananiyiyaya8912 nice try, MI6

    • @sandekv
      @sandekv 6 месяцев назад +5

      He is winding down from one. He went there, helped Britain win, and came back.

    • @jimmyzhao2673
      @jimmyzhao2673 6 месяцев назад +1

      @@sandekv He's slowly revealing that to us.

  • @paladin656
    @paladin656 5 месяцев назад

    I saw the thumbnail and thought this was going to be about counting takes on the move in a column or formation, but the history tie in made this really interesting. Thanks for this!

  • @mladengavrilovic8014
    @mladengavrilovic8014 6 месяцев назад +51

    it would also make sense to calculate the average of the samples and multiply it by 2 as the average of consecutive numbers starting at 1 would be about n/2 and the average of the samples would also approach the same value.

    • @timseguine2
      @timseguine2 6 месяцев назад +17

      Close. The average of the observations is an estimator of the mean of the serial numbers in the bag. You got that much right. But the average serial number is (n+1)/2. So you have to double it and then subtract one.

    • @raiseer
      @raiseer 6 месяцев назад +5

      Was my first idea, too. They did basically the same with extra steps :)

    • @rianfelis3156
      @rianfelis3156 6 месяцев назад +6

      The reason for those extra steps is that you usually have padding around the serial numbers, like just start counting at 1500 because the 15 means something else, and the last two digits are sequential. Which they did touch on, but not a lot.

    • @halbronk7133
      @halbronk7133 6 месяцев назад +10

      This is the method I thought of too, but it turns out that the numbers you find other than the max aren't relevant. However many tanks there are, finding 1, 2, 3, 4, and 30 is the same as finding 26, 27, 28, 29, and 30 (as long as the serial numbers start at 1).

    • @timseguine2
      @timseguine2 6 месяцев назад +6

      @@halbronk7133 "the numbers you find other than the max aren't relevant": This isn't precisely true. They are relevant in the sense that they produce a valid estimate for the maximum. The problem is that it ignores relevant information that we know about the problem (that the numbers are sequential without gaps). And usually when you don't use some piece of information to derive your answer then it is possible to do better.

  • @jokoluna6978
    @jokoluna6978 5 месяцев назад

    This video is brillant! I knew about the story and always thought there is some really complicated math behind the scientists work. Nicely explained, thanks! :)

  • @LudicrousTachyon
    @LudicrousTachyon 6 месяцев назад +10

    For electronics with network cards, companies are assigned ranges of MAC addresses as they are supposed to be universally unique. The range could allow one to estimate the number of devices they sell.

    • @trueriver1950
      @trueriver1950 6 месяцев назад +2

      Life, including the Y-T algorithm, is strange indeed

    • @stargazer7644
      @stargazer7644 4 месяца назад

      The operative words here are "supposed to be". And nobody says they have to be assigned sequentially. Each organizationally unique identifier (OUI) can create 16 million unique MAC addresses. And you can have more than one OUI.

  • @mateodemicheli2420
    @mateodemicheli2420 6 месяцев назад

    Awesome concept of a video, I love how you explain each part slowly of the puzzle and the graphs, it helped a lot. Im sucribing right now

  • @bryan-nz
    @bryan-nz 6 месяцев назад +36

    Have you ever done a video on "Hyper Log Log"? We use it in massive data systems for efficiently estimating the number of unique values. It is very interesting, and freakily accurate.

    • @EricKay_Scifi
      @EricKay_Scifi 5 месяцев назад

      I've used that in BigQuery. APPROX_COUNT_DISTINCT is great for figuring out new data.

  • @meownezz
    @meownezz 5 месяцев назад

    Information and mathematics once again showing their overwhelming and seemingly timeless relevance. 🙂

  • @brmolnar
    @brmolnar 6 месяцев назад +45

    Seal Team 6 is named that to imply that there are at least 5 other Seal Teams. At least this is the common rumor.

    • @Laotzu.Goldbug
      @Laotzu.Goldbug 6 месяцев назад +11

      This is actually true (at least according to Richard Marcinko's autobiography). Now presently there are well over six SEAL Teams (8?) but when Marcinko created a specialst SEAL unit in 1980 here were only two other ones, and "Seal Team 6" was a deliberate attempt at deceiving the Soviets.

  • @willywodka
    @willywodka 6 месяцев назад

    Love the honest enthusiasm on this formula!

  • @aleksihermonen9017
    @aleksihermonen9017 6 месяцев назад +60

    I was thinking about taking the average and doubling it. The idea being that the average would be approximately in the middle of the true number, so double the average would be close to the true number.

    • @PsychoMuffinSDM
      @PsychoMuffinSDM 6 месяцев назад +5

      That's what I did, lol.

    • @xerkules2851
      @xerkules2851 6 месяцев назад +4

      Same here. That method gives very similar estimates in these examples.

    • @TomVennix
      @TomVennix 6 месяцев назад +8

      I think you can improve this estimate by subtracting 1 at the end, since the average of the numbers 1 up to and including N is (N+1)/2 rather than N/2. Denoting the sample average by X, your idea is that X should be approximately equal to (N+1)/2, which would imply that N is approximately equal to 2X-1.
      I'm actually curious to see how this performs (in general) compared to the method presented in the video.

    • @akshaj7011
      @akshaj7011 6 месяцев назад

      That wouldn't work if the serial numbers didn't start from 1

    • @aleksihermonen9017
      @aleksihermonen9017 6 месяцев назад +4

      @@akshaj7011 That's true, but the average cap wouldn't work either if they take account to the cap from 0 to the first element.
      If the starting point would be unknown, i would probably use standard deviation in the same manner.

  • @asyrun
    @asyrun 6 месяцев назад

    this was cool, I liked this. kind of reminded me of being in high school with a teacher I actually liked. subbed.

  • @romansanders
    @romansanders 6 месяцев назад +12

    Apple serial numbers were sequential until about 5 years ago. They even contained information about which factory produced the item and when.

    • @MrZauberelefant
      @MrZauberelefant 6 месяцев назад +5

      They still should, trackability is vital information.

  • @SteveThePster
    @SteveThePster 4 месяца назад

    Great video! Maths is just so awesome sometimes, particularly probability theory

  • @aksela6912
    @aksela6912 6 месяцев назад +19

    OK, what about this: As the sample size increases, the average of the sample will approach the average of the population, so let's estimate the average like that. For a uniform distribution starting at zero the maximum is simply two times the average, but in this example the minimum is one, so we'll just subtract one from our average. Using this method I get 32 and 28 tanks, respectively.

    • @cryme5
      @cryme5 6 месяцев назад

      Or double the median. It would have been 32 and 30. Not sure which is usually closer, I feel like you need a Bayesian analysis with a prior.

    • @aksela6912
      @aksela6912 6 месяцев назад

      Although these specific estimates has less error than the ones presented by James, on average his method will be better, at least for larger samples. I did some simulations, and for small samples, say three, it's pretty close, but James' method has a lot more bias.

    • @cryme5
      @cryme5 6 месяцев назад

      ​@aksela6912 Funny thing is, no matter the prior you use, the posterior probability of N (the total number of tank) is just the prior truncated starting from M (the maximum of the observed serial numbers). In other words, a Bayesian answer, no matter the prior, should only depend on M (not even on the number of samples).

    • @aksela6912
      @aksela6912 6 месяцев назад +2

      @@cryme5 For a uniform distribution the variance of the sample median will be greater than the variance of the sample mean, and as mean and median should be the same it will be better to use the one with less variance. I have to reiterate though, sample mean times two is a poor estimator, even if it feels more intuitive, and it feels like you're utilising the collected data better.

    • @EebstertheGreat
      @EebstertheGreat 6 месяцев назад +1

      @@aksela6912 James's method is unbiased. If you observe n tanks and the maximum value you observe is m, then the minimum variance unbiased estimator is m + m/n - 1. Your estimator of twice the sample mean minus one is also unbiased, but its variance is higher. And it doesn't use the important information of the sample maximum, which means the estimate might actually give a value we _know_ is too small.

  • @alphakumar-g4q
    @alphakumar-g4q 6 месяцев назад +2

    i have other method of solving we will average the numbers of 5 random tanks we picked then the average will be close to the combined average of total numbers of tanks so avg of 5 = 17 = n(n+1)/2n {avg of all numbers on tanks , n=total number of tanks } we get n=33 =total number of tanks

  • @JaniLaaksonen91
    @JaniLaaksonen91 6 месяцев назад +23

    Would make a nice graph plotting your best guess of total tanks, pulling one tank at a time. Any time you get a new biggest number the plot would jump up, and when you get smaller numbers it will slowly decend as your average gap gets smaller. It would jerk up and down, approaching the actual total number.

    • @virt1one
      @virt1one 6 месяцев назад +1

      agreed that would be nice to look at, though you'd want a larger set than 30. should start out a as a line jumping up and down but rapidly smoothing out. After it calmed down a bit you could probably do a bit of "eyeball extrapolation" to get a more accurate estimate than the last prediction.

  • @aivehn
    @aivehn 5 месяцев назад

    Great overlap of math and history. Tanks a lot!

  • @pallavinavin4988
    @pallavinavin4988 6 месяцев назад +11

    Love ur passion, professor

  • @drewmqn
    @drewmqn Месяц назад +1

    1:33 Before hearing the punchline, I predict the math nerds got it super close based only on 1) this story being told on a math channel and 2) how pleased James looks in telling it.

  • @Diekyl
    @Diekyl 6 месяцев назад +18

    At first, I was perplexed about the method of estimating monthly production with just serial numbers, but I am glad they explained they had a way to decode the month and factory of the tank as well. I assumed some of these numbers must have been intentionally hidden or misleading.

    • @suit1337
      @suit1337 5 месяцев назад +2

      no, they were just contracted to different manufacturers and sub-models (Ausführung) and we're assigned specific number ranges
      the gearboxes, or rather specific the engines with the geartrain attached were often shared between different models, like the Panzer V Panther and Panzer VI Tiger shared the same engine platform, and only was different in minor details and power
      in the later stages of the war it was not uncommon to use what was in stock or repair tanks with parts from different models

  • @adamgrimsley2900
    @adamgrimsley2900 2 месяца назад

    I love this stuff, it's so much fun guessing how to get a formula

  • @MangoJones139
    @MangoJones139 6 месяцев назад +4

    I really like Brady's talent for asking "good questions"

  • @Darrylx444
    @Darrylx444 6 месяцев назад

    It would seem to be a basic opsec precaution for a military equipment manufacturer to randomize serial numbers, having their own private ledger to decode it into sequential order again if needed. Particularly during wartime production. Or even just add a random letter prefix to each batch run perhaps.
    Anyway, thanks for the great video.

  • @GeekRedux
    @GeekRedux 6 месяцев назад +8

    12:17 "But we broke that code, okay? That's another story." Well, now we've got to hear it! Enigma, or something else?

    • @TheBendermen
      @TheBendermen 6 месяцев назад +1

      The Engine machines were for coded communications, I think. I think he meant that the serial numbers were coded, which isn't uncommon for different companies and favorites to have different ways of doing things

  • @RichardJBarbalace
    @RichardJBarbalace 6 месяцев назад +4

    I think there may be a simpler and more accurate way to do the estimation. My first thought gave estimates of 34 and 28 for the two trials, beating Brady's estimates of 35 and 27.8 both times compared to the actual number 30. Assuming "everything is equal and random" (i.e., a uniform distribution), just take the average of the tank numbers and double it. This also balances all the potential gaps.

    • @ChristopheSmet123321
      @ChristopheSmet123321 6 месяцев назад +1

      That is certainly a valid method as well, also unbiased (meaning on average you will be spot on). However, the "maximum plus average gap" method is more efficient, i.e., it has a lower mean squared error: the squared difference to the actual N will on average be smaller than using your method. And that is what you want from an estimator!

  • @betabenja
    @betabenja 6 месяцев назад +6

    6:24 scary camera pan

  • @matmar10
    @matmar10 5 месяцев назад +1

    This is such a fun video on many levels.

  • @AloisMahdal
    @AloisMahdal 6 месяцев назад +9

    I keep coming back to the Brady's question at 11:41 -- if in my distribution, lower numbers are more likely, would there be an easy correction for that?

    • @forasago
      @forasago 5 месяцев назад

      You would have to come up with a formula for how much more likely the lower numbers are and calculate some kind of upward bias out of that. I don't think the answer could be considered "easy", no.

    • @ArcaneOath
      @ArcaneOath 5 месяцев назад +3

      For the purposes of war estimations, I suspect you'd find the opposite true, particularly as time goes on - the data would become skewed towards newer serials for everything, as older models were destroyed or made inoperable.
      Probably best to hash military serial numbers at manufacture time though, regardless.

  • @andrewberryman4957
    @andrewberryman4957 5 месяцев назад +1

    I love James so much. "No, I'm not going to feel the weight of the bag!" So hilariously offended.

  • @YEASTY_COMMIE
    @YEASTY_COMMIE 6 месяцев назад +5

    If you take the simpler formula of twice the average value of the tanks, it actually gives better prediction in this case (34 and 28, if I can still perform additions)

  • @sabinrawr
    @sabinrawr 5 месяцев назад

    Brady's final questions show amazing insight. My favorite anecdote involves SEAL Team Six. There was not a 5, they just used the number to make people think there were more teams. I don't know if this story is true, but I like it and it shows that you have to know the parameters of the numbers instead of assuming a sequence starting with 1.

  • @fespa
    @fespa 6 месяцев назад +4

    Another great and entertaining video. Thank you. I would love to read the paper about the why the spies were so wrong.

    • @Jeff-jr4xw
      @Jeff-jr4xw 6 месяцев назад +1

      Me too. I thought maybe they were being fed false information?

  • @5000rgb
    @5000rgb 2 месяца назад

    I appreciate the 500% more description. A lot of people muddle that up and would say 600% more. They ship past the fact that 6 times as many is 600% OF or 500% MORE.
    I think my estimation was a little different technique. If we take the arithmetic mean of all the tanks we come up with a number that is half the total. So by taking the mean of the numbers on tracks pulled out of the bag, we can double it.

  • @PhilBoswell
    @PhilBoswell 6 месяцев назад +13

    RUclips recommended me a short video by Hannah Fry about this very thing just this morning: I don't recall how old the video was but life is strange!

  • @shaun7163
    @shaun7163 6 месяцев назад

    This guy is the absolute best and has been for years!

  • @kleddit6400
    @kleddit6400 3 месяца назад +13

    1:51 “Is that a German tank or?” *every tank enthusiast goes oof*

  • @Ojisan642
    @Ojisan642 4 месяца назад

    James Grime is really a fantastic educator.

  • @spencerarmon4491
    @spencerarmon4491 6 месяцев назад +7

    Would be cool to see the mathematical derivation of calculating the expected value of the tanks using an infinite sum of the probability at the beginning

    • @Last_Resort991
      @Last_Resort991 6 месяцев назад

      Its not an infinite sum when is finite. It has N elements

    • @spencerarmon4491
      @spencerarmon4491 6 месяцев назад +1

      @@Last_Resort991to properly calculate the expected value, it would be an infinite sum from the max number seen to infinity

  • @julkkis666
    @julkkis666 6 месяцев назад

    I went into this kind of theory in my thesis on anonymizing production data (for test use). Fun to see a real world example.

  • @beal_a
    @beal_a 6 месяцев назад +3

    IIUC, this is also a problem where frequentist and bayesian techniques arrive at different answers. I'd love to see an explanation of that.

  • @ritual_aftermath
    @ritual_aftermath 2 месяца назад +1

    If I would have had a teacher like this when I was young, I'd likely have become a mathematician! Great video, thank you!

  • @Xelopheris
    @Xelopheris 6 месяцев назад +19

    I literally saw the Hannah Fry video about this yesterday and kind of assumed that this would be a Hannah Fry numberphile video.

  • @adrianv.v.4445
    @adrianv.v.4445 4 месяца назад +2

    When calculating the average gap, you should just count the difference between tanks (e.g., between 15 and 1, count that as 14). That way, what you get is the actual (assimptotically) un-biased estimator of the number of tanks. If we do it your way, we get: MAX + (MAX - k)/k = MAX * (1+1/k) - 1, which when we let k->infinity, MAX->Acual_value and therefore we get the Actual_value - 1. It can be also proven that the estimator is biased for any k, outputting smaller values than the real one.
    If we don't add that -k to the formula (that is, we count the gaps as just the difference), the estimator we get is MAX + MAX/k = MAX * (1+1/k), which is the actual un-biased estimator we should use in this case. One may call this the adjusted Maximum Likelihood Estimator (MLE). As you said in the video, the MLE is just the MAX (the number most likely to be right), but it is biased. What we did with this trick was, as you explained, correct it.
    A more standardized way to compute this correction would have been to calculate the Expected Value of the MLE we got, to then apply the necessary multiplicative correction. That is, if it is necessary at all (MLE might as well be unbiased itself). This is one of the most used methods for estimating stuff out in the real world (when we are able to get an MLE).

    • @rennleitung_7
      @rennleitung_7 Месяц назад

      I agree, the definition of the distance looked a bit fishy to me. But as k infinity when k -> infinity. So I think, you might need a more sophisticated reason to make your point.

  • @EXPLICITBG
    @EXPLICITBG 6 месяцев назад +9

    “I will do one”
    Lo and behold, one he proceeded to do

  • @RAFAELSILVA-by6dy
    @RAFAELSILVA-by6dy 2 месяца назад

    I got a solution to this problem in a maths challenge about eight years ago. My approach used conditional probabilities and the expected number of tanks in the bag would be:
    N = MAX(k -1)/(k-2)
    For five observations (k = 5), this gives N = MAX(4/3). This is higher than the average gap approach, which gives N = MAX(6/5) - 1

  • @WAMTAT
    @WAMTAT 6 месяцев назад +4

    Nothing better than James talking WW2

  • @OsamaRana
    @OsamaRana 6 месяцев назад

    This was a delight to watch, like all videos starring James.

  • @Eddy002
    @Eddy002 6 месяцев назад +3

    I think the “failed” demo was perfect since you had to explain not only how it works, but also where the formula fails.
    Reminded me of school. The teacher would teach the easiest way to understand something, but then on a test it would be the hardest example/use of that formula. School failed, numberphile succeeded.

  • @David8n
    @David8n 6 месяцев назад

    When i was doing stats at university the lecturer had us fill in a questionnaire on day one to give us some nice data to do analysis on (birthdays and such). It was all nice data except that there wasn't a single left hander in the class. Not one. There ought to have been about ten but there was zero. Credit to the lecturer, he rolled with it. His attitude was, "these things happen - we don't fudge our data". It was actually a great class.

  • @ventinor7451
    @ventinor7451 6 месяцев назад +6

    Nothing like a James Grime Numberphile video.