How Binary Search Makes Computers Much, Much Faster
HTML-код
- Опубликовано: 29 сен 2024
- Featuring binary versus linear search, and non-clustered indexes. Uh, indices. However you want to say it. • MORE BASICS: • The Basics
Written with Sean Elliott / seanmelliott • Camera by Tomek • Graphics by William Marler wmad.co.uk
🟥 MORE FROM TOM: www.tomscott.com/
(you can find contact details and social links there too)
📰 WEEKLY NEWSLETTER with good stuff from the rest of the internet: www.tomscott.c...
❓ LATERAL, free weekly podcast: lateralcast.com/ / lateralcast
➕ TOM SCOTT PLUS: / tomscottplus
👥 THE TECHNICAL DIFFICULTIES: / techdif
I am, as ever, extremely thankful for animator William Marler for handling all the graphics here!
epic
Pinned one month ago hahaha
Hello
is says your comment is from a month ago yet this video is 13 seconds old....?
One month ago ?! Wow...
When you really think about it, the existence of search engines is something absolutely incredible that we just take for granted
Fun to build as well.
That's why the internet has been called the world's largest library with the world's worst indexing system.
A week after my CS class covered the concept, Tom Scott comes around an explains it in an easier to understand format
My teacher used his video on sorting system
Haha same here, Gr 12?
so what's missing?
@@BloodSprite-tan nothing is missing per se, more that the Cs class also covered how to write the binary search and indexing and sorting to make binary search efficient.
im literally going over this rn in my cs class lmao. RUclips knows
Great to see Tom filming in the real computer history centre, after so many months.
The animation and talk are both enlightening. Thank you for this great video Tom!
0:19 The attention to detail here is incredible. Besides the obvious joke about Tom’s red shirts, John Scalzi is an actual author that wrote a book called _Redshirts_
My teacher: Talks about anything
Me: Bored
Tom: Talks about anything
Me: interested
Bravo! *How to compress a day of college into 7 minutes.* Wish I had this as a student. Knew the theory, but not about Dewy's biography. That was a nice touch which never came up in Computer Studies nor Information Systems, and would have been a good reason why my University used another method.
Here in the US, college and university libraries tend to use Library of Congress cataloging, which has more detailed categories and tends to work better for larger collections. Some large public library systems use it too, like the two largest systems in my area (Hennepin County Library and St Paul Public Library).
My personal favorite index for book orders is how a NY library organizes by height just so they can fit all their books on shelf's.
That can go hand in hand with Dewey order, though: arrange bookcases in Dewey order, and arrange the shelves in each case according to how tall the books are in that case's subrange of Dewey numbers.
This sounds like a dumb system until you're actually faced with trying to put a book on a shelf that is too short. Then, suddenly, sorting by size is the only system that makes sense.
Most libraries I've seen have a section for oversized books. The rest of the books are then ordered by some sort of numbered classification system.
@@chaosordeal294 My preservation & conservation teacher in library school once mentioned that ordering books by height is a good way to keep them from getting uneven exposure, which helps with their long term conservation.
*shelves
Love the sidenote on Dewey, good and important of you to say.
3:02 Thought you might want to know that the captions are messed up! It says "forget about the side w nd so on," instead of "forget about the side and so on"!
Was the error you made intentional?
@@barneystinson3358 you?
@@barneystinson3358 Are you okay, Barney?
It's forget about the side we don't want and so on
You should be able to suggest an English translation by clicking on the three dots directly under the right side of the video.
5:36 I'm using the triangle to describe the struggle between CPU, ram, and disk space. The size of the triangle is showing the size of the impact on the system that the program will have.
4:02 genuinely expecting Tom to say “a cluster ****” 😂
Great video, Tom! I work with huge databases on a daily basis and was never formally trained so this helped me understand indices better. (Also, I like that you say "indices," since I'm generally alone among my coworkers, who say "indexes.")
OK, so if you code (a great tool for RDBA), build your own dataset. Then build your own index. Then a deeper one. Then run tests for lookups on all three datasets.
Dewey is a fantastic example of an uncomfortable truth. Sometimes terrible people can do great things.
Dewey is not the only example. Heisenberg is literally Nazi
I would've liked this video anyways, but it gets EXTRA love for the shout-out to John Scalzi at the very beginning.
I'm literally learning about this in my data structures class right now! Thanks for the review.
I legitimately believe your short videos should be shown in a variety of subject lessons in secondary schools nationally.
They are informative, accurate, researched and extremely well explained by taking the form of a narrative.
People can really learn about the way the world around them works from your efforts.
I was so engrossed in the library sorting segment at the beginning that I forgot the title and length of the video and upon seeing the intro, thinking it was the outro, went "Oh that was nice." And went to scroll down when Tom continued to speak and spooked me.
I started 5 weeks ago with my computer science studies and got this topic (the basics of it) 3 weeks ago, but Tom can explain it way better than my teacher did
"Binary search" is correct but confusing, so that;'s why many in the industry use the term "binary chop". As always, interesting and presented in a great way.
5:40: modern computers having literally hundreds of gigabytes of space and insanely efficient compression algorithms
Came to youtube procrastinating studying about binary search to find Tom Scott release a video about it. I think it's a divine sign to go back to study.
i like how tom gets really excited when talking about this..
Funny thing is, earlier today I was working with a classmate on an assignment with Binary Trees. Turned out one of the solutions we couldn't figure was indexes. Gotta love Tom
I javascript letters have numerical weight for them, "a" > "b" is true. I was completely confused when I learned about the "sort" method for arrays that operated exclusively on whether it received 1, 0, or -1 I had no idea that here you can do math with letters as if they were numbers.
Years ago I was working on a computer program on a very slow computer, and I had just read an article on binary sorting. Using that knowledge, I rearranged the decision process in the software which gave me a 40% speed increase. I've been using that trick ever since :).
Nice reference, Scalzi's 'Redshirts' is a very good book.
Congrats Tom. You are required listening for CS 2 at my school.
I just learned DB indexing without expecting to.
I love educational videos.
A113 easter egg on a Tom Scott Video? More likely than you think.
If you don't know how indexing works, then it becomes a certain magical index.
Great explanation. Particularly answered my question of "why doesn't mySQL just create an index for every column by default?"
I was all ready to type "The plural of index is indices, not indexes . . . and that's something you might not have known!"
But then you corrected yourself, because you did know.
nice a113 reference
I used to work at a place that had the same or similar model of disk drives that are behind Tom in this video. When a record had to be read from disk for each test in a search, if a file had millions of records, because of the time needed for disk I/O, even a binary search would be too slow. We had an indexing system where one read of the index file would bring in dozens to a couple of thousand indices. A search of the indices read would take place in memory, and result in either the location of the record, the discovery the target record didn't exist, or a location where another set of indices were to be read. The number of disk reads needed was reduced from (log base 2 N) for a binary search to (log base i N), where i is the number of indices read with one I/O, and N the number of records in the file.
Have you heard about the insane Everything search for Windows?
It's a program that, within seconds, indexed all your disks (for me it's 1TB+2TB) with about 1TB of data and is then able to *instantly* search for anything, any little file on your computer or sort by size, last modified, anything.
Thanks for expanding my understanding.
And the next video in the series? Maybe Hashing?
Regular watchers of sorting algorithm visualisers are overjoyed at the existence of this video.
I'm confused why I was intrigued enough to watch this after a decade of coding, but here I am.
Whenever anyone mentions Melvil Dewey I am compelled to inform them (and anyone within earshot) about Henry Evelyn Bliss and spicy library history drama.
And yet you don't provide the spicy library history in this comment!
Well, you normally use a hash table for the primary key index, not rely on binary search for that. Binary search is common for other indexes though.
I can predict the future, this video will be shown in computer science classrooms for years to come
0:33 nice toy story reference - A113 on the box xD
Saw the title because I literally did this is in computer science today, while also comparing to linear search
3:55 I feel like this has something to do with why Facebook marketplace has abysmal sorting features.
2:19 I always confused with "ty" and "teen"
We need a full blown video about bartender's elbow
I just recently discovered his "how to read in binary" video. The sequel was quick.
0:40 Nice box number you got there, such an interesting number...
*Indexes
Indizes = multiple registers
Indexes = multiple indexes in a database
Tought by my university professor in Germany
Thanks to Melvil Dewey we don't need to ask the librarian where a specific adult book is.
If I had to make a search function it would be by relevance what is trending would be at the forefront in the RAM. But what we have now will suffice. A good project would be able to access anything in data centers in an instant or the closest thing to that.
This was a brilliant explanation. So clear to understand binary search and how it is superior to linear search. Thank you!
Tom, I do not know how you find the time and effort to produce these videos but they are brilliant! Top one. Nice one. Get sorted.
I learn more on the internet than I did at school.
Hi Tom. having a pedantic moment. The computers speed does not change. What has been achieved is a faster algorithm that provides results quicker. The poor old computer is non the wiser, it still runs at its designed speed regardless of what is being executed.
Nice reference to Pixar and a113!
I prefer 'indexes' for a plural
Also, congrats on being in a Pixar movie!
Corona brought me here and thank god for that
god, i only JUST noticed the Doctor Who reference at 0:32
Nice Pixar reference
Nice summary
Those booktitles! 😁 Also, Bos A113, Lux Foundation 😁
For a short while I thought, waw great green screen technic! The even added reverb. Then I saw the reflection on the disc stack. ;-)
It's quite simple: Database indexes contain the indices of the data, but in a different order, so you can quickly search the index to find the index of the original data you wanted.
Love your channel bro
"don't you know the dewey decimal system" conan the librarian
How about bit shifting vs divide?
Quake 3's fast inverse square function is a fun one.
1:43 and onwards untill the music at 1:56, it felt like the video was over.
Excellent video! If you know the story, it can also be a good way to search for a good bit in a well-thumbed book.
I also need to check out that book by this "Marler" fellow...
hmmm fresh content, thank you
YAY !!! MORE BASICS!!!!!!!!!!!!!! YAHHHHH I AM SO SO SO SOS OS SO OSSOOSOSOSOOSOSO HAPPY (I may need the basics of spelling , grammar and when caps should be used .)
I love that the red book chosen is RedShirts by John Scalzi.
I will use the Dewey Decimal System to find books on sexual harassment.
there is actually a third kind of search:
Windows search. It takes your query, scraps it, returns 6 random database entries and 2 ads. Then it recursively shuffles the deck to look busy until you close it in frustration.
Are you talking about the search bar in Windows 10?
This is the almost the only way I launch apps/programs and how I convert currencies and timezones and translates words and lookup words definitions and lookup general info about movies/shows/actors. And it works perfectly for these cases.
I've never actually used it for searching for files, because I always know where exactly my files are 😅
Pop os
@@HassanSelim0 It works fine for sure, but there are many times where it plain sucks - So my idea why its bad it because of its index/priority system - Like it should value .exe files or important app's the most, but often I haft to spell out the entire thing before it even suggest it, or other issues. Its nothing like googles auto fill for sure, but its also based on high-machine learning and spying on us...Something MS search bar also does, but not as good as google textbox I figure.
@@HassanSelim0 it refuses to find programs for me too btw. ive tried for a very long time to launch OBS from the search bar, until i gave up and just added another desktop icon
@@HassanSelim0 How do you do "convert currencies and timezones and translates words and lookup words definitions and lookup general info about movies/shows/actors" with windows search?
Tom did what I think most people should do about terrible people in history: acknowledge their work, while making sure the world remembers they were terrible.
Well said. All this tearing down of statutes because the subject's actions don't fit today's morality has to stop.
Cry me a river hippie. Go back far enough and everyone was “terrible” based on modern weak minded standards.
@@jangamaster8677 People like you wouldn't survive a day without "modern weak minded standard"
@@jangamaster8677 so sexual harassment is a modern weak minded standard?
Omar B. Kar not exactly his point... calm down guys
this is my new favourite pixar movie
Wait..the a113.... Nice one.
I'm temped to reply, because I like your videos and wanna be noticed for a second, and you did the same thing
okay, but thanks for stealing my comment Seamus 🙄
Note the box is from the Lux Foundation Library
Rupert You realise the graphics guy did that so people, not just you, would find it? It’s not like it’s extremely hidden, a lot of people noticed.
For those that are still struggling with understanding binary search: Imagine telephone books not having the names in sorted order, and you have to look to ALL the names until you find the right one. That would be so unpractical. Instead, the entries are ordered by name, and you just keep skipping over a couple of pages until you overshoot, and then go back and look closer. That is the kind of magnitude we are talking about.
I understood it, but this is a good analogy.
What the heck is a telephone book? You mean ebook.
ThePotaToh yellow pages
To realize I'm this old to remember scouring names in Yellow Pages and this action helped to understand binary research even after watching Tom's video is.....
unbelievable (say it like MrWhoseTheBoss).
Jk, good analogy. Thank you!
@@ThePotaToh your too young. Think of "contacts" list
Binary search for Tom Scotts' age. Results: Somewhere between 15 and 50
lmaoo
35.
@Hunter The Based God Burn
The speedy secret of the google algorithm is ignoring as many search terms as possible "unless" "you" "do" "this" "forever"
Seriously! Is it just me, or has Google search massively gone downhill over the last decade? It's just broken auto-answers, AMP pages, ads, and the same dozen or so top results that come up no matter how you tweak the search query.
Leo It's probably search engine optimization catching up to current algorithms. Although a search engine's job is to get you what you want, those who want you to browse nonsense you don't want (e.g. clickbait headlines) aren't into that.
especially annoying that google ignores keywords like "not". Its really hard to search for thing like "shirt that is not red" as google just gives you pictures of tom scott.
@@real_dddf While Google does not treat "not" as a keyword it does allow you to exclude words using the - operator. A search for Shirt -red for instance will only show results where red is not mentioned.
@@real_dddf there are options for this. if you put a '-' in front of a word, it won't show results with this word. try searching for "Tom scott the park bench" and "-Tom scott the park bench" and see how the results differ (without the " ) (or in your case "shirt -red". The non-advertisement results are all not red)
Dictionaries in some languages actually have a second indexing system. In Chinese for example, since the written character does not necessarily relate to its pronunciation, dictionaries often have a by-stroke index at the end. (The main index, i.e. the order characters appear in, is ordered by the latinization of the characters' pronunciations. There are multiple latinization schemes, but dictionaries will usually pick one depending on the region.) This way, if you know how a character is written but don't know how it's pronounced or latinized, there's still a way for you to find it.
Cool
Today, Hanyu Pinyin is the standard in all Sinophone countries. Older books might still use one system or another, such as Wade-Giles, Yale, etc. but today most won't stray from Pinyin
That is, unless it's Taiwan. The Zhuyin alphabet, developed specifically for Chinese under the Republic of China, is still widely used for how intuitive it is
The volume called "Red Shirts", one of Tom's favourites.
Also a real book, one of my favorites I've read!
Fun fact: Isaac Asimov had at least one book in every main section of the Dewey Decimal System.
I can’t wait to read the one on history of Australia /s
@@AlexTheGamer97 spoiler alert: a lot of get killed
Sneaking around slipping copies on the shelves when the librarians' backs were turned.
@@AlexTheGamer97 You will never find it. It was stolen ;-)
@@666Tomato666 as compared to any other colonial country?
Makes a book about how to order a library
Librarians:
where does this go then?
025 Library operations
Rick Astley is president check on my channel
Yo Dewey, we heard you wanted to sort books about how to sort books, so we put a section for books about sorting books in your method for sorting books so you can search for books about searching for books.
@@HeavyMetalMouse wasn't it yo dawg? Nevertheless, you just posted meme props to you gent sir.
Bertrand Russell has entered the chat
"I don't remember the title, but I know it was red"
_pulls out book called redshirts_
Hah, it's cause Tom wears red shirts.
_By John Scalzi_
Wait, I know that author, I've read his books.
OH. I READ _THAT_ BOOK.
omg I didn't even notice that! Fantastic book that. And I don't even watch Star Trek!
I dont get it
@@Vocaloid_Cevirmeni i don’t know what the exact reference is, but there’s probably a real book called redshirts by John Scalzi
@@kwibloupthesomething There is
I already know this from the education system, but I still feel like I’m learning when I watch this
I bet you dont study in a public school in the us. Just a feeling
It's a type of storytelling and science communication.
@@Tomer_Zaitsev I learned this in public school
Tomer Zaitsev No, a public school in the UK actually
@@seastilton7912 Were you perhaps from the era when they had BBC Micros in the schools?
And how binary makes Tom's videos get put in everyone's recommended much much faster
What???????
Yay!
Nothing against it...
Box A113! Can’t skip that past me!
From the Lux Foundation Library, no less
I rewound the video because I didn't notice, then when I saw I giggled
saw that too haha
0:40
Can anyone say why it's such a significant number?
The day Tom changes t-shirt will be the day the end is near
Lmao
This guy is a computer scientist, philosopher, artist, musician, and incidentally a RUclipsr.
+ linguist
+ game show semi-finalist
@@KoyasuNoBara The most important one
+ grumpy emoji expert
+ ex-politician
Ar-r-r!
2:09 there's also exponential search (start at the lowest index, double it until you've passed the thing you're looking for, then perform a search (usually binary search) between where you ended and the index you looked at right before). It's used when you don't know how many items there are.
To be fair, Google is amazing how it does things. However, a lot of the searches will be cached and returned for similar searches, and they have thousands of servers, so people access different ones. Still amazing tho given the sheer scale of it all!
I can't imagine they just make it snappier by some engineer saying "ah stick an index on that column it'll be better" :D
wrong
Tom: “It’s a long, long way...”
My Brain: “TO BA SING SE”
Fantastic video as always Tom!
There's no war in Ba Sing Se.
they look so prettayyyy
to mukumbura!
We could always take the Secret Tunnel.
@@starlightsall Through the mountain? *(But that would get us to Omashu)
Now that Arthur Library Card song makes sense to me finally I know who Dewey is!
DW: WHO IS DEWEY
Tom: Terrible person. Absolutely awful.
The secret sauce to Google indexing is the same as Dewey's system : inherent built-in biases liberally sprinkled with revolutionary thinking.
Using GaGa for a search is like doing "research" using the yellow pages. The best plumber must be "AAAAAAAA111 PLUMBing'
JR
"How bad do you have to be to be kicked out in 1905?" Is literally so funny I had to pause the video.
"if it's unordered, there are no shortcuts - linear search is the best you can do"
quantum search: allow us to introduce ourselves
Well thats cheating
I guess, you could turn a search algorithm into a Quantum Annealing problem, which would make it work on a D-Wave QC. So that would mean you can handle an search of *drum roll* 2048 items! A general purpose QC would do a lot better on this problem, as it can actually work with qubits representing an index, rather than a bit mask, but the largest practical search algorithm I've seen was with 10 qubits, making it work as a search in an index of just 1024 items, which is even worse. We might have pushed it a bit further, I'm a bit out of date, but the point is, theoretical work on quantum computing greatly outpaced what we can do in practice. For a QC to replace indexed binary search and use unindexed data directly, you'd need 30-40 qubit general purpose QCs. And there are some really hard problems, both engineering and theoretical, that prevent us from pushing a qubit count that high on general purpose QC.
@@konstantinkh bro just solved quantum computing in a RUclips comment
@@michaelba7791 Not sure I'm following. I'm explicitly pointing out why practical quantum computing we have isn't adequate. And my background is quantum theory, albeit, in particle physics rather than computing. Though, I have been out of academia for nearly a decade now.
@@konstantinkh they were being sarcastic because they hardly understand what you're saying, most likely
I like how this animator pays homage to A113
Sjoerd Quaak And Doctor Who series 4 on the same box
Dewey was a bad, google is a bad.
Conclusion: indexing makes bad people.
“Was a bad”? Why are you talking like a goddamn toddler?
@@Jayfive276 sounds like you are also a bad.
@@Dunkster74 Wee aw all dhe baad