I Made a FAST Search Engine

Поделиться
HTML-код
  • Опубликовано: 20 май 2024
  • Get $15 free credits with BrightData: brdta.com/conaticus1
    BrightData RUclips Channel: @BrightData
    TF-IDF Blog Post: janav.wordpress.com/2013/10/2...
    Lemmetization Word Lists: github.com/michmech/lemmatiza...
    Crawler Repository: github.com/conaticus/search-e...
    API Repository: github.com/conaticus/search-e...
    Client Repository: github.com/conaticus/search-e...
    Discord: / discord
    Github: github.com/conaticus
    Twitter: / conaticus
    Join this channel to get access to perks:
    / @conaticus
    I Made a FAST Search Engine
    0:00 Intro
    0:20 BrightData
    2:10 Inverse Term Frequency & Indexing
    6:41 Page Ranking & Lemmetization
  • НаукаНаука

Комментарии • 169

  • @conaticus
    @conaticus  Месяц назад +39

    Start building awesome projects with $15 free credits using BrightData today: brdta.com/conaticus1

  • @lifeofme702
    @lifeofme702 Месяц назад +283

    I don't know what this guy said, and still was mind-blown of all the effort this guy puts

    • @conaticus
      @conaticus  Месяц назад +17

      Thanks much so 🙏 It would not be possible without your support

  • @jaymarksum6542
    @jaymarksum6542 Месяц назад +254

    I’m impressed, can’t wait to see you build a multithreaded web server in assembly

    • @da40au40
      @da40au40 Месяц назад +8

      Why do I find it super funny 😅😅😅.

    • @ArthursHD
      @ArthursHD Месяц назад +2

      @@da40au40 Me too :D

    • @DanskeCrimeRiderTV
      @DanskeCrimeRiderTV Месяц назад +2

      it's not impressive. Of course querying a few hundred or even hundred thousand web pages isn't as complicated or slow of a task than querying trillions of webpages.

    • @KibitoAkuya
      @KibitoAkuya Месяц назад

      ​@@DanskeCrimeRiderTV google also wastes time deciding wether you are allowed to see or not certain sites

    • @DanskeCrimeRiderTV
      @DanskeCrimeRiderTV Месяц назад +1

      @@KibitoAkuya what does that have to do with anything? Google is still faster at querying trillions of results than this.

  • @coderx8634
    @coderx8634 Месяц назад +27

    Love your content. You and your quality have really improved. Keep it up ❤

    • @conaticus
      @conaticus  Месяц назад +2

      Thanks so much, your support means a lot ♥

  • @asm_x86
    @asm_x86 Месяц назад +68

    That's really impressive, I can't even figure out how to run it.

    • @ZuperPotato
      @ZuperPotato Месяц назад +9

      Nice username

    • @conaticus
      @conaticus  Месяц назад +17

      Just added some instructions to the READMEs if you're interested :)

    • @asm_x86
      @asm_x86 Месяц назад +4

      @@conaticus thanks, I'll do that

  • @greensporevalley
    @greensporevalley Месяц назад +386

    SERBIA MENTIONED 🎉🎉🎉

    • @europa_the_last_battle
      @europa_the_last_battle Месяц назад +12

      Now waiting for Russia 🥰

    • @RealMephres
      @RealMephres Месяц назад +15

      ​@@europa_the_last_battle>goes to comments
      >sees meme comment
      >looks at replies
      >only a LARPer replied
      lol

    • @MAXHASS-ph5ib
      @MAXHASS-ph5ib Месяц назад +19

      @@RealMephres this aint 4chan nga

    • @jawadmansoor6064
      @jawadmansoor6064 Месяц назад +1

      that name rings a bell, maybe from some kind of Serbian movie?

    • @RealMephres
      @RealMephres Месяц назад +4

      @@MAXHASS-ph5ib tell that to the LARPer dawg

  • @ccost
    @ccost Месяц назад +58

    7:40 flashing those questionable websites in a sponsored video is quite the move

  • @coderan5029
    @coderan5029 25 дней назад

    This is basically what we learned in my big data class, but we used map-reduce to do the TF-IDF calculations, so it's impressive you figured this out on your own

  • @rafaelpereiracoias1047
    @rafaelpereiracoias1047 Месяц назад +1

    Nice video and nice code, keep up the good work!

  • @ExpandedCuber
    @ExpandedCuber Месяц назад +5

    Let's go another conaticus video

  • @foqsi_
    @foqsi_ Месяц назад +2

    Love this dude and his video projects

  • @GermanTimecrafter
    @GermanTimecrafter Месяц назад +1

    such a cool video! i love the way how you explain what you are doing :)
    random question but what is your editor font?

    • @conaticus
      @conaticus  Месяц назад

      Appreciate it :) I'm using Jetbrains Mono it's free to download

  • @MySachincool
    @MySachincool 15 дней назад

    Subscribed & notifications on :)
    you deserve more recognition bruh

  • @polyshrub
    @polyshrub Месяц назад +2

    This is very impressive, what was the size of the database when indexing is finished? Seems like it would be quite big

  • @turb0004
    @turb0004 Месяц назад +1

    Please finish your file explorer in rust fully, because the idea of it is awesome. Love your videos, content is very engaging 🎉

  • @iritesh
    @iritesh Месяц назад

    Awesome effort ✨

  • @6IGNITION9
    @6IGNITION9 Месяц назад +6

    filter out JS for another 10x bandwidth savings
    alternatively use an adblocker. (can puppeteer do that? It's just chromium right?)

  • @R_Y_Z_E_N
    @R_Y_Z_E_N 28 дней назад +1

    Google also does the same but with disstributed computing to reduce the overall time .
    Just scale the database horizontally and mimic googles apporach

  • @madalenaferreira3018
    @madalenaferreira3018 26 дней назад

    great video, gave me ptsd from my information retrieval class though

  • @allenfpascua
    @allenfpascua Месяц назад

    Super good editing 🫡🫡🫡🫡

    • @conaticus
      @conaticus  Месяц назад

      Would not possible with your breathtaking animations 😄

  • @stayhappy-forever
    @stayhappy-forever Месяц назад +2

    thats insane, hows this only at 12k views

  • @Nerdimo
    @Nerdimo Месяц назад

    Impressive, seriously!

  • @alexmoses3215
    @alexmoses3215 12 дней назад

    Programming 🤝 martincitopants…match made in heaven

  • @a6gitti
    @a6gitti Месяц назад

    Supa dope. I would like to use this search engine of yours

  • @devinlauderdale9635
    @devinlauderdale9635 Месяц назад +32

    The problem is this approach is susceptible to SEO spamming/invisible SEO keywords

    • @conaticus
      @conaticus  Месяц назад +10

      Yeah for sure, realistically it should be moderated based on user interaction as well

  • @SG-kn2jl
    @SG-kn2jl Месяц назад +5

    Why did you choose TF-IDF instead of word2vec or any context aware model?

    • @skorp5677
      @skorp5677 Месяц назад +1

      +1 Woule like to know

  • @80sVectorz
    @80sVectorz Месяц назад +1

    3:07 Best pronunciation of Euclidean I have every heard :P

  • @errplane_
    @errplane_ Месяц назад +5

    oh my fuck i saw this on your github last night

  • @jugurtha292
    @jugurtha292 Месяц назад +5

    very nice, built something similar for my info retrieval class. we have to use okapi bm25 formula for the ranking but overall very similar. scrape, tokenize, parse, inverted index, rank

  • @dreamsofcode
    @dreamsofcode Месяц назад +11

    🔥🔥🔥

  • @user-xl2om2up2x
    @user-xl2om2up2x Месяц назад +2

    W ad plug, it's 100% relevant and actually necessary to fulfill the premise of this vid.

  • @maksymilianglowacki1409
    @maksymilianglowacki1409 Месяц назад

    is this engine oneline or ( wouldt it be abel to be oneline for otcher users ) so otcher also coulst enjoy it?
    or was it dust a peak or somthing you made cuz ( you where bored or smt )

  • @gaimnbro9337
    @gaimnbro9337 Месяц назад

    Nice job :D

  • @jsalsman
    @jsalsman Месяц назад

    I believe it's "inverted indexing", as inverse indexing is something else.

  • @yorailevi6747
    @yorailevi6747 Месяц назад

    how much did you pay for the web scraping service in total?

  • @lonelybookworm
    @lonelybookworm Месяц назад +3

    Well of course it is very fast, it only has like 200 websites

  • @thekwoka4707
    @thekwoka4707 Месяц назад

    How much did the scraping cost if it wasn't free?

  • @MortonMcCastle
    @MortonMcCastle Месяц назад

    Good! The world needs a new Google Search, one that's more like how it was in the 2000s.

  • @mahrezjanati3426
    @mahrezjanati3426 Месяц назад

    first time watching a vid of yours ...
    i have one question : why are you vibrating ??

    • @-rate6326
      @-rate6326 Месяц назад

      Cause he is vibrator

  • @datainsight1724
    @datainsight1724 Месяц назад

    Next time use the Common Crawl dataset ;)

  • @HyperCodec
    @HyperCodec Месяц назад +1

    Bro managed to memleak in js

  • @ethanstewart1011
    @ethanstewart1011 26 дней назад

    How did you manage to get a node.js memory leak??

  • @binpersonal
    @binpersonal Месяц назад +1

    "some fucking genius" lmao

  • @TheRealMangoDev
    @TheRealMangoDev Месяц назад

    good vid

  • @gopallohar5534
    @gopallohar5534 20 дней назад

    ain't see rust there!

  • @joenutt1232
    @joenutt1232 Месяц назад +3

    Create your own database engine for shits and giggles

  • @carlitosdummy
    @carlitosdummy Месяц назад

    i love this channel

  • @animeworld4775
    @animeworld4775 Месяц назад

    what is things that i should to know or learn to create like these projects

    • @GONDWANA-de4od
      @GONDWANA-de4od Месяц назад +1

      HTML for website creation
      CSS page designing
      Javascript for making website dynamic and for backend
      SQL for indexing
      Rust for fast backend services

  • @larry_berry
    @larry_berry Месяц назад

    Lol. Got notif after clicking the video.

  • @SlimyFrog123
    @SlimyFrog123 Месяц назад

    Now make your own email system to go along with it. 😉

  • @lazarusNoob
    @lazarusNoob Месяц назад

    You should host it

  • @gammongaming9081
    @gammongaming9081 Месяц назад

    yk what would be funny? making the slowest search engine possible without like halting the program for a set time, just with maths

  • @Raven-fu1zz
    @Raven-fu1zz Месяц назад

    Remember, never return an over 18 site without an over 18 word in the search request

  • @gamedirection_us
    @gamedirection_us Месяц назад

    🍎 👀
    .. Apple being like "when will it be ready?".

  • @AquaQuokka
    @AquaQuokka Месяц назад +19

    Rewrite your genetic code in Rust.

    • @pyyrr
      @pyyrr Месяц назад

      i would rather be bug free so i will pass

  • @fangg194
    @fangg194 Месяц назад

    you seem ok

  • @igrb
    @igrb 27 дней назад

    nice

  • @Tech_Code127-76
    @Tech_Code127-76 Месяц назад

    Good

  • @monotonedevelopment
    @monotonedevelopment Месяц назад +1

    If only windows file explorer could do the same

    • @SandWire
      @SandWire Месяц назад +1

      For this we have thing named Everything :)

  • @a224kkk
    @a224kkk Месяц назад +1

    Nice, you re-invented the lucene library

  • @J0Y22
    @J0Y22 Месяц назад

    shockedd

  • @thescratchguy428
    @thescratchguy428 Месяц назад

    at a desert

  • @_DarkLiquid
    @_DarkLiquid Месяц назад +1

    discord clone when

  • @playtatus1758
    @playtatus1758 Месяц назад

    how do you edit your vids

    • @conaticus
      @conaticus  Месяц назад

      Allen uses adobe after effects for the amazing animations - I just use Davinci to cut things up 😁

    • @playtatus1758
      @playtatus1758 Месяц назад

      @@conaticus ok thx

  • @humanontheinternet6510
    @humanontheinternet6510 13 дней назад

    Auto solve captcha you say🧐

  • @ALTERRAa8
    @ALTERRAa8 Месяц назад

    6:08 nahhhhhhhhhhh whats bro even searching 💀💀💀💀

  • @Serhii_Volchetskyi
    @Serhii_Volchetskyi Месяц назад

    🔥🔥🔥
    I was looking for that algorithm and didn't know its name.

  • @Xanmattauri
    @Xanmattauri Месяц назад

    @google acquire this man

  • @trolIface_
    @trolIface_ Месяц назад +1

    hub 🎉🎉

  • @v037_
    @v037_ Месяц назад

    I found a worthy opponent

  • @monkshee
    @monkshee Месяц назад

    damn

  • @Ayymoss
    @Ayymoss Месяц назад

    MAKE LONGER VIDEOS

  • @dylhack
    @dylhack Месяц назад

    da goat

  • @daemonkisure2952
    @daemonkisure2952 Месяц назад

    how can i install this search engine?

    • @conaticus
      @conaticus  Месяц назад

      Instructions are on the Github repos :)

  • @neologicalgamer3437
    @neologicalgamer3437 Месяц назад +1

    Bro sounds like WilburSoot

  • @Macellaio94
    @Macellaio94 Месяц назад

    Liked and subbed

  • @sleepybraincells
    @sleepybraincells Месяц назад +3

    Why is there Rust in the thumbnail? This was written in Javascript

    • @conaticus
      @conaticus  Месяц назад +2

      Used Rust for the API and TF-IDF matching - decided not to keep in much of the footage for that as it was already explained in the animations

  • @Faeest
    @Faeest Месяц назад

    why disallow and user-agent matter? can't you just scrap everything?

    • @skorp5677
      @skorp5677 Месяц назад

      You can but it might be illegal

  • @Miluum
    @Miluum Месяц назад

    1:06 automatically solve captchas? i knew these things exist just to waste our time and energy

  • @danielisop3182
    @danielisop3182 Месяц назад

    What did u mean by the websites u shouldn’t have searched

  • @user-fj5ts6sz1f
    @user-fj5ts6sz1f Месяц назад

    rust is a real badass❤❤

  • @AhmedMahmoud-ec4kz
    @AhmedMahmoud-ec4kz 13 дней назад

    Great video 😊
    FYI: bright data is an Israeli company 😮

  • @juniordevmedia
    @juniordevmedia Месяц назад +2

    what TF is IDF ?!!

    • @neofox2526
      @neofox2526 Месяц назад

      idk man but watching it makes me feel smart

    • @jamesbarret4240
      @jamesbarret4240 Месяц назад +1

      Term frequency (the number of times a given word or so shows up in total) - inverse document frequency (the number of times it shows up in a specific document). The wikipedia article is pretty good: en.wikipedia.org/wiki/Tf-idf

  • @susannerudolph8469
    @susannerudolph8469 Месяц назад +2

    then brightdata makes captchas useless

  • @chiroyce
    @chiroyce Месяц назад

    What are the consequences of scrapings sites you aren't allowed to?

    • @conaticus
      @conaticus  Месяц назад +1

      Probably not much on its own as long as you're not violating copyright - however it is curtious not to scrape sites forbidden by the robots.txt

    • @trollinqu
      @trollinqu Месяц назад +1

      wastes their resources and yours

  • @latrapa918
    @latrapa918 Месяц назад

    105

  • @ph03n1x_dev
    @ph03n1x_dev Месяц назад +1

    You made a search engine for porn?! Thats disgusting... is it on GitHub?! 👀

    • @conaticus
      @conaticus  Месяц назад

      All open source and ready to play around with 😂

  • @_sohom
    @_sohom Месяц назад

    Make a better version of VSCode.

  • @konstantinsotov6251
    @konstantinsotov6251 Месяц назад

    we had a hackathon where we basically had to implement TF/IDF - also a search engine of a sort, but for files. we did the interface in python and all mathematics processing in C++. It would have been a fun experience if not for the time limit. we struggled really hard, on test data our solution worked faster by an order or two than most other participants, but... we somehow failed on the exam data. we failed fucking IO. and won nothing. I fucking hate hackathons since then. fuck IDF.
    also maybe this happened because i had written 75% of the code, while 4 other members did almost nothing. It was (their) responsibility to handle IO, and mine to handle mathematics and processing. I hate working in teams. I know noone cares but i might as well just burst out all of the rage I have towards that experience. once again, fuck team work, fuck hackathons, fuck my teammates, fuck everything and everyone

  • @kavinbharathi
    @kavinbharathi Месяц назад +1

    Not to be the 🤓☝️ guy, but "Jana Vembunarayanan" is pronounced 'Ja' as in 'Jarvis' and 'na' as usual. Just fyi

    • @conaticus
      @conaticus  Месяц назад +1

      Thank you, I'll do this if I ever pronounce it again 😂

  • @vrljk
    @vrljk Месяц назад

    SRBIJAAAAAA

  • @lukamajcenic1172
    @lukamajcenic1172 Месяц назад

    This is just an ad for BrightData. Compared to previous videos very low effort.

  • @deadshadow759
    @deadshadow759 Месяц назад

    this result dont make any sense xha... very fast

  • @planktonfun1
    @planktonfun1 Месяц назад +20

    Still not fast and scalable enough. The result is not even relevant, you made bing not google

    • @LaugeHeiberg
      @LaugeHeiberg Месяц назад +4

      wow really? Im also surprised one single guy didnt manage to make a product rivaling Google

  • @avi7278
    @avi7278 8 дней назад

    You need to learn how to sync up your audio and video.

  • @DanskeCrimeRiderTV
    @DanskeCrimeRiderTV Месяц назад +2

    how is this impressive? Of course it's gonna be faster. You aren't querying billions or even trillions of web pages unlike Google? So this search engine isn't even faster than Google...

    • @conaticus
      @conaticus  Месяц назад +2

      It wasn't meant to be impressive it was meant to be informative and entertaining 👍

    • @DanskeCrimeRiderTV
      @DanskeCrimeRiderTV Месяц назад +2

      @@conaticus your thumbnail implies it is faster than Google. And I believe the original title did too.

  • @FaZekiller-qe3uf
    @FaZekiller-qe3uf Месяц назад

    Disappointing

  • @FeTetra
    @FeTetra Месяц назад

    ⬛🟧 http scrape?????