PCY (Park-Chen-Yu) Algorithm with Solved Example | Big Data Analytics |

Поделиться
HTML-код
  • Опубликовано: 17 янв 2025

Комментарии • 58

  • @ashishbalti6757
    @ashishbalti6757 Год назад +7

    Your channel is underrated, concept was nicely explained with good ppt presentation.

    • @ataglanceofficial
      @ataglanceofficial  Год назад +2

      Heyy.... Thanks a lot for appreciating 🙏🏻❤️ Please Support my channel by hitting Like, Subscribe and please Share with your friends! 🙌🏻✨

  • @nicenice2442
    @nicenice2442 Год назад +12

    Finally I got a video which explains the PCY topic very nicely. I sense the entire Big Data Playlist is so interestingly made❤

    • @ataglanceofficial
      @ataglanceofficial  Год назад +2

      Glad to know that you liked my Playlist and videos! Please stay connected to my channel and Share it! 💖

  • @SohaAhmedMohamed
    @SohaAhmedMohamed 7 месяцев назад +1

    Great as usual. The numerical examples is incredibe

    • @ataglanceofficial
      @ataglanceofficial  7 месяцев назад +1

      Thankyou so much for appreciating ❤️🙏 Please Share it with your friends!

  • @yeonjin8
    @yeonjin8 Год назад +5

    No, like why is this so underrated 😢, i loved this, you cleared the concept, i hope the numerical comes in exam though lol

    • @ataglanceofficial
      @ataglanceofficial  Год назад +2

      Thankyou very very much for your appreciation ❤️ It really means a lot😁 Please do a favour by sharing it with your friends 🙏🏻 Best wishes for your exams hope this numerical will come for sure and you will crack it🔥

    • @yeonjin8
      @yeonjin8 Год назад +1

      @@ataglanceofficial the paper was all theoretical but mine still went well:)

    • @ataglanceofficial
      @ataglanceofficial  Год назад +2

      Woahh that's great! 🥳 Amazing!

  • @yonderbobcat211
    @yonderbobcat211 Год назад +6

    Very good presentation

  • @bhavishahadiyal7836
    @bhavishahadiyal7836 Год назад +3

    Your content are best✨

    • @ataglanceofficial
      @ataglanceofficial  Год назад +2

      Thankyou so much for your appreciation ❤️🙌🏻 Please Like, Subscribe and share with your friends😊

  • @ImmortalPlayz
    @ImmortalPlayz 2 месяца назад +1

    amazing just amazing wow...

  • @sarthaktamhankar
    @sarthaktamhankar Год назад +5

    10:00 Why are you using the condition of item count less than 1 when the given threshold is 2?

    • @ataglanceofficial
      @ataglanceofficial  Год назад +1

      Heyyyy! Look this is an algorithm used for Big Data and in Big Data there might arise a case where any redundant or useless item may be present which is actually not a part of any transaction given. So just to check the validity of it, as per the algorithm it checks first whether the count is atleast 1 for every unique element in the transactions. I hope that helps you❤️ Please Share it with your friends😊 Best wishes for your exams! 💕

    • @sarthaktamhankar
      @sarthaktamhankar Год назад +1

      @@ataglanceofficial okay thanks

  • @cleanbold4967
    @cleanbold4967 Год назад +2

    Very very nice tutorials ❤️❤️💫💫🔥🔥😍😍

    • @ataglanceofficial
      @ataglanceofficial  Год назад +1

      Thankyou soooo much for your kind words! Please Subscribe and Share it with your friends 🙏🏻 Best wishes for your exams 💕

  • @ATothFTW18
    @ATothFTW18 Год назад +2

    Great explanation! You are a fantastic teacher! One question, as I don't have much background in computers, a pass is when the data is read to the main memory (it is not shown in the main memory diagram but I'm guessing this is a given)? Then in between the passes the data is deleted out of main memory? Is it removed from main memory after the item counts are made in order to make room for the Hash Table pairs? Then the item count is reduced to frequent items and the Hash table to a Bitmap in order to make space to reload the dataset to main memory?

    • @ataglanceofficial
      @ataglanceofficial  Год назад +1

      Thankyou for your great compliment❤️
      You really asked a good question. So look,
      1. Yes, you're right that a "pass" typically refers to reading data into main memory, which is an essential step in the PCY algorithm. While it might not be explicitly shown in the main memory diagram, it's indeed a given.
      2. Data is not immediately deleted from main memory between passes. Instead, it's maintained in memory for multiple passes. The primary goal is to count the item pairs that occur together frequently during these passes.
      3. After the item counts are made and you've identified frequent items, you don't necessarily remove all the data from main memory. Instead, you keep the necessary data for building the Hash Table, which is used to identify frequent item pairs.
      4. You're right about reducing the item count to frequent items and converting the Hash Table to a Bitmap. This process aims to save memory space by only focusing on the frequent item pairs. The goal is to make room for the next dataset to be loaded into memory.
      Hope this helps! :)

  • @anshumansharma7225
    @anshumansharma7225 7 месяцев назад +1

    ❤️🚀

  • @paramatrix8813
    @paramatrix8813 Год назад +1

    Great video man thanks!❤️

    • @ataglanceofficial
      @ataglanceofficial  Год назад +1

      Thankyou so much for appreciating🙏🏻😇 Please Share it with your friends ❤️ Best wishes for your exams🔥

  • @tabishmomin5148
    @tabishmomin5148 Год назад +3

    Nice ❤

  • @hamidzaki1971
    @hamidzaki1971 Год назад

    Finally got better results comparing other content

    • @ataglanceofficial
      @ataglanceofficial  Год назад

      Thankyou so much for appreciating 🤗 Please Share it with your friends ♥️ and also with your juniors too✨

  • @tshorts4923
    @tshorts4923 Год назад +1

    Sir , i dont understand the use of hash function when we are not even using it to check the candidate key . You were only checking the thersode wven in the last step.
    Sir please clearify my doubt please sirrrr

    • @ataglanceofficial
      @ataglanceofficial  Год назад +1

      Heyy, I would request you to watch the full video carefully.. you will understand why I did so! Thanks for watching ☺️

  • @yashgarg950
    @yashgarg950 Год назад +2

    Sir i have two doubts:
    1) In Step 1,we have to remove elements with frequency less then 1 or we have to remove elements with frequency less than threshold value ?
    2) Is the hash function is always fixed i.e.(i*j)%10 ?
    sir plse reply as soon as you see the comment..my exams are nearby !!

    • @ataglanceofficial
      @ataglanceofficial  Год назад +1

      Answer to your doubts:
      1) remove elements with a frequency less than 1, because it's an algorithm used to process "real time data" hence you need to check this condition also this step is just to calculate supports of every product.
      2) Hash function is not fixed it can be any but it is generally chosen by considering opinions from domain expertise and dataset. My suggestion: choose hash function in such a way that you get different buckets for different products.
      Please Share it with your friends ♥️

    • @yashgarg950
      @yashgarg950 Год назад +1

      @@ataglanceofficial
      sir what if ..suppose we are using hash function (i*j)%10 and we got same value for two buckets ?

    • @ataglanceofficial
      @ataglanceofficial  Год назад +2

      It's okayyy to get the same values of bucket for more than one product.. I just told you a suggestion..
      You can go for both

    • @yashgarg950
      @yashgarg950 Год назад +1

      @@ataglanceofficial
      ok thanks sir for clearing my doubt and also for giving instance replies.....Thanks a lot sir 🙏

    • @yashgarg950
      @yashgarg950 Год назад

      @@ataglanceofficial
      one more doubt sir....sir as you told that we need to remove elements with frequency less than 1 in step 1...that means element with frequency 0....it means that the element will never present in the transaction ??

  • @vatsalshah1548
    @vatsalshah1548 2 месяца назад

    This seems wrong, see technically we are not even using the hash function to find C2 we are directly finding L2 by counting the pairs directly.
    The point of PCY was that we reduce C2 so only those pairs in C2 have their count to be calculated.
    If suppose we have millions of transactions, then we arent going to find count of all pairs, thats exactly what were trying to not do

    • @vatsalshah1548
      @vatsalshah1548 2 месяца назад

      Expecting an explanation from your end at a glance, btw salute to you for the videos, helped me a lot

  • @sarmad_ali
    @sarmad_ali 9 месяцев назад

    How to construct hash functions accordingly ???

  • @SamruddhiChavan_A
    @SamruddhiChavan_A Год назад +2

    1,4 has no pair

    • @ataglanceofficial
      @ataglanceofficial  Год назад +1

      Heyyyy! In transaction T3 and T4 you can find (1,4)

    • @prathameshsablepatil9504
      @prathameshsablepatil9504 Год назад +1

      So should we also consider pair who are not next to each other ?? Like in T4 1 and 4 are not next to each other

    • @ataglanceofficial
      @ataglanceofficial  Год назад +2

      Yes yes you are correct.... You can consider it like that✨
      Hope it helps you! Thanks a lot for watching.... Please Share it with your friends ❤️ Best wishes for your exams 💕

    • @SamruddhiChavan_A
      @SamruddhiChavan_A Год назад +1

      @@ataglanceofficial plz check again

    • @ataglanceofficial
      @ataglanceofficial  Год назад +2

      Heyy look it doesn't mean that always the pair has to be side by side in the transactions. You can create your combinations from each transaction. Now in T3 we have 1,4,5 so we can have pairs [(1,4),(1,5)< (4,5)] similarly in T4 we have items as 1,2,4 hence we can make pairs like this [(1,2),(1,4),(2,4)]
      Now out of these two transactions T3 and T4 we get the count of pair (1,4) as 2 since it is repeated in both the transactions🙌🏻 Hope now this will definitely clear your doubts! 💕