Moving 1/3 of a Heap

Поделиться
HTML-код
  • Опубликовано: 4 дек 2024
  • НаукаНаука

Комментарии • 14

  • @mooremike44
    @mooremike44 4 года назад +3

    Glad to have the videos back.

  • @zanonilabuschagne7628
    @zanonilabuschagne7628 4 года назад +6

    Hey Bert. Thanks for this, brought back a bunch of unhappy memories :)
    My suggestion... Print the entire heap table to paper. By the 3rd box of paper the customer had to pay for, he will realize it will be cheaper to give you an OLAP instance and let you query from there >

    • @zanonilabuschagne7628
      @zanonilabuschagne7628 4 года назад +1

      p.s. it was on this scenario I learnt how much slower "OR" operators are than "AND" operators... life lessons :)

    • @jerichaux9219
      @jerichaux9219 7 месяцев назад

      @@zanonilabuschagne7628Indeed they are! I just showed a newer colleague yesterday about the speed differences in OR and an alternative (though not AND in this case).
      I had a fairly complex query written out, and we ran it a few times. Every time it executed in ~700ms.
      -- CteNonUniqueIDPair:
      -- ([ID1], [ID2])
      -- 5 rows
      -- Tbl:
      -- ([ID1], [ID2], …[21OtherCols])
      -- 130,000 rows
      I commented out this logic:
      -- ==========
      /*
      WHERE (
      NOT EXISTS (
      SELECT TOP (1) 1
      FROM CteNonUniqueIDPair AS “Cte”
      WHERE (
      Cte.[ID1] = Tbl.[ID1]
      )
      )
      AND NOT EXISTS (
      SELECT TOP (1) 1
      FROM CteNonUniqueIDPair AS “Cte”
      WHERE (
      Cte.[ID2] = Tbl.[ID2]
      )
      )
      )
      */
      -- ==========
      And replaced it with something that LOOKS like it would be more performant:
      -- ==========
      WHERE (
      NOT EXISTS (
      SELECT TOP (1) 1
      FROM CteNonUniqueIDPair AS “Cte”
      WHERE (
      (Cte.[ID1] = Tbl.[ID1])
      OR (Cte.[ID2] = Tbl.[ID2])
      )
      )
      )
      -- ==========
      And we executed it a couple of times. Now, the execution time had spiked from ~700ms to ~1900ms! It was nearly three times as slow!

  • @richardsmith2583
    @richardsmith2583 2 года назад

    Hi Bert. I had similar with a 3rd party audit table. Then whilst looking for a key noticed a lot of duplicates (> 60% !). Job step 1) inc control table (year) 2) copy a year of data into a child 3) remove that year from parent 4) dedup child 5) copy smaller child back into parent. And left the children to be manually deleted at the end (just in case). Took a 12 nights.

  • @djmoosecom
    @djmoosecom 4 года назад

    Glad to have the videos back. I would have talked it over at the next volleyball game.

  • @vklude
    @vklude Год назад

    Also often need to go with this approach 😅 manual job but it works

  • @MiningForPies
    @MiningForPies 4 года назад +1

    Sometimes you just have to be pragmatic and do it in a loop. I probably would have done it on a day to day basis but had 5 or 6 connections to the database.

  • @alexmcqueen5563
    @alexmcqueen5563 4 года назад +1

    I would have cried

  • @oresteszoupanos
    @oresteszoupanos 4 года назад

    I'm wondering if something like a window function would have helped.. So figure out ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) for each row and then do a modulo 3 to pluck out every 3rd row..?

  • @1bhud1
    @1bhud1 4 года назад

    Hey Bert just curious did you build your query out using a WHILE loop? I'm fairly new to learning SQL & would like to practice this scenario on some sample data.

    • @1bhud1
      @1bhud1 4 года назад

      @@DataWithBert O even better, thank you!

  • @adamali-jr6ye
    @adamali-jr6ye 4 года назад

    Did you try to put this logic into python and run async ? I think i would take less time to copy the entire table because it will be copied parallelly