If I understood correctly then , on a very basic level, choosing between index and table scan depends on how many blocks both the access methods are likely to traverse for a particular query
Hi Chris, very useful. Thank you. Just one consideration, if it is worth do add it: another factor that could affect the optimizer choice on whether to use FTs or the Index (I'm excluding PK access here) could also be the DB_BLOCK_SIZE in use? I mean, suppose you have a DB Block Size greater than 8K: say 16K or even 32K (even though this last one is more used in DWH), the Optimizer could opt, as a matter of cost, in using Full Table Scan as in few IOs you read few blocks each containing potentially more rows (since the DB Block Size is enough large ans supposing rows to search are as much as contiguous in the blocks potentially red). Could it be? I'm also excluding the handling of DB_FILE_MULTIBLOCK_READ_COUNT (in earlier releases was used to modify it). Does it make sense?
Not really. The maximum I/O size depends on the platform - by default Oracle Database determines this. This maximum is the same whatever block size you use. If you have bigger blocks, the database will fetch fewer each time. Think of it like this: say you've got buckets holding coins. Each bucket only stores coins of one value. So there's 1p coins in one bucket and £1 coins in another. Each time you get money from a bucket, the maximum value is £10. So when taking from the penny bucket, you could get up to 1,000 coins each time. From the £1 bucket you could get a maximum of 10 coins each time.
Very helpful. One question, when you say avoid using hints in production query, do you specifically say about index hints or other hints like direct path insert etc.
While you should aim to avoids hints in general, as you say there are hints that affect SQL behaviour such as append (direct path). These types of hints can affect transaction processing (i.e. the code you write), so you should explicitly code them. You should still use these rarely though. You should avoid hints that force a particular access method (e.g. index vs full table scan), join method/order or query transformations. If you're trying to get a specific plan, it's better to use SQL plan management (baselines).
Thanks for such a wonderful explanation with example. Is it possible to calculate the number of block accesses with and without index. Does last_cr_buffer_gets represent the number of blocks read? I have a table with B-Tree index with height 2, total blocks - 305, leaf blocks - 154. The clustering_factor is equal to the number of blocks. I need to compare the number of blocks reading for fetching 90% of the records. Eg. select * from table_name where pk_column > 100, in table containing 80000 records.
Thanks. In theory you can calculate the accesses needed with an index. In practice it's difficult because you need to know exactly how many index entries you'll read (and thus index blocks) and how many table block reads this requires. The easiest way is to run the query & get the plan with the buffer information. Does last_cr_buffer_gets represent the number of blocks read? Yes I need to compare the number of blocks reading for fetching 90% of the records. As you have perfect clustering this will be roughly 2 (the height) + 80% of the index leaf blocks + 80% of the table blocks. That said, a full table scan will almost always be more efficient than an index when fetching that large a fraction of the table. I wouldn't even bother trying to make the calculation.
So to make sure I understand, it sounds like you're saying that if three matching rows reside in the same block, an index-based query wil get the same block three times, thus performing triple the amount of work as a table scan (within the context of that one block). Is that right? If so, I'm kind of surprised that no hybrid solution is used, something like "when you get a block based on an index hit, also scan it for other matches avoid repeated gets". I'm sure I'm not the first genius to think of this, so there are probably reasons why this is a bad idea... any insight?
Only if the database has to fetch other rows in between these three. Say you have 9 rows with values 1-9, spread across 3 data blocks with three rows in each. Say the rows are stored in the data blocks "in order" - block 1 = 1,2,3, block 2 = 4,5,6, and block 3 = 7,8,9. Consecutive index entries match are also consecutive in the table. So to get the rows 1-3, you only need to access block one. And it only does this once, not three times. Now imagine the rows are in the blocks in this sequence: block 1 = 1,4,7 block 2 = 2,5,8, and block 3 = 3,6,9. To get the values 1-3, you need to visit all three data blocks. To get row 4, the database goes back to block 1 and fetches it again. Does this clarify it for you?
@@TheMagicofSQL Ahh I see. So the point is that the db performs gets in the order that it retrieves pointers from the index, so if several successive matches are stored in the same block, it should be able to gather them all at once, correct? How about the case where block 1=1,2,3 but your query is for, e.g., odd numbers, such that you match 1 and 3, but not the 2. Does it matter that 1 and 3 are not directly adjacent within the block? Or as long as the index finds 1 and 3 in succession, will it still be able to use block 1 all at once without needing a repeated get?
Yes, it's the order of retrieving index pointers that affects table block fetches. As long as consecutive pointers from the index reference the same data block, the database only fetches the table block once. So if you have the condition col in ( 1, 3 ), skipping over 2 as your example then the database only fetches the table data block once.
Hi Chris! Excellent video on this topic! How about the importance of single block I/O vs multiblock I/O? Is it ever a good idea to play with DB_FILE_MULTIBLOCK_READ_COUNT parameter?
I am happy that I found this channel
Very very useful. Big thanks to you Chris. I love the way you explain things.
Thanks, glad you found this useful :)
If I understood correctly then , on a very basic level, choosing between index and table scan depends on how many blocks both the access methods are likely to traverse for a particular query
Yep, you got it!
Hi Chris, very useful. Thank you. Just one consideration, if it is worth do add it: another factor that could affect the optimizer choice on whether to use FTs or the Index (I'm excluding PK access here) could also be the DB_BLOCK_SIZE in use? I mean, suppose you have a DB Block Size greater than 8K: say 16K or even 32K (even though this last one is more used in DWH), the Optimizer could opt, as a matter of cost, in using Full Table Scan as in few IOs you read few blocks each containing potentially more rows (since the DB Block Size is enough large ans supposing rows to search are as much as contiguous in the blocks potentially red). Could it be? I'm also excluding the handling of DB_FILE_MULTIBLOCK_READ_COUNT (in earlier releases was used to modify it). Does it make sense?
Not really. The maximum I/O size depends on the platform - by default Oracle Database determines this. This maximum is the same whatever block size you use. If you have bigger blocks, the database will fetch fewer each time.
Think of it like this: say you've got buckets holding coins. Each bucket only stores coins of one value. So there's 1p coins in one bucket and £1 coins in another. Each time you get money from a bucket, the maximum value is £10. So when taking from the penny bucket, you could get up to 1,000 coins each time. From the £1 bucket you could get a maximum of 10 coins each time.
Clustering Factor was a riddle to me, not any more. Thank you.
Great :)
Very helpful. One question, when you say avoid using hints in production query, do you specifically say about index hints or other hints like direct path insert etc.
While you should aim to avoids hints in general, as you say there are hints that affect SQL behaviour such as append (direct path). These types of hints can affect transaction processing (i.e. the code you write), so you should explicitly code them. You should still use these rarely though.
You should avoid hints that force a particular access method (e.g. index vs full table scan), join method/order or query transformations. If you're trying to get a specific plan, it's better to use SQL plan management (baselines).
Thanks for such a wonderful explanation with example. Is it possible to calculate the number of block accesses with and without
index. Does last_cr_buffer_gets represent the number of blocks read?
I have a table with B-Tree index with height 2, total blocks - 305, leaf blocks - 154. The clustering_factor is equal to the number of blocks. I need to compare the number of blocks reading for fetching 90% of the records. Eg. select * from table_name where pk_column > 100, in table containing 80000 records.
Thanks.
In theory you can calculate the accesses needed with an index. In practice it's difficult because you need to know exactly how many index entries you'll read (and thus index blocks) and how many table block reads this requires. The easiest way is to run the query & get the plan with the buffer information.
Does last_cr_buffer_gets represent the number of blocks read? Yes
I need to compare the number of blocks reading for fetching 90% of the records.
As you have perfect clustering this will be roughly 2 (the height) + 80% of the index leaf blocks + 80% of the table blocks. That said, a full table scan will almost always be more efficient than an index when fetching that large a fraction of the table. I wouldn't even bother trying to make the calculation.
Good presentation, to reduce high clustering factor#, do we need to reorg the table and rebuild index?
Rebuilding indexes has NO effect on the clustering factor; to change it you need to move rows in the table
@@TheMagicofSQL thanks for your reply,👍
Hello thanks for these videos,
How Bitmap index works ? If column has ‘yes or no’ values (50%) each.
I discuss the differences between BTree and bitmap indexes at blogs.oracle.com/sql/post/how-to-create-and-use-indexes-in-oracle-database#choose
So to make sure I understand, it sounds like you're saying that if three matching rows reside in the same block, an index-based query wil get the same block three times, thus performing triple the amount of work as a table scan (within the context of that one block). Is that right? If so, I'm kind of surprised that no hybrid solution is used, something like "when you get a block based on an index hit, also scan it for other matches avoid repeated gets". I'm sure I'm not the first genius to think of this, so there are probably reasons why this is a bad idea... any insight?
Only if the database has to fetch other rows in between these three.
Say you have 9 rows with values 1-9, spread across 3 data blocks with three rows in each.
Say the rows are stored in the data blocks "in order" - block 1 = 1,2,3, block 2 = 4,5,6, and block 3 = 7,8,9. Consecutive index entries match are also consecutive in the table. So to get the rows 1-3, you only need to access block one. And it only does this once, not three times.
Now imagine the rows are in the blocks in this sequence: block 1 = 1,4,7 block 2 = 2,5,8, and block 3 = 3,6,9. To get the values 1-3, you need to visit all three data blocks. To get row 4, the database goes back to block 1 and fetches it again.
Does this clarify it for you?
@@TheMagicofSQL Ahh I see. So the point is that the db performs gets in the order that it retrieves pointers from the index, so if several successive matches are stored in the same block, it should be able to gather them all at once, correct? How about the case where block 1=1,2,3 but your query is for, e.g., odd numbers, such that you match 1 and 3, but not the 2. Does it matter that 1 and 3 are not directly adjacent within the block? Or as long as the index finds 1 and 3 in succession, will it still be able to use block 1 all at once without needing a repeated get?
Yes, it's the order of retrieving index pointers that affects table block fetches.
As long as consecutive pointers from the index reference the same data block, the database only fetches the table block once.
So if you have the condition col in ( 1, 3 ), skipping over 2 as your example then the database only fetches the table data block once.
@@TheMagicofSQL That makes a ton of sense. Thanks!
Hi Chris! Excellent video on this topic! How about the importance of single block I/O vs multiblock I/O? Is it ever a good idea to play with DB_FILE_MULTIBLOCK_READ_COUNT parameter?
Relatively low - I wouldn't start fiddling with that parameter
@@TheMagicofSQL Thank you for the reply!
How to upload CSV in Oracle live sql
There's no CSV upload option available on Live SQL. If you have Oracle Database questions unrelated to a video, please ask them on asktom.oracle.com
Have you ever taken the Hotsos Course?
No