While these indexes are being created, does the database lock the tables? Will this make your application unresponsive if the tables cannot be accessed?
You can still do DML (for example, inserting something into the table) for the btree indexes. However, this is not permitted when creating the fulltext! More info in the docs: dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html#online-ddl-index-syntax-notes
Oh, and apropos full-text index. Wondering why after updating MySQL 5.7 to 8.0 queries that took < 1s to execute are now taking 6 seconds. Similar setup to yours where the FTI is over one column and the query only checks that column. 700k rows. Not sure if I should simply rebuild the index.
Hard to say without more info about the query and DB. Have you confirmed that after migrating/upgrading the index still exists? What does "SHOW INDEXES FROM table_name;" say?
How many rooms are there in the test data? Asking because I always thought an index makes sense if the cardinality is high (= there is a lot of variance in the data). I would be surprised to see such a big impact on speed if there were only a handful of rooms.
You could add it! However, if you try to build a FULLTEXT index on a table without it, it will automatically rebuild the table to incorporate that index so you don't have to handle it manually. Just makes the index creation take longer.
This would definitely be a consideration if this was in production! For the purpose of the video, this was for testing the app pre-release. You specify the fulltext index at the time of table creation, and have it build incrementally as new messages arrive. However, this will add some overhead for table insertions.
Great question! In the video we're using a utf8mb4 character set, but MySQL supports a large number of other character sets as well: dev.mysql.com/doc/mysql-g11n-excerpt/8.3/en/charset-charsets.html Give it a try with a different one! FULLTEXT indexes work on char, varchar, and text column types.
@@PlanetScale Thanks, So with select a good charset we can use full text index on the text column? Does the tokenization process work for a RTL (ex: Arabic or Persian) Language?
Btree indexes never worked in any database if you query field like ‘%search_keyword% instead ‘search_keyword%’ only works. Creating a fulltext index on populated table is painful, instead create an empty table with full text index and populate data. ( insert into select … ) . Last with that many records you have to shard the db or use some partitioning to manage data
While these indexes are being created, does the database lock the tables? Will this make your application unresponsive if the tables cannot be accessed?
You can still do DML (for example, inserting something into the table) for the btree indexes. However, this is not permitted when creating the fulltext! More info in the docs:
dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html#online-ddl-index-syntax-notes
@@PlanetScale what is the solution on this case?
Yeah this is awesome
wow, amazing
Sounds like a usecase outside of bioinformatics for interleaved bloom filters. The end result looks still pretty slow to me.
Oh, and apropos full-text index. Wondering why after updating MySQL 5.7 to 8.0 queries that took < 1s to execute are now taking 6 seconds. Similar setup to yours where the FTI is over one column and the query only checks that column. 700k rows. Not sure if I should simply rebuild the index.
Hard to say without more info about the query and DB. Have you confirmed that after migrating/upgrading the index still exists? What does "SHOW INDEXES FROM table_name;" say?
So, next step would be to store that in elastic search?
How many rooms are there in the test data? Asking because I always thought an index makes sense if the cardinality is high (= there is a lot of variance in the data). I would be surprised to see such a big impact on speed if there were only a handful of rooms.
This this setup, the population script for the DB put 1000 messages in per room, meaning there were over a million different room names.
So what would be the next step to make it even faster?
We have a part 2 coming out soon where we cover just that
I already have PK, which is string, can I make another column as FTS_DOC_ID field ?
You could add it! However, if you try to build a FULLTEXT index on a table without it, it will automatically rebuild the table to incorporate that index so you don't have to handle it manually. Just makes the index creation take longer.
how did you manage the uptime for the server in case of the index whichtook 7 hours?
This would definitely be a consideration if this was in production! For the purpose of the video, this was for testing the app pre-release. You specify the fulltext index at the time of table creation, and have it build incrementally as new messages arrive. However, this will add some overhead for table insertions.
This would also be a great use case for PlanetScale branching + deploy requests.
Greta video! Thx sharing
Does the full text index work for all languages? for example a language like Arabic?
Great question! In the video we're using a utf8mb4 character set, but MySQL supports a large number of other character sets as well:
dev.mysql.com/doc/mysql-g11n-excerpt/8.3/en/charset-charsets.html
Give it a try with a different one! FULLTEXT indexes work on char, varchar, and text column types.
@@PlanetScale Thanks, So with select a good charset we can use full text index on the text column? Does the tokenization process work for a RTL (ex: Arabic or Persian) Language?
The real test is doing queries while the table is getting written into.
What storage engine is required for full text? Does it work on innodb?
yes.
Yep, this table was using innodb in the video!
Great Vid
Btree indexes never worked in any database if you query field like ‘%search_keyword% instead ‘search_keyword%’ only works. Creating a fulltext index on populated table is painful, instead create an empty table with full text index and populate data. ( insert into select … ) . Last with that many records you have to shard the db or use some partitioning to manage data
Populating the rows and indexes at the same time sounds as painful as doing them separately.