Mysql Index (B-tree) from v5.5–v8.0

dieutb
6 min readApr 6, 2020

--

On this short talk, we only talk about index B-tree of storage engine InnoDB

Some Definitions:

  • Primary key:
    - Each table only has one primary key.
    - It could consist of single or multiple columns.
    - UNIQUE, NOT NULL, and auto-create INDEX.
    - Determines the physical layout of rows in the data file.
  • Foreign key:
    - Each table could have many foreign keys.
    - It could consist of single or multiple columns.
    - Auto-create INDEX when CREATE TABLE. When ALTER TABLE, we must add index for that foreign key by ourselves (it’s true so far version 8.0)
    - Constraints of Foreign key:
    a. key column’s VALUE must map with the column that is referred OR be null.
    b. If there is any record having a reference value, it must be deleted before deleting the table. An alternative way to drop the table is removes the constrain foreign key first.
create TABLE categories (
id int AUTO_INCREMENT,
name varchar(255),
CONSTRAINT PR_categories PRIMARY KEY (id)
);

create TABLE books (
id int AUTO_INCREMENT,
name varchar(255) NOT NULL,
category_id int,
CONSTRAINT PR_books PRIMARY KEY (id)
CONSTRAINT FR_idx_books_category_id FOREIGN KEY (category_id) REFERENCES categories(id)
)

What’s Index?

  • Indexes are used to find rows with specific column values quickly.
  • The maximum number of indexes per table and the maximum index length is defined per storage engine. At least 16 indexes per table and a total index length of at least 256 bytes.
  • There are some kinds of index columns corresponding with the storage engine:
    - B-Tree: supports index, index-prefixes (767 bytes long for InnoDB, 1000bytes long for MyISAM), fulltext-index (char, varchar, text columns).
    - Hash: is used for search exactly match with "key". The response time is very fast. (just for the knowledge, we don’t talk deeply about it in this article)
    -
    R-Tree: supports spatial-index. It is used for geo point data. (just for the knowledge, we don’t talk deeply about it in this article)

Type of Index:

1/ Single column: the index has only a column and could not support for “OR” condition.
2/ Multiple columns: the index has more than one column and could not support for “OR” condition. It could also support the query using a part of key-columns with the order of columns-index. In a short way, if you have an index(1,2,3), you could query with (1), (1,2), (1,3), (1,2,3) but not the others (2), (2,3), (3).
3/ index merge: a way optimize query of Mysql. It supports the single-col index, multiple-col index in “OR(UNION).

Some RULES:

  • MySQL could not use the index if the column index isn’t the same binary type of column refer to. ie: utf8 column refers tolatin1 , int column refers to varchar column.
  • MySQL can use indexes on columns more efficiently if they are declared as the same type and size. “var” and “varchar” are considered the same if they are declared as the same size. ie: var(10) ~ varchar(10).
  • Kind of queries:
    - Match: “=”
    - Range: “Like” not support prefix wildcard, “<”, “<=”, “>”, “>=”, “Between”, “IN”
  • If you create an index with a few separate values and one of the value is outstanding of the number, MySQL decides to skip using the index and take the full scan table.
Scenario: table "campaigns" have single-column index "status" with 3 values constraint pending/approved/deleted. There is 1M rows in the table and "approved" own 800K rows. If the query find the status='approved', index(status) won't be used.
  • The index skips NULL value. The same as the query.
  • The more UNIQUE value in the column INDEX, the faster query be. The fastest is the Primary key.

1/ If there are many choices of the index for the query, MySQL normally uses the index that has the smallest number of rows (the selective index).

2/ Mysql arranges the order of conditions to appropriately match with the selective-index.

3/ In some queries, we should spread the condition AND-(OR) to a simple statement to check the selective index easier.

books.name='some pets' AND (book.category_id=1 OR book.author='a')# spread statement would be:
(books.name='some pets' AND book.category_id=1) OR (books.name='some pets' AND book.author='a')
# This case the index-merge could be used if they're defined.
# We could define the index for this query:
# - option1: we define only index(name). The index would be use in both two conditions.
# - option2: we define 2 index(category_id, name), index(author, name)

4/ LIMIT combines ORDER: The sorting and matching are concurrency process. As soon as a row is sorted, it would be put into the queue to detect matching. MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. In case the index column is used for sorting, it’s really fast.

5/ If you combine LIMIT row_count with ORDER and DISTINCT, MySQL stops as soon as it finds row_count unique rows.

JOIN OPTIMIZE

  • we have 2 tables.
    1. “categories”. Columns: id, name. Primary key (id). Index: id.
    2. “books”. Columns: id, name, category_id. Primary key(id), Foreign key(category_id). Index: id, name, category_id.
# query 1: condition append to "ON" clause
SELECT books.id, books.name, categories.name AS category_name
FROM books
LEFT JOIN categories
ON books.category_id = categories.id AND books.name='some pets';
# (1) has very bad performance.
# query 2: condition in "WHERE" clause
SELECT books.id, books.name, categories.name AS category_name
FROM books
LEFT JOIN categories
ON books.category_id = categories.id
WHERE books.name='some pets';
# query 3: sub-query
SELECT b.id, b.name, cate.name as CategoryName
FROM (select * from books where books.name = 'some pets') as b
LEFT JOIN categories as cate
ON b.category_id = cate.id;
# (2) and (3) queries above are mostly the same. The graph below visualize the flow of them.
# we can also use 'force index(index-name)' to let MySQL use it.
JOIN flow (2), (3)

(2), (3): As you can see, each table in the JOIN clause chooses one index for itself. The table “books” choose idx_books_name index because there is condition relate to that index (books.name=’some pets’) and MySQL optimization consider it the selective-index. In this query, “books” is the base-table, it will be filtered by the condition (books.name=’some pets’) then each row of the result will execute the query matching (books.category_id = categories.id) then join them together (nested loop). Obviously, the “categories” would choose the Primary key for the selective index.

GROUP OPTIMIZE

There are two ways to execute a GROUP BY query through index access

1/ LOOSE SCAN: must meet all rules below

  • Query over a single table and no selective index for WHERE clause.
  • There is a selective index in GROUP BY.
  • All columns in GROUP BY must contain in the index. ie: index(1,2,3) the columns in GROUP BY could be: (1), (1,2), (1,2,3) and no other columns
  • If there is aggregation function, the (col) must not in GROUP BY clause. Supported: MIN(col), MAX(col), SUM(DISTINCT col), COUNT(DISTINCT col), AVG(DISTINCT col).
  • No support the prefix-index.

Assume that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4). The Loose Index Scan access method can be used for the following queries:

SELECT c1, c2 FROM t1 GROUP BY c1, c2; 
SELECT DISTINCT c1, c2 FROM t1;
SELECT c1, MIN(c2) FROM t1 GROUP BY c1;
SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2;
SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;

2/ TIGHT SCAN: a full index scan or a range index scan, depending on the query conditions. In a short way GROUP BY scan all the indexes of WHERE clause then decide which to use. must meet all rules below:

  • When the LOOSE INDEX’s condition not match.
  • There is a selective index in WHERE clause.

--

--

dieutb
dieutb

Written by dieutb

passion in finding solutions and research some things relate to technical and helpful in life. Also strongly interest in foods, travel, and gentle music :)

No responses yet