Why does innodb pick B+ tree over B tree
- Non-leaves no longer store the data, so each node can store more childs, i.e., the tree height will be lower
- Linked list between leaves means range search/scan is easier
- Leaves goes to disk, non-leaves good for memeory
How to estimate index size
- B+ Tree each node fits into a page, with k keys, k+1 pointers the children, and 2 pointers to siblings
- An OS page is about 4KB, innodb is 16KB
- Estimate 64 bit key + 64 bit address = 16 bytes, so each node will point to about 1k nodes
- Therefore, a B+ tree of level 3 (inlcuding the root and leaves) can host almost 1B rows
- For 1B rows, total index for 1 key will be around 16G. In effect, it will be more considering padding and non-long PKs