Impact on Insert Performance — Primary Keys in MySQL27th February 2019
Impact on Insert Performance — Primary Keys in MySQL
The importance of Primary Keys in MySQL
Note: I have used MySQL and it’s storage engine InnoDB interchangeably.
A couple of days ago, someone at work asked me whether inserts are faster in a table when there’s a primary key present in the table. My immediate answer was that they were faster. I had some idea about this from experience and previous reading but didn’t remember the reason why it was so.
For a business user unaware about InnoDB internals or InnoDB’s existence, it might seem perfectly intuitive to assume that inserts will be faster without a primary key because a primary key is an additional constraint on the table, i.e., MySQL has to check for uniqueness, nullability. When it comes to InnoDB, that is not the case. Also, a table without a primary key will occupy more space than a table with a primary key. See how.
MySQL has various mutexes for preventing multiple threads accessing the same resource at the same time. There is a mutex called dict_sys which locks the data dictionary, i.e., the information_schema. Ovais Tariq has written a very insightful blogpost on this.
When a primary key is not defined on a table, MySQL generates row-ids for every row that is inserted in that table. This, because InnoDB is an index-organized storage engine, it mandates a primary key to store the contents of the table. If you don’t define a primary, MySQL will do this itself by generating a 6 byte auto increment primary key which is hidden. It is referred by the name DB_ROW_ID in the documentation. This only has to exist if a table doesn’t have a primary key.
Now, when this row-id is generated on every new insert, MySQL acquires dict_sys mutex — and it becomes a system-wide contention point because what it is acquiring a mutex on is the data dictionary — which might be in use by a couple of other DDL and DML operations. This would mean that insert performance on a table might be affected by another operation taking place on another table. This contention would otherwise not exist if a primary key is defined on the table.
To check how a similar insert workload behaves on a table with a defined primary key and another table without a defined primary key, I performed a simple mysqlslap insert performance test on my local. I have attached the scripts to perform these tests. Note: Doing this with rand() is probably not ideal.
mysqlslap — is a sysbench like tool for MySQL that can be used for performance and load testing of individual tables — Read more about it here.
The insert performance in this case was surprisingly not very different in both the cases — checked for 10 million records. This was with the concurrency of 4 threads
- With primary key, 1284.848 seconds
- Without primary key, 1282.350 seconds
In an isolated environment and with only 4 thread, there were not many dict_sys mutexes on information_schema, hence, there was no significant difference between the insert performance. Although, with the same table configuration, these two tables occupy significantly different amount of space on disk — which comes from the fact that MySQL had to create row ids for every record that was being inserted in the table without a defined primary key.
- With primary key, 900 MB
- Without primary key, 1.1 GB
I conducted five more tests with increasing concurrency and a consistent load of 100K records. See the difference in insert performance here.
With increasing threads, time taken to insert into tables without primary key increases quite a lot because of the contention point added because of dict_sys mutex.
Always use a primary key!
P.S. — From Jeremy Cole’s blog about this
Given how much other code within InnoDB is protected by dict_sys->mutex I think it’s fair to say any tables with an implicit clustered key (ROW_ID) could expect to experience random insert stalls during operations like dropping (unrelated) tables. Parallel insertion into multiple tables with implicit keys could be performance-constrained, as it will be serialized on both the shared mutex and cache contention for the shared counter variable.
- Ovais Tariq’s Original Blogpost about Scalability Issues
- Chapter 4 from High Performance MySQL — strongly recommended book for MySQL power users
- Tuning Primary Keys in InnoDB
- Wasting InnoDB Memory — not immediately related to this, but interesting nevertheless
- MySQL Server Team’s Blog about InnoDB’s Intrinsic Tables
- Community Discussion about using mysqlslap
- Importance of Primary Keys in MySQL Performance
- UUID vs incremental ID insert performance in MySQL — not immediately relevant
- Jeremy Cole’s blog on How InnoDB behaves without a Primary Key — must read
- Lack of Primary Key May Effectively Stop Slave — this needs a deeper dive
- Another blogpost about Replication Issues when Primary Keys are not present
- InnoDB Row Format Specification
Impact on Insert Performance — Primary Keys in MySQL was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.