Mysql time query

Mysql time query DEFAULT

Summary: in this tutorial, we will introduce you to the MySQL data type and show you useful temporal functions to manipulate time data effectively.

Introduction to MySQL TIME data type

MySQL uses the format for querying and displaying a time value that represents a time of day, which is within 24 hours. To represent a time interval between two events, MySQL uses the format, which is larger than 24 hours.

To define a column, you use the following syntax:

Code language:SQL (Structured Query Language)(sql)

For example, the following snippet defines a column named with data type.

Code language:SQL (Structured Query Language)(sql)

A value ranges from to . In addition, a value can have fractional seconds part that is up to microseconds precision (6 digits). To define a column whose data type is with a fractional second precision part, you use the following syntax:

Code language:SQL (Structured Query Language)(sql)

N is an integer that represents the fractional part, which is up to 6 digits.

The following snippet defines a column with data type including 3 digits of fractional seconds.

Code language:SQL (Structured Query Language)(sql)

A value takes 3 bytes for storage. In case a value includes fractional second precision, it will take additional bytes based on the number of digits of the fractional second precision. The following table illustrates the storage required for fractional second precision.

Fractional Second PrecisionStorage (BYTES)
1, 21
3, 42
5, 63

For example, and takes 3 bytes. and takes 4 bytes (3  + 1); and take 5 and 6 bytes.

MySQL TIME data type example

Let’s take a look at an example of using the data type for columns in a table.

First, create a new table named  that consists of four columns: , , , and . The data types of the and columns are .

Code language:SQL (Structured Query Language)(sql)

Second, insert a row into the table.

Code language:SQL (Structured Query Language)(sql)

Third, query data from the table.

Code language:SQL (Structured Query Language)(sql)
MySQL TIME example

Notice that we use as the literal time value in the statement. Let’s examine all the valid time literals that MySQL can recognize.

MySQL TIME literals

MySQL recognizes various time formats besides the format that we mentioned earlier.

MySQL allows you to use the format without delimiter ( : ) to represent time value. For example, and can be rewritten as and .

Code language:SQL (Structured Query Language)(sql)

However, is not a valid time value because does not represent the correct minute. In this case, MySQL will raise an error if you try to insert an invalid time value into a table.

Code language:SQL (Structured Query Language)(sql)

MySQL issued the following error message after executing the above statement.

Code language:SQL (Structured Query Language)(sql)

In addition to the string format, MySQL accepts the as a number that represents a time value. You can also use , . For example, instead of using , you can use as follows:

Code language:SQL (Structured Query Language)(sql)

For the time interval, you can use the format where represents days with a range from 0 to 34. A more flexible syntax is , , , or .

If you use the delimiter:, you can use 1 digit to represent hours, minutes, or seconds. For example, can be used instead of .

Code language:SQL (Structured Query Language)(sql)
MySQL TIME literals 1 digit

Useful MySQL TIME functions

MySQL provides several useful temporal functions for manipulating data.

Getting to know the current time

To get the current time of the database server, you use the function. The function returns the current time value as a string ( ) or a numeric value ( ) depending on the context where the function is used.

The following statements illustrate the function in both string and numeric contexts:

Code language:SQL (Structured Query Language)(sql)

Adding and Subtracting time from a TIME value

To add a value to another value, you use the function. To subtract a value from another value, you use  the  function.

The following statement adds and subtracts 2 hours 30 minutes to and from the current time.

Code language:SQL (Structured Query Language)(sql)

In addition, you can use the function to get a difference between two values.

Code language:SQL (Structured Query Language)(sql)

Formatting MySQL TIME values

Although MySQL uses when retrieving and displaying the a  value, you can display the  value in your preferred way using the  function.

The function is like the function except that the function is used to format a value only.

See the following example.

Code language:SQL (Structured Query Language)(sql)
MySQL TIME_FORMAT function example

In the time format string above:

  •   means two-digit hours from 0 to 12.
  •  means two-digit minutes from 0 to 60.
  •   means AM or PM.

Extracting hour, minute, and second from a TIME value

To extract the hour, minute, and second from a value, you use , , and functions as follows:


Getting UTC time value

To get the UTC time, you use function as follows:

Code language:SQL (Structured Query Language)(sql)

In this tutorial, we have been covered a lot about MySQL data type and some commonly used temporal functions for manipulating values.


I think you’ll agree with me when I say that a database is one of the most important components of almost every application today. When working with databases, you have to take care of a lot of things. Some examples include designing and creating the database, maintaining it, managing backup and recovery, and measuring performance. And in this post, I’m focusing on the performance part.

Performance is always linked with time. The faster your database is, the higher it’s performing. When it comes to databases, the interaction always happens with queries. So to measure the performance of your database, you have to measure the query time. And that’s what I’ll be talking about in this post.

Clock signifying mysql query time

Why Measure Query Time?

When you have an application that uses a database, the application executes database queries to interact with the database. These queries can be from the admin or from your customer’s side. Each of these queries takes some time to be executed by the database management system (DBMS). To optimize the performance, you have to make sure that the queries are being executed as fast as possible.

Measuring the query time helps you understand the time taken by the query to execute. Once you know this, you can decide which query is eating up the time and then work on optimizing it. Basically, measuring query time is the first step in the database optimization process.

In this post, I’m focusing on measuring MySQL query time because MySQL is one of the most used database management systems. So let’s look at a couple of ways to measure it.

How Do You Measure MySQL Query Time?

You will find different ways on how to do this. What is best for you depends on your use case. In some cases, measuring the query time for one or a few cases is enough for you. And in other cases, you’ll have to do it for a huge number of queries. To make sure this post caters to both these types of requirements, I’ll tell you how to measure query time for each of these.

Measuring MySQL Query Time for Single Queries

MySQL by default shows the time taken by the query to execute. Whenever you run a valid query, you will see the time of execution in seconds. Here’s a screenshot of the time after I executed a few queries.

mysql query time execution shown in seconds

But this won’t be very helpful when you are using a server-side program to execute the query. In that case, you can use the profiler to measure the query time.

The first thing to do is to initialize the profiler using the below query:

set profiling=1

Then you can execute whatever query you want to measure the execution time for. Once executed, you can check the query execution time using the below query:

show profiles;

You will be able to see the duration of query execution in seconds.

These ways are fine when you want to measure the query time for one or a few queries. But this is not practical when you want to measure the query time for a large number of queries. For instance, if you have a deployed application and users are continuously making database transactions using it, it’s not practical or possible for you to record these queries and run them one at a time to measure the query time. You would need something that works for a large set of queries and that is also scalable.

So let me tell you how to measure the query time in such situations.

Measuring MySQL Query Time Using Slow Query Logs

MySQL allows you to log slow queries so that you can examine them later for optimization. So instead of logging all the query details, you just log slow queries. This saves a lot of memory because you’re only logging what matters. But how does the system know that the query is slow? You will have to tell that to the system by assigning a value to the long_query_time variable. So, if you assign the value “2” to this variable, every query that takes more than two seconds for execution is logged.

Let me show you an example of how you can configure MySQL to log slow queries.

First, you’ll have to enable slow log query for MySQL DBMS. To do that, add the following line in the MySQL configuration file:


You can also mention the name of the file where the log should be stored by adding the following line:

slow_query_log_file=<path of the file>

Then add the long_query_time value. If you want to log all the queries, you can set this value to 0.


Now, whenever any query is executed, its details are stored in the log file. I’ve run a couple of simple select queries, and after that my log file looks something like this:

Logging details of all the queries will unnecessarily consume memory. To save that memory, you will have to decide on a value for the long_query_time variable. You can decide this by referring to benchmark query execution time for your use case.

Using logs to measure query time gives you a lot of information and is really helpful when you want to measure query time for a huge number of queries. But these logs are in raw format. Going through these logs to find the data that will be valuable for you can get frustrating. You can obviously run this log file through a parser to fetch just the important details. But that’s again an additional task.

If you are looking for a quick solution to measure MySQL query time without putting in a lot of effort, you can use readily available log management tools.

Measuring MySQL Query Time Using Scalyr

Scalyr is one such log management tool that helps you handle logs easily. What’s good about Scalyr is that, along with recording the logs, it also visualizes them for you. Now, let me show you how Scalyr makes things easy for you.

First, you will have to log in to or sign up for Scalyr. Once done, install the Scalyr agent on the system that’s running the MySQL server. You can find the instructions here. This agent will upload your logs to your dashboard so that you can easily view them.

To start logging MySQL query logs, open the terminal and then open the /etc/scalyr-agent-2/agent.json file. Then add the DBMS username and password under the monitor section as follows:

After configuring the file, start the agent by running the following command:

sudo scalyr-agent-2 start

Now, go to the Scalyr dashboard menu and select MySQL. You will be able to see the log details of your MySQL, which includes the query time.

This is a very simple and easy way to measure query time for a large number of MySQL queries.

The Best Way

I believe that the best way to measure query time is by using tools like Scalyr. It not only gives you the information that you need but also brings it to you in an easy-to-understand form. You don’t have to get frustrated looking at the raw log files anymore. To experience what I’ve talked about, you’ll have to see it for yourself. So go ahead and give it a try. You can get a free trial here.

  1. 50 mower blades
  2. Ssundee discord
  3. Chewbacca girls
  • ,

    When invoked with the form of the second argument, is a synonym for . The related function is a synonym for . For information on the argument, see Temporal Intervals.

    When invoked with the form of the second argument, MySQL treats it as an integer number of days to be added to .

  • adds to and returns the result. is a time or datetime expression, and is a time expression.

  • converts a datetime value from the time zone given by to the time zone given by and returns the resulting value. Time zones are specified as described in Section 5.1.13, “MySQL Server Time Zone Support”. This function returns if the arguments are invalid.

    If the value falls out of the supported range of the type when converted from to UTC, no conversion occurs. The range is described in Section 11.2.1, “Date and Time Data Type Syntax”.

  • Returns the current date as a value in or format, depending on whether the function is used in string or numeric context.

  • ,

    and are synonyms for .

  • ,

    and are synonyms for .

  • ,

    and are synonyms for .

  • Returns the current time as a value in or format, depending on whether the function is used in string or numeric context. The value is expressed in the session time zone.

    If the argument is given to specify a fractional seconds precision from 0 to 6, the return value includes a fractional seconds part of that many digits.

  • Extracts the date part of the date or datetime expression .

  • returns − expressed as a value in days from one date to the other. and are date or date-and-time expressions. Only the date parts of the values are used in the calculation.

  • ,

    These functions perform date arithmetic. The argument specifies the starting date or datetime value. is an expression specifying the interval value to be added or subtracted from the starting date. is evaluated as a string; it may start with a for negative intervals. is a keyword indicating the units in which the expression should be interpreted.

    For more information about temporal interval syntax, including a full list of specifiers, the expected form of the argument for each value, and rules for operand interpretation in temporal arithmetic, see Temporal Intervals.

    The return value depends on the arguments:

    • if the argument is a value and your calculations involve only , , and parts (that is, no time parts).

    • if the first argument is a (or ) value, or if the first argument is a and the value uses , , or .

    • String otherwise.

    To ensure that the result is , you can use to convert the first argument to .

  • Formats the value according to the string.

    The specifiers shown in the following table may be used in the string. The character is required before format specifier characters. The specifiers apply to other functions as well: , , .

    Abbreviated weekday name (..)
    Abbreviated month name (..)
    Month, numeric (..)
    Day of the month with English suffix (, , , , …)
    Day of the month, numeric (..)
    Day of the month, numeric (..)
    Microseconds (..)
    Hour (..)
    Hour (..)
    Hour (..)
    Minutes, numeric (..)
    Day of year (..)
    Hour (..)
    Hour (..)
    Month name (..)
    Month, numeric (..)
    Time, 12-hour ( followed by or )
    Seconds (..)
    Seconds (..)
    Time, 24-hour ()
    Week (..), where Sunday is the first day of the week; mode 0
    Week (..), where Monday is the first day of the week; mode 1
    Week (..), where Sunday is the first day of the week; mode 2; used with
    Week (..), where Monday is the first day of the week; mode 3; used with
    Weekday name (..)
    Day of the week (=Sunday..=Saturday)
    Year for the week where Sunday is the first day of the week, numeric, four digits; used with
    Year for the week, where Monday is the first day of the week, numeric, four digits; used with
    Year, numeric, four digits
    Year, numeric (two digits)
    A literal character
    , for any “” not listed above

    Ranges for the month and day specifiers begin with zero due to the fact that MySQL permits the storing of incomplete dates such as .

    The language used for day and month names and abbreviations is controlled by the value of the system variable (Section 10.16, “MySQL Server Locale Support”).

    For the , , , and specifiers, see the description of the function for information about the mode values. The mode affects how week numbering occurs.

    returns a string with a character set and collation given by and so that it can return month and weekday names containing non-ASCII characters.

  • See the description for .

  • is a synonym for .

  • Returns the name of the weekday for . The language used for the name is controlled by the value of the system variable (Section 10.16, “MySQL Server Locale Support”).

  • Returns the day of the month for , in the range to , or for dates such as or that have a zero day part.

  • Returns the weekday index for ( = Sunday, = Monday, …, = Saturday). These index values correspond to the ODBC standard.

  • Returns the day of the year for , in the range to .

  • The function uses the same kinds of specifiers as or , but extracts parts from the date rather than performing date arithmetic. For information on the argument, see Temporal Intervals.

  • Given a day number , returns a value.

    Use with caution on old dates. It is not intended for use with values that precede the advent of the Gregorian calendar (1582). See Section 12.9, “What Calendar Is Used By MySQL?”.

  • Returns a representation of as a datetime or character string value. The value returned is expressed using the session time zone. (Clients can set the session time zone as described in Section 5.1.13, “MySQL Server Time Zone Support”.) is an internal timestamp value representing seconds since UTC, such as produced by the function.

    If is omitted, this function returns a value.

    If is an integer, the fractional seconds precision of the is zero. When is a decimal value, the fractional seconds precision of the is the same as the precision of the decimal value, up to a maximum of 6. When is a floating point number, the fractional seconds precision of the datetime is 6.

    is used to format the result in the same way as the format string used for the function. If is supplied, the value returned is a .


    If you use and to convert between values in a non-UTC time zone and Unix timestamp values, the conversion is lossy because the mapping is not one-to-one in both directions. For details, see the description of the function.

  • Returns a format string. This function is useful in combination with the and the functions.

    The possible values for the first and second arguments result in several possible format strings (for the specifiers used, see the table in the function description). ISO format refers to ISO 9075, not ISO 8601.

    Function CallResult

    can also be used as the first argument to , in which case the function returns the same values as for .

  • Returns the hour for . The range of the return value is to for time-of-day values. However, the range of values actually is much larger, so can return values greater than .

  • Takes a date or datetime value and returns the corresponding value for the last day of the month. Returns if the argument is invalid.

  • ,

    and are synonyms for .

  • ,

    and are synonyms for .

  • Returns a date, given year and day-of-year values. must be greater than 0 or the result is .

  • Returns a time value calculated from the , , and arguments.

    The argument can have a fractional part.

  • Returns the microseconds from the time or datetime expression as a number in the range from to .

  • Returns the minute for , in the range to .

  • Returns the month for , in the range to for January to December, or for dates such as or that have a zero month part.

  • Returns the full name of the month for . The language used for the name is controlled by the value of the system variable (Section 10.16, “MySQL Server Locale Support”).

  • Returns the current date and time as a value in or format, depending on whether the function is used in string or numeric context. The value is expressed in the session time zone.

    If the argument is given to specify a fractional seconds precision from 0 to 6, the return value includes a fractional seconds part of that many digits.

    returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, returns the time at which the function or triggering statement began to execute.) This differs from the behavior for , which returns the exact time at which it executes.

    In addition, the statement affects the value returned by but not by . This means that timestamp settings in the binary log have no effect on invocations of . Setting the timestamp to a nonzero value causes each subsequent invocation of to return that value. Setting the timestamp to zero cancels this effect so that once again returns the current date and time.

    See the description for for additional information about the differences between the two functions.

  • Adds months to period (in the format or ). Returns a value in the format .


    The period argument is not a date value.

  • Returns the number of months between periods and . and should be in the format or . Note that the period arguments and are not date values.

  • Returns the quarter of the year for , in the range to .

  • Returns the second for , in the range to .

  • Returns the argument, converted to hours, minutes, and seconds, as a value. The range of the result is constrained to that of the data type. A warning occurs if the argument corresponds to a value outside that range.

  • This is the inverse of the function. It takes a string and a format string . returns a value if the format string contains both date and time parts, or a or value if the string contains only date or time parts. If the date, time, or datetime value extracted from is illegal, returns and produces a warning.

    The server scans attempting to match to it. The format string can contain literal characters and format specifiers beginning with . Literal characters in must match literally in . Format specifiers in must match a date or time part in . For the specifiers that can be used in , see the function description.

    Scanning starts at the beginning of and fails if is found not to match. Extra characters at the end of are ignored.

    Unspecified date or time parts have a value of 0, so incompletely specified values in produce a result with some or all parts set to 0:

    Range checking on the parts of date values is as described in Section 11.2.2, “The DATE, DATETIME, and TIMESTAMP Types”. This means, for example, that “zero” dates or dates with part values of 0 are permitted unless the SQL mode is set to disallow such values.

    If the SQL mode is enabled, zero dates are disallowed. In that case, returns and generates a warning:


    You cannot use format to convert a year-week string to a date because the combination of a year and week does not uniquely identify a year and month if the week crosses a month boundary. To convert a year-week to a date, you should also specify the weekday:

  • ,

    When invoked with the form of the second argument, is a synonym for . For information on the argument, see the discussion for .

    The second form enables the use of an integer value for . In such cases, it is interpreted as the number of days to be subtracted from the date or datetime expression .

  • returns − expressed as a value in the same format as . is a time or datetime expression, and is a time expression.

  • Returns the current date and time as a value in or format, depending on whether the function is used in string or numeric context.

    If the argument is given to specify a fractional seconds precision from 0 to 6, the return value includes a fractional seconds part of that many digits.

    returns the time at which it executes. This differs from the behavior for , which returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, returns the time at which the function or triggering statement began to execute.)

    In addition, the statement affects the value returned by but not by . This means that timestamp settings in the binary log have no effect on invocations of .

    Because can return different values even within the same statement, and is not affected by , it is nondeterministic and therefore unsafe for replication if statement-based binary logging is used. If that is a problem, you can use row-based logging.

    Alternatively, you can use the option to cause to be an alias for . This works if the option is used on both the source and the replica.

    The nondeterministic nature of also means that indexes cannot be used for evaluating expressions that refer to it.

  • Extracts the time part of the time or datetime expression and returns it as a string.

    This function is unsafe for statement-based replication. A warning is logged if you use this function when is set to .

  • returns − expressed as a time value. and are time or date-and-time expressions, but both must be of the same type.

    The result returned by is limited to the range allowed for values. Alternatively, you can use either of the functions and , both of which return integers.

  • ,

    With a single argument, this function returns the date or datetime expression as a datetime value. With two arguments, it adds the time expression to the date or datetime expression and returns the result as a datetime value.

  • Adds the integer expression to the date or datetime expression . The unit for is given by the argument, which should be one of the following values: (microseconds), , , , , , , , or .

    The value may be specified using one of keywords as shown, or with a prefix of . For example, and both are legal.

  • Returns − , where and are date or datetime expressions. One expression may be a date and the other a datetime; a date value is treated as a datetime having the time part where necessary. The unit for the result (an integer) is given by the argument. The legal values for are the same as those listed in the description of the function.


    The order of the date or datetime arguments for this function is the opposite of that used with the function when invoked with 2 arguments.

  • This is used like the function, but the string may contain format specifiers only for hours, minutes, seconds, and microseconds. Other specifiers produce a value or .

    If the value contains an hour part that is greater than , the and hour format specifiers produce a value larger than the usual range of . The other hour format specifiers produce the hour value modulo 12.

  • Returns the argument, converted to seconds.

  • Given a date , returns a day number (the number of days since year 0).

    is not intended for use with values that precede the advent of the Gregorian calendar (1582), because it does not take into account the days that were lost when the calendar was changed. For dates before 1582 (and possibly a later year in other locales), results from this function are not reliable. See Section 12.9, “What Calendar Is Used By MySQL?”, for details.

    Remember that MySQL converts two-digit year values in dates to four-digit form using the rules in Section 11.2, “Date and Time Data Types”. For example, and are seen as identical dates:

    In MySQL, the zero date is defined as , even though this date is itself considered invalid. This means that, for and , returns the values shown here:

    This is true whether or not the SQL server mode is enabled.

  • Given a date or datetime , returns the number of seconds since the year 0. If is not a valid date or datetime value, returns .

    Like , is not intended for use with values that precede the advent of the Gregorian calendar (1582), because it does not take into account the days that were lost when the calendar was changed. For dates before 1582 (and possibly a later year in other locales), results from this function are not reliable. See Section 12.9, “What Calendar Is Used By MySQL?”, for details.

    Like , , converts two-digit year values in dates to four-digit form using the rules in Section 11.2, “Date and Time Data Types”.

    In MySQL, the zero date is defined as , even though this date is itself considered invalid. This means that, for and , returns the values shown here:

    This is true whether or not the SQL server mode is enabled.

  • If is called with no argument, it returns a Unix timestamp representing seconds since UTC.

    If is called with a argument, it returns the value of the argument as seconds since UTC. The server interprets as a value in the session time zone and converts it to an internal Unix timestamp value in UTC. (Clients can set the session time zone as described in Section 5.1.13, “MySQL Server Time Zone Support”.) The argument may be a , , or string, or a number in , , , or format. If the argument includes a time part, it may optionally include a fractional seconds part.

    The return value is an integer if no argument is given or the argument does not include a fractional seconds part, or if an argument is given that includes a fractional seconds part.

    When the argument is a column, returns the internal timestamp value directly, with no implicit “string-to-Unix-timestamp” conversion.

    The valid range of argument values is the same as for the data type: UTC to UTC. If you pass an out-of-range date to , it returns .

    If you use and to convert between values in a non-UTC time zone and Unix timestamp values, the conversion is lossy because the mapping is not one-to-one in both directions. For example, due to conventions for local time zone changes such as Daylight Saving Time (DST), it is possible for to map two values that are distinct in a non-UTC time zone to the same Unix timestamp value. maps that value back to only one of the original values. Here is an example, using values that are distinct in the time zone:

    If you want to subtract columns, you might want to cast them to signed integers. See Section 12.11, “Cast Functions and Operators”.

  • ,

    Returns the current UTC date as a value in or format, depending on whether the function is used in string or numeric context.

  • ,

    Returns the current UTC time as a value in or format, depending on whether the function is used in string or numeric context.

    If the argument is given to specify a fractional seconds precision from 0 to 6, the return value includes a fractional seconds part of that many digits.

  • ,

    Returns the current UTC date and time as a value in or format, depending on whether the function is used in string or numeric context.

    If the argument is given to specify a fractional seconds precision from 0 to 6, the return value includes a fractional seconds part of that many digits.

  • This function returns the week number for . The two-argument form of enables you to specify whether the week starts on Sunday or Monday and whether the return value should be in the range from to or from to . If the argument is omitted, the value of the system variable is used. See Section 5.1.7, “Server System Variables”.

    The following table describes how the argument works.

    ModeFirst day of weekRangeWeek 1 is the first week …
    0Sunday0-53with a Sunday in this year
    1Monday0-53with 4 or more days this year
    2Sunday1-53with a Sunday in this year
    3Monday1-53with 4 or more days this year
    4Sunday0-53with 4 or more days this year
    5Monday0-53with a Monday in this year
    6Sunday1-53with 4 or more days this year
    7Monday1-53with a Monday in this year

    For values with a meaning of “with 4 or more days this year,” weeks are numbered according to ISO 8601:1988:

    • If the week containing January 1 has 4 or more days in the new year, it is week 1.

    • Otherwise, it is the last week of the previous year, and the next week is week 1.

    If a date falls in the last week of the previous year, MySQL returns if you do not use , , , or as the optional argument:

    One might argue that should return because the given date actually occurs in the 52nd week of 1999. returns instead so that the return value is “the week number in the given year.” This makes use of the function reliable when combined with other functions that extract a date part from a date.

    If you prefer a result evaluated with respect to the year that contains the first day of the week for the given date, use , , , or as the optional argument.

    Alternatively, use the function:

  • Returns the weekday index for ( = Monday, = Tuesday, … = Sunday).

  • Returns the calendar week of the date as a number in the range from to . is a compatibility function that is equivalent to .

  • Returns the year for , in the range to , or for the “zero” date.

  • ,

    Returns year and week for a date. The year in the result may be different from the year in the date argument for the first and the last week of the year.

    The argument works exactly like the argument to . For the single-argument syntax, a value of 0 is used. Unlike , the value of does not influence .

    The week number is different from what the function would return () for optional arguments or , as then returns the week in the context of the given year.

  • Sours:

    Chapter 4. Query Performance Optimization

    In the previous chapter, we explained how to optimize a schema, which is one of the necessary conditions for high performance. But working with the schema isn’t enough—you also need to design your queries well. If your queries are bad, even the best-designed schema will not perform well.

    Query optimization, index optimization, and schema optimization go hand in hand. As you gain experience writing queries in MySQL, you will come to understand how to design schemas to support efficient queries. Similarly, what you learn about optimal schema design will influence the kinds of queries you write. This process takes time, so we encourage you to refer back to this chapter and the previous one as you learn more.

    This chapter begins with general query design considerations—the things you should consider first when a query isn’t performing well. We then dig much deeper into query optimization and server internals. We show you how to find out how MySQL executes a particular query, and you’ll learn how to change the query execution plan. Finally, we look at some places MySQL doesn’t optimize queries well and explore query optimization patterns that help MySQL execute queries more efficiently.

    Our goal is to help you understand deeply how MySQL really executes queries, so you can reason about what is efficient or inefficient, exploit MySQL’s strengths, and avoid its weaknesses.

    The most basic reason a query doesn’t perform well is because it’s working with too much data. Some queries just have to sift through a lot of data and can’t be helped. That’s unusual, though; most bad queries can be changed to access less data. We’ve found it useful to analyze a poorly performing query in two steps:

    1. Find out whether your application is retrieving more data than you need. That usually means it’s accessing too many rows, but it might also be accessing too many columns.

    2. Find out whether the MySQL server is analyzing more rows than it needs.

    Are You Asking the Database for Data You Don’t Need?

    Some queries ask for more data than they need and then throw some of it away. This demands extra work of the MySQL server, adds network overhead, [36] and consumes memory and CPU resources on the application server.

    Here are a few typical mistakes:

    Fetching more rows than needed

    One common mistake is assuming that MySQL provides results on demand, rather than calculating and returning the full result set. We often see this in applications designed by people familiar with other database systems. These developers are used to techniques such as issuing a statement that returns many rows, then fetching the first rows, and closing the result set (e.g., fetching the 100 most recent articles for a news site when they only need to show 10 of them on the front page). They think MySQL will provide them with these 10 rows and stop executing the query, but what MySQL really does is generate the complete result set. The client library then fetches all the data and discards most of it. The best solution is to add a clause to the query.

    Fetching all columns from a multitable join

    If you want to retrieve all actors who appear in Academy Dinosaur, don’t write the query this way:

    mysql> -> -> ->

    That returns all columns from all three tables. Instead, write the query as follows:

    Fetching all columns

    You should always be suspicious when you see *. Do you really need all columns? Probably not. Retrieving all columns can prevent optimizations such as covering indexes, as well as adding I/O, memory, and CPU overhead for the server.

    Some DBAs ban * universally because of this fact, and to reduce the risk of problems when someone alters the table’s column list.

    Of course, asking for more data than you really need is not always bad. In many cases we’ve investigated, people tell us the wasteful approach simplifies development, as it lets the developer use the same bit of code in more than one place. That’s a reasonable consideration, as long as you know what it costs in terms of performance. It may also be useful to retrieve more data than you actually need if you use some type of caching in your application, or if you have another benefit in mind. Fetching and caching full objects may be preferable to running many separate queries that retrieve only parts of the object.

    Is MySQL Examining Too Much Data?

    Once you’re sure your queries retrieve only the data you need, you can look for queries that examine too much data while generating results. In MySQL, the simplest query cost metrics are:

    • Execution time

    • Number of rows examined

    • Number of rows returned

    None of these metrics is a perfect way to measure query cost, but they reflect roughly how much data MySQL must access internally to execute a query and translate approximately into how fast the query runs. All three metrics are logged in the slow query log, so looking at the slow query log is one of the best ways to find queries that examine too much data.

    As discussed in Chapter 2, the standard slow query logging feature in MySQL 5.0 and earlier has serious limitations, including lack of support for fine-grained logging. Fortunately, there are patches that let you log and measure slow queries with microsecond resolution. These are included in the MySQL 5.1 server, but you can also patch earlier versions if needed. Beware of placing too much emphasis on query execution time. It’s nice to look at because it’s an objective metric, but it’s not consistent under varying load conditions. Other factors—such as storage engine locks (table locks and row locks), high concurrency, and hardware—can also have a considerable impact on query execution times. This metric is useful for finding queries that impact the application’s response time the most or load the server the most, but it does not tell you whether the actual execution time is reasonable for a query of a given complexity. (Execution time can also be both a symptom and a cause of problems, and it’s not always obvious which is the case.)

    Rows examined and rows returned

    It’s useful to think about the number of rows examined when analyzing queries, because you can see how efficiently the queries are finding the data you need.

    However, like execution time, it’s not a perfect metric for finding bad queries. Not all row accesses are equal. Shorter rows are faster to access, and fetching rows from memory is much faster than reading them from disk.

    Ideally, the number of rows examined would be the same as the number returned, but in practice this is rarely possible. For example, when constructing rows with joins, multiple rows must be accessed to generate each row in the result set. The ratio of rows examined to rows returned is usually small—say, between 1:1 and 10:1—but sometimes it can be orders of magnitude larger.

    Rows examined and access types

    When you’re thinking about the cost of a query, consider the cost of finding a single row in a table. MySQL can use several access methods to find and return a row. Some require examining many rows, but others may be able to generate the result without examining any.

    The access method(s) appear in the column in ’s output. The access types range from a full table scan to index scans, range scans, unique index lookups, and constants. Each of these is faster than the one before it, because it requires reading less data. You don’t need to memorize the access types, but you should understand the general concepts of scanning a table, scanning an index, range accesses, and single-value accesses.

    If you aren’t getting a good access type, the best way to solve the problem is usually by adding an appropriate index. We discussed indexing at length in the previous chapter; now you can see why indexes are so important to query optimization. Indexes let MySQL find rows with a more efficient access type that examines less data.

    For example, let’s look at a simple query on the Sakila sample database:


    This query will return 10 rows, and shows that MySQL uses the access type on the index to execute the query:

    mysql> *************************** 1. row *************************** id: 1 select_type: SIMPLE table: film_actor type: ref possible_keys: idx_fk_film_id key: idx_fk_film_id key_len: 2 ref: const rows: 10 Extra:

    shows that MySQL estimated it needed to access only 10 rows. In other words, the query optimizer knew the chosen access type could satisfy the query efficiently. What would happen if there were no suitable index for the query? MySQL would have to use a less optimal access type, as we can see if we drop the index and run the query again:

    mysql> mysql> mysql> *************************** 1. row *************************** id: 1 select_type: SIMPLE table: film_actor type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5073 Extra: Using where

    Predictably, the access type has changed to a full table scan (), and MySQL now estimates it’ll have to examine 5,073 rows to satisfy the query. The “Using where” in the column shows that the MySQL server is using the clause to discard rows after the storage engine reads them.

    In general, MySQL can apply a clause in three ways, from best to worst:

    • Apply the conditions to the index lookup operation to eliminate nonmatching rows. This happens at the storage engine layer.

    • Use a covering index (“Using index” in the column) to avoid row accesses, and filter out nonmatching rows after retrieving each result from the index. This happens at the server layer, but it doesn’t require reading rows from the table.

    • Retrieve rows from the table, then filter nonmatching rows (“Using where” in the column). This happens at the server layer and requires the server to read rows from the table before it can filter them.

    This example illustrates how important it is to have good indexes. Good indexes help your queries get a good access type and examine only the rows they need. However, adding an index doesn’t always mean that MySQL will access and return the same number of rows. For example, here’s a query that uses the aggregate function: [37]


    This query returns only 200 rows, but it needs to read thousands of rows to build the result set. An index can’t reduce the number of rows examined for a query like this one.

    Unfortunately, MySQL does not tell you how many of the rows it accessed were used to build the result set; it tells you only the total number of rows it accessed. Many of these rows could be eliminated by a clause and end up not contributing to the result set. In the previous example, after removing the index on , the query accessed every row in the table and the clause discarded all but 10 of them. Only the remaining 10 rows were used to build the result set. Understanding how many rows the server accesses and how many it really uses requires reasoning about the query.

    If you find that a huge number of rows were examined to produce relatively few rows in the result, you can try some more sophisticated fixes:

    • Use covering indexes, which store data so that the storage engine doesn’t have to retrieve the complete rows. (We discussed these in the previous chapter.)

    • Change the schema. An example is using summary tables (discussed in the previous chapter).

    • Rewrite a complicated query so the MySQL optimizer is able to execute it optimally. (We discuss this later in this chapter.)

    As you optimize problematic queries, your goal should be to find alternative ways to get the result you want—but that doesn’t necessarily mean getting the same result set back from MySQL. You can sometimes transform queries into equivalent forms and get better performance. However, you should also think about rewriting the query to retrieve different results, if that provides an efficiency benefit. You may be able to ultimately do the same work by changing the application code as well as the query. In this section, we explain techniques that can help you restructure a wide range of queries and show you when to use each technique.

    Complex Queries Versus Many Queries

    One important query design question is whether it’s preferable to break up a complex query into several simpler queries. The traditional approach to database design emphasizes doing as much work as possible with as few queries as possible. This approach was historically better because of the cost of network communication and the overhead of the query parsing and optimization stages.

    However, this advice doesn’t apply as much to MySQL, because it was designed to handle connecting and disconnecting very efficiently and to respond to small and simple queries very quickly. Modern networks are also significantly faster than they used to be, reducing network latency. MySQL can run more than 50,000 simple queries per second on commodity server hardware and over 2,000 queries per second from a single correspondent on a Gigabit network, so running multiple queries isn’t necessarily such a bad thing.

    Connection response is still slow compared to the number of rows MySQL can traverse per second internally, though, which is counted in millions per second for in-memory data. All else being equal, it’s still a good idea to use as few queries as possible, but sometimes you can make a query more efficient by decomposing it and executing a few simple queries instead of one complex one. Don’t be afraid to do this; weigh the costs, and go with the strategy that causes less work. We show some examples of this technique a little later in the chapter.

    That said, using too many queries is a common mistake in application design. For example, some applications perform 10 single-row queries to retrieve data from a table when they could use a single 10-row query. We’ve even seen applications that retrieve each column individually, querying each row many times!

    Another way to slice up a query is to divide and conquer, keeping it essentially the same but running it in smaller “chunks” that affect fewer rows each time.

    Purging old data is a great example. Periodic purge jobs may need to remove quite a bit of data, and doing this in one massive query could lock a lot of rows for a long time, fill up transaction logs, hog resources, and block small queries that shouldn’t be interrupted. Chopping up the statement and using medium-size queries can improve performance considerably, and reduce replication lag when a query is replicated. For example, instead of running this monolithic query:


    you could do something like the following pseudocode:

    rows_affected = 0 do { rows_affected = do_query( "DELETE FROM messages WHERE created < DATE_SUB(NOW(),INTERVAL 3 MONTH) LIMIT 10000") } while rows_affected > 0

    Deleting 10,000 rows at a time is typically a large enough task to make each query efficient, and a short enough task to minimize the impact on the server [38] (transactional storage engines may benefit from smaller transactions). It may also be a good idea to add some sleep time between the statements to spread the load over time and reduce the amount of time locks are held.

    Many high-performance web sites use join decomposition. You can decompose a join by running multiple single-table queries instead of a multitable join, and then performing the join in the application. For example, instead of this single query:

    mysql> -> -> ->

    You might run these queries:

    mysql> mysql> mysql>

    This looks wasteful at first glance, because you’ve increased the number of queries without getting anything in return. However, such restructuring can actually give significant performance advantages:

    • Caching can be more efficient. Many applications cache “objects” that map directly to tables. In this example, if the object with the tag is already cached, the application can skip the first query. If you find posts with an id of 123, 567, or 9098 in the cache, you can remove them from the list. The query cache might also benefit from this strategy. If only one of the tables changes frequently, decomposing a join can reduce the number of cache invalidations.

    • For MyISAM tables, performing one query per table uses table locks more efficiently: the queries will lock the tables individually and relatively briefly, instead of locking them all for a longer time.

    • Doing joins in the application makes it easier to scale the database by placing tables on different servers.

    • The queries themselves can be more efficient. In this example, using an list instead of a join lets MySQL sort row IDs and retrieve rows more optimally than might be possible with a join. We explain this in more detail later.

    • You can reduce redundant row accesses. Doing a join in the application means you retrieve each row only once, whereas a join in the query is essentially a denormalization that might repeatedly access the same data. For the same reason, such restructuring might also reduce the total network traffic and memory usage.

    • To some extent, you can view this technique as manually implementing a hash join instead of the nested loops algorithm MySQL uses to execute a join. A hash join may be more efficient. (We discuss MySQL’s join strategy later in this chapter.)

    If you need to get high performance from your MySQL server, one of the best ways to invest your time is in learning how MySQL optimizes and executes queries. Once you understand this, much of query optimization is simply a matter of reasoning from principles, and query optimization becomes a very logical process.


    This discussion assumes you’ve read Chapter 1, which provides a foundation for understanding the MySQL query execution engine.

    Figure 4-1 shows how MySQL generally executes queries.

    Follow along with the illustration to see what happens when you send MySQL a query:

    1. The client sends the SQL statement to the server.

    2. The server checks the query cache. If there’s a hit, it returns the stored result from the cache; otherwise, it passes the SQL statement to the next step.

    3. The server parses, preprocesses, and optimizes the SQL into a query execution plan.

    4. The query execution engine executes the plan by making calls to the storage engine API.

    5. The server sends the result to the client.

    Each of these steps has some extra complexity, which we discuss in the following sections. We also explain which states the query will be in during each step. The query optimization process is particularly complex and important to understand.

    Figure 4-1. Execution path of a query

    The MySQL Client/Server Protocol

    Though you don’t need to understand the inner details of MySQL’s client/server protocol, you do need to understand how it works at a high level. The protocol is half-duplex, which means that at any given time the MySQL server can be either sending or receiving messages, but not both. It also means there is no way to cut a message short.

    This protocol makes MySQL communication simple and fast, but it limits it in some ways too. For one thing, it means there’s no flow control; once one side sends a message, the other side must fetch the entire message before responding. It’s like a game of tossing a ball back and forth: only one side has the ball at any instant, and you can’t toss the ball (send a message) unless you have it.

    The client sends a query to the server as a single packet of data. This is why the configuration variable is important if you have large queries. [39] Once the client sends the query, it doesn’t have the ball anymore; it can only wait for results.

    In contrast, the response from the server usually consists of many packets of data. When the server responds, the client has to receive the entire result set. It cannot simply fetch a few rows and then ask the server not to bother sending the rest. If the client needs only the first few rows that are returned, it either has to wait for all of the server’s packets to arrive and then discard the ones it doesn’t need, or disconnect ungracefully. Neither is a good idea, which is why appropriate clauses are so important.

    Here’s another way to think about this: when a client fetches rows from the server, it thinks it’s pulling them. But the truth is, the MySQL server is pushing the rows as it generates them. The client is only receiving the pushed rows; there is no way for it to tell the server to stop sending rows. The client is “drinking from the fire hose,” so to speak. (Yes, that’s a technical term.)

    Most libraries that connect to MySQL let you either fetch the whole result set and buffer it in memory, or fetch each row as you need it. The default behavior is generally to fetch the whole result and buffer it in memory. This is important because until all the rows have been fetched, the MySQL server will not release the locks and other resources required by the query. The query will be in the “Sending data” state (explained in “Query states” on Query states). When the client library fetches the results all at once, it reduces the amount of work the server needs to do: the server can finish and clean up the query as quickly as possible.

    Most client libraries let you treat the result set as though you’re fetching it from the server, although in fact you’re just fetching it from the buffer in the library’s memory. This works fine most of the time, but it’s not a good idea for huge result sets that might take a long time to fetch and use a lot of memory. You can use less memory, and start working on the result sooner, if you instruct the library not to buffer the result. The downside is that the locks and other resources on the server will remain open while your application is interacting with the library. [40]

    Let’s look at an example using PHP. First, here’s how you’ll usually query MySQL from PHP:

    <?php $link = mysql_connect('localhost', 'user', 'p4ssword'); $result = mysql_query('SELECT * FROM HUGE_TABLE', $link); while ( $row = mysql_fetch_array($result) ) { // Do something with result } ?>

    The code seems to indicate that you fetch rows only when you need them, in the loop. However, the code actually fetches the entire result into a buffer with the function call. The loop simply iterates through the buffer. In contrast, the following code doesn’t buffer the results, because it uses instead of :

    <?php $link = mysql_connect('localhost', 'user', 'p4ssword'); $result = mysql_unbuffered_query('SELECT * FROM HUGE_TABLE', $link); while ( $row = mysql_fetch_array($result) ) { // Do something with result } ?>

    Programming languages have different ways to override buffering. For example, the Perl driver requires you to specify the C client library’s attribute (the default is ). Here’s an example:

    #!/usr/bin/perl use DBI; my $dbh = DBI->connect('DBI:mysql:;host=localhost', 'user', 'p4ssword'); my $sth = $dbh->prepare('SELECT * FROM HUGE_TABLE', { mysql_use_result => 1 }); $sth->execute(); while ( my $row = $sth->fetchrow_array() ) { # Do something with result }

    Notice that the call to specified to “use” the result instead of “buffering” it. You can also specify this when connecting, which will make every statement unbuffered:

    my $dbh = DBI->connect('DBI:mysql:;mysql_use_result=1', 'user', 'p4ssword');

    Each MySQL connection, or thread, has a state that shows what it is doing at any given time. There are several ways to view these states, but the easiest is to use the command (the states appear in the column). As a query progresses through its lifecycle, its state changes many times, and there are dozens of states. The MySQL manual is the authoritative source of information for all the states, but we list a few here and explain what they mean:

    The thread is waiting for a new query from the client.

    The thread is either executing the query or sending the result back to the client.

    The thread is waiting for a table lock to be granted at the server level. Locks that are implemented by the storage engine, such as InnoDB’s row locks, do not cause the thread to enter the state.


    The thread is checking storage engine statistics and optimizing the query.

    The thread is processing the query and copying results to a temporary table, probably for a , for a filesort, or to satisfy a . If the state ends with “on disk,” MySQL is converting an in-memory table to an on-disk table.

    The thread is sorting a result set.

    This can mean several things: the thread might be sending data between stages of the query, generating the result set, or returning the result set to the client.

    It’s helpful to at least know the basic states, so you can get a sense of “who has the ball” for the query. On very busy servers, you might see an unusual or normally brief state, such as , begin to take a significant amount of time. This usually indicates that something is wrong.

    Before even parsing a query, MySQL checks for it in the query cache, if the cache is enabled. This operation is a case sensitive hash lookup. If the query differs from a similar query in the cache by even a single byte, it won’t match, and the query processing will go to the next stage.

    If MySQL does find a match in the query cache, it must check privileges before returning the cached query. This is possible without parsing the query, because MySQL stores table information with the cached query. If the privileges are OK, MySQL retrieves the stored result from the query cache and sends it to the client, bypassing every other stage in query execution. The query is never parsed, optimized, or executed.

    You can learn more about the query cache in Chapter 5.

    The Query Optimization Process

    The next step in the query lifecycle turns a SQL query into an execution plan for the query execution engine. It has several sub-steps: parsing, preprocessing, and optimization. Errors (for example, syntax errors) can be raised at any point in the process. We’re not trying to document the MySQL internals here, so we’re going to take some liberties, such as describing steps separately even though they’re often combined wholly or partially for efficiency. Our goal is simply to help you understand how MySQL executes queries so that you can write better ones.

    The parser and the preprocessor

    To begin, MySQL’s parser breaks the query into tokens and builds a “parse tree” from them. The parser uses MySQL’s SQL grammar to interpret and validate the query. For instance, it ensures that the tokens in the query are valid and in the proper order, and it checks for mistakes such as quoted strings that aren’t terminated.

    The preprocessor then checks the resulting parse tree for additional semantics that the parser can’t resolve. For example, it checks that tables and columns exist, and it resolves names and aliases to ensure that column references aren’t ambiguous.

    Next, the preprocessor checks privileges. This is normally very fast unless your server has large numbers of privileges. (See Chapter 12 for more on privileges and security.)

    The parse tree is now valid and ready for the optimizer to turn it into a query execution plan. A query can often be executed many different ways and produce the same result. The optimizer’s job is to find the best option.

    MySQL uses a cost-based optimizer, which means it tries to predict the cost of various execution plans and choose the least expensive. The unit of cost is a single random four-kilobyte data page read. You can see how expensive the optimizer estimated a query to be by running the query, then inspecting the session variable:

    mysql> +----------+ | count(*) | +----------+ | 5462 | +----------+ mysql> +-----------------+-------------+ | Variable_name | Value | +-----------------+-------------+ | Last_query_cost | 1040.599000 | +-----------------+-------------+

    This result means that the optimizer estimated it would need to do about 1,040 random data page reads to execute the query. It bases the estimate on statistics: the number of pages per table or index, the cardinality (number of distinct values) of indexes, the length of rows and keys, and key distribution. The optimizer does not include the effects of any type of caching in its estimates—it assumes every read will result in a disk I/O operation.

    The optimizer may not always choose the best plan, for many reasons:

    • The statistics could be wrong. The server relies on storage engines to provide statistics, and they can range from exactly correct to wildly inaccurate. For example, the InnoDB storage engine doesn’t maintain accurate statistics about the number of rows in a table, because of its MVCC architecture.

    • The cost metric is not exactly equivalent to the true cost of running the query, so even when the statistics are accurate, the query may be more or less expensive than MySQL’s approximation. A plan that reads more pages might actually be cheaper in some cases, such as when the reads are sequential so the disk I/O is faster, or when the pages are already cached in memory.

    • MySQL’s idea of optimal might not match yours. You probably want the fastest execution time, but MySQL doesn’t really understand “fast”; it understands “cost,” and as we’ve seen, determining cost is not an exact science.

    • MySQL doesn’t consider other queries that are running concurrently, which can affect how quickly the query runs.

    • MySQL doesn’t always do cost-based optimization. Sometimes it just follows the rules, such as “if there’s a full-text clause, use a index if one exists.” It will do this even when it would be faster to use a different index and a non query with a clause.

    • The optimizer doesn’t take into account the cost of operations not under its control, such as executing stored functions or user-defined functions.

    • As we’ll see later, the optimizer can’t always estimate every possible execution plan, so it may miss an optimal plan.

    MySQL’s query optimizer is a highly complex piece of software, and it uses many optimizations to transform the query into an execution plan. There are two basic types of optimizations, which we call static and dynamic. Static optimizations can be performed simply by inspecting the parse tree. For example, the optimizer can transform the clause into an equivalent form by applying algebraic rules. Static optimizations are independent of values, such as the value of a constant in a clause. They can be performed once and will always be valid, even when the query is reexecuted with different values. You can think of these as “compile-time optimizations.”

    In contrast, dynamic optimizations are based on context and can depend on many factors, such as which value is in a clause or how many rows are in an index. They must be reevaluated each time the query is executed. You can think of these as “runtime optimizations.”

    The difference is important in executing prepared statements or stored procedures. MySQL can do static optimizations once, but it must reevaluate dynamic optimizations every time it executes a query. MySQL sometimes even reoptimizes the query as it executes it. [41]

    Here are some types of optimizations MySQL knows how to do:

    Reordering joins

    Tables don’t always have to be joined in the order you specify in the query. Determining the best join order is an important optimization; we explain it in depth in “The join optimizer” on The join optimizer.

    Convertings to s

    An doesn’t necessarily have to be executed as an . Some factors, such as the clause and table schema, can actually cause an to be equivalent to an . MySQL can recognize this and rewrite the join, which makes it eligible for reordering.

    Applying algebraic equivalence rules

    MySQL applies algebraic transformations to simplify and canonicalize expressions. It can also fold and reduce constants, eliminating impossible constraints and constant conditions. For example, the term will reduce to just . Similarly, becomes . These rules are very useful for writing conditional queries, which we discuss later in the chapter.

    , , andoptimizations

    Indexes and column nullability can often help MySQL optimize away these expressions. For example, to find the minimum value of a column that’s leftmost in a B-Tree index, MySQL can just request the first row in the index. It can even do this in the query optimization stage, and treat the value as a constant for the rest of the query. Similarly, to find the maximum value in a B-Tree index, the server reads the last row. If the server uses this optimization, you’ll see “Select tables optimized away” in the plan. This literally means the optimizer has removed the table from the query plan and replaced it with a constant.

    Likewise, queries without a clause can often be optimized away on some storage engines (such as MyISAM, which keeps an exact count of rows in the table at all times). See “Optimizing COUNT() Queries” on Optimizing Specific Types of Queries for details.

    Evaluating and reducing constant expressions

    When MySQL detects that an expression can be reduced to a constant, it will do so during optimization. For example, a user-defined variable can be converted to a constant if it’s not changed in the query. Arithmetic expressions are another example.

    Perhaps surprisingly, even something you might consider to be a query can be reduced to a constant during the optimization phase. One example is a on an index. This can even be extended to a constant lookup on a primary key or unique index. If a clause applies a constant condition to such an index, the optimizer knows MySQL can look up the value at the beginning of the query. It will then treat the value as a constant in the rest of the query. Here’s an example:

    mysql> -> -> -> +----+-------------+------------+-------+----------------+-------+------+ | id | select_type | table | type | key | ref | rows | +----+-------------+------------+-------+----------------+-------+------+ | 1 | SIMPLE | film | const | PRIMARY | const | 1 | | 1 | SIMPLE | film_actor | ref | idx_fk_film_id | const | 10 | +----+-------------+------------+-------+----------------+-------+------+

    MySQL executes this query in two steps, which correspond to the two rows in the output. The first step is to find the desired row in the table. MySQL’s optimizer knows there is only one row, because there’s a primary key on the column, and it has already consulted the index during the query optimization stage to see how many rows it will find. Because the query optimizer has a known quantity (the value in the clause) to use in the lookup, this table’s type is .

    In the second step, MySQL treats the column from the row found in the first step as a known quantity. It can do this because the optimizer knows that by the time the query reaches the second step, it will know all the values from the first step. Notice that the table’s type is , just as the table’s was.

    Another way you’ll see constant conditions applied is by propagating a value’s constant-ness from one place to another if there is a , , or clause that restricts them to being equal. In this example, the optimizer knows that the clause forces to have the same value everywhere in the query—it must be equal to the constant value given in the clause.

    Covering indexes

    MySQL can sometimes use an index to avoid reading row data, when the index contains all the columns the query needs. We discussed covering indexes at length in Chapter 3.

    Subquery optimization

    MySQL can convert some types of subqueries into more efficient alternative forms, reducing them to index lookups instead of separate queries.

    Early termination

    MySQL can stop processing a query (or a step in a query) as soon as it fulfills the query or step. The obvious case is a clause, but there are several other kinds of early termination. For instance, if MySQL detects an impossible condition, it can abort the entire query. You can see this in the following example:

    mysql> +----+...+-----------------------------------------------------+ | id |...| Extra | +----+...+-----------------------------------------------------+ | 1 |...| Impossible WHERE noticed after reading const tables | +----+...+-----------------------------------------------------+

    This query stopped during the optimization step, but MySQL can also terminate execution sooner in some cases. The server can use this optimization when the query execution engine recognizes the need to retrieve distinct values, or to stop when a value doesn’t exist. For example, the following query finds all movies without any actors: [42]

    mysql> -> -> ->

    This query works by eliminating any films that have actors. Each film might have many actors, but as soon as it finds one actor, it stops processing the current film and moves to the next one because it knows the clause prohibits outputting that film. A similar “Distinct/not-exists” optimization can apply to certain kinds of , and queries.

    Equality propagation

    MySQL recognizes when a query holds two columns as equal—for example, in a condition—and propagates clauses across equivalent columns. For instance, in the following query:

    mysql> -> -> ->

    MySQL knows that the clause applies not only to the table but to the table as well, because the clause forces the two columns to match.

    If you’re used to another database server that can’t do this, you may have been advised to “help the optimizer” by manually specifying the clause for both tables, like this:

    ... WHERE film.film_id > 500 AND film_actor.film_id > 500

    This is unnecessary in MySQL. It just makes your queries harder to maintain.

    list comparisons

    In many database servers, is just a synonym for multiple clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the list and uses a fast binary search to see whether a value is in the list. This is O(log n) in the size of the list, whereas an equivalent series of clauses is O(n) in the size of the list (i.e., much slower for large lists).

    The preceding list is woefully incomplete, as MySQL performs more optimizations than we could fit into this entire chapter, but it should give you an idea of the optimizer’s complexity and intelligence. If there’s one thing you should take away from this discussion, it’s don’t try to outsmart the optimizer. You may end up just defeating it, or making your queries more complicated and harder to maintain for zero benefit. In general, you should let the optimizer do its work.

    Of course, as smart as the optimizer is, there are times when it doesn’t give the best result. Sometimes you may know something about the data that the optimizer doesn’t, such as a fact that’s guaranteed to be true because of application logic. Also, sometimes the optimizer doesn’t have the necessary functionality, such as hash indexes; at other times, as mentioned earlier, its cost estimates may prefer a query plan that turns out to be more expensive than an alternative.

    If you know the optimizer isn’t giving a good result, and you know why, you can help it. Some of the options are to add a hint to the query, rewrite the query, redesign your schema, or add indexes.

    Table and index statistics

    Recall the various layers in the MySQL server architecture, which we illustrated in Figure 1-1. The server layer, which contains the query optimizer, doesn’t store statistics on data and indexes. That’s a job for the storage engines, because each storage engine might keep different kinds of statistics (or keep them in a different way). Some engines, such as Archive, don’t keep statistics at all!

    Because the server doesn’t store statistics, the MySQL query optimizer has to ask the engines for statistics on the tables in a query. The engines may provide the optimizer with statistics such as the number of pages per table or index, the cardinality of tables and indexes, the length of rows and keys, and key distribution information. The optimizer can use this information to help it decide on the best execution plan. We see how these statistics influence the optimizer’s choices in later sections.

    MySQL’s join execution strategy

    MySQL uses the term “join” more broadly than you might be used to. In sum, it considers every query a join—not just every query that matches rows from two tables, but every query, period (including subqueries, and even a against a single table). Consequently, it’s very important to understand how MySQL executes joins.

    Consider the example of a query. MySQL executes a as a series of single queries whose results are spooled into a temporary table, then read out again. Each of the individual queries is a join, in MySQL terminology—and so is the act of reading from the resulting temporary table.

    At the moment, MySQL’s join execution strategy is simple: it treats every join as a nested-loop join. This means MySQL runs a loop to find a row from a table, then runs a nested loop to find a matching row in the next table. It continues until it has found a matching row in each table in the join. It then builds and returns a row from the columns named in the list. It tries to build the next row by looking for more matching rows in the last table. If it doesn’t find any, it backtracks one table and looks for more rows there. It keeps backtracking until it finds another row in some table, at which point, it looks for a matching row in the next table, and so on. [43]

    This process of finding rows, probing into the next table, and then backtracking can be written as nested loops in the execution plan—hence the name “nested-loop join.” As an example, consider this simple query:

    mysql> -> ->

    Assuming MySQL decides to join the tables in the order shown in the query, the following pseudocode shows how MySQL might execute the query:

    outer_iter = iterator over tbl1 where col1 IN(5,6) outer_row = while outer_row inner_iter = iterator over tbl2 where col3 = outer_row.col3 inner_row = while inner_row output [ outer_row.col1, inner_row.col2 ] inner_row = end outer_row = end

    This query execution plan applies as easily to a single-table query as it does to a many-table query, which is why even a single-table query can be considered a join—the single-table join is the basic operation from which more complex joins are composed. It can support s, too. For example, let’s change the example query as follows:

    mysql> -> ->

    Here’s the corresponding pseudocode, with the changed parts in bold:

    outer_iter = iterator over tbl1 where col1 IN(5,6) outer_row = while outer_row inner_iter = iterator over tbl2 where col3 = outer_row.col3 inner_row = while inner_row output [ outer_row.col1, inner_row.col2 ] inner_row = end outer_row = end

    Another way to visualize a query execution plan is to use what the optimizer folks call a “swim-lane diagram.” Figure 4-2 contains a swim-lane diagram of our initial query. Read it from left to right and top to bottom.

    Figure 4-2. Swim-lane diagram illustrating retrieving rows using a join

    MySQL executes every kind of query in essentially the same way. For example, it handles a subquery in the clause by executing it first, putting the results into a temporary table, [44] and then treating that table just like an ordinary table (hence the name “derived table”). MySQL executes queries with temporary tables too, and it rewrites all queries to equivalent . In short, MySQL coerces every kind of query into this execution plan.

    It’s not possible to execute every legal SQL query this way, however. For example, a can’t be executed with nested loops and backtracking as soon as a table with no matching rows is found, because it might begin with a table that has no matching rows. This explains why MySQL doesn’t support . Still other queries can be executed with nested loops, but perform very badly as a result. We look at some of those later.

    MySQL doesn’t generate byte-code to execute a query, as many other database products do. Instead, the query execution plan is actually a tree of instructions that the query execution engine follows to produce the query results. The final plan contains enough information to reconstruct the original query. If you execute on a query, followed by , you’ll see the reconstructed query. [45]

    Any multitable query can conceptually be represented as a tree. For example, it might be possible to execute a four-table join as shown in Figure 4-3.

    Figure 4-3. One way to join multiple tables

    This is what computer scientists call a balanced tree. This is not how MySQL executes the query, though. As we described in the previous section, MySQL always begins with one table and finds matching rows in the next table. Thus, MySQL’s query execution plans always take the form of a left-deep tree, as in Figure 4-4.

    Figure 4-4. How MySQL joins multiple tables

    The most important part of the MySQL query optimizer is the join optimizer, which decides the best order of execution for multitable queries. It is often possible to join the tables in several different orders and get the same results. The join optimizer estimates the cost for various plans and tries to choose the least expensive one that gives the same result.

    Here’s a query whose tables can be joined in different orders without changing the results:

    mysql> -> -> -> ->

    You can probably think of a few different query plans. For example, MySQL could begin with the table, use the index on in the table to find values, and then look up rows in the table’s primary key. This should be efficient, right? Now let’s use to see how MySQL wants to execute the query:

    *************************** 1. row *************************** id: 1 select_type: SIMPLE table: actor type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 200 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: film_actor type: ref possible_keys: PRIMARY,idx_fk_film_id key: PRIMARY key_len: 2 ref: rows: 1 Extra: Using index *************************** 3. row *************************** id: 1 select_type: SIMPLE table: film type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 2 ref: sakila.film_actor.film_id rows: 1 Extra:

    This is quite a different plan from the one suggested in the previous paragraph. MySQL wants to start with the , table (we know this because it’s listed first in the output) and go in the reverse order. Is this really more efficient? Let’s find out. The keyword forces the join to proceed in the order specified in the query. Here’s the output for the revised query:

    mysql> *************************** 1. row *************************** id: 1 select_type: SIMPLE table: film type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 951 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: film_actor type: ref possible_keys: PRIMARY,idx_fk_film_id key: idx_fk_film_id key_len: 2 ref: rows: 1 Extra: Using index *************************** 3. row *************************** id: 1 select_type: SIMPLE table: actor type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 2 ref: sakila.film_actor.actor_id rows: 1 Extra:

    This shows why MySQL wants to reverse the join order: doing so will enable it to examine fewer rows in the first table. [46] In both cases, it will be able to perform fast indexed lookups in the second and third tables. The difference is how many of these indexed lookups it will have to do:

    • Placing first will require about 951 probes into and , one for each row in the first table.

    • If the server scans the table first, it will have to do only 200 index lookups into later tables.

    In other words, the reversed join order will require less backtracking and rereading. To double-check the optimizer’s choice, we executed the two query versions and looked at the variable for each. The reordered query had an estimated cost of 241, while the estimated cost of forcing the join order was 1,154.

    This is a simple example of how MySQL’s join optimizer can reorder queries to make them less expensive to execute. Reordering joins is usually a very effective optimization. There are times when it won’t result in an optimal plan, and for those times you can use and write the query in the order you think is best—but such times are rare. In most cases, the join optimizer will outperform a human.

    The join optimizer tries to produce a query execution plan tree with the lowest achievable cost. When possible, it examines all potential combinations of subtrees, beginning with all one-table plans.

    Unfortunately, a join over n tables will have n-factorial combinations of join orders to examine. This is called the search space of all possible query plans, and it grows very quickly—a 10-table join can be executed up to 3,628,800 different ways! When the search space grows too large, it can take far too long to optimize the query, so the server stops doing a full analysis. Instead, it resorts to shortcuts such as “greedy” searches when the number of tables exceeds the limit.

    MySQL has many heuristics, accumulated through years of research and experimentation, that it uses to speed up the optimization stage. This can be beneficial, but it can also mean that MySQL may (on rare occasions) miss an optimal plan and choose a less optimal one because it’s trying not to examine every possible query plan.

    Sometimes queries can’t be reordered, and the join optimizer can use this fact to reduce the search space by eliminating choices. A is a good example, as are correlated subqueries (more about subqueries later). This is because the results for one table depend on data retrieved from another table. These dependencies help the join optimizer reduce the search space by eliminating choices.

    Sorting results can be a costly operation, so you can often improve performance by avoiding sorts or by performing them on fewer rows.

    We showed you how to use indexes for sorting in Chapter 3. When MySQL can’t use an index to produce a sorted result, it must sort the rows itself. It can do this in memory or on disk, but it always calls this process a filesort, even if it doesn’t actually use a file.

    If the values to be sorted will fit into the sort buffer, MySQL can perform the sort entirely in memory with a quicksort. If MySQL can’t do the sort in memory, it performs it on disk by sorting the values in chunks. It uses a quicksort to sort each chunk and then merges the sorted chunk into the results.

    There are two filesort algorithms:

    Two passes (old)

    Reads row pointers and columns, sorts them, and then scans the sorted list and rereads the rows for output.

    The two-pass algorithm can be quite expensive, because it reads the rows from the table twice, and the second read causes a lot of random I/O. This is especially expensive for MyISAM, which uses a system call to fetch each row (because MyISAM relies on the operating system’s cache to hold the data). On the other hand, it stores a minimal amount of data during the sort, so if the rows to be sorted are completely in memory, it can be cheaper to store less data and reread the rows to generate the final result.

    Single pass (new)

    Reads all the columns needed for the query, sorts them by the columns, and then scans the sorted list and outputs the specified columns.

    This algorithm is available only in MySQL 4.1 and newer. It can be much more efficient, especially on large I/O-bound datasets, because it avoids reading the rows from the table twice and trades random I/O for more sequential I/O. However, it has the potential to use a lot more space, because it holds all desired columns from each row, not just the columns needed to sort the rows. This means fewer tuples will fit into the sort buffer, and the filesort will have to perform more sort merge passes.

    MySQL may use much more temporary storage space for a filesort than you’d expect, because it allocates a fixed-size record for each tuple it will sort. These records are large enough to hold the largest possible tuple, including the full length of each column. Also, if you’re using UTF-8, MySQL allocates three bytes for each character. As a result, we’ve seen cases where poorly optimized schemas caused the temporary space used for sorting to be many times larger than the entire table’s size on disk.

    When sorting a join, MySQL may perform the filesort at two stages during the query execution. If the clause refers only to columns from the first table in the join order, MySQL can filesort this table and then proceed with the join. If this happens, shows “Using filesort” in the column. Otherwise, MySQL must store the query’s results into a temporary table and then filesort the temporary table after the join finishes. In this case, shows “Using temporary; Using filesort” in the column. If there’s a , it is applied after the filesort, so the temporary table and the filesort can be very large.

    See “Optimizing for Filesorts” on Optimizing for Filesorts for more on how to tune the server for filesorts and how to influence which algorithm the server uses.

    The Query Execution Engine

    The parsing and optimizing stage outputs a query execution plan, which MySQL’s query execution engine uses to process the query. The plan is a data structure; it is not executable byte-code, which is how many other databases execute queries.

    In contrast to the optimization stage, the execution stage is usually not all that complex: MySQL simply follows the instructions given in the query execution plan. Many of the operations in the plan invoke methods implemented by the storage engine interface, also known as the handler API. Each table in the query is represented by an instance of a handler. If a table appears three times in the query, for example, the server creates three handler instances. Though we glossed over this before, MySQL actually creates the handler instances early in the optimization stage. The optimizer uses them to get information about the tables, such as their column names and index statistics.

    The storage engine interface has lots of functionality, but it needs only a dozen or so “building-block” operations to execute most queries. For example, there’s an operation to read the first row in an index, and one to read the next row in an index. This is enough for a query that does an index scan. This simplistic execution method makes MySQL’s storage engine architecture possible, but it also imposes some of the optimizer limitations we’ve discussed.


    Not everything is a handler operation. For example, the server manages table locks. The handler may implement its own lower-level locking, as InnoDB does with row-level locks, but this does not replace the server’s own locking implementation. As explained in Chapter 1, anything that all storage engines share is implemented in the server, such as date and time functions, views, and triggers.

    To execute the query, the server just repeats the instructions until there are no more rows to examine.

    Returning Results to the Client

    The final step in executing a query is to reply to the client. Even queries that don’t return a result set still reply to the client connection with information about the query, such as how many rows it affected.

    If the query is cacheable, MySQL will also place the results into the query cache at this stage.

    The server generates and sends results incrementally. Think back to the single-sweep multijoin method we mentioned earlier. As soon as MySQL processes the last table and generates one row successfully, it can and should send that row to the client.

    This has two benefits: it lets the server avoid holding the row in memory, and it means the client starts getting the results as soon as possible. [47]

    MySQL’s “everything is a nested-loop join” approach to query execution isn’t ideal for optimizing every kind of query. Fortunately, there are only a limited number of cases where the MySQL query optimizer does a poor job, and it’s usually possible to rewrite such queries more efficiently.


    The information in this section applies to the MySQL server versions to which we have access at the time of this writing—that is, up to MySQL 5.1. Some of these limitations will probably be eased or removed entirely in future versions, and some have already been fixed in versions not yet released as GA (generally available). In particular, there are a number of subquery optimizations in the MySQL 6 source code, and more are in progress.

    MySQL sometimes optimizes subqueries very badly. The worst offenders are subqueries in the clause. As an example, let’s find all films in the Sakila sample database’s table whose casts include the actress Penelope Guiness (). This feels natural to write with a subquery, as follows:

    mysql> -> ->

    It’s tempting to think that MySQL will execute this query from the inside out, by finding a list of values and substituting them into the list. We said an list is generally very fast, so you might expect the query to be optimized to something like this:

    -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);

    Unfortunately, exactly the opposite happens. MySQL tries to “help” the subquery by pushing a correlation into it from the outer table, which it thinks will let the subquery find rows more efficiently. It rewrites the query as follows:

    SELECT * FROM WHERE ( SELECT * FROM sakila.film_actor WHERE actor_id = 1

    Now the subquery requires the from the outer table and can’t be executed first. shows the result as (you can use to see exactly how the query is rewritten):

    mysql> +----+--------------------+------------+--------+------------------------+ | id | select_type | table | type | possible_keys | +----+--------------------+------------+--------+------------------------+ | 1 | PRIMARY | film | ALL | NULL | | 2 | DEPENDENT SUBQUERY | film_actor | eq_ref | PRIMARY,idx_fk_film_id | +----+--------------------+------------+--------+------------------------+

    According to the output, MySQL will table-scan the table and execute the subquery for each row it finds. This won’t cause a noticeable performance hit on small tables, but if the outer table is very large, the performance will be extremely bad. Fortunately, it’s easy to rewrite such a query as a :

    mysql> -> ->

    Another good optimization is to manually generate the list by executing the subquery as a separate query with . Sometimes this can be faster than a .

    MySQL has been criticized thoroughly for this particular type of subquery execution plan. Although it definitely needs to be fixed, the criticism often confuses two different issues: execution order and caching. Executing the query from the inside out is one way to optimize it; caching the inner query’s result is another. Rewriting the query yourself lets you take control over both aspects. Future versions of MySQL should be able to optimize this type of query much better, although this is no easy task. There are very bad worst cases for any execution plan, including the inside-out execution plan that some people think would be simple to optimize.

    When a correlated subquery is good

    MySQL doesn’t always optimize correlated subqueries badly. If you hear advice to always avoid them, don’t listen! Instead, benchmark and make your own decision. Sometimes a correlated subquery is a perfectly reasonable, or even optimal, way to get a result. Let’s look at an example:

    mysql> -> -> -> -> *************************** 1. row *************************** id: 1 select_type: PRIMARY table: film type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 951 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: film_actor type: ref possible_keys: idx_fk_film_id key: idx_fk_film_id key_len: 2 ref: film.film_id rows: 2 Extra: Using where; Using index

    The standard advice for this query is to write it as a instead of using a subquery. In theory, MySQL’s execution plan will be essentially the same either way. Let’s see:

    mysql> -> -> -> *************************** 1. row *************************** id: 1 select_type: SIMPLE table: film type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 951 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: film_actor type: ref possible_keys: idx_fk_film_id key: idx_fk_film_id key_len: 2 ref: rows: 2 Extra: Using where; Using index;

    The plans are nearly identical, but there are some differences:

    • The type against is in one query and in the other. This difference simply reflects the syntax, because the first query uses a subquery and the second doesn’t. It doesn’t make much difference in terms of handler operations.

    • The second query doesn’t say “Using where” in the column for the table. That doesn’t matter, though: the second query’s clause is the same thing as a clause anyway.

    • The second query says “Not exists” in the table’s column. This is an example of the early-termination algorithm we mentioned earlier in this chapter. It means MySQL is using a not-exists optimization to avoid reading more than one row in the table’s index. This is equivalent to a correlated subquery, because it stops processing the current row as soon as it finds a match.

    So, in theory, MySQL will execute the queries almost identically. In reality, benchmarking is the only way to tell which approach is really faster. We benchmarked both queries on our standard setup. The results are shown in Table 4-1.

    Table 4-1. NOT EXISTS versus LEFT OUTER JOIN


    Result in queries per second (QPS)


    360 QPS

    425 QPS

    Our benchmark found that the subquery is quite a bit slower!

    However, this isn’t always the case. Sometimes a subquery can be faster. For example, it can work well when you just want to see rows from one table that match rows in another table. Although that sounds like it describes a join perfectly, it’s not always the same thing. The following join, which is designed to find every film that has an actor, will return duplicates because some films have multiple actors:

    mysql> ->

    We need to use or to eliminate the duplicates:

    mysql> ->

    But what are we really trying to express with this query, and is it obvious from the SQL? The operator expresses the logical concept of “has a match” without producing duplicated rows and avoids a or operation, which might require a temporary table. Here’s the query written as a subquery instead of a join:

    mysql> -> ->

    Again, we benchmarked to see which strategy was faster. The results are shown in Table 4-2.

    Table 4-2. EXISTS versus INNER JOIN


    Result in queries per second (QPS)

    185 QPS


    325 QPS

    In this example, the subquery performs much faster than the join.

    We showed this lengthy example to illustrate two points: you should not heed categorical advice about subqueries, and you should use benchmarks to prove your assumptions about query plans and execution speed.

    MySQL sometimes can’t “push down” conditions from the outside of a to the inside, where they could be used to limit results or enable additional optimizations.

    If you think any of the individual queries inside a would benefit from a , or if you know they’ll be subject to an clause once combined with other queries, you need to put those clauses inside each part of the . For example, if you together two huge tables and the result to the first 20 rows, MySQL will store both huge tables into a temporary table and then retrieve just 20 rows from it. You can avoid this by placing on each query inside the .

    Index merge optimizations

    Index merge algorithms, introduced in MySQL 5.0, let MySQL use more than one index per table in a query. Earlier versions of MySQL could use only a single index, so when no single index was good enough to help with all the restrictions in the clause, MySQL often chose a table scan. For example, the table has an index on and an index on , but neither is a good choice for both conditions in this query:

    mysql> ->

    In older MySQL versions, that query would produce a table scan unless you wrote it as the of two queries:

    mysql> -> -> ->

    In MySQL 5.0 and newer, however, the query can use both indexes, scanning them simultaneously and merging the results. There are three variations on the algorithm: union for conditions, intersection for conditions, and unions of intersections for combinations of the two. The following query uses a union of two index scans, as you can see by examining the column:

    mysql> -> *************************** 1. row *************************** id: 1 select_type: SIMPLE table: film_actor type: index_merge possible_keys: PRIMARY,idx_fk_film_id key: PRIMARY,idx_fk_film_id key_len: 2,2 ref: NULL rows: 29 Extra: Using where

    MySQL can use this technique on complex clauses, so you may see nested operations in the column for some queries. This often works very well, but sometimes the algorithm’s buffering, sorting, and merging operations use lots of CPU and memory resources. This is especially true if not all of the indexes are very selective, so the parallel scans return lots of rows to the merge operation. Recall that the optimizer doesn’t account for this cost—it optimizes just the number of random page reads. This can make it “underprice” the query, which might in fact run more slowly than a plain table scan. The intensive memory and CPU usage also tends to impact concurrent queries, but you won’t see this effect when you run the query in isolation. This is another reason to design realistic benchmarks.

    If your queries run more slowly because of this optimizer limitation, you can work around it by disabling some indexes with , or just fall back to the old tactic.

    Equality propagation can have unexpected costs sometimes. For example, consider a huge list on a column the optimizer knows will be equal to some columns on other tables, due to a , or clause that sets the columns equal to each other.

    The optimizer will “share” the list by copying it to the corresponding columns in all related tables. This is normally helpful, because it gives the query optimizer and execution engine more options for where to actually execute the check. But when the list is very large, it can result in slower optimization and execution. There’s no built-in workaround for this problem at the time of this writing—you’ll have to change the source code if it’s a problem for you. (It’s not a problem for most people.)

    MySQL can’t execute a single query in parallel on many CPUs. This is a feature offered by some other database servers, but not MySQL. We mention it so that you won’t spend a lot of time trying to figure out how to get parallel query execution on MySQL!

    MySQL can’t do true hash joins at the time of this writing—everything is a nested-loop join. However, you can emulate hash joins using hash indexes. If you aren’t using the Memory storage engine, you’ll have to emulate the hash indexes, too. We showed you how to do this in “Building your own hash indexes” on Hash indexes.

    MySQL has historically been unable to do loose index scans, which scan noncontiguous ranges of an index. MySQL’s index scans generally require a defined start point and a defined end point in the index, even if only a few noncontiguous rows in the middle are really desired for the query. MySQL will scan the entire range of rows within these end points.

    An example will help clarify this. Suppose we have a table with an index on columns and we want to run the following query:


    Because the index begins with column , but the query’s clause doesn’t specify column , MySQL will do a table scan and eliminate the nonmatching rows with a clause, as shown in Figure 4-5.

    Figure 4-5. MySQL scans the entire table to find rows

    It’s easy to see that there’s a faster way to execute this query. The index’s structure (but not MySQL’s storage engine API) lets you seek to the beginning of each range of values, scan until the end of the range, and then backtrack and jump ahead to the start of the next range. Figure 4-6 shows what that strategy would look like if MySQL were able to do it.

    Notice the absence of a clause, which isn’t needed because the index alone lets us skip over the unwanted rows. (Again, MySQL can’t do this yet.)

    Figure 4-6. A loose index scan, which MySQL cannot currently do, would be more efficient

    This is admittedly a simplistic example, and we could easily optimize the query we’ve shown by adding a different index. However, there are many cases where adding another index can’t solve the problem. One example is a query that has a range condition on the index’s first column and an equality condition on the second column.

    Beginning in MySQL 5.0, loose index scans are possible in certain limited circumstances, such as queries that find maximum and minimum values in a grouped query:

    mysql> -> -> *************************** 1. row *************************** id: 1 select_type: SIMPLE table: film_actor type: range possible_keys: NULL key: PRIMARY key_len: 2 ref: NULL rows: 396 Extra:

    The “Using index for group-by” information in this plan indicates a loose index scan. This is a good optimization for this special purpose, but it is not a general-purpose loose index scan. It might be better termed a “loose index probe.”

    Until MySQL supports general-purpose loose index scans, the workaround is to supply a constant or list of constants for the leading columns of the index. We showed several examples of how to get good performance with these types of queries in our indexing case study in the previous chapter.

    MySQL doesn’t optimize certain and queries very well. Here’s an example:


    Because there’s no index on , this query performs a table scan. If MySQL scans the primary key, it can theoretically stop after reading the first matching row, because the primary key is strictly ascending and any subsequent row will have a greater . However, in this case, MySQL will scan the whole table, which you can verify by profiling the query. The workaround is to remove the and rewrite the query with a , as follows:

    mysql> ->

    This general strategy often works well when MySQL would otherwise choose to scan more rows than necessary. If you’re a purist, you might object that this query is missing the point of SQL. We’re supposed to be able to tell the server what we want and it’s supposed to figure out how to get that data, whereas, in this case, we’re telling MySQL how to execute the query and, as a result, it’s not clear from the query that what we’re looking for is a minimal value. True, but sometimes you have to compromise your principles to get high performance.

    SELECT and UPDATE on the same table

    MySQL doesn’t let you from a table while simultaneously running an on it. This isn’t really an optimizer limitation, but knowing how MySQL executes queries can help you work around it. Here’s an example of a query that’s disallowed, even though it is standard SQL. The query updates each row with the number of similar rows in the table:

    mysql> -> -> -> -> ERROR 1093 (HY000): You can't specify target table 'outer_tbl' for update in FROM clause

    To work around this limitation, you can use a derived table, because MySQL materializes it as a temporary table. This effectively executes two queries: one inside the subquery, and one multitable with the joined results of the table and the subquery. The subquery opens and closes the table before the outer opens the table, so the query will now succeed:

    mysql> -> -> -> -> -> ->

    In this section, we give advice on how to optimize certain kinds of queries. We’ve covered most of these topics in detail elsewhere in the book, but we wanted to make a list of common optimization problems that you can refer to easily.

    Most of the advice in this section is version-dependent, and it may not hold for future versions of MySQL. There’s no reason why the server won’t be able to do some or all of these optimizations itself someday.

    Optimizing COUNT() Queries

    The aggregate function and how to optimize queries that use it is probably one of the top 10 most misunderstood topics in MySQL. You can do a web search and find more misinformation on this topic than we care to think about.

    Before we get into optimization, it’s important that you understand what really does.

    is a special function that works in two very different ways: it counts values and rows. A value is a non- expression ( is the absence of a value). If you specify a column name or other expression inside the parentheses, counts how many times that expression has a value. This is confusing for many people, in part because values and are confusing. If you need to learn how this works in SQL, we suggest a good book on SQL fundamentals. (The Internet is not necessarily a good source of accurate information on this topic, either.)

    The other form of simply counts the number of rows in the result. This is what MySQL does when it knows the expression inside the parentheses can never be . The most obvious example is , which is a special form of that does not expand the * wildcard into the full list of columns in the table, as you might expect; instead, it ignores columns altogether and counts rows.

    One of the most common mistakes we see is specifying column names inside the parentheses when you want to count rows. When you want to know the number of rows in the result, you should always use . This communicates your intention clearly and avoids poor performance.

    A common misconception is that MyISAM is extremely fast for queries. It is fast, but only for a very special case: without a clause, which merely counts the number of rows in the entire table. MySQL can optimize this away because the storage engine always knows how many rows are in the table. If MySQL knows can never be , it can also optimize a expression by converting it to internally.

    MyISAM does not have any magical speed optimizations for counting rows when the query has a clause, or for the more general case of counting values instead of rows. It may be faster than other storage engines for a given query, or it may not be. That depends on a lot of factors.

    You can sometimes use MyISAM’s optimization to your advantage when you want to count all but a very small number of rows that are well indexed. The following example uses the standard World database to show how you can efficiently find the number of cities whose is greater than . You might write this query as follows:


    If you profile this query with , you’ll see that it scans 4,079 rows. If you negate the conditions and subtract the number of cities whose s are less than or equal to from the total number of cities, you can reduce that to five rows:

    mysql> ->

    This version reads fewer rows because the subquery is turned into a constant during the query optimization phase, as you can see with :

    +----+-------------+-------+...+------+------------------------------+ | id | select_type | table |...| rows | Extra | +----+-------------+-------+...+------+------------------------------+ | 1 | PRIMARY | City |...| 6 | Using where; Using index | | 2 | SUBQUERY | NULL |...| NULL | Select tables optimized away | +----+-------------+-------+...+------+------------------------------+

    A frequent question on mailing lists and IRC channels is how to retrieve counts for several different values in the same column with just one query, to reduce the number of queries required. For example, say you want to create a single query that counts how many items have each of several colors. You can’t use an (e.g., ), because that won’t separate the different counts for the different colors. And you can’t put the colors in the clause (e.g., ), because the colors are mutually exclusive. Here is a query that solves this problem:

    mysql> ->

    And here is another that’s equivalent, but instead of using uses and ensures that the expressions won’t have values when the criteria are false:

    mysql> ->

    More complex optimizations

    In general, queries are hard to optimize because they usually need to count a lot of rows (i.e., access a lot of data). Your only other option for optimizing within MySQL itself is to use a covering index, which we discussed in Chapter 3. If that doesn’t help enough, you need to make changes to your application architecture. Consider summary tables (also covered in Chapter 3), and possibly an external caching system such as memcached. You’ll probably find yourself faced with the familiar dilemma, “fast, accurate, and simple: pick any two.”

    This topic is actually spread throughout most of the book, but we mention a few highlights:

    • Make sure there are indexes on the columns in the or clauses. See “Indexing Basics” on Indexing Basics for more about indexing. Consider the join order when adding indexes. If you’re joining tables and on column and the query optimizer decides to join the tables in the order , you don’t need to index the column on table . Unused indexes are extra overhead. In general, you need to add indexes only on the second table in the join order, unless they’re needed for some other reason.

    • Try to ensure that any or expression refers only to columns from a single table, so MySQL can try to use an index for that operation.

    • Be careful when upgrading MySQL, because the join syntax, operator precedence, and other behaviors have changed at various times. What used to be a normal join can sometimes become a cross product, a different kind of join that returns different results, or even invalid syntax.

    The most important advice we can give on subqueries is that you should usually prefer a join where possible, at least in current versions of MySQL. We covered this topic extensively earlier in this chapter.

    Subqueries are the subject of intense work by the optimizer team, and upcoming versions of MySQL may have more subquery optimizations. It remains to be seen which of the optimizations we’ve seen will end up in released code, and how much difference they’ll make. Our point here is that “prefer a join” is not future-proof advice. The server is getting smarter all the time, and the cases where you have to tell it how to do something instead of what results to return are becoming fewer.

    Optimizing GROUP BY and DISTINCT

    MySQL optimizes these two kinds of queries similarly in many cases, and in fact converts between them as needed internally during the optimization process. Both types of queries benefit from indexes, as usual, and that’s the single most important way to optimize them.

    MySQL has two kinds of strategies when it can’t use an index: it can use a temporary table or a filesort to perform the grouping. Either one can be more efficient for any given query. You can force the optimizer to choose one method or the other with the and optimizer hints.

    If you need to group a join by a value that comes from a lookup table, it’s usually more efficient to group by the lookup table’s identifier than by the value. For example, the following query isn’t as efficient as it could be:

    mysql> -> -> ->

    The query is more efficiently written as follows:

    mysql> -> -> ->

    Grouping by could be more efficient than grouping by . You should profile and/or benchmark on your specific data to see.

    This query takes advantage of the fact that the actor’s first and last name are dependent on the , so it will return the same results, but it’s not always the case that you can blithely select nongrouped columns and get the same result. You may even have the server’s configured to disallow it. You can use or to work around this when you know the values within the group are distinct because they depend on the grouped-by column, or if you don’t care which value you get:


    Purists will argue that you’re grouping by the wrong thing, and they’re right. A spurious or is a sign that the query isn’t structured correctly. However, sometimes your only concern will be making MySQL execute the query as quickly as possible. The purists will be satisfied with the following way of writing the query:

    mysql> -> -> -> -> -> ->

    But sometimes the cost of creating and filling the temporary table required for the subquery is high compared to the cost of fudging pure relational theory a little bit. Remember, the temporary table created by the subquery has no indexes.

    It’s generally a bad idea to select nongrouped columns in a grouped query, because the results will be nondeterministic and could easily change if you change an index or the optimizer decides to use a different strategy. Most such queries we see are accidents (because the server doesn’t complain), or are the result of laziness rather than being designed that way for optimization purposes. It’s better to be explicit. In fact, we suggest that you set the server’s configuration variable to include so it produces an error instead of letting you write a bad query.

    MySQL automatically orders grouped queries by the columns in the clause, unless you specify an clause explicitly. If you don’t care about the order and you see this causing a filesort, you can use to skip the automatic sort. You can also add an optional or keyword right after the clause to order the results in the desired direction by the clause’s columns.


    A variation on grouped queries is to ask MySQL to do superaggregation within the results. You can do this with a clause, but it might not be as well optimized as you need. Check the execution method with , paying attention to whether the grouping is done via filesort or temporary table; try removing the and seeing if you get the same group method. You may be able to force the grouping method with the hints we mentioned earlier in this section.

    Sometimes it’s more efficient to do superaggregation in your application, even if it means fetching many more rows from the server. You can also nest a subquery in the clause or use a temporary table to hold intermediate results.

    The best approach may be to move the functionality into your application code.

    Optimizing LIMIT and OFFSET

    Queries with s and s are common in systems that do pagination, nearly always in conjunction with an clause. It’s helpful to have an index that supports the ordering; otherwise, the server has to do a lot of filesorts.

    A frequent problem is having a high value for the offset. If your query looks like , it is generating 10,020 rows and throwing away the first 10,000 of them, which is very expensive. Assuming all pages are accessed with equal frequency, such queries scan half the table on average. To optimize them, you can either limit how many pages are permitted in a pagination view, or try to make the high offsets more efficient.

    One simple technique to improve efficiency is to do the offset on a covering index, rather than the full rows. You can then join the result to the full row and retrieve the additional columns you need. This can be much more efficient. Consider the following query:


    If the table is very large, this query is better written as follows:

    mysql> -> -> -> -> ->

    This works because it lets the server examine as little data as possible in an index without accessing rows, and then, once the desired rows are found, join them against the full table to retrieve the other columns from the row. A similar technique applies to joins with clauses.

    Sometimes you can also convert the limit to a positional query, which the server can execute as an index range scan. For example, if you precalculate and index a position column, you can rewrite the query as follows:

    mysql> ->

    Ranked data poses a similar problem, but usually mixes into the fray. You’ll almost certainly need to precompute and store ranks.

    If you really need to optimize pagination systems, you should probably use precomputed summaries. As an alternative, you can join against redundant tables that contain only the primary key and the columns you need for the . You can also use Sphinx; see Appendix C for more information.

    Optimizing SQL_CALC_FOUND_ROWS

    Another common technique for paginated displays is to add the hint to a query with a , so you’ll know how many rows would have been returned without the . It may seem that there’s some kind of “magic” happening here, whereby the server predicts how many rows it would have found. But unfortunately, the server doesn’t really do that; it can’t count rows it doesn’t actually find. This option just tells the server to generate and throw away the rest of the result set, instead of stopping when it reaches the desired number of rows. That’s very expensive.

    A better design is to convert the pager to a “next” link. Assuming there are 20 results per page, the query should then use a of 21 rows and display only 20. If the 21st row exists in the results, there’s a next page, and you can render the “next” link.

    Another possibility is to fetch and cache many more rows than you need—say, 1,000—and then retrieve them from the cache for successive pages. This strategy lets your application know how large the full result set is. If it’s fewer than 1,000 rows, the application knows how many page links to render; if it’s more, the application can just display “more than 1,000 results found.” Both strategies are much more efficient than repeatedly generating an entire result and discarding most of it.

    Even when you can’t use these tactics, using a separate query to find the number of rows can be much faster than , if it can use a covering index.

    MySQL always executes queries by creating a temporary table and filling it with the results. MySQL can’t apply as many optimizations to queries as you might be used to. You might have to help the optimizer by manually “pushing down” , and other conditions (i.e., copying them, as appropriate, from the outer query into each in the ).

    It’s important to always use , unless you need the server to eliminate duplicate rows. If you omit the keyword, MySQL adds the option to the temporary table, which uses the full row to determine uniqueness. This is quite expensive. Be aware that the keyword doesn’t eliminate the temporary table, though. MySQL always places results into a temporary table and then reads them out again, even when it’s not really necessary (for example, when the results could be returned directly to the client).

    MySQL has a few optimizer hints you can use to control the query plan if you’re not happy with the one MySQL’s optimizer chooses. The following list identifies these hints and indicates when it’s a good idea to use them. You place the appropriate hint in the query whose plan you want to modify, and it is effective for only that query. Check the MySQL manual for the exact syntax of each hint. Some of them are version-dependent. The options are:


    These hints tell MySQL how to prioritize the statement relative to other statements that are trying to access the same tables.

    tells MySQL to schedule a statement before other statements that may be waiting for locks, so they can modify data. In effect, it makes the go to the front of the queue instead of waiting its turn. You can also apply this modifier to , where it simply cancels the effect of a global server setting.

    is the reverse: it makes the statement wait at the very end of the queue if there are any other statements that want to access the tables—even if the other statements are issued after it. It’s rather like an overly polite person holding the door at a restaurant: as long as there’s anyone else waiting, it will starve itself! You can apply this hint to , and statements.

    These hints are effective on storage engines with table-level locking, but you should never need them on InnoDB or other engines with fine-grained locking and concurrency control. Be careful when using them on MyISAM, because they can disable concurrent inserts and greatly reduce performance.

    The and hints are a frequent source of confusion. They do not allocate more or fewer resources to queries to make them “work harder” or “not work as hard”; they simply affect how the server queues statements that are waiting for access to a table.

    This hint is for use with and . It lets the statement to which it is applied return immediately and places the inserted rows into a buffer, which will be inserted in bulk when the table is free. This is most useful for logging and similar applications where you want to insert a lot of rows without making the client wait, and without causing I/O for each statement. There are many limitations; for example, delayed inserts are not implemented in all storage engines, and doesn’t work with them.

    This hint can appear either just after the keyword in a statement, or in any statement between two joined tables. The first usage forces all tables in the query to be joined in the order in which they’re listed in the statement. The second usage forces a join order on the two tables between which the hint appears.

    The hint is useful when MySQL doesn’t choose a good join order, or when the optimizer takes a long time to decide on a join order. In the latter case, the thread will spend a lot of time in “Statistics” state, and adding this hint will reduce the search space for the optimizer.

    You can use to see what order the optimizer would choose, then rewrite the query in that order and add . This is a good idea as long as you don’t think the fixed order will result in bad performance for some clauses. You should be careful to revisit such queries after upgrading MySQL, however, because new optimizations may appear that will be defeated by .


    These hints are for statements. They tell the optimizer how and when to use temporary tables and sort in or queries. tells the optimizer that the result set will be small and can be put into indexed temporary tables to avoid sorting for the grouping, whereas indicates that the result will be large and that it will be better to use temporary tables on disk with sorting.

    This hint tells the optimizer to put the results into a temporary table and release table locks as soon as possible. This is different from the client-side buffering we described in “The MySQL Client/Server Protocol” on Query Execution Basics. Server-side buffering can be useful when you don’t use buffering on the client, as it lets you avoid consuming a lot of memory on the client and still release locks quickly. The tradeoff is that the server’s memory is used instead of the client’s.


    These hints instruct the server that the query either is or is not a candidate for caching in the query cache. See the next chapter for details on how to use them.

    This hint tells MySQL to calculate a full result set when there’s a clause, even though it returns only rows. You can retrieve the total number of rows it found via (but see “Optimizing SQL_CALC_FOUND_ROWS” on Optimizing UNION for reasons why you shouldn’t use this hint).


    These hints control locking for statements, but only for storage engines that have row-level locks. They enable you to place locks on the matched rows, which can be useful when you want to lock rows you know you are going to update later, or when you want to avoid lock escalation and just acquire exclusive locks as soon as possible.

    These hints are not needed for queries, which place read locks on the source rows by default in MySQL 5.0. (You can disable this behavior, but it’s not a good idea—we explain why in Chapters Chapter 8 and Chapter 11.) MySQL 5.1 may lift this restriction under certain conditions.

    At the time of this writing, only InnoDB supports these hints, and it’s too early to say whether other storage engines with row-level locks will support them in the future. When using these hints with InnoDB, be aware that they may disable some optimizations, such as covering indexes. InnoDB can’t lock rows exclusively without accessing the primary key, which is where the row versioning information is stored.

    , and

    These hints tell the optimizer which indexes to use or ignore for finding rows in a table (for example, when deciding on a join order). In MySQL 5.0 and earlier, they don’t influence which indexes the server uses for sorting and grouping; in MySQL 5.1 the syntax can take an optional or clause.

    is the same as , but it tells the optimizer that a table scan is extremely expensive compared to the index, even if the index is not very useful. You can use these hints when you don’t think the optimizer is choosing the right index, or when you want to take advantage of an index for some reason, such as implicit ordering without an . We gave an example of this in “Optimizing LIMIT and OFFSET” on Optimizing SQL_CALC_FOUND_ROWS, where we showed how to get a minimum value efficiently with .

    In MySQL 5.0 and newer, there are also some system variables that influence the optimizer:

    This variable tells the optimizer how exhaustively to examine partial plans. If your queries are taking a very long time in the “Statistics” state, you might try lowering this value.

    This variable, which is enabled by default, lets the optimizer skip certain plans based on the number of rows examined.

    Both options control optimizer shortcuts. These shortcuts are valuable for good performance on complex queries, but they can cause the server to miss optimal plans for the sake of efficiency. That’s why it sometimes makes sense to change them.

    It’s easy to forget about MySQL’s user-defined variables, but they can be a powerful technique for writing efficient queries. They work especially well for queries that benefit from a mixture of procedural and relational logic. Purely relational queries treat everything as unordered sets that the server somehow manipulates all at once. MySQL takes a more pragmatic approach. This can be a weakness, but it can be a strength if you know how to exploit it, and user-defined variables can help.

    User-defined variables are temporary containers for values, which persist as long as your connection to the server lives. You define them by simply assigning to them with a or statement: [48]

    mysql> mysql> mysql>

    You can then use the variables in most places an expression can go:


    Before we get into the strengths of user-defined variables, let’s take a look at some of their peculiarities and disadvantages and see what things you can’t use them for:

    • They prevent query caching.

    • You can’t use them where a literal or identifier is needed, such as for a table or column name, or in the clause.

    • They are connection-specific, so you can’t use them for interconnection communication.

    • If you’re using connection pooling or persistent connections, they can cause seemingly isolated parts of your code to interact.

    • They are case sensitive in MySQL versions prior to 5.0, so beware of compatibility issues.

    • You can’t explicitly declare these variables’ types, and the point at which types are decided for undefined variables differs across MySQL versions. The best thing to do is initially assign a value of for variables you want to use for integers, for floating-point numbers, or '' (the empty string) for strings. A variable’s type changes when it is assigned to; MySQL’s user-defined variable typing is dynamic.

    • The optimizer might optimize away these variables in some situations, preventing them from doing what you want.

    • Order of assignment, and indeed even the time of assignment, can be nondeterministic and depend on the query plan the optimizer chose. The results can be very confusing, as you’ll see later.

    • The assignment operator has lower precedence than any other operator, so you have to be careful to parenthesize explicitly.

    • Undefined variables do not generate a syntax error, so it’s easy to make mistakes without knowing it.

    One of the most important features of variables is that you can assign a value to a variable and use the resulting value at the same time. In other words, an assignment is an L-value. Here’s an example that simultaneously calculates and outputs a “row number” for a query:

    mysql> mysql> -> +----------+--------+ | actor_id | rownum | +----------+--------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | +----------+--------+

    This example isn’t terribly interesting, because it just shows that we can duplicate the table’s primary key. Still, it has its uses—one of which is ranking. Let’s write a query that returns the 10 actors who have played in the most movies, with a rank column that gives actors the same rank if they’re tied. We start with a query that finds the actors and the number of movies:

    mysql> -> -> -> -> +----------+-----+ | actor_id | cnt | +----------+-----+ | 107 | 42 | | 102 | 41 | | 198 | 40 | | 181 | 39 | | 23 | 37 | | 81 | 36 | | 106 | 35 | | 60 | 35 | | 13 | 35 | | 158 | 35 | +----------+-----+

    Now let’s add the rank, which should be the same for all the actors who played in 35 movies. We use three variables to do this: one to keep track of the current rank, one to keep track of the previous actor’s movie count, and one to keep track of the current actor’s movie count. We change the rank when the movie count changes. Here’s a first try:

    mysql> mysql> -> -> -> -> -> -> -> +----------+-----+------+-------+ | actor_id | cnt | rank | dummy | +----------+-----+------+-------+ | 107 | 42 | 0 | 0 | | 102 | 41 | 0 | 0 | ...

    Oops—the rank and count never got updated from zero. Why did this happen?

    It’s impossible to give a one-size-fits-all answer. The problem could be as simple as a misspelled variable name (in this example it’s not), or something more involved. In this case, shows there’s a temporary table and filesort, so the variables are being evaluated at a different time from when we expected.

    This is the type of inscrutable behavior you’ll often experience with MySQL’s user-defined variables. Debugging such problems can be tough, but it can really pay off. Ranking in SQL normally requires quadratic algorithms, such as counting the distinct number of actors who played in a greater number of movies. A user-defined variable solution can be a linear algorithm—quite an improvement.

    An easy solution in this case is to add another level of temporary tables to the query, using a subquery in the clause:

    mysql> -> -> -> -> -> -> -> -> -> -> -> +----------+-----+------+-------+ | actor_id | cnt | rank | dummy | +----------+-----+------+-------+ | 107 | 42 | 1 | 42 | | 102 | 41 | 2 | 41 | | 198 | 40 | 3 | 40 | | 181 | 39 | 4 | 39 | | 23 | 37 | 5 | 37 | | 81 | 36 | 6 | 36 | | 106 | 35 | 7 | 35 | | 60 | 35 | 7 | 35 | | 13 | 35 | 7 | 35 | | 158 | 35 | 7 | 35 | +----------+-----+------+-------+

    Most problems with user variables come from assigning to them and reading them at different stages in the query. For example, it doesn’t work predictably to assign them in the statement and read from them in the clause. The following query might look like it will just return one row, but it doesn’t:

    mysql> mysql> -> -> +----------+------+ | actor_id | cnt | +----------+------+ | 1 | 1 | | 2 | 2 | +----------+------+

    This happens because the and are different stages in the query execution process. This is even more obvious when you add another stage to execution with an :

    mysql> mysql> -> -> ->

    This query returns every row in the table, because the added a filesort and the is evaluated before the filesort. The solution to this problem is to assign and read in the same stage of query execution:

    mysql> mysql> -> -> +----------+--------+ | actor_id | rownum | +----------+--------+ | 1 | 1 | +----------+--------+

    Pop quiz: what will happen if you add the back to this query? Try it and see. If you didn’t get the results you expected, why not? What about the following query, where the changes the variable’s value and the clause evaluates it?

    mysql> mysql> -> -> ->

    The answer to most unexpected user-defined variable behavior can be found by running and looking for “Using where,” “Using temporary,” or “Using filesort” in the column.

    The last example introduced another useful hack: we placed the assignment in the function, so its value is effectively masked and won’t skew the results of the (as we’ve written it, the function will always return ). This trick is very helpful when you want to do variable assignments solely for their side effects: it lets you hide the return value and avoid extra columns, such as the column we showed in a previous example. The and functions are also useful for this purpose, alone and in combination, because they have special behaviors. For instance, stops evaluating its arguments as soon as one has a defined value.

    You can put variable assignments in all types of statements, not just statements. In fact, this is one of the best uses for user-defined variables. For example, you can rewrite expensive queries, such as rank calculations with subqueries, as cheap once-through statements.

    It can be a little tricky to get the desired behavior, though. Sometimes the optimizer decides to consider the variables compile-time constants and refuses to perform assignments. Placing the assignments inside a function like will usually help. Another tip is to check whether your variable has a defined value before executing the containing statement. Sometimes you want it to, but other times you don’t.

    With a little experimentation, you can do all sorts of interesting things with user-defined variables. Here are some ideas:

    • Calculate running totals and averages

    • Emulate and functions for grouped queries

    • Do math on extremely large numbers

    • Reduce an entire table to a single MD5 hash value

    • “Unwrap” a sampled value that wraps when it increases beyond a certain boundary

    • Emulate read/write cursors

    Be Careful with MySQL Upgrades

    As we’ve said, trying to outsmart the MySQL optimizer usually is not a good idea. It generally creates more work and increases maintenance costs for very little benefit. This is especially relevant when you upgrade MySQL, because optimizer hints used in your queries might prevent new optimizer strategies from being used.

    The way the MySQL optimizer uses indexes is a moving target. New MySQL versions change how existing indexes can be used, and you should adjust your indexing practices as these new versions become available. For example, we’ve mentioned that MySQL 4.0 and older could use only one index per table per query, but MySQL 5.0 and newer can use index merge strategies.

    Besides the big changes MySQL occasionally makes to the query optimizer, each incremental release typically includes many tiny changes. These changes usually affect small things, such as the conditions under which an index is excluded from consideration, and let MySQL optimize more special cases.

    Although all this sounds good in theory, in practice some queries perform worse after an upgrade. If you’ve used a certain version for a long time, you have likely tuned certain queries just for that version, whether you know it or not. These optimizations may no longer apply in newer versions, or may degrade performance.

    If you care about high performance you should have a benchmark suite that represents your particular workload, which you can run against the new version on a development server before you upgrade the production servers. Also, before upgrading, you should read the release notes and the list of known bugs in the new version. The MySQL manual includes a user-friendly list of known serious bugs.

    Most MySQL upgrades bring better performance overall; we don’t mean to imply otherwise. However, you should still be careful.

    Get High Performance MySQL, 2nd Edition now with O’Reilly online learning.

    O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.

    Start your free trial


    Query mysql time

    Welcome to a short tutorial on how to measure the time taken to execute a query in MYSQL. So just how long did a query take to run in MYSQL? Which query is better and faster? There are actually quite a number of ways to measure and benchmark queries in MYSQL:

      1. The simplest way is to run the query in the command line, MYSQL is set to show the query execution time by default.
      2. Similarly, with tools like MYSQL Workbench and PHPMyAdmin, the execution time will be shown after running the query.
      3. Use MYSQL query profiling.
        • Run your statements.
      4. Use the function.
      5. Manually write a test script to benchmark the performance of the query.
      6. Lastly, use a benchmarking tool such as Sysbench.

    That covers the quick basics, read on for the detailed examples!

    ⓘ I have included a zip file with all the example code at the start of this tutorial, so you don’t have to copy-paste everything… Or if you just want to dive straight in.




    Firstly, here is the download link to the example code as promised.


    Click here to download all the example source code, I have released it under the MIT license, so feel free to build on top of it or use it in your own project.


    If you spot a bug, please feel free to comment below. I try to answer questions too, but it is one person versus the entire world… If you need answers urgently, please check out my list of websites to get help with programming.


    All right, let us now get into the various ways and examples of getting the query execution time in MYSQL.


    This is probably the simplest way to check a query execution time, without having to install anything else. Just fire up the command line (or terminal), run MYSQL, and execute the SQL query. The time taken is set to be shown by default.


    If you are using the “common MYSQL tools” such as MYSQL Workbench and PHPMyAdmin, both of them are also set to show the query execution time by default. No sweat.


    Now, some of you code ninjas may be thinking “what if I want to check for multiple queries”? Introducing MYSQL query profiling.

    • Turn on the “profiling mode” with .
    • Run all your SQL queries and statements as usual.
    • Then to see how long each query takes.
    • You can also use and to see more details for the queries.


    Again, some of you sharp code ninjas may be thinking “running the query one time is not conclusive”. So yes, there is also a native MYSQL function for doing query benchmarking – . For example, will run 1000 times, and return the average execution time.

    But please take extra note, as indicated in the MYSQL manual itself – The function is meant for measuring scalar expressions. Meaning, the statement must return a single value. Testing queries such as will not work, since it returns multiple rows and columns. A big bummer, but there are other ways to benchmark queries.



    Following up with the above “average query speed”, we can write a simple test script to loop through the query multiple times. Then calculate the total time taken and the average per run.

    P.S. This one is in PHP, but it can really be any other language that can connect to MYSQL… You can even write a stored function or procedure in “pure MYSQL” if you want.


    Lastly, before the toxic trolls start to rage about “proper benchmarking” – Yes, there are a ton of database benchmarking tools. I am not going to list every single one here, but here is what the official MYSQL community recommends:

    • DBT2
    • SysBench
    • flexAsynch – Used to test MYSQL Cluster. Already included in the source.




    Thank you for reading, and we have come to the end. I hope that it has helped you to better understand, and if you want to share anything with this guide, please feel free to comment below. Good luck and happy coding!


    MySQL Query Analyzer

    The MySQL Query Analyzer enables developers and DBAs to quickly improve the performance of their database applications by monitoring query performance. MySQL Query Analyzer lets you accurately pinpoint SQL code that is the root cause of a slow down. Rich graphs that drill down into detailed query information provide significantly better visibility into database performance issues. With the MySQL Query Analyzer, developers can improve SQL code during active development as well as continuously monitor and tune queries running on production systems.

    Improve MySQL Performance: Find and Fix Problem Queries

    The MySQL Query Analyzer provides a consolidated view of query activities and execution details enabling developers and DBAs to quickly find performance tuning opportunities. MySQL Query Analyzer enables developers to:

    • Quickly identify expensive queries that impact the performance of their applications
    • Visualize query activity to gain further insight into performance beyond query statistics
    • Filter for specific query problems like full table scans and bad indexes using advanced global search options
    • Fix the root causes of poor performance directly in the SQL code

    Visually Correlate Query Execution with MySQL Server Activity

    Correlated graphs allow developers and DBAs to visually compare MySQL server activity concurrently with executing queries.

    • Drag and Select a region on a graph to display the queries being executed during the selected time period
    • Combine timeframes with numerous filtering options such as Query Type, Execution Counts, First Seen and more to further target tuning opportunities

    Drill Down into Detailed Query Information

    Fine-grained query statistics take the guesswork out of finding queries that can be tuned for better performance.

    • Query Analyzer Table provides aggregated summary information for all executing queries
    • Query Response Time Index (QRTi) identifies queries with unacceptable response times
    • Response Statistics provide Execution Time and Row Statistics, Time Span and First Seen
    • Example Query provides a sample query for further review
    • Explain Query provides insight into the execution plan used for the query
    • Graphs provides key metrics such as Execution Time, Executions, Rows, and Kilobytes for a window of time for a specific query


    White Papers



    You will also like:

    How to measure actual MySQL query time?

    To measure actual MySQL query time, we can use the concept of profiling that must be set to 1 before executing the query.

    The order must be like this.

    Set profiling to 1 Then execute query Then show profiles

    Now, I am applying the above order to get the actual MySQL query time −

    mysql> SET profiling = 1; Query OK, 0 rows affected, 1 warning (0.00 sec)

    After that I am executing the following query −

    mysql> SELECT * from MilliSecondDemo;

    The following is the output

    +-------------------------+ | MyTimeInMillSec | +-------------------------+ | 2018-10-08 15:19:50.202 | +-------------------------+ 1 row in set (0.00 sec)

    To know the actual time of the above query, use the following query

    mysql> SHOW PROFILES;

    After executing the above query, we will get the output as shown below −

    +----------+------------+------------------------------+ | Query_ID | Duration | Query | +----------+------------+------------------------------+ | 1 | 0.00051725 | SELECT * from MilliSecondDemo| +----------+------------+------------------------------+ 1 row in set, 1 warning (0.00 sec)

    1111 1112 1113 1114 1115