OurSQL Episode 36: It's Not Our (De)fault! Part 1

Nick Pisarro, aka @highenergybeams, suggested we do a podcast about the MySQL defaults we recommend always having, as MySQL's pre-configured option files are not particularly helpful. So we have put together a 2-part podcast on the options that we change.

About the Configuration File
Location and precedence of configuration files for Windows, Unix and Mac OS X:
http://dev.mysql.com/doc/refman/5.5/en/option-files.html

Note: We use the term "directive" to refer to the groups of options that begin with [groupname]. Upon reading the manual page again, we realized that the term is actually "option group". (Sheeri takes complete blame for this -- "directive" is actually the term for what !includedir and !include are...)

Sheeri's blog post about MySQL being too helpful - group vs. group_concat_max_len saga

Links to the manual for the commands we usually change:
Directories
datadir
tmpdir

Overall Server Settings
default_storage_engine

We mentioned default-character-set but it has been deprecated in MySQL 5.1 and removed in MySQL 5.5, in favor of using character-set-server. As we mentioned in the beginning, check your error logs when you restart mysqld and fix any warnings -- deprecated features will definitely show up there.

skip-name-resolve - it may be helpful to read Sheeri's Blog post on "What is an unauthenticated user?" and the MySQL manual description of How MySQL Uses DNS.

MyISAM Variables
concurrent_insert - even if you are not planning on using MyISAM, just in case. There is a more descriptive manual page about concurrent inserts if you want to learn more.

key_buffer_size - you can also read more about MyISAM's multiple key cache feature.

InnoDB Variables
innodb_file_per_table

innodb_buffer_pool_size - Also the more detailed manual page on Using Per-Table Tablespaces.

innodb_flush_method

Ear Candy
log_slow_slave_statements - log slow queries that come through replication

log_slow_admin_statements

Part 2 of the series

Where you can see us
On Monday March 7th at 7pm PT, the Seattle MySQL meetup will happen with Brian Aker talking about Drizzle 7 and the upcoming GA version! http://www.meetup.com/seattlemysql/events/16489408/

Sheeri will be at Collaborate in Orlando, Florida from Sunday April 10th through Thursday April 14th. She is organizing the Community dinner, eastern US edition on Sunday April 10th at 5 pm at Maggiano's Little Italy near the Convention Center.

Sarah will be at the O'Reilly MySQL Conference in Santa Clara, California from Monday, April 11th through Thursday April 14th, including the community dinner, western US edition on Monday April 11th at 7 pm at Pedro's.

Sheeri will be speaking about monitoring MySQL efficiently with Nagios at the Professional IT Community Conference, otherwise known as PICC, in New Brunswick, New Jersey. The conference is Friday, April 29 – Saturday April 30, 2011.

Sheeri will be at OpenDBCamp, Fri May 6th - Sun May 8th in Sardinia, Italy.

Feedback
Facebook group
e-mail: podcast at technocation.org
voicemail using phone/Skype: +1-617-674-2369
twitter: @oursqlcast
or Tweet about @oursqlcast



Transcript:

Sheeri: Welcome to OurSQL episode 36

Sarah: It’s not our default!

Sheeri: Today we’re going to talk about

Sarah: What we would like to see as defaults for configurations

Sheeri: part 1

[Theme song]

Sarah: Today is Tuesday, March 1st, 2011
Sheeri: I’m Sheeri
Sarah: And I’m Sarah
Sheeri: And this is OurSQL
Sarah: The MySQL Database Community Podcast

Sarah: So, today we are talking about the MySQL configuration files. And what we really wish was closer to the defaults. But Sheeri, I know you’ve got some notes here about kind of the broader – how things happen, and how things are read in the configuration files, in what order and such. So, why don’t you go ahead and start with those.

Sheeri: Some versions of MySQL ship with a support_files directory that have a few different configuration files, usually called my-small.cnf and my-large.cnf, but even those are not quite useful for folks. And as machines get more powerful and memory gets bigger they are more and more out of date.

Sarah: Pretty much we default to my-huge as our starting point on anything.

Sheeri: Oh, I just default to a blank file. Actually what I default to is a known good configuration for that particular client on a different server – something that is close to it. So if we’re making a slave from a master, I’ll usually copy the master.

Sarah: You’ll start with the master’s.

Sheeri: mm-hmm. So talking about the configuration file. Well, what is the configuration file? Many people know it as my.cnf, or /etc/my.cnf. There’s actually several different places a configuration file can be. There’s about four different places on Windows, but we won’t go through that there, you can look that up in the manual. In Unix, there are several different places that you can have a configuration file too.

By default, MySQL will look:
· first in /etc/my.cnf
· then in /etc/mysql/my.cnf
· in the environment variable MYSQL_HOME/my.cnf
· if that environment variable isn’t actually set, it’s going to use the value of DATADIR, if it’s specified
· and if not, then it will use the value of BASEDIR
· and then you can also set a variable called defaults-extra-file with a path to a file, so that will be after that
· and then the last one that it will look at is in your home directory, a .my.cnf file.

Now, from the first file I mentioned to the last, any variables that come later overwrite what has come earlier. So if you have a variable in /etc/my.cnf and a variable in ~/.my.cnf, the one that’s going to be used is the one in ~/.my.cnf (in your home directory).

I’ve run into issues where there (for whatever reason) has in the data directory been a my.cnf file. Some folks, I think, do that to have it with their backups or something? I’m not quite sure why. But certainly, the standard is usually to have it in /etc/my.cnf. So that’s something I don’t recommend using more than one – even a ~/.my.cnf can get annoying at times because you don’t really know where the variable is coming from if you’ve set variables in the ~/.my.cnf. Usually people use the ~/.my.cnf (which means your home directory) – for client-based things like username or port or socket.

Sarah: Or prompt

Sheeri: Or prompt, there you go.

Sarah: The customizations that you want specifically.

Sheeri: Right, the init-connect, things like that. So I don’t recommend using it in general, using more than one file, although usually there are some reasons for a ~/.my.cnf. Just be very careful if you know you want to use that.

There is a way to tell MySQL to only use one configuration file, and that’s if you use the --defaults-file option, and you could say --defaults-file=my.cnf or anything, it doesn’t have to be my.cnf, you could say --defaults-file=/etc/sheeri/foo.bar if you really want. If you do that the server will ONLY use that though, so if you do have some kind of a user based .my.cnf it won’t be used. So, if you want to keep that personal option then you don’t want to do that. But this might be something that you might consider as a policy around your whole organization, to say “OK, we’re just not allowing those personal config files. It’s just gonna be this, here’s the location and here’s where it’s going to be.”

By default, upgrades don’t usually overwrite /etc/my.cnf.

Sarah: Which is really handy. It will make a copy. Specifically with RPM installs it will put the my.cnf.new which is really nice.

So, laying out the configuration file, there’s little groupings, which is really handy as far as how the configuration file is laid out. Sheeri has brought to my attention that they’re actually called directives. These are the groupings that are put in the my.cnf with bracketed names. So you’d have [mysqld] or [mysqld_safe] or [client] and the client is specific for mysqladmin, or Perl, or anything that is actually making a connection into the database. So you’ve got your Perl connectors, your PHP connectors. You can also have different groupings like [mysqldump] or [myisamchk] for those of you who might still be using MyISAM for some reason. Or need to repair something that’s there.

We went looking and investigating to see if there’s a definitive list of directives and we haven’t been able to run into one. So maybe someone will tell us if that’s not true, but we didn’t find documentation on that very easily.

Sheeri: Yeah, I’m not sure if it’s in the source code or if it’s just in the tools themselves, to say, OK, hey look and see if there’s a matching directive.

There is one thing in that there is a directive called [client] and there’s also a directive called [mysql]. So the [mysql] directive is used by the mysql commandline client. The [client] directive is used by all clients, for example, the mysql commandline client but also for PHP connections, or mysqladmin, any kind of client program that’s used will actually take those client directives…

Sarah: PHPmyadmin

Sheeri: Yes, PHPmyadmin will do that. But I was looking at mysqladmin and then PHP or Perl or whatever and my brain merged it into PHPmyadmin.

Sarah: So the key thing when always firing up your server the first time is making sure that you have your configuration options in the right directive, and checking to make sure that your error log is clean, because you don’t want to be putting the wrong thing in the wrong place, or getting errors and having to troubleshoot those much later. So that’s one of the “always” best practices is, if you fire up a new server or any time you make a change to your my.cnf files, that you want to make sure that you have a clean error log.

Sheeri: One thing that a lot of people leave in is warnings about deprecated features, um, for an option that’s going to be deprecated later on. For example…

Sarah: Just fix it. Just fix it.

Sheeri: Yeah, exactly. Like in MySQL 5.1 the name of one of the variables changed from table_cache to table_open_cache. And it will work but you’ll get a warning. But it’s just one of those “just make it so you don’t have any warnings.” It’s just the easiest thing possible.

One interesting thing is that if you have a variable that MySQL does not understand – and by MySQL I mean whatever is reading that configuration directive doesn’t understand – you’ll get an error. For example, if you put something wrong in your init-connect string, you won’t be able to connect, unless you are SUPER user, which doesn’t use that string.

Sarah: That was one of our ear candy things.

Sheeri: That WAS one of our ear candy things. That’s why I used it as an example, because I figured our audience would be familiar with that as an example. Another example, if you spell something wrong, mysqld might not start. So you might say “why didn’t mysqld start?” well maybe I put lo-error instead of log-error as one of the options in there.

One thing that can be a little tricky with that, is that sometimes you don’t necessarily get an error, you might get a warning, or you might get something that is – you might not get anything, but you might get some behavior that you don’t really want. By default, for some reason MySQL….well, it’s not for some reason. MySQL wants to be easy to use, that was one of its, you know, three core goals, was fast, reliable, easy to use. And so it does a lot of things where it helps you out and is being really nice to you and does things that you probably don’t want to, for example, if you insert a value that’s too large for the datatype or if you insert like a text data for a number, it will actually say “well, maybe you meant, you know, 0 instead of, you know, the text. Or maybe even this field only goes up to 255, maybe, when you inserted 1000, maybe you meant 255, so I’m going to put that in there.”

MySQL does the same thing with variables in the config file. What it will do is, if it doesn’t understand an option, it actually will try to expand it to a longer variable name. I don’t know what the full list of expansions are and I don’t recommend using it on purpose because it can be confusing and if a future version adds a similar named variable you end up getting an error or unexpected behavior.

I actually ran into this the hard way. I was trying to be really diligent about making sure that MySQL started with the correct operating system user and operating system group. So I put user=mysql and group=mysql in two places – I put it in the [mysqld] directive and the [mysqld_safe] directive. What I didn’t realize was that mysqld does not actually support the option called group. It does support an option called group_concat_max_len which is the maximum length that the output of the GROUP_CONCAT() function can be. That value is a number. Now, the string “mysql” isn’t a number, so in addition to helpfully saying “oh, you said group, maybe you mean group_concat_max_len”, it actually very helpfully converted the string “mysql” to a number – which ended up being 0. And I was getting warnings in the log saying “uh, you can’t set group_concat_max_len to 0 because the minimum value is 4, so I’m going to set it to 4 for you.” So that’s three helpful things that it did.

First, it changed group to group_concat_max_len, then it changed mysql to 0 and then it said “well the minimum value is 4 so I’m going to set it to that.” And I had NO IDEA why I was getting those warnings. No clue.

11:05 Sarah: That would be a really entertaining, “Why are you – why do you think you’re doing this for me? But, thanks!”

Sheeri: And since nobody was really using GROUP_CONCAT() at all, it wasn’t really an issue of changing it. It was only when – I don’t know how I realized it, but I do have a blog post about it, and we will link to that in the show notes. The point is that if you’re doing this and you see any warnings it’s useful to try to track down the reason of it and knowing that MySQL does group expansion might actually help if you get a similar warning that you have no idea what it’s talking about.

Sarah: Yeah, that is a really helpful trick to know, because then you’re like “group what? I didn’t set that! Oh but I did set group, maybe that’s not right.”

So the next thing we had on our list is the, another just really standard best practice, is hash is a comment. And I’m one who liberally comments in my.cnf files even if it’s a default, and Sheeri and I differ on this slightly. But if it’s a default, I’m still going to say “I did this on purpose” when I put it in and I also put in notes as to how I came to a conclusion for a variable change, if it’s not a default.

12:17 Sheeri: I try to put in comments, too, but I do realize that we can’t necessarily force everyone to put comments in, particularly because

Sarah: Sure we can!

Sheeri: Well…

This is our podcast! We can force everyone to.

Sheeri: It’s true, but I’m trying to use the best practices anyway. I mean, the best practice would be to comment stuff in, you know, put in ticket numbers as to why it is, or a brief description as to why and who changed it. Theoretically, you know, you could source control your config file and put even longer detailed descriptions about what changes you’ve made. What I like to do is I like to at least make it self-documenting such that if you explicitly set a variable to the default, then you know you did that on purpose, even though you’re like “Why would you not change that?”

13:02 The reason that I say I don’t necessarily comment EVERYTHING is that there’s a lot of things that are the default and are fine as the default, that I won’t necessarily put in. I mean, there’s over 300 server variables and most of them, you don’t need to change right away, so I’m not necessarily going to say, “Oh, log_bin_trust_creators for stored routines” - or I think it’s called log_function_trust_creators or something – “that’s default to off, and that’s good for security, so I’m leaving that exactly like it is. For things that you don’t normally change I don’t normally put it in the default configuration file otherwise the configuration file would be hundreds of lines long. So that’s why I say I don’t necessarily put EVERYTHING that I’m OK with the default in there, but if I’ve specifically looked at it and said hrm, you know, should we use it, should we not, then I’ll probably put it in the configuration file to say “oh, this value is the default.”

Sarah: Yeah, it’s the “I’ve thought about this and the default is the right choice right now.”

14:00 Sheeri: so that’s commenting. So now we can get into the real meat of what we wanted to talk about, which is

Sarah: Which is options. Options in our directives, now that we’ve defined our directives. I’ll go ahead and start with directories. For example, datadir. In different directories, on different OS’s, there are different datadirs defined. But since we’re – Sheeri and I are both – very Linux centric, we will mention mostly those that are Unix and Linux centric. So the data directory defaults to either /var/lib/mysql or /usr/local/mysql/data, depending on whether it’s a precompiled RPM version or if you’re compiling it with different flags. I am inclined to keep things in /var/lib/mysql whenever I can, and I like setting it even as a default because I think Sheeri’s going to mention this same thing – we have lots and lots of clients and it is so handy when somebody’s having a crisis to be able to just grep for datadir in the my.cnf to find out where your files are expected to be – where to go look. We’ll talk a little bit later about logging, and it’s why I like having logs there as well.

15:13 Sheeri: Yep, and I agree wholeheartedly that, you know, grep datadir /etc/my.cnf is completely my friend all the time. I do change the data directory depending on the client installation. We have a lot of clients that are using either large storage or they’re using Amazon EC2 and so they’ll have something like a /data directory that’s /data/mysql is going to be the data directory or something like that. So in most cases we’ll set the data directory to something like that, as opposed to doing something like /var/lib/mysql and then having a symlink or something. Which is possible but I tend not to like to do that unless I actually absolutely have to.

Sarah: Well then you’re depending on a symlink and that’s scary. I am one who symlinks /var/lib/mysql but if it’s not on that filesystem, if that’s not where it actually is, I will change the datadir option as well.

16:06 Sheeri: Great! The next directory that I like to look at and/or change and definitely put in even if it’s the default, is the tmpdir. Which defaults to /tmp – now, the temporary directory, most of what it holds is temporary tables, it also holds temporary files, if you’re doing like a LOAD DATA INFILE. But mostly it’s the temporary tables that are too large to store on the disk* when you’re doing – when you need it for an intermediate temporary table for a large query. Or a query that’s, you know, joining or sorting a large amount of data.

[* this should be “too large to store in memory”. That’s why it gets saved to disk, in tmpdir]

So you need to make sure it’s big enough. If you have, you know, a 1 Gb /tmp directory that might not be big enough. You also want to make sure that if your /tmp is memory-backed, if it’s like a swap partition, that that’s exactly what you want, when you’re doing it. A lot of times you want it to be able to persist upon reboot if you need replication to start up from where it was, or something like that.

If you’re using for example Amazon EC2 you might not want it to be /tmp, you might want it to be on a partition that

17:14 Sarah: EBS volume

Sheeri: That! I remembered it started with E

Sarah: Yeah you don’t want it to be temporal if you don’t really intend it to be temporal storage.

Sheeri On the other hand if you have slower kind of a disk for example you've got a SAN or some kind of network attached storage you might want to consider keeping it in a local directory like /tmp because it's going to be a little faster when you're doing all those temporary tables. So it doesn't actually apply to the temporary tables that mysql makes when it's doing offline alter tables statements and rebuilding tables - that is actually in the data directory - but it does apply for the intermediate temporary tables that are created when you have a query that needs them.

Sarah: If they max out, say, the memory space to try to do a join or something.

Sheeri: Yep, so I would set tmpdir no matter what, explicitly in the my.cnf

Sarah: another handy thing to be able to just go and grep out so that you know where to look for it.

18:20 Sheeri: Exactly. Or, you know right away, if there's some kind of directory /foo having issues and you grep for foo in /etc/my.cnf and you see it's the temporary directory and you say, "Oh, I think I might know what the source of that problem is." Another one that I like to set is the default storage engine

Sarah: Which is already fixed, Yay! It's fixed in 5.5.

Sheeri: But I like to set it explicitly. Even if you're using InnoDB in 5.5 I would explicitly set it, just so that you know that's what you want. Because having that there says "yes, I want this, even though it's not the default."

I actually had a thing earlier today - I had an issue where I actually had to convert some tables from a TokuDB MySQL instance. The developers are so used to creating new tables as InnoDB that they just created a new ones as InnoDB so I actually had to convert them from InnoDB to TokuDB. So that's the case where I ended up putting default_storage_engine=TokuDB on that server. It wouldn't have helped in that case because it was a 5.1 server anyway, so if they hadn't have put InnoDB in the CREATE TABLE, it would have been MyISAM anyway.

19:24 So it wouldn't have helped in that case - it's just something to keep in mind that people are just so used to the default not being there, that they will automatically put InnoDB, even when it's not necessarily appropriate.

Sarah: Next on our wish list is default character set. Again it's one that changes between 5.1 and 5.5 - which is awesome - it's moving more toward the better choices. I've run into this before where the default character set prior to 5.5 is latin1 and you can stuff all sorts of really weird characters into latin1 even though they won't really render properly later. Moving to 5.5 though, it is now utf8. But it's still one of the things that is really nice to have explicitly set in your my.cnf. And it'll save you all sorts of pain later especially if you're running 5.1 right now. If you start with it as utf8. Because character set conversions are evil.

Sheeri: It's also something where, if you've made a business decision and you're saying "hey, no, we're going to use latin1 and we're OK with that - and then later on you do end up converting - at least you know, hey, you're going to expect things to be in latin1 because it says - right here - default_charset=latin1 in the configuration file.

So, again, this is one of those things that, EVEN if you're using the default, I would totally recommend putting that line in the configuration file.

20:40 Sarah: So, one of my favorite things to throw into the configuration file is skip-name-resolve. There are all sorts of curiosities and challenges and troubleshooting pains trying to understand and make sure - if something hasn't been set up the way you expect it to - being able to match logins with the username and host designation, for permissions. So skip-name-resolve simplifies this and simplifies permission granting because it won't resolve host names. So, every grant that is done or every user that is set up in the users table is set up with a username and an IP address. So there's no longer the question of, you know, "first I start with the DNS name, and then I move to the IP address if there's no DNS name for what that host is on login, and then I move to the IP wildcard to go ahead and resolve the username-hostname combination." If you use skip-name-resolve you only look through IP address first and then IP wildcard to match on username/host.

20:50 And that will resolve, of course, to the most specific user available so it would be sheeri@10.0.0.1.

Sheeri: One of the big problems that we see a lot in clients too, is that if your DNS ends up getting slow and you don't have skip-name-resolve and for some reason, you know, the client is just connecting and it's got to do a reverse lookup on the hostname. If it's connecting via hostname, MySQL for security sake does a reverse lookup to make sure it matches. If your DNS is slow you can have all of these connections trying to authenticate. They're not authenticated yet because they're waiting for the return of DNS.

And you see a lot of these users in the process list and it says unauthenticated user, which is A) a little scary and B) not the desired result - you can run up against your max_connections limit very easily in that kind of situation, and you don't really want a slow DNS to bring your entire database down in that kind of a way. So skip-name-resolve is something that, for high-volume organizations or high-volume databases, you know, you may end up get to a point where you're going to end up turning it off anyway. So you may as well turn it off sooner rather than later so that you don't have to then go and convert all these users that only have host names to IP addresses.

23:05 Sarah: And we have a link in our show notes about how DNS works in MySQL.

Sheeri: Yep.

Sarah: All sorts of fun stuff. And we've even put the 5.5 notes in there.

Sheeri: Yep!

Sarah: We'll put the 5.1 notes in too.

Sheeri: Yeah. So, that was a, you know, kind of, a section about, you know, generic stuff - we start with directory. So if you're using MyISAM there are few variables that I would set. One of them is the key_buffer_size. This is a cache for indexes and the default is 8 Mb.

So if you are using myISAM heavily - if you have a reporting server or something, you probably do want to change that, because you don't want your your cache to only have 8 megabytes in it for the index. There's no data cache with MyISAM, it uses the filesystem cache, so you definitely don't want to fill up, using all of your memory with the index cache. You want to save some for the file systems data cache. But probably 8 Mb is going to be way too small if you're using MyISAM tables at all. Oh, and you can also have more than one key cache if you want so there's an overall key buffer you can actually have named key caches if you want, as well. So you may end up wanting to keep key_buffer_size at 8 Mb because everything else maybe a named key cache but in that case I would also recommend specifying yes, key_buffer_size=8M and that's OK.

Sarah: I actually don't do concurrent insert things very often. I've not run into it, so tell us more.

Sheeri: concurrent_insert is a MyISAM specific parameter that is 1 by default. And it's one of these parameters that the developers of MySQL decided "hey, you know, we need to be able to insert things into MyISAM concurrently." Because MyISAM does table level locking, if you do an update or a delete of a MyISAM table, it locks the whole table. Well, for a while in MySQL if you did an INSERT it also locked the whole table. So concurrent_insert is a way to basically say, just because you're doing an INSERT doesn't mean you can't do a second INSERT." You can still insert more than one thing at a time at the same time.

25:09
You can't read at the same time because an INSERT gets an exclusive lock not a shared lock. But you can do INSERT INSERT INSERT because it's MyISAM - so it's Indexed Sequential Access Method (ISAM). So you're just saying "okay, I'm going to go here, and then I'm going to insert something there, and then I'm going to insert something in the next position, and the next position. So it makes it possible.

Now, with concurrent_insert set to 1, what happened was, if there were no gaps in your data - so if you didn't delete or update any variable-length rows - then concurrent_insert will actually insert concurrently. If you do have gaps though even with concurrent_insert set to 1, you won't be able to do concurrent inserts. So I usually set this to 2, whether or not I know I'm going to use MyISAM - definitely if you know you're going to use MyISAM. Because what that does it says even if there are gaps, it will insert at the end. So even if your table is fragmented it's going to insert at the end.

26:05
Sarah: Nifty. I learn something new every time we do this podcast it's awesome",

Sheeri: There you go.

Sarah: So, innodb_file_per_table is one of my favorite options. It's off by default [NOTE: this has changed and is ON by default starting in MySQL version 5.6 and higher] but I much prefer to have the separate data files per database table, as opposed to having an aggregated data file for all of the InnoDB tables.
Sheeri: You're going to have an ibdata file no matter what - usually called ibdata1 - because that's where the InnoDB data dictionary is. But the data dictionary is very very small and if you don't use file-per-table - meaning, you leave it as the default is off, then all of your data and indexes are going to be stored in this one ibdata file. Which is going to be huge!

26:47 Sarah: Yes. And in way older versions of operating systems - if anyone happens to be running 32-bit operating systems out there - thaehuge data files, you end up having to have multiples of them. When you do a delete within InnoDB and it's set with file-per-table off, you're going to be deleting big chunks of data from somewhere inside that large file. And you end up with more fragmentation but you also have a harder time recovering that space. So in the case of "oops I've run out of disk space and I really need some additional disk space on this filesystem for something" even if you delete you know that 20 gig temp table that you had built in innodb that 20 gigs will still be taken up by the ibdata file.

Sheeri: Right, and it will be able to be reused, it just, you won't be able to see it. And it's because InnoDB has estimated metadata, you're not going to get an exact answer on how much data you'll have. Additionally if you're not dropping a table - if you just, say, defragment a table - the space can only be used for that table. Because, it's basically, when you defragment, it kind of moves the table to like the beginning of that section. So if you go from say 20 gigs to 10 gigs in a table, you'll end up having like a 10 gig section of that ibdata file and then 10 gigs worth of space, which can only then be used for that table.

28:25 Sarah: Yeah, it's not as flexible for disk utilization in my experience so I really prefer file-per-table.

Sheeri: Right, now, 95% of the time, this is what you want. Because I know there are gonig to be people who say, "Well, if you have one file per table, and you have 5 million tables, then you're going to CAUSE fragmentation. You're going to have an operating system that can't handle all those files in the directory."

We totally understand that, but for 95% of the organizations out there, and the people using MySQL, innodb_file_per_table is a huge win - if only because when you defragment a table you actually get the space recovered back into the filesystem.

29:00 Another one of the big things that I like is that you can actually see how big the index and data file is on disk. So you get the .ibd file, and you can actually see how big it is on disk. Now when you do that, you can say "Oh look, here's this file, it's 20 gigs on disk and then you can look in the metadata, in INFORMATION_SCHEMA, and you can look at the INDEX_LENGTH and the DATA_LENGTH, add them together and see what that size is.

Now, remember that it's an estimate. But if the estimate is way off then you know you have some fragmentation. For example, if you have this 20 gig file on disk and the metadata says that the data plus the indexes are 10 gig, then you know you have pretty bag fragmentation.
Sarah: Yeah, it's one of the things that you want to be able to watch for, and it's much harder to estimate with a single file.

Sheeri: Well, you can estimate it you just don't know which table might be causing all of it. You can just add it all up...

Sarah: It's much harder to estimate how to fix it, and actually you can't in some cases fix it when it's only a single file.

29;58 Sheeri: Right. And again, it's hard to know how much you have fixed when you've done it because the space isn't reclaimed.

Sarah: Just complicated. We like innodb_file_per_table unless of course you're going to blow out your file system etcetera etcetera.

Sheeri: Right.

Sarah: Our little fine print - asterisk!

Sheeri: Exactly. Well, I mean, you know, it's one of those things that I wouldn't even mention. But I know that people are going to comment and say "no but that's not good in EVERY case". Not everything is good in every case but, you know, we're saying what the defaults we want to be, are, because it's what we found with, you know, all of our - not all of our clients, maybe - but certainly for the vast majority of people - that these are the kind of tuning and optimizations that they need.

Sheeri: So speaking of which innodb_buffer_pool_size. Now the default for the innodb_buffer_pool_ size used to be 8 megabytes currently it's 128 megabytes. Both of which are vastly unsuitable if people are actually using InnoDB tables at all. The InnoDB buffer pool is the cache for both data and indexes for all InnoDB tables.

31:00 So unless you're going to have a very small system - if you're running MySQL on your iPod or something - even then, 128 megs is tiny for cache, even on your iPod, right? If you have a 32 gig ipod or whatever "kids these days have now".

Sarah: Get off my lawn! Take your database off my iPod!

Sheeri: Exactly, exactly.

Sarah: The InnoDB buffer pool is something that you want to be pretty much as big as your memory can allow it to be, if you're running InnoDB as the primary data store.

Sheeri: Yeah, I usually do a kind of a rough guesstimate if I'm using all InnoDB tables. I'll do a guesstimate of say, maybe about 75-80% of the memory on the system. Obviously for larger memory systems I might use a different kind of percentage. For example if something is 32 gigs I'll probably do something like a 28 gig innodb_buffer_pool_size

32:00 So the next InnoDB parameter that we want to talk about was innodb_flush_method, which we usually set to O_DIRECT.

Sarah: O_DIRECT is the access to disk that bypasses filesystem caches. Sheeri and I had a conversation about which operating systems this was viable on and it's unclear on whether the documentation is correct it seems to change between 5.1 and 5.5 and I know that we at Blue Gecko use this a lot, in order to make sure that we have the fastest access to disk and we know exactly where data is cached. So that in the event of a catastrophic failure we have limited the number of places that we can lose data. So we do actually run innodb_flush_method=O_DIRECT on Linux boxes in our default configurations here for anything that's doing any volume of transaction. Or any place that we need extreme confidence that we have ACID compliance

33:03 Sheeri:
So that's all really that we have for InnoDB, it's only 3 variables that we change. It was interesting when we were preparing the show notes for this, we both thought there would be tons more of all these parameters that we would do. And we still have a bunch more parameters to go through. But we both "Wow we're going to talk for hours on this!" And there just really aren't that many variables to change. Which was kind of nice.

Sarah - I always thought that we had more - that the defaults were so so far off, but when we went and got down to the nut of it for the basic everyday "just start with these" kind of space it wasn't as many as I thought it was. It was very interesting

Sheeri: That's it for part 1! In part 2 we talk about default variables we usually set for logging, replication slave, timeouts, connections, and errors. The logging includes slow query logs and binary logging defaults.

34:00
Sheeri:
So this week's near candy is talking about the slow query login and replication.

Sarah: We've got a couple of specific suggestions - and this is super quick of course. We want to suggest that you enable log_slow_slave_statements.

Sheeri: What that does is, if you have a slave and there's a slow query that comes through replication - for example if you have an update statement that takes 20 minutes to run, for example - and you know your long query time is set to something less than 20 minutes you actually won't get that log in the slow query log on the slave. Which can run into problems because you might want to look at a slave to say "hey why was this slave slow?" And it won't actually put in all of the queries that were making it slow at the time. It will only put in the queries that were done on the slave itself. So if you have for example a read slave, it might show that - you know - a read was locked for a long time, and couldn't proceed. But you won't see why it was locked because…

34:57 Sarah: Right because that was happening on the replication thread.

Sheeri: Exactly. So what log_slow_slave_statements does is make sure that ALL statements that are slow - which is longer than long_query_time or uses log_queries_not_using_indexes and all that kind of stuff, and min_examined_row_limit - will be logged to the slow query log.

Now that's something that - again - you can decide whether you want that to default to being used or not - I think it's actually pretty useful.

Sarah: And that's available only in 5.1.45 and up correct?

Sheeri: But if people are using 5.1 they should be using a version greater than that.

Sarah: So, we also have the log_slow_admin_statements option which we recommend that you throw into your slow logging options because it also requests logging of administrative statements such as OPTIMIZE TABLE, ANALYZE TABLE, or ALTER TABLE to the slow query log. So that's another sort of thing that you didn't get to see in your slow query log before and sometimes you see it - just, you know - you'd have an update that was really slow for some reason but you could never identify why and it might have been one of these administrative statements that was blocking it.

36:06 Sheeri: Yep! And that's not whether or not it comes through replication - that's not whether or not it comes through replication, it's whether or not it's done the host - right? - because if you have OPTIMIZE TABLE it might not be put in the slow query log unless you have that parameter of the log_slow_slave_statements.

36:28 Sheeri:
So where you can see us in the next few months…

Sarah: On Monday March 7th at 7 p.m. Pacific time, the Seattle MySQL meetup is happening and Brian Aker will be talking about Drizzle 7 and the upcoming GA versions. So those will be coming out mid-March, I think, if you're actually interested in playing with Drizzle there's going to be - there are, actually - some Amazon AMIs that Blue Gecko built. And there's a post on the Blue Gecko blog from Patrick Galbraith that he has developed 2 AMIs - one that's the box that's all ready to build and develop Drizzle on. Basically you do a check out from their repository and off you go. It's got all the prereqs and then there's another one that has Drizzle installed and all the plug-in goodies from the RPM's that we built on top of the stock AMI. So playing with Drizzle, a little thing there.

37:16 Sheeri: I will be at Collaborate in Orlando Florida from Sunday April 10th through Thursday April 14th I'm organizing the Eastern US edition of the community dinner which Sarah will also be at, on Sunday April 10th at 5 pm at Maggiano's Little Italy near the convention center.

Sarah: So, after I join Sheeri at the community dinner, I will be heading directly to Santa Clara for the O'Reilly MySQL conference. That is Monday April 11th through Thursday April 14th and that includes the MySQL community dinner Western edition - and that's Monday April 11th at 7 p.m. at Pedro's in Santa Clara.

37:52 Sheeri: Also in April I will be speaking about Monitoring MySQL Effectively with Nagios at the Professional IT Community Conference - the PICC conference - in New Brunswick, New Jersey. And that conference is Friday April 29th to Saturday April 30th. And then I will be at Open Database Camp from Friday May 6th to Sunday May 8th in Sardinia Italy.

38:35 Sarah: And that, I think, covers our week.
Sheeri: That's it!
Sarah: As always you can email feedback to podcast at technocation dot org
Sheeri: Or call the comment line at +1 617-674-2369
Sarah: And we'll actually play your comment on our podcast if it's not really too loquacious
Sheeri: Well, even if it is, we'll edit it, that's what editing's for.
Sarah: Yeah, we can edit it down. Or you can tweet to us using @oursqlcast
Sheeri: Or join the OurSQL Facebook group
Sarah: And please continue to give us feedback via iTunes. We're starting to see reviews and little stars.

38:59 Sheeri: Yes! Yes! We have - last time I looked we had five stars, so I don't know if we still have 5 stars, but there were 5 whole ratings and we were at five stars so that was pretty exciting for me!
Sarah: Yeah, and we'll bribe you with swag.
Sheeri: Yeah we had - we still only had one review the last I looked - from Tom Krouper and we're in the process of sending some swag over there - but if you submit a review we will send you some swag.
Sarah: Sweet! So, you have been listening to ourSQL.
Sheeri: The MySQL database community podcast
Sarah: Featuring Sarah Novotny of Blue Gecko
Sheeri: And Sheeri Cabral of PalominoDB
Sarah: This podcast is sponsored by the Oracle Technology Network
Sheeri: The OurSQL sound engineer and producer is Rich Goyette
Sarah: And show notes and past episodes can be found online at www.oursqlcast.com
Sheeri: thanks for listening.
Sarah: yay, thanks!

[theme music]

[blooper]
Sheeri: today is Tuesday March 1st 2011 and I am not Sarah
Sarah: and I am not Sheeri
Sheeri: all right, let's try that again. I'm sorry I'm stealing your lines.
Sarah: It doesn't matter, we could.
Sheeri: I guess it doesn't matter, it's just - it's weird though, isn't it? I don't know. It says you should do it, so you should do it.

[blooper]
[phone rings]
Sheeri: that - hold on a second, oh -
Sarah: Brrrring!