quick method to clone a MySQL database

let’s say you have a MySQL database on db1.db and you want to clone it to db2.db

the “official” way to do this is to run a “mysqldump” on db1.db and then import the resulting .sql file into the db2.db server.

There are problems with this approach:

  • mysqldump locks the source database, making it inaccessible while the dump is happening.
  • mysqldump creates files which may be many times the size of the source database’s binary files, potentially exhausting the space on your source server before it’s even done.
  • the resulting file then needs to be imported into the target server, which could take hours depending on the size.

I needed to clone some databases in a hurry that are about 20G in size. The method I used ended up taking less than half an hour to complete, and the source database (db1.db) only had to be down for less than a minute, instead of the potential /hours/ in the mysqldump method.

  1. use rsync on db2.db to copy the data directories from db1.db to db2.db:
    cd /var/lib/ && rsync root@db1.db:/var/lib/mysql ./ -rva –progress –delete
  2. use rsync on db2.db to copy binary logs from db1.db to db2.db:
    cd /var/log/ && rsync root@db1.db:/var/log/mysql ./ -rva –progress –delete
  3. repeat 1&2 (the first time around would take some time. the second time around will be quick)
  4. on db1.db, stop the database
    service mysqld stop
  5. on db2.db, repeat 1&2 one last time
  6. on db1.db, start the database again, and start the slave service if you need to
    service mysqld start
  7. on db2.db, remove auto.cnf and any innodb log files
    cd /var/lib/mysql/ && rm -f auto.cnf ib_logfile*
  8. start the database, and start the slave if needed
    service mysqld start

With the above method, your source database will be down for only a minute or so (steps 4-6).

The reason that 1&2 are repeated 3 times:

  1. clone the db1.db database from scratch. this will take a while
  2. because it took so long to run #1, there are probably a lot of changes. repeat to get those changes
  3. when you stop db1.db, some files will get final changes as they are changed. grab those after db1.db has been stopped

You need to delete any existing innodb logs (step 7) which might cause the system to attempt to “fix” some tables it might think are broken. but, because we did a clean shutdown in step 4, this is not necessary. so delete the log files (they will be recreated automatically).

If you are doing the clone because you want to create a new slave database, then the database needs a new internal ID that it will send to the master. By deleting auto.cnf, you force the MySQL server to create a new unique ID.

Salting Passwords

The simplest way to store a password in a database is as a plain string

insert into users set email="kae@kvsites.ie", password="password";
["kae@kvsites.ie", "password"]

But, if someone hacks into the server, or you have a malicious admin, then those passwords can be stolen. This is a big security risk as passwords tend to be re-used by people for other purposes, such as PayPal, etc.

So, the next stage is to encrypt the password using a hash such as MD5:

insert into users set email="kae@kvsites.ie", password=md5("password");
["kae@kvsites.ie", "5f4dcc3b5aa765d61d8327deb882cf99"]

That /looks/ secure, but there are huge databases on the Internet with MD5 translations of all words, so it is trivial to hack these.
https://www.google.ie/search?q=5f4dcc3b5aa765d61d8327deb882cf99

The next stage is to “salt” the password by adding a prefix to it before hashing. For example, let’s use “123ghjzxc” as the salt key.

insert into users set email="kae@kvsites.ie", password=md5(concat("123ghjzxc", "password"));
["kae@kvsites.ie", "9f400bac0b5a9b3d66c9c98aae09fab5"]

This is much more secure now. A search for the MD5 hash will not return any results at all (well, this page… but you know what I mean).
https://www.google.ie/search?q=9f400bac0b5a9b3d66c9c98aae09fab5

Another method is to hash the password before prefixing it with the salt, then hashing again. This may be a bit more secure again.

insert into users set email="kae@kvsites.ie", password=md5(concat("123ghjzxc", md5("password")));
["kae@kvsites.ie", "d1dddda63a6dde54fb1740dffe3faa27"];

As an extra step, do all the MD5ing outside the database, so the password is not sent over the wire to the database.

multi-tenanted CMS architecture

Last week, I did a talk at the Dublin Google buildings titled “Multi-tenanted CMS Architecture using PHP”.

Here are the slides that I used:

While talking with Google’s Brian Brazil, he explained that it is actually more efficient to use one database and many separate tables, than to separate each installation into a separate database, so one point I made (that KV-WebME uses separate databases per site) will change in the future.

I think the talk went down well, by the number of questions afterwards.

Last year, I gave a similar talk, and made the mistake of including way too much PHP in it – I had assumed that the audience would be composed of PHP developers. This year, there is just one slide of PHP, and that’s just to illustrate one possible way to build a proxy config.

Lesson’s learned for this time:

  • Talk slower. When I’m explaining something, I tend to try to get as much in as possible, so speak very fast. This makes it hard to hear what I’m saying.
  • More pictures, less words!
  • Stats. Some of the questions were around how efficient certain parts of the method were – particularly on the overhead of piping a file through a script as opposed to simply delivering it via Apache. I need to come up with numbers for that.

Overall, I was happy with this presentation.

CMS Design using PHP and jQuery

I’m happy this week. Last week, I spent some time and organised myself a bit more. In work, things are going smoothly – managed to get over a tricky piece of work and the rest is simply a list of small tasks.

For the last few weeks, I’ve been emailing and messaging Darshana at Packt Publishing, about writing a second book (jQuery 1.3 with PHP is going very well – list of reviews).

I initially wanted to write about file management, to explain how KFM works, and to help force me to improve on it. But there’s just not enough of an interested market in that – it’s too specialised.

So instead, I’ll be writing about CMS design using PHP and jQuery.

We (webworks.ie) have a CMS engine which we’ve written and improved for the last 6 or so years. We’ve open-sourced it a number of times, but never managed to generate much interest in it. We never had the time to spend on publicising it.

The book will not be specifically about that engine, but rather about the concepts that went into creating it – how a CMS works, how to manage plugins, administration, user management, and all the other little bits and pieces that every PHP developer needs to eventually address.

By way of explanation, I will be demonstrating various parts of our CMS, and explaining how and why it was built that way. I will be closely examining the other major CMSes as well, and giving alternative methods where good ones exist.

The proposed chapter list is:

  1. Introduction
  2. CMS core design
  3. User management and access control
  4. Page creation and Navigation
  5. Template Management
  6. Plugins
  7. Form creation
  8. Image Gallery
  9. Panels
  10. Search and Polls
  11. RSS and News
  12. Online Store
  13. Products

I’m really excited about this project!

a tale of prefixes

I debugged a problem for a friend of mine over the last day. He had installed bbPress on a server which also had WordPress on it, and his WordPress site just suddenly stopped serving comments and pages, and started killing the database server instead.

In the end (after a day of considering this and that and a lot of head-scratching), it turns out that the bbPress installation had been done using the same table prefix as the WordPress one.

See, when a web app is installed, it usually installs a load of database tables. The way to avoid these tables overwriting each other is to “namespace” them. The most common method is to simply place a unique prefix in front of the table names.

For example, WordPress and bbPress both have tables for posts. The default installation is to name them wp_posts and bb_posts, respectively, using wp_ and bb_ as prefixes. KFM does that as well, using kfm_ by default.

In the current case, what happened is that the bbPress installation was done using wp_ as the prefix, and some of the WordPress tables were then changed to match what bbPress expected.

In the case of wp_posts, the field post_status was changed from a varchar holding values such as “publish” or “draft”, to a smallint holding just ‘1’ or ‘0’. This had the immediate effect of unpublishing over 6000 articles on the blog. (aside: oi, WordPress – why are you not using enums?)

A few hours, spread over a day or so, were spent figuring that out then fixing that.

So, what actually happened? Why was wp_ used as the prefix?

Let’s look at the WordPress installation.

wordpress

When you install a copy of WordPress, the first thing you do is fill in the database details. This is the field for the prefix:

<input id="prefix" type="text" size="25" value="wp_" name="prefix"/>

Notice the name, “prefix”. A lot of browsers these days include auto-fill, which is sometimes useful. When encountering a form, the browser would fill in the details that it already knows, such as email address, etc.

The problem, though, is that sometimes, autofill might fill in details that really should be left blank or at their default states. An example of this is in phpBB, the popular forum software – whenever an admin goes to edit a user’s details, autofill in Firefox tends to fill in the “password” field in the user form, but it really shouldn’t do that, and can potentially break a user’s account.

This is actually quite easy to fix, though, in the installer – the installation program just needs to change to use a different name for its input box:

<input id="prefix" type="text" size="25" value="wp_" name="wp_prefix"/>

Now the value should only be autofilled by values recorded from other WordPress installations, and should not affect future installations of other software.

The thing is, this is not what broke the system I was fixing.

Let’s look at the bbPress installation.

bbpress

When writing bbPress, more thought appears to have gone into the installation. The only fields visible here are database name, user, and password. These are the most commonly changed details, so are shown prominently.

There is an “advanced” checkbox. When clicked, this shows fields that should only be changed if you are sure of what they do:

bbpress2

The fields shown here rarely need to be changed. They include database host, collation, character set and table prefix.

The default character set is UTF8, and I can’t imagine why anyone would need anything other than that (that also covers collation). My own opinion is that these ones shouldn’t even be in advanced – they should be hard-coded.

There are not many reasons why the table prefix would ever need to be changed, but in the case I was troubleshooting, the database host was held on a separate server to the site itself.

So, because the database host needed to be changed, that explains why the admin opened the Advanced section in the first place.

If the prefix input box was named “prefix”, this would be the end of the story – I might have assumed that the problem was autofill (as I did for a while), and suggest the above solution to them.

But, the authors of bbPress have obviously come across the autofill problem themselves, because they named their prefix button “bb_table_prefix”, exactly as I suggest every installer should do (now do it with WordPress, guys!):

<input [...] name="bb_table_prefix" class="text" value="bb_" tabindex="8" />

So, the problem is not the software.

In this case, I believe it’s human error. What follows is my own guess at what happened, in hind-sight and without the benefit of watching the proceedings in person.

I think what happened is that our admin’s techie had left some notes about the settings used for the WordPress installation, including database username, password, etc. Then the admin had gone to install bbPress, in the correct assumption that installation should be straightforward and compatible with WordPress.

However, I believe the problem is that the techie had left a note about table prefix, but had not explained to the admin that all the other settings must be as written but the prefix must not.

The admin had simply filled in what was written, and bbPress did what it does, and changed WordPress’s database tables.

So, who’s at fault?

I believe the problem here was a misunderstanding – to all of us who have done a million installations of a thousand different software applications, it’s very obvious what the table prefix is, and what it does. However, I don’t believe I’ve ever seen it written down, or the dangers of using an existing value.

Let’s look at the bbPress installation again, this time with the ‘?’ expanded:

bbpress3

It says “If you are running multiple bbPress sites in a single database, you will probably want to change this.” While literally correct, this actually doesn’t explain what the prefix is or why it should be changed. And most importantly, it does not say “This value must be unique in your database, or it may overwrite your existing database tables.” The help text in bbPress’s case does not actually explain the danger of using existing values for the prefix.

The techie is also at fault. If leaving notes for someone, always make sure to explain anything which needs explaining. In this case, it should be something like “If installing new software, use these database access details, but make sure that you do not use these database table prefixes.” In fact, the table prefix should not have been in the notes at all – if the value is a default, then don’t write it down.

The admin is also at fault, for following the notes to the letter and filling in the prefix where the default value was actually perfect. I won’t fault him for installing software when the techie wasn’t at hand to point out mistakes, as there really shouldn’t have been any – the prefix error was actually quite easy to make; it just happened to have devastating effects.

However, these are all minor faults. They’re all easy mistakes to make, and it was the combination of all three that caused the error.

Having said all that, I realise that I have some installers to check now to make sure I follow my own advice!

Mastering phpMyAdmin 3.1

Last year, I reviewed Mastering phpMyAdmin 2.11 for Packt, and said it was overall a comprehensive book, but with a few points that I would change.

I was sent a copy of Mastering phpMyAdmin 3.1 a few weeks ago and after reading it, I can see that most of the points I mentioned have been addressed, and I know from other sources that the points that were not changed have good reasons for them (screenshots and we/our speech are very good for people that are not familiar with the subject).

There are many reviews of this book already, so I will mostly describe how this edition is different from the previous one. In short, though, I would say that this version is much more readable, and is still the best book to buy if you want a book on phpMyAdmin. The reviews I’ve read here all agree with my own assessment that phpMyAdmin is a surprise – right when you think you know what it does, a book points out a load of stuff you didn’t know it did.

Things that are improved

I suggested last year that the 5 page date-by-date history of phpMyAdmin’s delivery be moved from the front of the book to an appendix. At the time, the book didn’t have an appendix. The new edition has two appendices – the first one is for the history, as suggested, and the second contains the old book’s chapter 20 on Troubleshooting. The short history of MySQL at the beginning of the MySQL 5.0 chapter has also been removed – it had no purpose.

A few of the unnecessary screenshots have been removed. I understand the need for screenshots, as they sometimes describe better than mere text what the author means when he says something, but my complaint from last year was that screenshots that were simply not necessary had been put in apparently just to take up space. This has been improved this year, with many of the redundant images removed.

SQL code screenshots have been replaced with just the text of the SQL. This makes the examples much easier to read.

Language issues, such as switching collations, had originally been a full chapter, but is now spread out in the book, where appropriate. This is good, as when exporting a backup (for example), you don’t want to have an important part of the explanation, collation, hidden away at the back of the book.

A lot of the “since 2.6.1”, “since 2.6.0” wording has been removed. This is good, as the reader of the book is obviously using version 3.1 or higher. The author does sometimes mention the version number when adding new text, and I think that’s unnecessary – the reader really should not need to know what version of phpMyAdmin a new feature was introduced in. If you really want, add another appendix listing dates of improvements, but it’s really not needed; that’s what changelogs are for.

phpMyAdmin itself has a number of new minor features that are detailed in the new book. For example, exports (SQL dumps) can be done to Texy! text format, the search wizard has been improved to allow searches for empty/non-empty fields. Support for the PBXT storage engine is added, public query bookmarks, the default browse view’s query can be customised

There are a number of major features which also make their debut here. New features include database partitioning, scheduled events, streamable blobs (for videos, etc). I’m looking forward to making use of streaming blobs!

Things that are not improved

The table of contents is even longer. At 13 pages (versus 12 last year), I still feel it’s too large. My comment last year was that maybe just the higher level headers should be shown, with more detailed contents at the beginning of each chapter. As an example, under the user management section, there is a header for adding a user. That should be enough. Instead, we then have headers describing the username, the password, the hostname, etc. That sort of thing makes the front of the book very heavy. I feel that most readers when opening the book for the first time wonder to themselves when will the contents end and the content begin.

Most of the screenshots have been redone, but in some cases, shots which were perfectly sized in the old book have been exploded up to larger images in the new one, making them huge relative to the content they portray. For example, on page 78, the top screenshot has a total of about four lines of text in it, but takes up almost two inches of page space (and is pixelated), compared to the original on page 75 of the old book at only one inch. I can see this in a lot of cases in the new book (another is the 3 inch bottom image on 132 vs the perfectly legible 2 inch one in the old book’s page 128) and wonder if that helped to add to the number of pages. The new book has 326 pages versus the old book’s 318. On the good side, in some cases, the new screenshots reduced the size of the old ones.

I didn’t mention it last year, but remember puzzling over it at the time. The last part of the Troubleshooting appendix (chapter 20 back then) discusses future enhancements to phpMyAdmin, but the goals are very vague. “improved support of mysql”, “internal code improvements”; these should not be considered goals – they should be considered givens! Besides which, it’s not productive to discuss future events that may or may not happen. They’re not available yet, so shouldn’t be written about in a manual which is about /existing/ work.

Conclusion

I noted a number of improvements, and a number of failures. The improvements outnumber the failures. Each edition of this book gets better and better.

If you are new to phpMyAdmin and are looking for a book, I really do recommend buying this.

bootnote: I have my own Packt book coming out in a few months. I hope that reviewers of that book (PHP And jQuery) will be as critical as I try to be in my own reviews – I feel that criticism is good, and can only help to improve future editions. This year’s edition of Mastering phpMyAdmin is better than last year’s, and I feel that it is because the author was receptive to criticism and improved his book based on it. I hope I can be just as receptive when my own time comes!

optimise all tables in all mysql dbs on your server

A quick script to do so:

<?php
mysql_connect('localhost','root','YOURPASSWORD');
$alldbs=mysql_query("SHOW DATABASES");
while ($dbs = mysql_fetch_assoc($alldbs)) {
	foreach($dbs as $db_name){
		if($db_name=='information_schema' || $db_name=='mysql')continue;
		mysql_select_db($db_name);
		echo '<strong>database: '.$db_name.'</strong><br />';
		$alltables = mysql_query("SHOW TABLES");
		while ($table = mysql_fetch_assoc($alltables)) {
			foreach ($table as $db => $tablename) {
				echo 'optimising '.$tablename.'<br />';
				mysql_query("OPTIMIZE TABLE `".$tablename."`") or die(mysql_error());
			}
		}
	}
}

Based on this original script for optimising all tables in a specific database.

webme, step 2: upgrading and user accounts

Last time, we wrote a basic installer and nothing else. This time, we’ll create user/admin accounts and discuss how upgrading can be done easily.

With user accounts, it is tempting to create two separate tables of users – one of admins and another of ordinary users, but that’s redundant. The same information is saved in both cases, so it makes more sense to have one table dedicated to users as a whole and some way of distinguishing the ordinary users from the administrators.

The way we’ll handle this is through use of Roles. The simplest being to have a table of users and a table of groups, and if a user is a member of the “admins” groups, then they have admin privileges. This is an example of ACL, which can get very complex if you dig into it. I’d recommend sticking to just users and groups unless there is a really good reason to get more in-depth than that.

So, the next thing we need in the CMS is the ability to set up one admin from the installer, and the tables needed to support it.

Logging in will be done via email and password.

Up until recently, the most common method was logging in via username and password, but that has a disadvantage – when you forget your password, it’s also possible that you forgot which email address you registered with, or you no longer have access to the email address.

Using your email address as the login, you are reminded every time you login what email address the new password will be sent to if you forget the old one. And if you know you’ve lost access to the old email address, you can take measures to change the login email before it becomes impossible to do so.

Adapting the installer to ask for these details was not difficult so I won’t detail it.

Upgrades are an interesting problem.

In a non-database environment, upgrades consist of merely unzipping the new version over the old one (or running “svn up” in the root of the site).

Upgrades become much trickier when databases are involved. Upgrading the files might cause an inconsistency, where the scripts may reference database tables that don’t exist or have been changed.

One project I worked on got around this by having a directory which held database patch scripts which an upgrader should run before the site should be used again. That was messy.

Other projects get around this by requiring you to upgrade your files, and then go to a URL (/upgrade.php for example) which will do the database upgrade for you. Better, but still messy.

I think the solution is to have a database version number kept in a file which is upgraded (the front index.php for example), and another version number kept in the site config file, which is not upgraded. When someone visits the site index, it’s a non-expensive process to simply compare those numbers, and if they are different, then an upgrade is automatically performed, and the config file is adjusted with the new version number.

In that way, upgrading is simply a matter of unzipping the new files, and the next time anyone visits the site, it is upgraded automatically.

Anyway – that’s enough for this step. You still can’t create any pages, but we’re almost ready for it.

As usual, you can get this code from the google Webworks-Webme repository. No zipped package available yet, although the next article might be the first which warrants it (we’ll create an actual page and discuss templates).

case-insensitive authentication in postfix-mysql

Just a quick note. If you find that your customers can log into their postfix account ok, but can’t read their email, check the directory their email is in:

[root@postfix /]# cd /home/vmail/domainname.ie/
[root@postfix domainname.ie]# ls -l
drwx------  5 postfix postfix 4096 Sep  4 11:52 joe.smith
drwx------  9 postfix postfix 4096 Sep  9 09:30 jane.smyth
drwx------  9 postfix postfix 4096 Sep  9 21:09 Jane.Smyth

The problem appears to be that postfix authenticates using MySQL, which is case-insensitive by default, then creates a default email skeleton directory named after the login username if it doesn’t find one already.

You can see in the above example that our user has logged in using uppercase letters in the username, MySQL verified the user had a right to be there, then postfix created the user’s directory using the login credentials, even though a lower-case version of the directory name already existed…

Obviously, this incorrect account will not receive email – email will be sent to the correct one.

There are a few solutions:

  • you can remove the Jane.Smyth directory and tell the user to change their authentication to use a lowercase username.
  • you can remove the Jane.Smyth directory then create a symbolic link from jane.smyth to Jane.Smyth.
  • change your MySQL installation to use a case-sensitive collation.

Personally, i go for the first solution.