3am, daughter wide awake, not allowed to sleep – what’s a guy to do? Let’s do an experiment.

Let’s say we want to efficiently select all points in an area from a database. This has real-world applications – I’ll be using it in a geographical project very soon.

First, create a simple table in MySQL. I’ve created mine in a database called ‘geodb’.

CREATE TABLE points(x INT, y INT, INDEX(x), INDEX(y));

Note that x and y are both indexed.

Then, seed that table using some PHP (I had to up the max_execution_time on my laptop to 300 for this).

<?php mysql_connect('localhost','username','password'); mysql_select_db('geodb'); for($i=0;$i<10000000;++$i){ $x=rand(-1000000,1000000); $y=rand(-1000000,1000000); mysql_query("INSERT INTO points (x,y) VALUES ($x,$y)"); }

Ok. For the rest of the experiment, we’ll be trying various ways to extract the number of points within a radius of 100000 from (0,0). The goal is to have the lowest working time. Each method should return the exact same result.

First, from the console, do a straight `select`

statement.

time echo "SELECT COUNT(x) FROM points WHERE SQRT(x*x+y*y)<100000" | mysql -uusername -ppassword geodb

Returned result 7993 in .414 seconds. Wow – pretty quick already… but, my daughter is still awake, so let’s continue.

We can improve this by avoiding the math on points that we are certain can not be in the area. For example, in a radius of 100000 from (0,0), and points with x<-100000, x>100000, y<-100000, y>100000 can definitely not be in the circle.

time echo "SELECT COUNT(x) FROM (SELECT x,y FROM points WHERE x>-100000 AND x<100000 AND y>-100000 AND y<100000) AS sub1 WHERE SQRT(x*x+y*y)<100000" | mysql -uusername -ppassword geodb

.326 seconds. Better. However, there are two calculations being performed on each value. Let’s reduce that.

time echo "SELECT COUNT(x) FROM (SELECT x,y FROM points WHERE ABS(x)<=100000 AND ABS(y)<=100000) AS sub1 WHERE SQRT(x*x+y*y)<100000" | mysql -uusername -ppassword geodb

.297 – more than 25% fster than the original.

It should be possible to reduce that further. For example, we can be certain that x,y values which are both less than (100000*Cos(Pi/4)) are contained inside the circle, so that’s another < comparison, reducing the number of maths operations. I’d test that one as well, but my daughter is finally asleep in my left arm as type.

I’d be interested in knowing how (on your system) would this work:

time echo “SELECT COUNT(x) FROM (SELECT x,y FROM points WHERE ABS(x)<=100000 AND ABS(y)<=100000) AS sub1 WHERE (x*x+y*y)<(100000*100000)” | mysql -uusername -ppassword geodb

.226 – nice spot! I should have noticed that optimisation as well. That’s what I get for testing at 3am. Almost down to 50% of the time for the unoptimised version.

I have a little work to do now, but I’ll try the center square idea (the last one I mentioned) when I’m finished, to see if it offers any further optimisation. My guess is that it will only shave a few ms off it.

In case it isn’t clear to others, calculating a square root is a rather “expensive” operation for the database engine – not to mention that in this case it is entirely unnecessary. If we wanted to know “how far” each point is from our reference point, the calculation would be necessary. However, since we only want to find out if it is “nearer than” the given radius, we can actually factor out the calculation of distance, and based on Pythagoras’s theorem decide that if “x to the second power”+”y to the second power” is less than “radius to the second power”, the point is in fact within the circle we’re interested in.

Damn I wish I’d paid attention in my maths classes while in school. These tidbits come in handy from time to time, and I really hope I’d have managed to get more of them in my head when they were given for free ğŸ™‚

s prazdnikov vasl

I use this solution:

CREATE TABLE points (y INT , x INT, y INT)..

id = y * (yMax) + x;

SELECT FROM points WHERE id IN (xxx);

wery fast …