innerHTML in php-dom

DOM does not officially have an innerHTML parameter, but it’s incredibly useful. I found a need for something similar when working on some DOM stuff, so had to write a version.

function string_getInsertedString($long_string,$short_string,$is_html=false){
  if($short_string>=strlen($long_string))return false;
  $insertion_length=strlen($long_string)-strlen($short_string);
  for($i=0;$i<strlen($short_string);++$i){
    if($long_string[$i]!=$short_string[$i])break;
  }
  $inserted_string=substr($long_string,$i,$insertion_length);
  if($is_html && $inserted_string[$insertion_length-1]=='<'){
    $inserted_string='<'.substr($inserted_string,0,$insertion_length-1);
  }
  return $inserted_string;
}
function DOMElement_getOuterHTML($document,$element){
  $html=$document->saveHTML();
  $element->parentNode->removeChild($element);
  $html2=$document->saveHTML();
  return string_getInsertedString($html,$html2,true);
}

Okay, this is outerHTML, not inner, but if you want the innerHTML, then just do something like this:

$innerHTML=preg_replace('/^<[^>]*>(.*)<[^>]*>$/','\1',DOMElement_getOuterHTML($document,$element));

There is possibly a better way of doing this, but the above worked for me in the “throwaway” code I was writing.

pre-parsing HTML for incorrectly-sized images

Every now and then, I get a call from a client who is puzzled why their site is running slow. I would look at their page and see an innocuous image inserted into a paragraph. When I examine the image, though, I see that the client has artificially resized the image using HTML.

One recent example showed on-screen as a 300px-wide image. When I examined it, it was actually 3000px wide (approx). As explained to the client, this had the effect of forcing the browser to use about 100 times more RAM (not counting the overhead of the transformation to 300px-wide), and the download was slower as well.

One solution to all this is to teach all clients how to resize images before they upload them. I did that in this case. But it’s not the easiest solution, and people forget how to do things.

Another solution was proposed by Ken, and that is to parse any submitted HTML for images and check that the size they claim to be is actually correct. he said that he’d had the idea ages ago but never implemented it. I think its time has come, so let’s do it.

There are four ways that images can get resized. through HTML parameters, inline CSS, selector-based CSS and JavaScript. We will address the first two, as the others would be too complex to solve in a small application.

How this will work is that resized images, if detected, will be adjusted in the HTML so their ‘src’ parameter points to a pre-created resized version of the image. The entire script is run when the HTML is submitted into a CMS, before the HTML is placed in the database or published to a file.

First, we need to detect image sources and their assigned sizes.

Here is some sample HTML with images from this site.

<p><img src="http://verens.com/wp-content/themes/mandigo-14/images/green/head.jpg" width="76" height="24" /></p>
<p><img src="/wp-content/themes/mandigo-14/images/green/head.jpg" style="width:76px;height:24px" /></p>

What we want is a function which, when fed that HTML, returns HTML which is modified such that images with incorrect widths and heights have their srcs modified to point to a pre-resized version, which is created using ImageMagick.

Here it is:

define('WORKDIR_IMAGERESIZES',$_SERVER['DOCUMENT_ROOT'].'/demos/html_imageresizer/f/');
define('WORKURL_IMAGERESIZES','/demos/html_imageresizer/f/');
function html_fixImageResizes($src){
	// checks for image resizes done with HTML parameters or inline CSS
	//   and redirects those images to pre-resized versions held elsewhere

	preg_match_all('/<img [^>]*>/im',$src,$matches);
	if(!count($matches))return $src;
	foreach($matches[0] as $match){
		$width=0;
		$height=0;
		if(preg_match('/width="[0-9]*"/i',$match) && preg_match('/height="[0-9]*"/i',$match)){
			$width=preg_replace('/.*width="([0-9]*)".*/i','\1',$match);
			$height=preg_replace('/.*height="([0-9]*)".*/i','\1',$match);
		}
		else if(preg_match('/style="[^"]*width: *[0-9]*px/i',$match) && preg_match('/style="[^"]*height: *[0-9]*px/i',$match)){
			$width=preg_replace('/.*style="[^"]*width: *([0-9]*)px.*/i','\1',$match);
			$height=preg_replace('/.*style="[^"]*height: *([0-9]*)px.*/i','\1',$match);
		}
		if(!$width || !$height)continue;
		$imgsrc=preg_replace('/.*src="([^"]*)".*/i','\1',$match);

		// get absolute address of img (naive, but will work for most cases)
		if(!preg_match('/^http/i',$imgsrc))$imgsrc=preg_replace('#^/*#','http://'.$_SERVER['HTTP_HOST'].'/',$imgsrc);

		list($x,$y)=getimagesize($imgsrc);
		if(!$x || !$y || ($x==$width && $y==$height))continue;

		// create address of resized image and update HTML
		$dir=md5($imgsrc);
		$newURL=WORKURL_IMAGERESIZES.$dir.'/'.$width.'x'.$height.'.png';
		$newImgHTML=preg_replace('/(.*src=")[^"]*(".*)/i',"$1$newURL$2",$match);
		$src=str_replace($match,$newImgHTML,$src);

		// create cached image
		$imgdir=WORKDIR_IMAGERESIZES.$dir;
		@mkdir($imgdir);
		$imgfile=$imgdir.'/'.$width.'x'.$height.'.png';
		if(file_exists($imgfile))continue;
		$str='convert "'.addslashes($imgsrc).'" -geometry '.$width.'x'.$height.' "'.$imgfile.'"';
		exec($str);
	}

	return $src;
}

The return string from calling that function with the above HTML is this:

<p><img src="/demos/html_imageresizer/f/6bf7dd2b8232448e85d7fa9cd1009b44/76x24.png" width="76" height="24" /></p>

<p><img src="/demos/html_imageresizer/f/6bf7dd2b8232448e85d7fa9cd1009b44/76x24.png" style="width:76px;height:24px" /></p>

Here is an example of it running, and here is the source of that demo.

mounting an LVM volume partition

Let’s say you have a hard-drive to mount, and you try it:

mount /dev/sdc2 /mnt/hdc2

…and it reports an error.

mount: unknown filesystem type 'lvm2pv'

Took me a while to get this, but basically, you’re trying to load a partition from a volume, which is a pseudo file system. What you see in /dev (/dev/sdc2) is not actually the partition you think it is – it’s a container. the partition is contained /inside/ it (please correct me if I’m wrong).

So, the way you do this is:

[root@localhost ~]# pvscan
  PV /dev/sdc2   VG VolGroup01   lvm2 [74.41 GB / 32.00 MB free]
  Total: 1 [74.41 GB] / in use: 1 [74.41 GB] / in no VG: 0 [0   ]
[root@localhost ~]# lvchange --available y /dev/VolGroup01/LogVol00
[root@localhost ~]#

Then the mount is done using the name noted above (LogVol00 is obtained by looking in /dev/VolGroup01/).

[root@localhost ~]# mount /dev/VolGroup01/LogVol00 /mnt/hdc2
[root@localhost ~]#

et violin, you’re set.

review: Pro PHP – Patterns, Frameworks, Testing and More


Author: Kevin MacArthur, Publisher: APress

Overview: this book is absolutely jam-packed with information useful to the medium-advanced PHP coder. SPL is described over a few chapters, and a quick intro to Zend’s MVC framework is provided. Of particular interest to me were the final chapters, to do with certificate-based authentication, and a chapter near the beginning describing the upcoming features of PHP6. Great book – I really enjoyed it.

Technically, this book is hard to fault. Kevin is very knowledgeable about his stuff and puts across that knowledge easily. It was a real pleasure to read. There were a lot of things in the book that I had only the vaguest idea about before hand – like Phing and Xinc – I will definitely be sitting down to read more about those techs when I get the time.

The book covers SPL, MVC, PHP6, and discusses issues such as continuous integration, web 2.0, source repositories, and digital certificate authorisation.

Kevin states at the beginning that this book was written for advanced PHP developers. I would posit that the book should be given to moderate developers who are looking to develop their project management skills – a lot of pages were devoted to tools and methods that are very useful for managing medium to large projects (continuous integration, MVC).

It is very hard to find fault with this book, but I’ll do my best!

While the title of the book mentions “frameworks”, only the Zend Framework is actually looked at. Not a single other framework was named, although it was mentioned that they exist. I think this is just not on – at the least, Kevin should have provided a few reasons why he chose to describe Zend over everything else. I was looking forward to reading more about such things as Cake, Symfony, et al.

The testing and continuous development sections were not long enough – the author practically raced through the description of continuous integration and did not spend much time on it. I was hoping for some discussion on such issues as keeping databases uptodate throughout development. In a book with this much information, it’s hard to focus on everything, but I think more time should have been spent on this crucial problem in development.

The “web 2.0” section covered the JavaScript XMLHTTPRequest object but not much mention was given to the higher-level stuff such as JavaScript frameworks and integration of those frameworks with PHP – I like the Sajax and Xajax libraries myself. Other people might go for the more mainstream (and complete) frameworks such as Dojo, MooTools and jQuery. Either way, I think this section should have included more integration with PHP or have been left out altogether.

SOAP was covered in the WebServices section, but not much mention is given of XML-RPC, REST, etc. It’s also not mentioned that JSON (my favourite object representation, described elsewhere in the book) can be used as a transport language for WebServices as well. This appears to be the same problem as the Zend section – Kevin chose a single tech to describe, without giving a good reason why he chose that or even what the alternatives are.

Forgetting about those minor details, I’d have to admit that that was a damned fine read. I would buy the book, and if you’re a serious PHP developer, so would you too.