a working computer

My home computer had been freezing randomly pretty regularly, so my boss divvied out the cash to get new parts to replace the broken bits (it helps that I do a lot of work at home (some of which is unsolicited)).

As I was upgrading the motherboard, and the new system wouldn’t work with my CPU and RAM, I got a handy speed boost out of it as well – out with the 512MB & AMD K6 1.7GHz, and in with 1GB & Intel Celeron D 2.66GHz.

Must say – my initial thoughts are: “holy shit, this is comfortable!”.

I have my x server running with lots of translucency, which would have caused my old system to slow to a crawl, but this one just shrugs and carries on! Bloody fantastic!

At last, I can see what the xcomposite hackers were raving about. Can’t wait for all the other gl-style goodies to get integrated! If only I was a C programmer, I’d jump in and try help out.

Now, I’m gonna see if I can get Thief 3 running through wine, before getting a bit of work done.

How the jsquash compressor works

Last week, I released a compression application for JavaScript files. The algorithm, while still a little buggy, turns out to be very strong (in comparison to other JavaScript compressors), so Christof Donat and myself decided we’d try translating it into C++, so it could be used for, say, adding to a context menu so you could right-click on a JavaScript file in a file manager, and tell it to zip itself.

The algorithm I used is fairly simple – you find groups of characters that are the same, and replace them with “code” characters which stand for those characters.

For example, say you want to compress the following line (only using lowercase letters and a few punctuation marks, for simplicity):

alert(‘peter piper picked a peck of pickled pepper.\na peck of pickled pepper peter piper picked.\nif peter piper picked a peck of pickled pepper,\nwhere’s the peck of pickled pepper peter piper picked?’);

Using the application, we come up with this:

var a='/-+*&%$'.split('')
var b='$ p$eter piper picked$eck of pickled pepper$\\\\$\\'$'.split(a[a.length-1]),c='alert(%p+ a-*.&na-*-+.&nif-+ a-*,&nwhere%s the-*-+?%);'
for(e in a)c=c.split(a[e]).join(b[e])
eval(c);

The first thing the application does is to get rid of all the unnecessary spaces, comments and semi-colons in the original. I won’t bother explaining that, as it’s simple to understand from the source, and can’t really be improved much (well, the implementation probably can, but I think the results are pretty much ideal). The goal of this article is to explain the compression algorithm, so that readers can potentially help improve the compression.

The first thing the algorithm did was to find all characters from ASCII character 32 (space) upwards to 255, that did not appear in the original. In this case, this included ‘/-+*&%$’ and a whole lot more.

From those characters, a delimiter is chosen, a replacement for backslashes (if they’re used), and a replacement for the apostrophe, which will be used to contain strings in the encoder ($, & and %). Then the compression starts.

This method follows the algorithm used by Chris’s application here. His method basically starts finding strings of length x (user-defined – 15 is recommended), and figuring out how much space would be saved if the copies found were replaced by single characters. Then, the algorithm starts again with x-1, and so on.

There is a problem with that approach, which becomes clear when you try it from the other way around (start with 2 and move up from there).

So, we start at 2. First, we check if the string composed of the first and second characters of the original (‘al’, in our example) appear multiple times. If not, we move on and test the 2nd and 3rd letters (‘le’).

‘le’ does appear a few times, so we can run a quick calculation to see how many characters would be saved if we replaced ‘el’ with ‘*’ (the next unused character in our list). This is 2 (the length of ‘le’) multiplied by 5 (the amount of occurances of the string) minus 5 (the number of replacement characters) minus 2 (two delimiters) – 3 (dunno how to describe this one – the replacement character plus the replaced string, in the end-results encoded strings).

The same test is then run with the 3rd and 4th characters, etc. Whenever a string replacement is found that comes up with a higher score than the current highest, it becomes new highest, and is recorded as the “best” replacement, for when we do the actual replacement.

Note that there is already some inefficiency here – there is no point doing the calculation test when you are testing the third occurance of ‘li’, for example – it’s already been calculated. So, we only do the calculation test if the occurance you’re testing is the second one. This ensures that A) the occurance is not unique (which would be a waste of time calculating), and B) that you only ever test a particular string once. Another way to do this would be to keep a hash array of already-tested strings, but it’s debateable whether that would be quicker (and I can’t be bothered testing it…).

Another point of inefficiency, is that if the original is 20 characters long, and your current highest is a 2 character string that occurs 3 times, then there is no point testing strings made from the 15th and 16th characters and beyond, as there’s no way this could improve the score – so, we record a “stop searching at” variable, which tells where to stop looking from. This point is the length of the string minus the length of the best match multipled by the amount of occurances of the best match.

After reaching the end of the test for strings of length 2, we start again with 3.

Another efficiency note, which explains why it’s better to start from 2 and work upwards, than to start from 15 and work down – since we already know that the string ‘al’ only occurs once in the original, there is no point even looking at the string ‘ale’. To work with this, we don’t just start with the first character, test that, move to the second, test that, etc – instead, we start off at 2 characters with an array of starting points, which coincidentally, happens to be every single character in the original. A second array is created for the next “level” (when we start testing length 3), and each time a multiple-occuring string is found, its position is added to the second array. At the end of the length 2 round, we have an array of starting points for length 3.

This dramatically speeds up the search, but also tells when the algorithm can’t proceed any further (when the new array is empty, there is no point continuing).

When we come across the point where the new array is empty, we then do the actual replacement, with the current “best” match. In our case, this is “eck of pickled pepper”. I’m not sure why it’s not “peck of pickled pepper”, but that’s probably why I need help with the algorithm (I’ve developed it beyond my own ability to understand 😉 ).

After that, we simply repeat, until the “best” match results in a negative score, at which point any further replacements will make the result grow instead of shrink.

Then we package it up prettily into that piece of JavaScript and that’s it!

If there are any questions or suggestions, please comment.

opera 9 beta

Opera have announced the first beta of Opera 9.

A quick look through the changelog shows these tasty titbits

  • Allowed self closing SCRIPT tags in HTML documents. This is one thing that’s annoyed me since the release of XHTML1 – that <script type="text/javascript" src="blah.js" /> was totally legal, and made perfect sense, yet would screw up in some browsers.
  • Changed default UserAgent string to identify as Opera. Thank you! This will help reduce the size of my browser sniffer object – at the moment, it recognises IE as anything that claims to be IE, and does not also claim to be Opera, the latter being an idea of very questionable genius on Opera’s part.
  • Opera now passes the Acid 2 test. Three major browsers down, Firefox and IE to go.
  • Added support for CSS 3 opacity. That’s all the major browsers that can do this now, although IE still needs a filter hack to support it.
  • Added support for CSS 3 attribute and UI selectors. Not sure what the UI selectors are, but the attribute thing would allow you to, say, bolden links to a certain page with a[href="http://blah.blah/test.php"]{font-weight:bold}
  • Allowed positioned elements to appear in front of iframes and objects. Although I don’t use iframes and objects much, anything that disobeys the z-index directives are looking to be smacked. IE7 fixed their own version of this bug, which involved the <selct> object.
  • Implemented designMode for rich text editing. Hallelujah, there is a god! Tonight, I will be hacking at FCKEditor to get it working in Opera 9. Now the only browser lacking in this is Konqueror.
  • Implemented support for canvas, as described in the Web Applications 1.0 draft, as well as the opera-2dgame context. I have not used this much, but this should promise for some very interesting Flash-like effects using plain old JavaScript. There are analogues of this in Firefox, Konqueror and IE. I don’t know if Safari has a Canvas element yet, but I’m sure it won’t be long!
  • Added support for onreadystatechange events, and the readyState property. very important for Ajax

There are some not quite so great things in there as well, though.

  • Body element now uses margin instead of padding by default. – wtf? surely anything which is outside the <body> should not be seen at all?
  • Removed support for “javascript:” URLs in CSS. – aww! I never even got to play with that!

In summary, this looks like a very major upgrade to the browser.

jsquash, a javascipt compressor

Last week, I wrote the first part of a JavaScript compressor – it basically removed any unneeded spaces and semi-colons.

This week, I decided to write the compression part. I believe I’ve done pretty well, even if it’s quite slow to run.

try it out! (tested in Firefox. if this breaks your browser, …sorry!)

The two engines I was influenced by are the Dean Edwards’ “Packer”, and dithered.com Chris’s compressor (I don’t know his surname). Dean’s engine is incredibly fast, but I felt that the method used by Chris’s was a bit easier to understand and extend.

There were some design considerations, other than “compress the javascript”. My main goals were:

  • don’t hog the CPU. I carefully broke loops apart into setTimeout blocks, to simulate threading. this makes the script very slow to finish (hours in the case of large scripts!), but it means that the browser is usable while it’s working away.
  • don’t restrict the users’ code. The script figures out for itself what characters it can use in its substitutions, so you don’t need to avoid using any characters.

Surprisingly, the script appears to create stronger compression than both Dean’s and Chris’s scripts. I am surprised by this because Dean’s algorithms are usually beyond my touch.

A handy thing about this script is that you do not have to wait for it to finish – at any time, you can click the “save” button to export a copy of the current compression – the script will continue in the background.

The dithered algorithm runs in the opposite direction to mine. In that script, you tell it the maximum size string to search for, and it searches for replacements with that length and then gradually lower. Mine starts with a default of 2, and searches upwards, depending on whether there is a reasonable chance that there are strings of a larger length to be found.

Now, to work on speeding it up!

laptop unusable

*sigh* The backlight in my Dell C610 died today. I could sense its imminent demise for the last week, but it finally shuffled off its electric coil today.

From what I’ve read, it’s difficult-to-impossible to replace a broken backlight, so it may be time for a new lappie.

Goodnight, sweet lappie; may flights of angels wing thee to thy rest.

using FCKeditor’s file manager for your own CMS

I’ve just finished round one of the debugging of our new product, WebME 1.0 (which I’ll be announcing soon).

Part of the debugging process was to try reduce the redundancy in the various parts of the backend.

One annoying aspect was the amount of image and file upload mechanisms I have in the admin area, for various parts. So, as I was rewriting one part of it today anyway, I thought I’d try attach FCKeditor‘s file manager to my own forms.

This ended up being vastly easier than I thought it would be!

the demo

For example, let’s say your fckeditor is located in /demos/FCKeditor/, the following will add the location of a file into an input box:

var input=document.createElement('input');
input.id="image";
input.style.width='400px';
var f=function(){
	window.SetUrl=function(value){
		document.getElementById('image').value=value;
	}
	var filemanager='/demos/FCKeditor/editor/filemanager/browser/default/browser.html';
	var connector='connectors/php/connector.php';
	window.open(filemanager+'?Connector='+connector+'&Type=Image','fileupload','modal,width=600,height=400');
}
input.onclick=f;
document.body.appendChild(input);

You can then use that value in your form.

This takes all the hassle of file uploads away, as you allow the FCKeditor’s file manager to handle all that ickiness.

javascript compressor

Update: Further work has since been done on this

I was a bit bored when I got home, so decided to start on a javascript compressor. There are many of them out there already (I particularly like these two), but there is nothing quite as satisfying as your own work.

The work so far is here.

The project so far has been fascinating. I realised immediately upon starting that I couldn’t simply apply rules like “swap all multiple spaces with single spaces” and such, as that would break any strings or regular expressions in the code.

So, I’ve basically had to write a JavaScript parser – the function reads the code in, one character at a time, and decides whether to keep it or drop it.

An advantage to this is that I can segment the code into different “types” of code, allowing me to parse strings, regexps and “command”-style code differently.

I’m sure this is probably old hat to those college types who had to write compilers in class, but it’s pretty fascinating to me.

Tomorrow, I hope to detect code such as the following:

var a=function(){alert('test');}
alert(a);

And convert it to this:

var a=function(){alert('test')};alert(a);

In the above, note the addition of a ‘;’ after the function declaration – as far as I know, no other javascript compression code actually corrects badly written code like that. I’d like mine to be the first.

I have other plans for this, but will wait to see if I am still interested in the morning, before working on them ;-).

contact juggling

It’s been a few years since I gave my site ContactJuggling.Org away, because of emerging mental difficulties (I eventually was diagnosed with depression, and am currently on drugs to prevent relapse), but I sometimes look in, to see how things are going.

Please view this video. It’s an example of what I always thought was so beautiful about the art. While the moves are not especially difficult, the artist manages to make a stunning performance. Ignore the shaky camera work – I’m sure the photographer has been shot or has apologised 😉

I must get back into this – it was really a fantastic period in my life, to see the reactions on people’s faces when they saw this for the first time.