idea for music recognition, conversion and composition using artificial neural networks

I had this idea while walking the kids to school. Starting from a simple network that can classify music styles as rock/metal/classical/folk/etc, I think that it would be possible to adapt the same algorithm to convert a music file from one style to another, and even write music from scratch in whatever style you want. And if I’m right, I think it would be very simple to write.

Recognition

This is the simplest task. To recognise the style of a music file, all you need is a feed-forward network with a few thousand inputs, at least one hidden layer, and one output for each style you want to recognise.

A standard data rate for recorded music is 160kbps. That means that every second, there are 10,240 separate wave heights (160*1024/16) that need to be examined. Of course, you can recognise music using lower bps values, but let’s use the same setting for the whole process (160 will be wanted for later parts).

So, the input layer would need 10,240*n inputs, where n is the number of seconds you want the network to sample in order to determine the style. In some cases (metal/classical), you may get away with sampling just a single second, but for better results, you might want a larger value. I’ll be setting n to 300, so it samples the entire song in most cases. This makes it easier to be accurate about the result, but will also be useful in a later stage.

The output layer needs to have one node per tag you want to measure. For example, you might have an output that measures how “rock” a song is, and another that measures how “baroque” it is. You could use output nodes that return a simple Yes/No result, but there is a good reason to return a more linear certainty instead (which we’ll get to).

The hidden network needs at least one neuron, obviously, but I don’t think there is any way to say exactly how many it needs, so it would be better to use a network model which grows automatically as it learns (I don’t know the technical term – I just build the things!).

After building the network, you need to train it. This is the easiest part – you just need a large database of music, and tags for every one of those tunes.

One handy idea: if you’re training a 5 second network (for example), then a 3 minute song has at least 36 completely separate training sets for you to sample – all you need to do is start linking to the inputs at second 0, 1, 2, .5, etc, and the network will see what it thinks (initially) is a completely different data set.

After training this for a while, you should be able to run a few seconds of a song through the network and have fairly accurate results of how “funk” or “jazz” a song is.

Conversion

After figuring out the above, I started thinking of alternative uses for the idea, and one surprising idea took hold.

Let’s say that you have a “folk” song played on guitar and violin. How would you go about making it “metal”? You could start by fuzzing the violin and distorting the guitar, and maybe adding some drums in.

I think it would be possible to write a program which lets you convert a song from one style to another literally at the click of a button.

Remember I mentioned that the output neurons should say how metal/classical/etc a song is, not just that it is or is not.

If the network is written with enough precision, then adjusting one or more of the input values should give a different value in the outputs.

As an example, let’s say you have a folk tune that you want to convert to neo-punk. Adjusting the inputs such that the sounds are more distorted (clipping high values, for example), or faster (shifting later inputs to the left, maybe) might change the tune’s “neo-punk” output from 0.00024 to 0.00025.

If you repeat this over and over (automatically, obviously), discarding changes that reduce the output and repeating changes that increase the output, until the “neo-punk” output reaches an acceptable threshold such as .9, then you have just created an automatic way to convert a tune from one style to another.

I think this has a lot of applications. For example, let’s say you want to convert a piano tune to guitar? You train your network to recognise what piano and guitar tunes sound like, and then simply convert as above!

Composition

This may be the simplest of the lot.

After creating the above programs, try inputting a sound sample of pure static into the conversion program, and tell it to convert the static to piano. I think it would come up with some interesting tunes. Maybe not completely accurate tunes, but they would be interesting.

I think the network would automatically learn rules about harmony and rhythm, but don’t think it would learn about structure. For example, you could train a network to recognise a 3/4 rhythm, but I don’t know if you could write something that recognises a fugue.

New clavichord project

My last clavichord project failed at the last moment, but not through lack of momentum. I got right to the point where I could play a full scale on it, but had to stop there, because I had learned enough from the project to realise that it would not work properly in the end.

The project I had envisioned was a clavichord built from very easily-sourced material: plywood. And the strings were made from high-tensile wire, using only one strand for the higher notes, and two or more for lower-frequency notes.

I didn’t actually expect to get as far as I did. My main intention with this project was to figure out exactly how clavichords work. This was a practice run.

Some mistakes learned from the last project:

  • The key tangent positions are crucial. If they’re off by even the slightest amount, you will miss the string or (even worse) the keys will overlap with each other.
  • When the strings are on, the tension created can warp the clavichord, making it bow in the middle, thus wrecking all your careful measurements and tunings.
  • It’s very hard to find explanations online about how /exactly/ sound-boxes work, such as how to make sure all notes sound equally loud, where to place ribs (if needed), the effects of the various measurements and materials.
  • Tuning is hard.

The new project will address these. I’m planning on building something which will address each of these problems, and also will allow me to test a few things I’m unsure of, and change things easily.

Firstly, the body will not be build as a solid rectangular block, as the last one was.

Instead, it will be built as a lightweight scaffold from metal rods bolted together. This allows me to easily re-arrange it if needed.

To stop the bowing, I will build a truss rod into the base, so if the strings cause the body to bend upwards, I can counter this by tightening the truss rod, pulling it back into shape.

To counter the tangent position problems, each key will be an adjustable three-part lever, which can be bent into shape, then “bolted” once it is correct.

Because the sound-box will probably be the hardest thing to get right, I have the idea of a removable box, so I can experiment with different materials and shapes. To make this possible, the bridge (which connects the strings to the sound-box) will be raise-able in its entirety, so the sound-box can be slipped out under it.

In a traditional “double-strung” clavichord, the string is looped around a tuning peg on the right side (next to the sound-box), pulled across the bridge, across the body, and looped around a pin, then back across the body, across the bridge, and looped to another tuning peg. I don’t really like the design of this, so will be changing it in mine.

In mine, each string will have a “ball end”, like a guitar string. the ball end hooks to the right end of the clavichord, and the string then is stretched over the bridge, across the body, then around a positioning pin and into a machine head. Machine heads are much easier to tune than tuning pins. This method also makes it easier to single-, double- or even triple-string different parts of the clavichord. In pianos, for example, the bass notes are single-strung using very heavy wire, and the treble notes are triple-strung using light wire.

I will also be adding a microphone and jack to mine, so the clavichord can be optionally amplified.

Some even more far-out ideas:

  • Add a touch screen and small computer (Raspberry Pi?) which can be used to display sheet music.
  • This could also be used to display a tuner, such as the awesome DaTuner Pro for Android.
  • And the most difficult: automatic tuning. A robot mechanism for turning the machine heads and picking/tapping the strings automatically to tune to Well-tempered, Pythagorean, Mean-tone, or any other tuning.

Well – that’s the plan! Now to watch some Red Dwarf and forget all about this madness…

straightening an image of horizontal lines

I’m working on a mobile app for photographing sheet music and then playing it.

When I first approached this, I considered using a Hough transform, which is a mathematical tool for finding lines in an image. It produces a matrix based on MC space (the tangent and y offsets of the lines).

I could then use the matrix to figure out what was being shown on the sheet.

That method is very computationally expensive.

After considering it, trying it, then abandoning it, a better solution came to me while I was thinking about something totally different.

Sheet music is composed of mostly horizontal lines, while everything that is not a horizontal line is part of the notation itself.

So, all I need to do is first locate the horizontal lines, and everything else will be easy to find.

The first problem, then, is how to make sure that the sheet itself is level.

How I ended up doing this was to measure mean difference of the average colours of each ‘y’ coordinate of the image, and try offsetting one side of the sheet up and down until I reached the maximum mean difference.

This is easier to understand visually.

Let’s consider this image:

As a human, we find it easy to spot the skew and fix it, but a computer is not so intuitive.

Here is the same image with the “x” coordinates of each “y” coordinate averaged out (motion-blurred, basically)

That’s a simple average of the “x” coordinates, and there already appears to be a pattern.

Next we shift/skew one side of the image up or down a few pixels and test it again. In my tests, I use a naive “brute-force” test of all offsets from -15 to +15. Here are blurs of a -11 offset and a +11 offset:


-11

+11

Obviously, the right one is the -11 one. But how do we tell a computer what the “obvious” solution is?

Well, the right answer is probably to come up with a way to measure which one is more “noisy”, but I couldn’t think of a simple way to do that.

Instead, what I did was to measure the average colour in the each image, use that average to find the mean difference in each image (how far from the average “gray” each line is), and the one we are looking for is the one with the highest mean difference.

Having found the right offset (-11), we then simply shift the pixels in the image by that much (in Y and X space), and end up with these images:


original image

straightened

The next task is to fix skewing, but it will use basically the same technique.

demo

tech support conversation…

Conversation I had with tech support at onlinesheetmusic.com

I would normally not print this kind of stuff, but it really irritated me that everything I wrote had to be repeated two or three times.

Please wait, an operator will be with you shortly.
Your request is important to us. Please wait, an operator will be with you shortly.
You are now chatting with Bill A (Customer Support) - Customer Support
13:18 Bill A: Hi
13:18 Kae Verens: hi - having trouble printing a sheet I just purchased. invoice number 80300
13:18 Kae Verens: I use Linux. not Windows or Max
13:18 Kae Verens: Mac, I mean
13:20 Kae Verens: the Instant Print does nothing, and the Online Sheet Music Viewer requires Windows or Mac to run (I don't use either)
13:21 Bill A: Can you please try to print it,Mac or P.C
13:22 Kae Verens: I DO NOT USE MAC OR PC. I've said that twice already.
13:22 Bill A: I have reset you print rights , whenever you want print your music, you can easily take your print.
13:25 Kae Verens: clicking Print on the big green button does not work. I get a 20...40...60...80... notification, then "ERROR Please contact support at help@onlinesheetmusic.com"
13:25 Kae Verens: As I said already, "the Instant Print does nothing"
13:26 Bill A: Please open the viewer by using your Email address (that you registered with us) and password (that you use at the time of login through our website). Then click on the synchronize button that is under the Score list looks like (Z). Then you can transpose, play and print your purchased songs.
13:26 Kae Verens: What viewer?
13:26 Bill A: Please Install the Online Sheet Music Viewer 1) Download the viewer - ( http://www.onlinesheetmusic.com/download.aspx ) 2) Run the installer
13:26 Bill A: After installing open the viewer by using your Email address and password. Then you can play, transpose and print your music sheet.
13:27 Kae Verens: I am going to say this once more: I AM NOT USING PC OR MAC. I use Linux. the link you are pointing at gives downloads for PC and for Mac. There is no download for Linux. I cannot install your software.
13:28 Kae Verens: just give me the PDF. I can print it out directly...
13:28 Bill A: Please hold for a moment.
13:31 Bill A: Unfortunately, You can only take the prints, that’s why we do not have PDF.
13:32 Kae Verens: Then I would like a refund. There was no notification before my purchase that I would need to install software to print my purchase.
13:33 Bill A: Do you want refund?
13:33 Kae Verens: I just said that.
13:34 Bill A: I have refunded your money back to your account. Please allow 1-2 business days for the refund to appear on your statement.
13:34 Kae Verens: Thank you.
13:34 Bill A: You are most welcome.
13:35 Bill A: Thanks for chatting with us. Have a nice time. Bye!

At times, I thought I was conversing with a bot.

This year’s Féile Oriel sucked

Féile Oriel this year is crap, in my opinion. We went into town today to see what was going on.

The Market House had a few violins in it. Well, my house has a few violins as well. There were two interesting violins. One had a long neck and only one string. I imagine it’s played something like the Chinese erhu. I asked what it was. The guys that were managing the exhibition didn’t know. I then spotted a violin that had a very interesting shape for its top plate – there was a deep scoop just inside the arches. I asked why that was. I was told “I don’t know – they’re just different”.

We were looking forward to the “try it out” shop that they’d had last year, where the owner of a local instrument shop would bring a load of things into a vacant shop and let visitors come in and try them out. We were then told that it wouldn’t be on this year.

So, I asked Bronwyn if the website had said anything about what’s on. She said no, that there /were/ some things mentioned, but generally things that you have to pay into.

We found some music finally outside the Westenra hotel. Boann had a great time dancing.

Then we noticed there was a session going on inside the hotel and went in, in the hope that we could sit down for a few minutes with a coke or lemonade and listen. The musicians were all in the reception area, where we couldn’t stand and listen as that’s where people come in and out. So i took the kids in to the seated area. We couldn’t hear the musicians at all from there – just some football that was on a TV. the kids wanted some food, so we got sandwiches and then went home.

I then checked the website, and found that Bronwyn was right:

– under Musical Events, it mentions /one/ thing on today and /one/ thing on tomorrow, and doesn’t give a time for either.
– There is a link for Sessions, and the link is to a broken page.
– under Other Activities, there’s a busking competition mentioned. well, my guess is that there won’t be any winners this year, because there weren’t any buskers that I could see!

All-in-all, the day sucked.

musical intervals trainer, web version

last weekend, I wrote an intervals trainer app for practicing recognising intervals.

I want other people to use it, but haven’t got a Google development account yet so can’t upload an app.

So, today, I improved the app and made a web-accessible version.

try it out!

it’s designed to move up from very simple intervals (major/minor 2nd intervals, with only natural notes) to more difficult intervals (diminished/augmented, with double sharps and flats), but it’s also designed to only get more difficult at a rate that /you/ can manage.

to do this, the app uses a “levels” system, where each level has one more extra type of interval or note type, and you are tested on them. over 50% of the time, the question will be from the level you’re on, and the rest of the time, the question will be randomly chosen from every other level that you have already passed.

get 10 in a row correct, and you go to the next level.

but, get 5 wrong in a row, and you go down a level.

at the moment, there are 24 levels – all the way up to augmented 8ths – can you get through all the levels?

give it a try!

2010

I’ve the most awful memory.

While trying to remember what the hell I’d done in the last year, I came up with nothing.

Luckily, I have a spare brain in the form of my facebook friends, who came up with this list for me:

  • I started a new company, KV Sites, which will be up and running properly within a month or so, and will be selling affordable CMS websites and programming.
  • I got grade 2 in piano. I’m still waiting for an examiner for grade 3 (which I wanted to do in September). I’ll be doing grade 4 in March.
  • I got my first grading in Genbukan Ninjutsu.
  • I finished another book, CMS Design using PHP and jQuery. I hope it is as well-received as the previous book, jQuery 1.3 with PHP. btw, Packt would like me to remind people that the book “Mastering phpMyAdmin […] for effective mysql management” (reviewed here and here) has been updated to version 3.3.x.
  • I am building up to a new release of my CMS, WebME, which, despite the last downloadable version being from early 2009, has actually been very actively updated. It should be ready for release tomorrow, right on time for 2011!
  • I wrote and released two jQuery plugins: k3dCarousel, and SaorFM (which I hope to vastly improve in 2011).
  • I also built a first attempt at a clavichord made from plywood. I’ve got some new tuning pegs and redesigned the keyboard, so will hopefully be able to record on it soon.

I’m hoping 2011 turns out to be awesomer and that my head will be able to remember it all!

clavichord fretting

My clavichord project stalled when I realised it was just not going to work the way I’d done it.

This is partly because I’d naively gone for the full un-fretted design in the beginning, then later realised this would put too much pressure on the cheap bodywork and cause it to implode.

Changing the design afterwards to a fretted design wasn’t going to work either, because of how the keys were laid out.

So, the new plan is to rebuild the keyboard on the current clavichord, and hopefully get the thing finished as a triple-fretted, single-strung design.

Now, to explain…

If you have one string per note, this is called “unfretted”. In this design, every string is only ever hit by only one key.

The usual way to design a clavichord is “double-strung”, this means that every note actually has two strings for it. This makes double-strung clavichords louder than single-strung clavichords, because the combination of the two strings’ waves tends to alternately strengthen and dampen what’s happening at the soundboard.

Back to fretting – consider a guitar. Despite only having six strings, a guitar can play many more than only six notes. This is accomplished by “fretting” the strings. When you play a “G” on an “E” string, what happens is that you are shortening the wavelength of the overall string (with the fret and your finger), causing it to play a different note (G) than it would play if it was unfretted (E).

With a clavichord, the very act of striking the string with the key (or “tangent”, as the striking edge is called) causes fretting. The strings are damped at the ends with cloth or felt so when the key is not touching the string, the string doesn’t vibrate.

When you design your keyboard to multi-fret the strings, you need to do some calculation – let’s say you have a note, C, which is struck on the strings 100cm (let’s say) from the bridge. If your fretting involves the C# hitting the same string, then that key’s tangent must hit the string at about 94.4cm.

This is quite a small distance between the two tangents (5.6 mm), meaning that if you decide to triple-fret all your notes, then the keys for the high notes will be very close together, and the lower notes will be further apart (lower notes have larger wavelengths, so the distance between semi-tone frets increases as well as you get lower).

That explains the following image (a double-strung, triple-fretted clavichord – click for a larger image):

Note that the keys are all squashed together on the right side where the high notes are, and the spaces gradually increase as you move further left.

Notice as well that at the extreme left, the increase in spacing stops and all the keys are together again.

The reason for this is that when the notes get too low, there’s simply no more room for multi-fretting, so instead, the lower notes are all one per string.

There’s one more point to make about the keys.

Let’s say you create a key, which has its tangent 25cm from the fulcrum (a clavichord key is a lever). When the key is pressed, the tangent arcs up and strikes the string. It is still 25cm from the fulcrum in a 3D sense, but when measuring x/y from a top-down view of the clavichord, if the string is 4cm above the tangent (with key at rest), then the tangent strikes the string about 22.5cm from the fulcrum.

This must be taken into account when you design where the strings will contact the bridge and the hitchpins, as getting this wrong will cause the tangents to miss. Yes, you could just place the tangents after doing the strings, but my goal here is to be as perfect as possible. (there’s also the added problem that the tangent’s top is a certain height (3cm, say) above the level of the fulcrum, but you get the picture)

I’ve explained some of the problems to do with designing a fretted keyboard and string layout. Now, I’m off to write a program to design one automatically!

more music scams…

last year, I wrote about some scams where people claimed to be looking for music lessons for their son or daughter.

So far, I have not had one single student for guitar come to me through email or the Internet. Every single request has been a scam.

Here is an example email I received today from andrewbarton67@yahoo.com (Andrew Barton):

Hello,

I’m Andrea Barton during my search for a Music Instrument Lessons teacher that would always take my Daughter (Gwyn) and I found your advert.Your advert looks great and it is very okay to me since you specialize in the area I am seeking for her. My daughter will be coming to your Country before the middle of July for 2 Months. She is just 15yrs Old, a beginner, I want you to help me teach her music during her stay in the Country because i will not want her to less busy, i want her to engage in something to keep her busy during her stay.

So, kindly let me know your charges cost per week in order for me to arrange for the payment before she travels down to your country.I would also like to know if there is any Text Book you will recommend for her as a beginner so that she will be reading privately at home after the lesson during her stay.

Please Advise back on;

(1) Your charges per 1 hour twice a week for 2 Months?

(2) The Day and time you will be available to teach her During the week?

(3) Tuition address?

I will be looking forward to read from you soonest.

Best Regards.

There are a few things about this which should immediately strike anyone:

  • People don’t usually mis-spell their own name. Is it Andrea (in the text) or Andrew (in the email address)?
  • There is no mention of what instrument the girl is supposed to be learning. Guitar? Piano? Didgeridoo?
  • The weird capitalisation says to me that translation software has been used, and only for some specific words. I can imagine a template that goes something like this: “I’m ________ during my search for a ________________ teacher that would always take my ________ (____) and I found your advert”. Every one of the blanked out words was inserted with capital letters.
  • There’s a lot of talk about countries – “your Country”, “the Country”, “down to your country”. This person obviously does not know what country I am in, yet knows that his/her daughter will be coming to it?
  • As for that, “My daughter will be coming to your Country before the middle of July for 2 Months.” The email arrived at 2 in the morning today. It’s the 18th of July. A real request for upcoming lessons would surely arrive weeks or months before the trip had already started?

There is a quirky little urge in me to take this as far as I can. However, I’m also not made of time, so I won’t bother.

So here’s the warning: NEVER trust an email from anyone you don’t know.

Here’s how this would pan out if I took it seriously:

  1. We agree price and dates.
  2. They send a cheque and urge me to cash it. I go to the bank and do so.
  3. I suddenly receive an urgent email saying there’s been an error and they sent me too much, and to please send back the extra money.
  4. Of course, that involves me writing and sending a cheque of my own.
  5. They then cash my cheque.
  6. Their cheque then bounces….
  7. The student never turns up.

So don’t be an idiot. Either throw these email in the spam directory (or delete it), or have fun trying to get the guy to do ridiculous things, but never take it seriously.

Btw: here’s an example of this same exact person being a bit over enthusiastic with the attempts – 9 copy/paste messages, with two separate daughters, Rita and Marsha – this guy should probably have got the kids lessons when they were younger…