Blastanova

August 18, 2010

Sambaverse Alpha

Filed under: flash/flex,music video games,projects — admin @ 12:05 am

Finally! I’m at the point where I have a demo-able application. Sambaverse is the first phase of my master plan for Blastanova.

Quite some time ago, I got interested in creating music based Flash games. I created a prototype, and it worked pretty well – but it was severely lacking in one aspect. That aspect was the ability to look ahead in music, to know what’s coming up 5 seconds from now or even 5 milliseconds from now. Or to be able to look back at what happened in the music before. Or to be a little smarter about logical song breaks.

While Flash has some decent realtime audio processing capabilities, realtime just wasn’t good enough – my music games needed mystical, physic powers…to be able to see into the future and know how the musician who composed the piece thinks.

In real terms, I needed a tool that was smart enough to load an MP3, take a good stab at automatically detecting beats, break, loud sections, and different sections of the song – like verses and choruses.

So, I set out to create Sambaverse. A person using this application loads up an MP3. Right away the gears start turning – an audio waveform and song navigator is brought up in a few seconds. You can browse around, and listen to snippets. You can properly visualize your song. That’s not very special though….you can do that in any audio editor. This is why I made some music visualization modes…

using the application
At the upper right of the application is the “Playback Visualization” menu. You can choose one or multiple options here. You have the ability to view the song beats, look where the quiet periods of the song are, where the most intense portions of the song are, and find logical breaks (sections) in the music like where a chorus ends and the bridge begins.

Unfortunately, automatically finding logical breaks in the song was a little tough, and it will never be perfect. I’d describe my break detection as a good start, I have quite a ways to go. This is why I’ve added two drop down options. The first is the “custom song sections”. This mirrors the automatically detected segments, until you drag them around to place them where you want. You can fully customize and refine these segments.

Likewise, I added a song exclamation overlay. Using this mode will place little stars on the timeline. The purpose of these stars is to signify special musical events that occur very rarely. Like when that singer does that one scream in the song “YAHHHHHH!” or there’s a sudden gong sound effect…just something wacky that occurs in the song against the norm.

So you have a few view modes at your disposal. Even more than that, pressing the orange “Analyzation Settings” button will bring up the settings window. Here you can change how quiet your quiet breaks need to be or how long they are in order for them to be identified as quiet breaks.

Theres a few settings for every view mode. I’ll be tweaking these as I go along to see what works best, and probably change names and defaults.

Finally, you can save this data as XML. Actually – I’m not really saving it yet….just posting it to a webpage, and you can save it if you want. Whatever. I know I’m psyched to play around with my new app, tweak some songs, and make some cool games.

Under the Hood

I learned a lotta cool stuff making this. The biggest thing of course was how to manipulate and use audio for my bidding in Flash. I used Pixel Bender to get a huge speed increase when loading the sound initially. That was a lot of fun once I figured out how to get it working.

Although a little annoying at the time, I was using the latest releases of Flex and Swiz. As I’ve been working on this app for a little while, I had to go through and upgrade from Flex Gumbo to Beta 1 to Beta 2 to RC to Release. Stuff changed every time. Likewise, I had an annoying evening transitioning from Swiz 0.64 to 1.0.

But I’m happy I did this. Flex skinning is awesome. Swiz allowed some nice shortcuts and organization in my application. I even tapped into Flash Catalyst for the initial application design. Catalyst seemed a little limiting after the initial graphics dump. After that for quicky and dirty graphics, I just cracked open Illustrator and exported to FXG which I then copied as text into my MXML skins.

future improvements

I’m going to take a break from this application for a bit and focus on some games. I’m sure I’ll tweak as I go along and I find limitations that I need to solve as I’m working on the games. One huge feature I’d like to add when I get back to this is to be able to run a Fast-Fourier Transform on the audio and drop frequencies I don’t care about. This way, I can zero in on the drums or the bass guitar and only analyze those. In rock music especially, the frequencies are all over the place at different tempos and can make it a little hard to find a rhythm.

I’d also like to make it so you can load in a previously saved XML file to get all your work back for a particular song.

Lastly I’d like to get a sample game/animation you can preview as you work to visualize the beats better than the Flash lights I have.

So that’s Sambaverse! Check it out! (but be nice, it’s only an alpha) .

May 23, 2010

Music Visualization and Papervision

Filed under: flash/flex,music video games,personal — admin @ 8:46 pm

After watching Simon Free’s Papervision demo at NCDevCon today, I just had to post an old demo I did when I first got my hands on Papervision a couple of years ago. It’s a really cool project (Papervision that is), and I hope to play with it again real soon.
In this demo, you drive around a submarine through the sea, and various fish come into your view. But then a James Brown song comes on, and all the fish start dancing to the music. Different fish listen to different frequencies. It’s fun, but driving the submarine is very hard, because you can lose it off screen….

Anyway – spacebar to make the submarine go, and arrow keys to make it turn (arrow keys are relative to which way the sub is facing)

Papervision Beats (with Dancing Fish)

(by the way, thanks to Troy and Iris Stratton for getting me started with PV3D and Blender!)

May 22, 2010

Do You Belieeeeve in Flash Autotuning

Filed under: flash/flex,music video games,projects — admin @ 2:52 am

No, I didn’t accomplish the mythical Flash autotune in time for my “Audio Manipulation in Flash and Flex” presentation for NCDevCon this Sunday.  But I don’t see why it’s not possible in the least

I didn’t accomplish it, not because of Flash, but because of my lack of digital audio experience.

About a month ago, I was thinking what effect I could dazzle my attendees with, and ONE thought popped into my head:  AUTOTUNE!!!  This led me down the rabbit hole of crazy amounts of math, signal processing theory, code optimization, and more.  I even started reading a free PDF 600 page book by Stephen Smith http://www.dspguide.com/.

I’ve learned about what Fourier Transforms REALLY are, that they are really useful beyond getting frequency data, how to do low pass filters, and tons of other stuff.

I really didn’t think it would be this involved.  This is like an entire field of expertise.  I honestly thought I could steal some algorithm online and be good to go.  Nope, I’m swimming with using the FFT, then back with an iFFT, convolving signals together, and all this crazy stuff.  It’s pretty awesome though, the things you can do and learn from audio signals, and any signal in general.  I’ll definitely be finishing that 600 page book (probably 3-4 times over so I understand it).

Anyway, I was able to accomplish pitch shifting in a couple different ways, and riding a voice on top of another tone (which sounds really close to autotune if you could just get rid of that damn tone!)

I didn’t even think that one of the things that autotuning did was to detect the frequency of a sung note, and step it up or down to the correct frequency.  I thought at the beginning that when T-Pain did his thing, he just sung whatever, and the software would push the voice to whatever the producer wanted.

With Flash, I’ve seen the Audio Processing Library for Flash detect notes in a sound, and as I’ve said, I can now pitch shift!  In truth, the real Autotune by Antares Audio Technologies is said to use a “phase vocoder”, which I’m still not up on my theory enough to know what it is.  It’s probably a combination of smart pitch-shifting coupled with a flange like effect to go all robot sounding.

I finally downloaded 10.1 and got my microphone working – I recorded in from the microphone, and played back via the sound buffer (all through the sample data event).  I pitchshifted first by just speeding up the tempo so things got all high pitched and fast.  But then I grabbed a PitchShifting class by Stephan M. Bernsee  that was ported to Actionscript from C# by Arnaud Gatouillat.  Using this was VERY processor intensive.  In fact, in debug mode, my entire computer was overheating, and it was a crapshoot whether the sound would actually come out right.  And then of course the Windows sound buffer kept doing weird snapping/popping noises every so often until I restarted.  However, running as a release build seems to work just fine.  I now know why Andre Michelle’s AudioTool has a warning against using the debug player.

But all in all, it was a great learning experience.  I’m embedding the demo I’ll be showing at NCDevCon on Sunday.  And of course I’ll keep learning to become a DSP master, and someday get my Autotune working (hopefully someone beats me to it, and shares the code).

Autotune Attempt – You Need Flash 10.1 and a Microphone

March 24, 2010

Intro to Pixel Bender

Filed under: flash/flex,music video games — Tags: , , , — admin @ 11:37 pm

So this is my intro to using Pixel Bender – if you don’t know what it is or why to use it check out the documentation or my first post.

Actually the documentation is a good place to start.  Go to http://www.adobe.com/devnet/pixelbender/ before you begin.  Go on….I’ll wait…

Once there you’ll want to download the Pixel Bender Toolkit.  It’s a simple and light program – not subject to the lengthy installs of other Adobe software.  Don’t bother downloading the PDF documentation, there are links in the help menu to these documents once you run the toolkit.

The best part of the documentation – which actually turns into the most depressing part once you learn it, is that you ignore around half of it if you’re developing Flash shaders.  Most of the more advanced functionality only applies to Photoshop and After Effects shaders.

So crack open the toolkit!  What to do now?  Well first, go to the file menu and choose “new kernel”.  Kernels are basically the “programs” you’re creating that compile to shaders.  A new kernel will look like this:

kernel NewFilter
<   namespace : "Your Namespace";     vendor : "Your Vendor";     version : 1;     description : "your description"; >
{
    input image4 src;
    output pixel4 dst;
 
    void
    evaluatePixel()
    {
        dst = sampleNearest(src,outCoord());
    }
}

Don’t worry about that top part – it’s just noting the author of the script.

So the first thing to worry about are the two variables at the top. There’s “image4 src” and “pixel4 dst”. You might guess that you’re defining a source image and a destination image. But what’s up with the funny syntax?

Well, first of all, Pixel Bender is one of those languages where the data type is in front of the variable. So you have a variable “src” of type image4, and a variable “dst” of type pixel4. Image and pixel datatypes might make sense, but the “4″ is what threw me off at first, but don’t worry it makes sense.

PxB mainly deals in floating point numbers. And no automatic type conversion! Doing float var = 2 is no good, but doing float var = 2.0 is OK. There are basically 4 types of floating point numbers: float, float2, float3, and float4. A float4 is basically an array of 4 floating point numbers. Example: float4 myfloat = float4(2.0, 2.0, 2.0, 2.0);

It only goes up to 4. Once you realize that the main point of PxB is to manipulate pixels, you being to see why. Red + Blue + Green + Alpha = 4 channels and an array of 4 floating point numbers.

I’ve found that I can use pixels and floats interchangeably (maybe I’m wrong). Images are reserved for an entire image comprised of pixel 4′s/float 4′s. Pixel Bender also supports integers and booleans (each with 4 or less values).

OK that was my rant on variables. Lets move onto “evaluatePixels”. This is THE method that everything PxB revolves around. In fact, in Flash, you can’t even create other methods in your kernel (PS and AE allow this though).

Every PxB kernel is designed to do one thing and one thing only. Take in a source image, go pixel by pixel, and create an output image pixel by pixel from the “evaluatePixels” method.

That’s easy enough to understand – but what about that wonky syntax they start you off with?

dst = sampleNearest(src,outCoord());

So, lets work from the inner to the outer. Starting with outCoord(). This method gets the current coordinates that pixel bender is analyzing at that moment. Hint: it’s a float2, containing both X and Y values.

That was the easy part – the hard part is “sampleNearest()”. For us Flash folks, this introduces you to the wholly confusing notion of sampling on half pixels and pixel ratios and other such nonsense. But then you realize that its Flash, and all pixels are square, and sampling a pixel samples the entirety of the pixel. At this point you realize that sampleNearest is just how PxB works – but it’s entirely unnecessary for Flash.

So in other image editing software (and apparently this holds true especially for video), pixels don’t have to be square. Pixels can have a different height and width, giving them an aspect ratio, which you can actually check for in PxB.

But then there’s PxB…it will scan each pixel in the image as IF THEY WERE square. So you end up with coordinates that could be x:4.56, y:1.567. When you do “sampleNearest”, you’re sampling the nearest pixel to these fractional values to end up with nice locked-in PxB world coordinates like x:4, y:2. You can also call “sampleLinear” which takes the average of the surrounding pixels when you ask for something on a half pixel.

Betcha feel smart now, don’t you? Well forget everything you just learned. If you are doing things in Flash, all pixels are square, and all pixels match to the PxB world coordinate system perfectly. So “sampleNearest” is just something you have to do to get the red, green, blue, and alpha values of the pixel.

So – in the end…you’re just taking the pixel you’ve come to, evaluating the 4 channels, and dumping those right back into the destination pixels. In other words, you’re doing nothing.

At this point though, it becomes easy to start manipulating an image. Go to the file menu again, and load an image. PxB has some sample ones to use, like this one:

Now change dst = sampleNearest(src,outCoord()); to:

dst = sampleNearest(src,outCoord()) * float4(0.25, 1.0, 1.0, 1.0);

Now click “run”

Congratulations! You just went into every pixel and turned the red down to 25%.

How bout a weird cross-hatch type effect?

dst = sampleNearest(src,outCoord()) * float4(sin(outCoord()[0] * 4.0), cos(outCoord()[1] * 4.0), sin(outCoord()[0] * 4.0), 1.0);

Play around, try different things. If you break anything, Pixel Bender will give you red error messages of varying usefulness on the right side.

The hardest thing to get used to is always typing numbers with decimals and usually performing operations not with one set of numbers but with a set of 4. If you run into any trouble, keep asking yourself these two questions:

  1. Am I performing a mathematical operation on two different data types?  Float2 * Float4 = Error!
  2. Am I performing a mathematical operation on a float using a number with no “point zero” on the end?  Float * 2 = Error!  Float * 2.0 = Good!

So that’s the basics of PxB!  You can manipulate surrounding pixels if you like by performing operations on surrounding pixels.  Just add or subtract X and/or Y to your outCoord(), and sample that pixel.  Combine and average surrounding pixels to get a blur effect for example.

Here’s an example of taking a big image, and downsampling the image to a tiny corner in the upper left of the destination:

        float4 colorAccumulator = float4(0.0,0.0,0.0,0.0);
        float4 avg;
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(-1.0, -1.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(0.0, -1.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(1.0, -1.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(-1.0, 0.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(1.0, 0.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(-1.0, 1.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(0.0, 1.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(1.0, 1.0));
 
        dst = colorAccumulator/9.0;

My next post will be about taking Pixel Bender and using it for non-image data processing. Stay tuned!

March 19, 2010

Pixel Bending for Speed in Flash and Flex

Filed under: flash/flex,music video games — admin @ 1:17 am

Lately, as I’ve been progressing with a Flex 4 based application which is largely an audio visualizer, I’ve felt the pain of slow response times from my UI.

The slow response is due to the fact that I’m loading a 3 minute MP3, extracting the audio to a byte array, and then processing the data in that byte array.  Consider a 180 second song with 44,100 samples per second.  That’s 7.9 million numbers to process as fast as I can.

This has caused me to do some complicated things.  The best of my efforts entailed processing the audio in segments on every enterframe handler.  I’d tell flash to process a reasonable amount of data, stop, and then process more the next round until everything was finished.  If I remember correctly, this took around 20-25 seconds or so, and STILL made my UI very sluggish as I was processing the data.  And of course this was in my development environment – in real life, a user would be waiting even more time while an MP3 loads.

So, I had two basic problems.  The first was that data takes too long to process.  I can hide this potentially with a good user experience – after all people expect that loading a large file can take some time.  But this leads to my second problem – Actionscript runs on one thread.  Only one thing can happen at a time.  If I’m processing audio, I can’t be updating/refreshing my UI.  And the whole application becomes unresponsive.  The more responsive I make my UI, the less data I process in a frame – but this just makes the whole darn thing take longer to process. Worse yet, if I process the whole thing in one go, Flash can timeout from inactivity at 15 seconds.

Basically it boils down to running it all at once, and potentially doing this:

or this….

Fortunately, there are two such Flash technologies that had the promise of helping me out.  First there’s Alchemy.  Alchemy is an Adobe research project that compiles C++ code to a Flash SWC.  Supposedly it can run code up to 10x faster than code compiled with Actionscript.

But, I haven’t touched C++ in a while, and still had the problem of an unresponsive UI for the time it takes to process the audio (whatever that time is).  It was tempting, but I had my sites set on trying Pixel Bender.

Pixel Bender is a cross-product Adobe technology.  It runs in Flash, Photoshop, and After Effects.  Basically it allows developers to write their own image filters.  In Flash, you can apply this image filter to images, animations, video, components…..and well anything that displays on the stage.

The best part?  It can run in different threads, on different CPU cores, and on your GPU.  Well, actually, strike that last part….you can’t run it in Flash on your GPU, but Photoshop and After Effects are cool.

This means that you can run a Pixel Bender shader on an image, and your UI doesn’t slow down.  Maybe you wrote something insanely complicated, well…then your PxB shader will suffer performance, but your UI won’t!

It doesn’t stop there – you don’t even have to have an image as your…erm…source image.  Yes, PxB assumes red, green, blue, and alpha channels.  But you can easily lie to PxB, and have it assume that your custom data is RGBA data.

This brings me full circle back to my problem.  I have something that supposedly processes data quickly, and on a different thread.  Hooray!  In fact, there’s a few projects going on that use PxB for audio processing already.

Come to the March RDAUG meeting on Tuesday to find out more!  I’ll be presenting, and quickly talking about what I covered here, but also diving into the nitty gritty.  You know….how to actually use this stuff in your work.

I’ll also be following up this blog post with a second part next week covering what I went over in my presentation.

July 14, 2009

Composing Music with the Flash ByteArray

Filed under: flash/flex,music video games — admin @ 11:43 pm

Demo here, view source for code

(view source for code, and who knows what bugs there are)

I’m getting into unfamiliar territory these days with exploring sound in Flash.

Flash 10 gives you the ability to take a sound object, and extract the entire thing into the raw data….a byte array.  How do you interpret this data? How can you process it into something meaningful?  Well I don’t know.

What I do know is that to get a firmer grasp on how I can utilize this, I need to understand what the sound object is at a basic level.  Probably the worst way to understand it is to take an entire song (mp3) file and let my eyes glaze over at the stream of numbers coming from the sound.extract feature.

No – the best way, I thought, would be to compose my own notes and chords, and work up from there.  To do this, I needed to understand what a note is, and what a sound is at a basic level.

My first thought was to picture sound as the nifty looking sound visualization that comes with music players these days…you know, those dancing bars:

Turns out that this is the worst way to imagine sound for this purpose.  My first question was….”OK, how do I get the value of the low frequency?”, “how about something in the midrange?”, “what about the highend?”.

Yes you can imagine sound this way, but only after processing your sound data with a Fourier Transform.  If you know how to do this, stop reading this right now, cause you’re way smarter than I am at this point in my sound exploration.

The BEST way to imagine sound is to picture a sine wave:

If you look at how tall the wave is, that is how loud the sound is, or the amplitude.   How close the peaks and valleys are in this picture is the frequency.

Frequency, at this very basic level can’t be thought of as high pitched, low pitched, etc – it’s only how far apart these peaks and valleys are.  Now think of this as a side-scrolling wave that goes on forever.  If you were to scroll at a constant speed, each peak would hit that vertical center line at a constant rate.  This rate would be the frequency.  If it goes faster, the frequency is higher – slower, the frequency is lower.  And, of course, this directly relates to how you hear it.  The more peaks and valleys at a given time, the higher the tone you hear.

Now think about how this relates to our Flash sound object’s byte array.  We could actually draw a sine wave with numbers.  This is nothing new to programmers.  Folks use trigonometry all the time.  But, I personally, never thought to use it for sound.

Try this AS3 code:

for (var i:int=0; i < 100; i++ ) {
var sample:Number;
sample = Math.sin(i * 2 * Math.PI) * 50;
}

This will basically plot a picture like the one above – a sinewave.

If we made this into a sound, it almost, but not quite be a tone.  However, if we pop over to google, we can actually look up the frequency, of say…..a middle C note, or a middle A.

http://www.phy.mtu.edu/~suits/notefreqs.html

There’s one more piece to this puzzle though, and that’s sampling rate.  A digital audio file could have all the sinewaves, peaks and valleys in the world, but if your audio playback is super slow, it’s gonna sound like garbage.  That’s why we tell our audio software to read our sound at a certain speed.

If you’ve heard the term 44.1 kbps when folks talk about sample rate – you can look back at our little for loop that drew the sine wave, and realize that you need more than the 100 points of data as we drew, you need 44.1k, or 44,100 PER SECOND.

So let’s rewrite that code:

for (var i:int=0; i < durationinseconds * 44100; i++ ) {
var sample:Number;
sample = Math.sin((i) * 2*Math.PI/44100 * 440) * volume;
}

OK!  So now, we’re creating a one second middle A (4th octave).  Our sample rate in Flash is always going to be 44100 if coming from a flash sound object (though that’s not true for any audio file).  We learned from our handy/note frequency chart when looking on google that 440hz is a middle A.

As a side note, lets say we want a different octave.  To go lower, half the frequency to get 220.  To go lower, half that.  Higher?  Double it.

Let’s think about the voice now.  The nice sine wave will give a nice, round tone.  For a dirtier tone that sounds like it has sharp edges….well our sine wave needs to have sharp edges – which is actually a square wave.  Our nice round peaks and valleys would be just straight corners.

To change the tone in the code, try this:

sample = Math.sin((i) * 44100 * 2 * Math.PI * frequency) > 0 ? volume : -volume;

As for more voices, well, I haven’t tried it, but a real world instrument would have a hard strike and then some falloff.  So our nice sine wave would be less round at the start of each peak, but comes back down way slower.

Here’s my final code (and keep in mind that I’m writing the sample 2 times, one for the left channel and one for the right):

returnBytes:ByteArray = new ByteArray();
for (var i:int=0; i < _duration * 44100; i++ ) {
var sample:Number;
if (_voice==VOICE_SQUARE) {
sample = Math.sin((i) * 44100 * 2 * Math.PI * this.frequency) > 0 ? _amplitude : -_amplitude;
} else {
sample = Math.sin((i) * 44100 * 2 * Math.PI * this.frequency) * _amplitude;
}
returnBytes.writeFloat(sample);
returnBytes.writeFloat(sample);
}

Now, what do we do with that byte array?  Well, as of Flash 10, our sound object, has a sample data event.  When this event is called, when it needs new bytes, it’ll call out to your custom sample data method.

What I do – and what you may choose to do, or not choose to do, is make my whole byte array first, and then read sequential amounts of data into the byte array each time it’s called:

soundBytes.position = 0;
dynamicSound = new Sound();
dynamicSound.addEventListener(SampleDataEvent.SAMPLE_DATA, addSoundBytesToSound, false, 0, true);
soundchannel = dynamicSound.play();

private function addSoundBytesToSound(event:SampleDataEvent):void
{
var bytes:ByteArray = new ByteArray();
soundBytes.readBytes(bytes, 0, Math.min(soundBytes.bytesAvailable, 8 * 8192));
event.data.writeBytes(bytes, 0, bytes.length);
}

To explain, about the 8 * 8192….

8192 is the maximum amount of samples you can use for each sample data event.  However….each sample is a left and a right 4 byte float.  So that’s 8…..time 8192.

There’s tons of cool stuff you can do with this.  If you don’t believe me, look up Andre Michelle – he’s THE MAN when it comes to this stuff.

February 11, 2009

Music Inspired Gameplay

Filed under: music video games — admin @ 10:14 pm

This is the last part of my recent musings.  Over the last two posts I’ve argued that digital storage has completely changed the way we listen to music, unfortunately in some ways for the worse.  We’re less inclined to replay an album in whole, and more inclined to put it all on shuffle.  A friend of mine wondered how well Pink Floyd’s Dark Side of the Moon would work today given our habits.

I wondered how we can increase replayability of music and get people to appreciate a song or collection of songs more and thought we can use games to achieve this.  After all 30 year olds like myself grew up on video games with soundtracks that repeat every time you play a level.  Why not use casual games to increase your band’s audience – to get people to listen to your music over and over again and get the tune stuck in their heads?

Of course, its easy to slap a soundtrack on an online game – but how can we make the game and soundtrack work together for a great experience?  How can we create musical escapism in games, just like MTV or musical theater?  What is the gaming equivalent of breaking out into dance, or fast camera cuts timed to the rhythm?

As you can tell, I’ve been thinking a lot about this lately, and I’ve recently read This is Your Brain on Music by Dr. Daniel J Levitin to better understand how our brains perceive music and how the various flavors of music appeal to us in different ways.

The most obvious way to tie a game to music is tempo or rhythm.  There are many rhythm based games on the market today – most noteably “Rock Band”.  Music has many more attributes, obviously, like the tone, contour, timbre, loudness, meter, theme, key,  harmony and melody….not to mention that rhythm and tempo are two different things.

Relating any of these aspects to games can be challenging given where we are with mainstream realtime audio-processing technology.  But even putting technology aside, how can we relate these to a piece of music in a systematic way?

Let’s start with tempo.  Tempo is the overall speed of the music.  It’s fairly constant for long periods of time – but can be an excellent baseline metric for establishing a musical pulse and to possibly tie it up with a visual pulse in your game.  When people tap their toes, dance, or bop their heads – its generally to this tempo, or the beat of the song.  So tying visual elements to this beat is probably the most effective tool I can think of.

And this is good, because a beat can be measured in pop music by picking the range of frequencies that the bass drum is on and listening to the volume of this frequency over time.  So tying visual elements to a beat is a very real tool that can be done automatically in music game creation.

Tempo only goes so far though.  It’s a very predictable musical trait.  Levitin argues that our musical tastes can somewhat be centered around complexity and predictability.  Children, for example, enjoy very simple and predictable music.  As we get older, and the more we listen to music, the more boring this simple and predictable music becomes.  So we listen to more complex music – but constantly strive for the right amount of predictability and the right amount of complexity.  As we listen to more and the level of complexity of our favorite music will probably go up to.  This is one of the reasons that the music of other cultures can be distasteful, or classical music, or jazz can be distasteful for people with pop only listening habits.  The more unfamiliar we are with music, the less predictable it is.  And if its not predictable at all – as another culture’s music can be to us – then it can really be unpleasant.

So, going beyond the beat to capture other nuances of the musical piece should be very important in manipulating visuals or gameplay elements.  ONLY syncing a predictable visual beat with a complex soundtrack would be such a shame because you lose all that which makes the music meaningful – and you can lose the connection between the two.  Alternately – having lots of movement that has nothing to do with the flow of the piece but loosely tie to frequency at a particular time, can create a similar disconnect because you’re picking up unpredictable parts of the piece to tie them together visually with the music.

So what types of musical nuances can we pick up and use?

Rhythm is another form of timing like tempo but it’s how notes are grouped together into phrases.  Guitar Hero and Rock Band use rhythm effectively.  I believe Harmonix actually does this by transcribing the notes themselves for each instrument, and grouping them together to have you play out a phrase at a time.   They’re not picking out random notes from the guitarists score and having you play them – no, they’re taking  the most meaningful notes  that make sense to put together the phrase with the limited amount of notes you can play.

Tone and frequency for example are aspects that may be quite hard to integrate effectively.  Considering an overall musical piece, many instruments are playing at different pitches and frequencies all at once.  The bass guitar has a very low frequency, while a flute can have a very high frequency – even though both can be playing the same note or different notes.  Taken out of context, different frequencies/tones in music don’t make us tap our toes.

Pitch, harmony, contour, key, and melody – taken together can be an entirely seperate but equal way to tie visuals to your music.  Unfortunately – it can be very difficult (if not impossible) with mainstream game technologies like Flash to take this into account in an automatic way.  I hope I’m wrong – and I hope somebody PROVES I’m wrong and lets me in on the secret, but consider this….

Using Fast Fourier Transform methods (FFT), I can grab the volume of any frequency at any time.  Can this provide me with what note is playing?  Well, maybe….but only if a single instrument is playing.  Unfortunately many instruments are playing at many different frequencies, and many can bleed into (if not use) another’s frequency.  I know some smart computer scientists have developed pitch detection – but it hasn’t made it’s way into any code libraries I know of yet (though I should look beyond Flash to see what I can find).

Also, on a technical level, using MIDI files, if the music is transcribed correctly (or if it originated on a sequencer or a computer in the first place), we would have access to all the notes seperated out into the different instrumental tracks.

So what if all this technical stuff presented no barrier?  How could we use it?

One of the most interesting parts Dr. Levitins books for me was Appendix B: Chords and Harmony.  Music, if you look at it in a very dull light, is all about patterns and manipulating people’s ears by breaking in and out of  the predictiability of those patterns.

Talking about pop music, there is a verse, a chorus, and a bridge.  A verse is normally a melody and tempo played over a few times.  At the end of the verse, we get into a similar thing with a chorus, which is generally shorter.  Going from chorus to verse breaks one pattern, but picks up another musical pattern which uses slightly different chords with a slightly different tempo.  So predictability is broken, but not very much.

Another way patterns are established is with long standing musical tradition.  The blues is somewhat defined by going from I Major to IV Major to I Major and then IV or V major, and then back to I Major – this happens to also be the basis for rock music as well.  Now, this is another established pattern – and when we break from this pattern, we’re introducing unpredictability.

The most intersting pattern Levitin mentions is when a chord is either resolved or unresolved.  In Western music, our ears have been trained to consider a tritone or an augmented 4th, the most unresolved interval we could possibly hear.  In fact, as Levitin recounts, this interval was banned in the Catholic church and named “Diabolus in musica” citing this interval as the work of the devil.  On the other hand, a simple major chord is considered resolved.  So when if we play a chord containing a tritone, we expect, and almost demand that it be resolved by something like a major chord.

Why do I bring these up?  Well, this type of musical behavior can start to paint a more visual or motion based picture.  When music breaks expectations either by changing tempo or with different chords, or even moving the notes up or down an octave – likewise our gameplay elements should match this level of broken predictability.

Likewise, when a chord is left unresolved – so too should our visuals on screen.  You know something is going to happen, but you don’t know what.

To spell it out more clearly, we can establish a visual and animated baseline in our games by listening to the beat of the music, and timing elements to this.  Assuming a pop song, our baseline is a song verse.

Deviating in minor ways while the verse is played, maybe be changing the tempo, going up an octave, or different instrumentation, will produce a minor change in the visuals or animations.

Going from verse to chorus however is a less subtle change, and often implies changing the tempo, chords, or otherwise quite drastically.  In this, the gameplay needs to change in the same fashion.

Hitting a chord that needs to be resolve creates musical suspense – and so too should it create visual or gameplay suspense.  Something on screen about to fall, or teetering from side to side.

Going back to technical implementation, however, assuming we had access to the entire score of notes in a musical piece, can we feel out these suspensful moments or feel out when a song is predictable and when it becomes unpredictable?

I believe the answer is yes, but it would take quite a bit of work to look for augmented fourths, or other unresolved intervals, or to run pattern matching algorithms on our melodies.

A more pragmatic approach might run a FFT analysis to find beats, and possiby even do some light pattern matching to seek out changes in tempo.  Meanwhile, do some manual markup of subtle changes in verse, and then manually mark where verse goes to chorus, chorus to verse, and where the bridge is.  We can also manually mark where we’re creating suspense, and then where we resolve that suspense.

In this fashion, we could create a sort of music markup language to map a timeline of all these events and use this to create a musical gameplay experience.

I guess my work is cut out for me!

January 25, 2009

Consumption of Music on Demand Part 2

Filed under: music video games — admin @ 10:47 pm

In my part one of this post, I took a look at how digital video and audio recorders and portable devices have changed the way I consume and perceive media.  What I’m most interested in is music – and I discussed that because of the ability to hold all of my music library in one place and the ability to get music from many sources free on demand, I’ve started listening to music on a very superficial level, and don’t give albums concurrent repeated listens.

If I have this problem – others probably do to, and may even be perfectly happy to be oblivious to this.

This is also a problem for musicians.  If people give music only superficial listens, and are just as happy to move onto the next song by a different artist, why would a musician have fans that bother coming to shows, buy t-shirts, or buy follow-up works by that musician?  Isn’t it easier just to turn on an internet stream and listen on shuffle than to seek out and buy an album or see who’s playing in your town on Friday night?  It’s as though being a fan is becoming a lot less fanatical.

My concern over this is that music will be written to appreciate on a superficial level – will only make you nod your head to a beat.  Of course it’s already been happening in pop music for years.  Musicians will only get their “radio friendly” hit song played wide-spread.  But many times, this radio-friendly song will be the gateway drug that leads you into the album, and get you to repeatedly listen to all songs by the artist.  Other songs on the album could provide deeper enjoyment.

Whether this problem is new and exasperated by our digital devices, or an old problem but just becoming apparent to me as I change my listening habits, I have to wonder what we can do to change these habits and produce ways to make people want to listen over and over again and come to a deeper appreciation for the music and the artist.  Listening to music over and over again can even make things more memorable.  How many times have listening to a song for the first time in years bring you back to old times when you were listening to that song constantly.  It can be very nostalgic.

I’ve been thinking a lot lately about music and video games (as you can infer from my blog posts).  I think that games like Dance Dance Revolution, Guitar Hero, Rock Band, etc do go quite a way in invoking repeated listening habits in music.  Pressing buttons in time to a beat, however, is just one direction to take this – I would call this a music creation game (even though you aren’t actually creating the music, there’s the illusion that you are).  I’m quite interested in exploring  a game where music is creating the game play environment that you’re in.

Another way to put this is I’d like to explore what games can be to music as what music videos are to music.  The best music videos, in my mind, produce another world where people break into singing and dance.  A mailbox will spring to life on the street and dance with Bjork.  Christopher Walken will dance his way down an escalator and fly through the air in grand choreographed moves.  Weezer will give a show to screaming members of the Happy Days cast.  You get my escapist point (and my Spike Jonz fixation).  I’m not a musical theater fan, though Gilbert and Sullivan have a lot of experience with this too, and they go a little farther back than MTV.  From what I understand, even the ancient Greeks had musical theater.

Bringing it back to games – a lot of great work has been put into cinematic gaming soundtracks, but to my knowledge, they are just soundtracks and don’t really change what happens in the game.  A change in tempo doesn’t make more bad guys come out to hurt you, a minor chord struck doesn’t usually signify coming doom.  I shouldn’t say never.  I’m not an avid gamer – I just hadn’t ever heard of this attention to music detail.  In fact – it’s quite the opposite – more bad guys coming out will change the tempo of the music – and coming doom, will change  the music to a minor key.  Game composers will typically write tiny segments that can be switched depending on what happens in the gameplay.

So, how can we produce a truly musical game?  One that gives a level of escapism straight from Broadway and MTV, gives music a forum to be listened to again and again, and is fun?  One where the music isn’t changed for it’s gameplay, but where gameplay revolves around the music?

(Yah, I’ll answer that question next, or at least try to)

January 11, 2009

Consumption of Music on Demand

Filed under: music video games — admin @ 8:09 pm

Tivo and iPods (or more generally a digital audio players and a digital video recorders)  have changed the way we consume media.  It’s a pretty obvious statement I know – but hear me out.   I want to talk about music, but let me first say something about tv/film.

I was a late comer to both Battlestar Galactica and 24.   I didn’t start watching either one until a couple seasons in.  I recorded a bunch of 24′s with my DVR, and bought a few seasons of Battlestar Galactica on DVD.  Either way, I wound up watching entire seasons of a show in a week or two in contrast to the six or nine months it usually takes to watch a season of a show.

Watching habits were changed from when the network demanded it to whenever I wanted to see it.  In this, you could almost argue that the story itself behind the show has changed.    There’s no more downtime between episodes to think and ponder, and be on the edge of your seat about what happens next.

In fact, my DVR, Netflix, and DVD collection have changed my attention span on shows when I do have to wait to access them.  When I was watching my 24 marathon, I was really into it.  Then, when I had to wait a week to see what happened, I was less into it.  And now, I’m not even sure how much I care about 24 now that its been a whole year.

This is stark contrast to how we used to watch TV.  Either we were chained to our sets when our favorite shows were on, or our favorite shows just happened to be those that we had the specific nights free for.

The point is that having constant, on demand access to TV and film can ultimately change the story as we percieve it.  Watching TV can become more like reading a book.  You can rewatch/reread, read/watch when you want, and you don’t have to read/watch the whole chapter/episode in a sitting, or you can finish the entire thing in one sitting.  This is especially true as we can watch video on our mobile devices now.

Music can be a little more complex.  In addition to being short form, where consumption of a song can take only 3 or 5 minutes, there’s not usually a cliff-hanger or a compelling reason to listen to the next song on the same album for most people.

This became apparent when your single disc CD player became a 5 disc CD changer back in the 90′s.  I found it was easier to put my 5 discs on shuffle to get a little variety in each music listening session.

My car was a major place that I’d listen to music, and it was especially great to have my music on long rides.  My car, however, had just a single disc CD player.  That means when I put a CD in, it would stay in.  And because I was driving it, was a bit complicated to remove and change it unless there was a break in traffic.  I also only had a limited selection to choose from, as I’d usually only take the last 5 or 6 CD’s I bought with me on a normal car trip.

This means when I bought new music, I’d get very familiar with it.  I’d listen to it over and over again.

I suspect that most people’s music habits had the same limiting factors.  Either you listened to what the technology at the time allowed you to hear, or what the radio allowed you to hear – and each of these things led to the same thing, you’d listen to a song over and over again.

Of course with digital audio players with lots of storage this has all changed.  I have a Microsoft Zune with 120 Gigabytes of storage capacity.  This means that I can put my wife’s music collection and my music collection (probably 500-600 CD’s) on my Zune in its entirety 3 or 4 times over.  Many music players have at least 1 Gigabyte of memory allowing 10-15 albums at a time.

Suddenly, it becomes more convenient just to put every song you own on shuffle than to find the last albums you bought to listen over and over again – and given people’s tendency towards variety, a big shuffled list sounds more appealing when it comes time to listen to music.

So, listening habits have changed – and so has how we percieve music as a result.

Have you ever listened to an album or song, and didn’t really like it at first, but maybe after listening to it for the 5th time or the 10th time it’s one of your favorites?  It used to happen to me all the time.  But it really doesn’t happen to me much anymore.  Music listening has become sort of superficial to me.  I’d buy some new music, listen to it once, and then it appears on shuffle sometimes as I’m buying newer music.

It’s somewhat of a personal goal to give my new music a better listen from now one, despite the fact that my entire collection fits in the palm of my hand.

So as things are getting long, I’m going to continue things in a second post where I think about how people can listen to music on a less superficial level despite the technology we have.

November 28, 2008

“PopFly” Reactive Music Game Prototype

Filed under: flash/flex,music video games,projects — admin @ 6:08 pm

It’s been a busy past month just getting a new job and a new car – I felt like I didn’t have time for anything.  Well, it just the Friday after Thanksgiving today, and I feel like I have all the time in the world this coming weekend, so I’m starting to get excited about my latest pet project.

I’ve been busy early this fall with writing “reactive music games”.  If you read my blog, which you probably don’t, you’d know that I couldn’t shut up about music games all summer.  I even did a little research into the different types.  The reactive game, is where music affects the things that happens in your game environment – but you don’t have to play along to (like Guitar Hero or Rockband).

There’s actually no popular music games I can think of that ARE reactive.  But I like to think of reactive games really…. like an interactive music video.  I even watched a bunch of Spike Jonz videos before I started sketching.  I have 10-15 games to do in the short term, and most are designed to be pretty easy to develop and generic – ESPECIALLY since this summer I’ve been working on min-framework in Flash AS3 that listens to beats in music.

The first game, I’ve done is basically an alpha, and if the music doesn’t load, just hit refresh (sometimes my security sandbox seems to go a little wacky).  I’ve codenamed it “Popfly” in my sketches, cause the balls kinda pop and fly out with the beats.

I’ve tried to pay special attention to things that I’ve noticed I didn’t like in other games.  Specifically, trying to tie visual events to music better.  Balls pop out, and when they reach their peak, its not very tied to music because it’s a split second after the beat.  This is why I put a subtle glow burst at the bottom when the balls do pop out.  I also gave the side gutters some ambient glow – just to have something additional on-screen that’s tied to the music in the game.  I also made some bigger balls pop up when the volume gets high – and it nicely adds a little crazy mosh effect when the music goes crazy.

I can’t wait to get through some more games, and see where this takes me, and hopefully way down the line, do a nice in-depth game with some characters.

But for now – here’s the PopFly game

Older Posts »

Powered by WordPress