Blastanova

March 24, 2010

Pixel Bender Roundup

Filed under: Uncategorized — admin @ 11:46 pm

Hey everyone,

I just gave a Pixel Bender presentation at our local Adobe User Group yesterday.  Here’s a roundup of the blog posts I wrote as I was collecting my thoughts:

First I wrote about the problems I wanted to solve, and why Pixel Bender seemed like the right choice:

http://www.blastanova.com/blog/2010/03/19/pixel-bending-for-speed-in-flash-and-flex/

Next, an overview and intro of Pixel Bender:

http://www.blastanova.com/blog/2010/03/24/intro-to-pixel-bender/

And then, a post about using Pixel Bender for data instead of images:

http://www.blastanova.com/blog/2010/03/24/using-pixel-bender-to-process-lots-of-data/

Finally, some speed tests along with a sample application I wrote to run the speed tests:

http://www.blastanova.com/blog/2010/03/24/pixel-bender-speed-tests/

Happy Pixel Bending!

Pixel Bender Speed Tests

Filed under: Uncategorized — admin @ 11:38 pm

This is my last in a series of blog posts on Pixel Bender.  I’ll probably be doing one more PxB post when I upgrade to Swiz 1.0 and we’re talking about PxB processors with Swiz, but until then….

This post will be focusing mostly on speed in Pixel Bender and doing similar operations in normal Actionscript.

Pixel Bender is good on speed, pixel by pixel, but there are some operations that make PxB a little less efficient.  In my tests – these efficiency deficiencies are made worse with specific operations.

The worst is when you sample the nearest pixel.  In PxB, sampling one pixel is fine.  But the more pixels you sample as you evaluate one pixel, the worse this problem is exasperated.

Why sample more than one pixel?  Well, a blur effect is a great example.  When you blur, you’re probably averaging the center pixel and the 8 surrounding pixels.  Or downsampling an image…you’d be sampling a whole mess of surrounding pixels and reducing to 1 pixel.  In my audio visualization application, I’m not reasonably going to view 2.5 million pixels.  No, I’m going to reduce down to something that can be displayed – maybe a few thousand, depending on how far I zoom.

In any of these scenarios, I’m sampling more than one pixel.  The more pixels I sample, the worse the performance gets.  In my experiments, it can get to the point where PxB is no better than just doing things in Actionscript.  In these cases, at least Pixel Bender has the whole multi-threaded feature going for it, so your UI performance won’t suffer.

Another horrible aspect of sampling more than one pixel is that PxB for Flash doesn’t support loops and functions.  So, if your kernel needs to sample 100 neighboring pixels, you have to write the 100 lines of code to go with it.

For audio processing, sampling surrounding pixels is the whole point.  The key is finding the right amount of pixels to sample where you get a good average, but still have your Pixel Bender shader running lean and mean.

With that said here’s a sample application which demos performing math operations in Pixel Bender vs performing the same operation in Actionscript 3.

Pixel Bender Speed Test Application

Keep in mind that we keep all the loading of the MP3, and the extraction of the byte array at the very start of the application.  These actually take a fairly long time (almost a full second to extract the audio), but are a necessary evil whether you use Pixel Bender or you don’t use Pixel Bender.  Lets look at the results:

Simple Copy Operation

Take data, and return the same data (but make sure to copy each and every value to a byte array one at a time)

Flash AS3: 1.715 seconds

PxB: 0.475 seconds

Power of 2 Operation (Square a Value)

Take each value, square it, and copy into a new byte array

Flash AS3: 2.81 seconds

PxB: 0.825 seconds

Multiply by 12.54 (a random number I picked for this test case)

Multiply each number by a value and copy to a new byte array

Flash AS3: 1.747 seconds

PxB: 0.495 seconds

Do a Complex Math Operation

Take each value, multiply by 12.54, divide by 100, take the sin of that, and then take the square root of the result.

Flash As3: 3.53 seconds

PxB: 0.713 seconds

Average 10, 20, 50, and 100 values

This is where Pixel Bender should start getting less effective, as we are sampling more and more surrounding pixels.

Average 10 – Flash AS3: 3.727 seconds

Average 10 – PxB: 0.878 seconds

Average 20 – Flash AS3: 3.954 seconds

Average 20 – PxB: 1.153 seconds

Average 50 – Flash AS3: 3.613 seconds

Average 50 – PxB: 2.047 seconds

Average 100 – Flash AS3: 3.636 seconds

Average 100 – PxB: 3.469 seconds

As you can see, Pixel Bender does VERY well until you start sampling more and more neighbors.  As we reach 100 neighboring pixels sampled,  there really is no speed advantage to Pixel Bender.  Fortunately though, we’re still talking about an operation on a different thread that won’t slow down the UI.  So even if it does take 3 and a half seconds, its 3 and a half seconds of time that our UI is as snappy as ever.

Remember before when I talked about extracting your audio?  Well, you don’t have to grab 60 seconds all at once and lock up your UI for a full second.  You could grab less, and process the audio in increments.  With PxB, you can just keep running shader job after shader job.  You’ll have to determine what’s best for your UI vs how fast you want the operation to complete.

Another thing that I’ve hidden (though I didn’t mean to)  in this demo application is the time it takes to create the Shader.  I’ve found that it takes around one second to create a Shader.  This means that before you utilize a Pixel Bender kernel, you must take 1 second of unresponsive UI time to have a shader be created.  Creating a ShaderJob, on the other hand takes about a millisecond.  This means that you could spend 1 additional second at the beginning of your application to initialize the shader.  That’s one second of time that you didn’t have to spend running your AS3 only code.  However, the more ShaderJobs you have to split up the work, the more this 1 second becomes a moot point.

An additional note of interest is the shaderJob.start method.  By default, the parameter you pass in here is “false”.  This boolean indicates whether to “waitForCompletion”.  If you don’t wait for completion, then you are using this event asynchronously – meaning you assign an event complete listener and wait for the operation to complete while your code does other things.  If you specify the “waitForCompletion” flag as true, then all Flash operations will halt until  your shader job finishes, and you don’t have to set an event complete listener because the next line of your AS3 code will just execute after the job completes.  I’ve read that this can give you a slight speed increase, but I haven’t had this experience.  Plus it will lock up your Flash code execution until it completes, meaning your UI will be unresponsive.

And that completes my series of posts on Pixel Bender for now.  Kevin Goldsmith at Adobe has been doing some pretty cool stuff as well.  He does audio processing live, or rather each time Flash reaches into the audio buffer to get the next sound samples.  So he actually uses Pixel Bender to alter audio AS YOU PLAY IT.  So that’s very cool, check it out here.

There is much more to this Pixel Bender stuff.  Find it online, and especially read up on the Pixel Bender Twiiter account.  For now though, I’m done.  Happy Bending!

Using Pixel Bender to Process Lots of Data

Filed under: Uncategorized — Tags: , , , , , — admin @ 11:38 pm

In my previous post, I talked about using Pixel Bender in the way it was intended.  By this, I mean to input an image, change the pixels in some way, and output an image.  As you might be aware, this can happen very quickly and be used to create some great run time visual effects in Flash and Flex.

What piqued MY interest in PxB though, was the fact that the you didn’t have to send it an image to process.  It could be any ByteArray object.  My interest was a little bit more specific – a ByteArray extracted from a flash.media.Sound object.  At 44,100 samples per second, with typical MP3 files being 180-200 seconds, we’re talking about pushing 10 million samples.

My project involves visualizing the waveform of a song, so I want to process all of these samples at once, and as quickly as I can.

So, lets look at how you would normally apply a PxB shader.  The following example is taken from Mike Chamber’s blog:

import flash.filters.*;
import flash.utils.ByteArray;		 		
 
//the file that contains the binary bytes of the PixelBender filter
[Embed("testfilter.pbj", mimeType="application/octet-stream")]
private var TestFilter:Class;		 	
 
private function onApplicationComplete():void {
//Pass the loaded filter to the Shader as a ByteArray
var shader:Shader = new Shader(new TestFilter() as ByteArray);
shader.data.amount.value = [100];
var filter:ShaderFilter = new ShaderFilter(shader); 	
 
//add the filter to the image
im.filters = [filter];
}

So, Mike’s got the right idea here. He creates a shader, makes a shader filter out of it, and then applies it as a filter to the image.

We’re going to be doing something slightly different. Instead of a ShaderFilter, we’ll be creating a ShaderJob. And we’ll also need to tell our ShaderJob certain things that our ShaderFilter knew automatically. Specifically, we’re talking about the height, width, and input source of the image. After setting these properties you add an event listener to the job, and tell it to start.

var output:ByteArray = new ByteArray();
data.position = 0;
 
var channels:int = 4;
var width:int = 2000;
var height:int = data.length / width / (channels*channels);
 
shader.data.src.width = width;
shader.data.src.height = height;
shader.data.src.input = data;
 
var shaderJob:ShaderJob = new ShaderJob(shader, output, width, height);
shaderJob.addEventListener(Event.COMPLETE, onComplete, false, 0, true);
shaderJob.start();

OK, I’ll back up…it’s a lot to start with. Let’s talk about the input data first – our byte array.  Because I’m talking specifically here about a byte array extracted from sound, I know that my byte array is a long list of 4 byte floating point numbers.  This particular list of numbers alternate from left channel to right channel, but that’s neither here nor there until we get into the Pixel Bender kernel.

So, now we’re talking the language of Pixel Bender – a byte array of 4 byte floats.  Now, we’re typically we’re dealing with 4 of these 4 byte floats (formerly known as red, green, blue, and alpha).  I believe that PxB is SUPPOSED to accept image1, image2, image3, or image4 type inputs, however I’ve read there are bugs associated with doing anything less than image3 as the input.  Even if we wanted to take advantage of an image2 – think about it.  PxB is designed to process a pixel as fast as it can,  Why not take advantage and create the largest pixel we can?  Little decisions like this multiply in big ways when you start talking about your entire data set, and you could be talking about saving hundreds of milliseconds for these little decisions.

So, each pixel is 16 bytes.  In my sample code, I’m calling out 4 “channels”.  This is an important variable when thinking about our height and width.

Let’s talk a little about height and width now.  As you know, an image has a height and width.  But how does this apply to a one dimensional list of items for PxB?

It all comes down to speed and optimization.  You could certainly give your data input a height of 1 and a width of 8192 (the height and width limit) and treat your data as a one-dimensional list.  It’s certainly easier to picture that way, but then you wouldn’t be taking advantage of PxB’s built in speed and optimization.

Pixel Bender is multi-threaded, multi-core, and GPU enabled.  While that last part doesn’t hold true for Flash, the first two parts are important.  Not only can PxB itself operate on a different thread and CPU core, so can each row of data.  So, PxB can be processing many rows of pixels all at the same time.  If you only do one row per job (a height of 1), you’re throwing away most of the speed benefits of Pixel Bender!  The data itself will wrap around to the next row, just like a text field wraps text to the next row.

How does you know what width and height to give your data?  Well, with a possible width and height of 8,192, you’re giving yourself a possible 67 million pixels to work with in one job.  Each one of these pixels holds 4 channels.  Which means, in one job you can process 268 million floating point numbers.

Earlier I had talked about each MP3 file being around 10 million samples.  I’m personally conducting my own speed tests, but I’ve found that processing 60 seconds of an MP3 file at a time is a little better overall for speed.  So we’re only talking about 2.5 million samples that I would personally process at a time.  So, now the question becomes: How can we reasonably distribute our data as an image.  Well, if we max out our width, we may not be producing many rows.  As  a result, speed may be reduced because we’re not utilizing multiple threads as efficiently as we could.  Based on my input data, I’ve found that a width of 2000 works well for me based on the specifics of my application.  Your mileage may vary, so feel free to try different things.  Just remember, you’ll want to take advantage of many rows for PxB optimization.

And then of course, to get the height, you can work that out by calculating this from your width and the overall length of your byte array.

After all has been said and done, you start the job, set the event listener and wait for the job to complete.

Next up, I’ll be posting some speed tests!

Intro to Pixel Bender

Filed under: flash/flex,music video games — Tags: , , , — admin @ 11:37 pm

So this is my intro to using Pixel Bender – if you don’t know what it is or why to use it check out the documentation or my first post.

Actually the documentation is a good place to start.  Go to http://www.adobe.com/devnet/pixelbender/ before you begin.  Go on….I’ll wait…

Once there you’ll want to download the Pixel Bender Toolkit.  It’s a simple and light program – not subject to the lengthy installs of other Adobe software.  Don’t bother downloading the PDF documentation, there are links in the help menu to these documents once you run the toolkit.

The best part of the documentation – which actually turns into the most depressing part once you learn it, is that you ignore around half of it if you’re developing Flash shaders.  Most of the more advanced functionality only applies to Photoshop and After Effects shaders.

So crack open the toolkit!  What to do now?  Well first, go to the file menu and choose “new kernel”.  Kernels are basically the “programs” you’re creating that compile to shaders.  A new kernel will look like this:

kernel NewFilter
<   namespace : "Your Namespace";     vendor : "Your Vendor";     version : 1;     description : "your description"; >
{
    input image4 src;
    output pixel4 dst;
 
    void
    evaluatePixel()
    {
        dst = sampleNearest(src,outCoord());
    }
}

Don’t worry about that top part – it’s just noting the author of the script.

So the first thing to worry about are the two variables at the top. There’s “image4 src” and “pixel4 dst”. You might guess that you’re defining a source image and a destination image. But what’s up with the funny syntax?

Well, first of all, Pixel Bender is one of those languages where the data type is in front of the variable. So you have a variable “src” of type image4, and a variable “dst” of type pixel4. Image and pixel datatypes might make sense, but the “4″ is what threw me off at first, but don’t worry it makes sense.

PxB mainly deals in floating point numbers. And no automatic type conversion! Doing float var = 2 is no good, but doing float var = 2.0 is OK. There are basically 4 types of floating point numbers: float, float2, float3, and float4. A float4 is basically an array of 4 floating point numbers. Example: float4 myfloat = float4(2.0, 2.0, 2.0, 2.0);

It only goes up to 4. Once you realize that the main point of PxB is to manipulate pixels, you being to see why. Red + Blue + Green + Alpha = 4 channels and an array of 4 floating point numbers.

I’ve found that I can use pixels and floats interchangeably (maybe I’m wrong). Images are reserved for an entire image comprised of pixel 4′s/float 4′s. Pixel Bender also supports integers and booleans (each with 4 or less values).

OK that was my rant on variables. Lets move onto “evaluatePixels”. This is THE method that everything PxB revolves around. In fact, in Flash, you can’t even create other methods in your kernel (PS and AE allow this though).

Every PxB kernel is designed to do one thing and one thing only. Take in a source image, go pixel by pixel, and create an output image pixel by pixel from the “evaluatePixels” method.

That’s easy enough to understand – but what about that wonky syntax they start you off with?

dst = sampleNearest(src,outCoord());

So, lets work from the inner to the outer. Starting with outCoord(). This method gets the current coordinates that pixel bender is analyzing at that moment. Hint: it’s a float2, containing both X and Y values.

That was the easy part – the hard part is “sampleNearest()”. For us Flash folks, this introduces you to the wholly confusing notion of sampling on half pixels and pixel ratios and other such nonsense. But then you realize that its Flash, and all pixels are square, and sampling a pixel samples the entirety of the pixel. At this point you realize that sampleNearest is just how PxB works – but it’s entirely unnecessary for Flash.

So in other image editing software (and apparently this holds true especially for video), pixels don’t have to be square. Pixels can have a different height and width, giving them an aspect ratio, which you can actually check for in PxB.

But then there’s PxB…it will scan each pixel in the image as IF THEY WERE square. So you end up with coordinates that could be x:4.56, y:1.567. When you do “sampleNearest”, you’re sampling the nearest pixel to these fractional values to end up with nice locked-in PxB world coordinates like x:4, y:2. You can also call “sampleLinear” which takes the average of the surrounding pixels when you ask for something on a half pixel.

Betcha feel smart now, don’t you? Well forget everything you just learned. If you are doing things in Flash, all pixels are square, and all pixels match to the PxB world coordinate system perfectly. So “sampleNearest” is just something you have to do to get the red, green, blue, and alpha values of the pixel.

So – in the end…you’re just taking the pixel you’ve come to, evaluating the 4 channels, and dumping those right back into the destination pixels. In other words, you’re doing nothing.

At this point though, it becomes easy to start manipulating an image. Go to the file menu again, and load an image. PxB has some sample ones to use, like this one:

Now change dst = sampleNearest(src,outCoord()); to:

dst = sampleNearest(src,outCoord()) * float4(0.25, 1.0, 1.0, 1.0);

Now click “run”

Congratulations! You just went into every pixel and turned the red down to 25%.

How bout a weird cross-hatch type effect?

dst = sampleNearest(src,outCoord()) * float4(sin(outCoord()[0] * 4.0), cos(outCoord()[1] * 4.0), sin(outCoord()[0] * 4.0), 1.0);

Play around, try different things. If you break anything, Pixel Bender will give you red error messages of varying usefulness on the right side.

The hardest thing to get used to is always typing numbers with decimals and usually performing operations not with one set of numbers but with a set of 4. If you run into any trouble, keep asking yourself these two questions:

  1. Am I performing a mathematical operation on two different data types?  Float2 * Float4 = Error!
  2. Am I performing a mathematical operation on a float using a number with no “point zero” on the end?  Float * 2 = Error!  Float * 2.0 = Good!

So that’s the basics of PxB!  You can manipulate surrounding pixels if you like by performing operations on surrounding pixels.  Just add or subtract X and/or Y to your outCoord(), and sample that pixel.  Combine and average surrounding pixels to get a blur effect for example.

Here’s an example of taking a big image, and downsampling the image to a tiny corner in the upper left of the destination:

        float4 colorAccumulator = float4(0.0,0.0,0.0,0.0);
        float4 avg;
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(-1.0, -1.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(0.0, -1.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(1.0, -1.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(-1.0, 0.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(1.0, 0.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(-1.0, 1.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(0.0, 1.0));
        colorAccumulator += sampleNearest(src, outCoord() * float2(9.0, 9.0) + float2(1.0, 1.0));
 
        dst = colorAccumulator/9.0;

My next post will be about taking Pixel Bender and using it for non-image data processing. Stay tuned!

March 19, 2010

Pixel Bending for Speed in Flash and Flex

Filed under: flash/flex,music video games — admin @ 1:17 am

Lately, as I’ve been progressing with a Flex 4 based application which is largely an audio visualizer, I’ve felt the pain of slow response times from my UI.

The slow response is due to the fact that I’m loading a 3 minute MP3, extracting the audio to a byte array, and then processing the data in that byte array.  Consider a 180 second song with 44,100 samples per second.  That’s 7.9 million numbers to process as fast as I can.

This has caused me to do some complicated things.  The best of my efforts entailed processing the audio in segments on every enterframe handler.  I’d tell flash to process a reasonable amount of data, stop, and then process more the next round until everything was finished.  If I remember correctly, this took around 20-25 seconds or so, and STILL made my UI very sluggish as I was processing the data.  And of course this was in my development environment – in real life, a user would be waiting even more time while an MP3 loads.

So, I had two basic problems.  The first was that data takes too long to process.  I can hide this potentially with a good user experience – after all people expect that loading a large file can take some time.  But this leads to my second problem – Actionscript runs on one thread.  Only one thing can happen at a time.  If I’m processing audio, I can’t be updating/refreshing my UI.  And the whole application becomes unresponsive.  The more responsive I make my UI, the less data I process in a frame – but this just makes the whole darn thing take longer to process. Worse yet, if I process the whole thing in one go, Flash can timeout from inactivity at 15 seconds.

Basically it boils down to running it all at once, and potentially doing this:

or this….

Fortunately, there are two such Flash technologies that had the promise of helping me out.  First there’s Alchemy.  Alchemy is an Adobe research project that compiles C++ code to a Flash SWC.  Supposedly it can run code up to 10x faster than code compiled with Actionscript.

But, I haven’t touched C++ in a while, and still had the problem of an unresponsive UI for the time it takes to process the audio (whatever that time is).  It was tempting, but I had my sites set on trying Pixel Bender.

Pixel Bender is a cross-product Adobe technology.  It runs in Flash, Photoshop, and After Effects.  Basically it allows developers to write their own image filters.  In Flash, you can apply this image filter to images, animations, video, components…..and well anything that displays on the stage.

The best part?  It can run in different threads, on different CPU cores, and on your GPU.  Well, actually, strike that last part….you can’t run it in Flash on your GPU, but Photoshop and After Effects are cool.

This means that you can run a Pixel Bender shader on an image, and your UI doesn’t slow down.  Maybe you wrote something insanely complicated, well…then your PxB shader will suffer performance, but your UI won’t!

It doesn’t stop there – you don’t even have to have an image as your…erm…source image.  Yes, PxB assumes red, green, blue, and alpha channels.  But you can easily lie to PxB, and have it assume that your custom data is RGBA data.

This brings me full circle back to my problem.  I have something that supposedly processes data quickly, and on a different thread.  Hooray!  In fact, there’s a few projects going on that use PxB for audio processing already.

Come to the March RDAUG meeting on Tuesday to find out more!  I’ll be presenting, and quickly talking about what I covered here, but also diving into the nitty gritty.  You know….how to actually use this stuff in your work.

I’ll also be following up this blog post with a second part next week covering what I went over in my presentation.

Powered by WordPress