Seumas's Programming Page
Copyright © 1998-2003,
True-color software blending:
Here are a couple of techniques for Alpha-Blending (mixing between source and destination based on an "Alpha" blend factor) and Additive-Blending (adding the source and destination colors using saturation arithmetic). These aren't as fast as could be achieved using MMX instructions or probably even plain Assembly instructions, but should be pretty fast for optimized C. If you know of any different or better techniques, please let me know.
Alpha-Blending two pixels:
First you mask off each color component of the source and destination pixels, multiply the source component by the Alpha value, multiply the destination component by the Inverse Alpha value, add the two, divide the result by the maximum Alpha value (in this case 256, which can be accomplished by a bit shift of 8 to the right), bit-and with the component mask again, and finally bit-or the three color components back together to produce the output pixel value. (Note, using an unsigned char Alpha mask only allows an alpha of up to 255, so even if the alpha channel is all 255, the colors of the source image will be ever so slightly reduced from their full intensity.)
Additive-Blending two pixels:
This is one of the methods I've figured out for Saturated (values don't wrap when they hit the top) Additive Blending. The biggest problem is the saturation arithmetic (why do CPUs seem to only have modulo arithmetic instructions?), which in the general case appears to be handled best by a test and branch. Luckily it is possible to test directly against the mask for the current color component, as seen below. Addition also doesn't pollute the lower bits of the result, so no additional bit-and with the mask is required. Depending on your C compiler, a slightly different arrangement of the operations might be faster.
Other forms of Blending:
Other forms of blending are also possible, such as multiplicative, subtractive, divisive, maximum, minimum, etc., though they are slower and/or more difficult to implement on variable-color-mask pixels, which are required for most 16-bit high-color graphics. If you can work with byte-per-component pixels (24-bit or 32-bit) then all the possible software blending modes become very easy and (relatively) fast to implement.
In general, pixel blending is much slower than plain solid or transparent bitmap blitting, so for high frame rate applications such as games, it's best to limit blending to a small portion of the screen. With other applications such as image processing, the speed penalty for blending large bitmaps together isn't much of an issue, and you can go to town.
February 10th, 1999 Update:
An interesting speed up for Alpha Blending, if you don't mind using a fixed alpha map (no interactive fade outs, etc.), is to pre-multiply the source bitmap RGB values by their corresponding Alpha values. That eliminates half of the multiplies per pixel, as you just have to multiply the destination by the inverse alpha and add to the pre-multiplied source. You could store the inverse alpha in your source alpha channel to avoid having to calculate the inverse alpha as well. Thanks Michael Tanczos at Game Programming '99.
April 4th, 1999 Update:
What's a good way to say "gee, that seems so obvious in retrospect"? :) Thanks to Thomas Mauer and Matias Ignacio Suarez Ornani for pointing out that you can remove half of the multiplies per pixel without any pre-processing steps by doing the following, dest = dest + (source - dest) * alpha. It's basically taking the difference between the source and the dest, scaling that by the alpha value, and adding it to the dest, using only one multiply per color channel. It takes a few more bit-ands when using arbitrary color masks than the previous code, but the drop in multiplies really wins out (a multiply takes 9 clocks on a Pentium, while a bit-and takes half a clock).
Comments? Questions? Know of a good programming page? Send me email!
Copyright 1998-1999 by Seumas McNally.