from: http://jacksondunstan.com/articles/1864
Speed Up Alpha Textures With Stage3D By 4x
Now that we know how to use textures with an alpha channel in rendering Stage3D
scenes, let’s see if we can cut the performance cost so we can use them more often. Today’s article will show some tricks to optimize your rendering loop.
The following test app started with the test app from last time and has some modifications made to it:
- New option to switch between “original” and “fast” sorting
- Rendering is now in two stages. First, view frustum culling is done to form a
Vector
of visible objects. Second, the visible objects are drawn. - Opaque texture and sorting options removed for simplicity’s sake
-
enableErrorChecking
no longer set on theContext3D
The “fast” sorting option is at the heart of this article’s optimization. You’ll recall that using alpha textures necessitates a back-to-front sort of the 3D objects in the scene. There are two ways that the “fast” sorting option speeds this up:
- Use Skyboy’s fastSort rather than
Vector.sort
to sort the 3D objects on a cached “distance from camera” field of the cube - Sort only the 3D objects that pass the view frustum culling step. Don’t bother sorting objects that will never be drawn.
Both of these are important optimizations, but the second is the major algorithmic change. Here’s the difference between the “original” and “fast” sorts: (pseudo-code)
/////////// // Original /////////// // Sort all cubes allCubes.sort(backToFront); // Draw all cubes that are in the view frustum for each (cube in allCubes) { if (cube.isInViewFrustum()) { draw(cube); } } /////// // Fast /////// // Make a list of all cubes that are in the view frustum visibleCubes = []; for each (cube in allCubes) { if (cube.isInViewFrustum()) { visibleCubes.push(cube); } } // Sort just those cubes visibleCubes.sort(backToFront);
There are two main “wins” here. First, sorting fewer 3D objects is clearly going to be faster. Second, good sorting algorithms run N * log2(N) times where N is the number of objects to sort. So each 3D object that’s being sorted adds more than one step to the sorting algorithm, making the increase more and more important as the number of 3D objects increases.
Now let’s take a look at the test app:
package { import skyboy.utils.fastSort; import com.adobe.utils.*; import flash.display.*; import flash.display3D.*; import flash.display3D.textures.*; import flash.events.*; import flash.geom.*; import flash.text.*; import flash.utils.*; /** * Test of faster ways of drawing alpha textures with Stage3D * @author Jackson Dunstan, http://JacksonDunstan.com */ public class FasterAlphaTextures extends Sprite { /** UI Padding */ private static const PAD:Number = 5; /** Number of cubes per dimension (X, Y, Z) */ private static const NUM_CUBES:int = 32; /** Number of total cubes */ private static const NUM_CUBES_TOTAL:int = NUM_CUBES*NUM_CUBES*NUM_CUBES; /** Positions of all cubes' vertices */ private static const POSITIONS:Vector.<Number> = new <Number>[ // back face - bottom tri -0.5, -0.5, -0.5, -0.5, 0.5, -0.5, 0.5, -0.5, -0.5, // back face - top tri -0.5, 0.5, -0.5, 0.5, 0.5, -0.5, 0.5, -0.5, -0.5, // front face - bottom tri -0.5, -0.5, 0.5, -0.5, 0.5, 0.5, 0.5, -0.5, 0.5, // front face - top tri -0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, -0.5, 0.5, // left face - bottom tri -0.5, -0.5, -0.5, -0.5, 0.5, -0.5, -0.5, -0.5, 0.5, // left face - top tri -0.5, 0.5, -0.5, -0.5, 0.5, 0.5, -0.5, -0.5, 0.5, // right face - bottom tri 0.5, -0.5, -0.5, 0.5, 0.5, -0.5, 0.5, -0.5, 0.5, // right face - top tri 0.5, 0.5, -0.5, 0.5, 0.5, 0.5, 0.5, -0.5, 0.5, // bottom face - bottom tri -0.5, -0.5, 0.5, -0.5, -0.5, -0.5, 0.5, -0.5, 0.5, // bottom face - top tri -0.5, -0.5, -0.5, 0.5, -0.5, -0.5, 0.5, -0.5, 0.5, // top face - bottom tri -0.5, 0.5, 0.5, -0.5, 0.5, -0.5, 0.5, 0.5, 0.5, // top face - top tri -0.5, 0.5, -0.5, 0.5, 0.5, -0.5, 0.5, 0.5, 0.5 ]; /** Texture coordinates of all cubes' vertices */ private static const TEX_COORDS:Vector.<Number> = new <Number>[ // back face - bottom tri 1, 1, 1, 0, 0, 1, // back face - top tri 1, 0, 0, 0, 0, 1, // front face - bottom tri 0, 1, 0, 0, 1, 1, // front face - top tri 0, 0, 1, 0, 1, 1, // left face - bottom tri 0, 1, 0, 0, 1, 1, // left face - top tri 0, 0, 1, 0, 1, 1, // right face - bottom tri 1, 1, 1, 0, 0, 1, // right face - top tri 1, 0, 0, 0, 0, 1, // bottom face - bottom tri 0, 0, 0, 1, 1, 0, // bottom face - top tri 0, 1, 1, 1, 1, 0, // top face - bottom tri 0, 1, 0, 0, 1, 1, // top face - top tri 0, 0, 1, 0, 1, 1 ]; /** Triangles of all cubes */ private static const TRIS:Vector.<uint> = new <uint>[ 2, 1, 0, // back face - bottom tri 5, 4, 3, // back face - top tri 6, 7, 8, // front face - bottom tri 9, 10, 11, // front face - top tri 12, 13, 14, // left face - bottom tri 15, 16, 17, // left face - top tri 20, 19, 18, // right face - bottom tri 23, 22, 21, // right face - top tri 26, 25, 24, // bottom face - bottom tri 29, 28, 27, // bottom face - top tri 30, 31, 32, // top face - bottom tri 33, 34, 35 // top face - bottom tri ]; [Embed(source="flash_logo_alpha.png")] private static const TEXTURE:Class; private static const TEMP_DRAW_MATRIX:Matrix3D = new Matrix3D(); private var context3D:Context3D; private var vertexBuffer:VertexBuffer3D; private var vertexBuffer2:VertexBuffer3D; private var indexBuffer:IndexBuffer3D; private var program:Program3D; private var texture:Texture; private var camera:Camera3D; private var cubes:Vector.<Cube> = new Vector.<Cube>(); private var fps:TextField = new TextField(); private var lastFPSUpdateTime:uint; private var lastFrameTime:uint; private var frameCount:uint; private var driver:TextField = new TextField(); private var draws:TextField = new TextField(); private var tempCameraPosX:Number; private var tempCameraPosY:Number; private var tempCameraPosZ:Number; private var fastSorting:Boolean; private var visibleCubes:Vector.<Cube> = new <Cube>[]; public function FasterAlphaTextures() { stage.align = StageAlign.TOP_LEFT; stage.scaleMode = StageScaleMode.NO_SCALE; stage.frameRate = 60; var stage3D:Stage3D = stage.stage3Ds[0]; stage3D.addEventListener(Event.CONTEXT3D_CREATE, onContextCreated); stage3D.requestContext3D(Context3DRenderMode.AUTO); } protected function onContextCreated(ev:Event): void { // Setup context var stage3D:Stage3D = stage.stage3Ds[0]; stage3D.removeEventListener(Event.CONTEXT3D_CREATE, onContextCreated); context3D = stage3D.context3D; context3D.configureBackBuffer( stage.stageWidth, stage.stageHeight, 0, true ); // Setup camera camera = new Camera3D( 0.1, // near 100, // far stage.stageWidth / stage.stageHeight, // aspect ratio 40*(Math.PI/180), // vFOV -6, -8, 6, // position 0, 0, 0, // target 0, 1, 0 // up dir ); // Setup cubes for (var i:int; i < NUM_CUBES; ++i) { for (var j:int = 0; j < NUM_CUBES; ++j) { for (var k:int = 0; k < NUM_CUBES; ++k) { cubes.push(new Cube(i*2, j*2, -k*2)); } } } // Setup UI fps.background = true; fps.backgroundColor = 0xffffffff; fps.autoSize = TextFieldAutoSize.LEFT; fps.text = "Getting FPS..."; addChild(fps); driver.background = true; driver.backgroundColor = 0xffffffff; driver.text = "Driver: " + context3D.driverInfo; driver.autoSize = TextFieldAutoSize.LEFT; driver.y = fps.height; addChild(driver); draws.background = true; draws.backgroundColor = 0xffffffff; draws.text = "Getting draws..."; draws.autoSize = TextFieldAutoSize.LEFT; draws.y = driver.y + driver.height; addChild(draws); var buttonsTopY:Number = makeButtons( "Move Forward", "Move Backward", null, "Move Left", "Move Right", null, "Move Up", "Move Down", null, "Yaw Left", "Yaw Right", null, "Pitch Up", "Pitch Down", null, "Roll Left", "Roll Right" ); var fastSortingCB:Sprite = makeCheckBox( "Fast Sorting?:", fastSorting, onFastSortingChecked ); fastSortingCB.x = PAD; fastSortingCB.y = buttonsTopY - fastSortingCB.height - PAD; addChild(fastSortingCB); var assembler:AGALMiniAssembler = new AGALMiniAssembler(); // Vertex shader var vertSource:String = "m44 op, va0, vc0\nmov v0, va1\n"; assembler.assemble(Context3DProgramType.VERTEX, vertSource); var vertexShaderAGAL:ByteArray = assembler.agalcode; // Fragment shader var fragSource:String = "tex oc, v0, fs0 <2d,linear,mipnone>"; assembler.assemble(Context3DProgramType.FRAGMENT, fragSource); var fragmentShaderAGAL:ByteArray = assembler.agalcode; // Shader program program = context3D.createProgram(); program.upload(vertexShaderAGAL, fragmentShaderAGAL); // Setup buffers vertexBuffer = context3D.createVertexBuffer(36, 3); vertexBuffer.uploadFromVector(POSITIONS, 0, 36); vertexBuffer2 = context3D.createVertexBuffer(36, 2); vertexBuffer2.uploadFromVector(TEX_COORDS, 0, 36); indexBuffer = context3D.createIndexBuffer(36); indexBuffer.uploadFromVector(TRIS, 0, 36); // Setup textures var bmd:BitmapData = (new TEXTURE() as Bitmap).bitmapData; texture = context3D.createTexture( bmd.width, bmd.height, Context3DTextureFormat.BGRA, true ); texture.uploadFromBitmapData(bmd); // Start the simulation addEventListener(Event.ENTER_FRAME, onEnterFrame); } private function makeButtons(...labels): Number { var curX:Number = PAD; var curY:Number = stage.stageHeight - PAD; for each (var label:String in labels) { if (label == null) { curX = PAD; curY -= button.height + PAD; continue; } var tf:TextField = new TextField(); tf.mouseEnabled = false; tf.selectable = false; tf.defaultTextFormat = new TextFormat("_sans"); tf.autoSize = TextFieldAutoSize.LEFT; tf.text = label; tf.name = "lbl"; var button:Sprite = new Sprite(); button.buttonMode = true; button.graphics.beginFill(0xF5F5F5); button.graphics.drawRect(0, 0, tf.width+PAD, tf.height+PAD); button.graphics.endFill(); button.graphics.lineStyle(1); button.graphics.drawRect(0, 0, tf.width+PAD, tf.height+PAD); button.addChild(tf); button.addEventListener(MouseEvent.CLICK, onButton); if (curX + button.width > stage.stageWidth - PAD) { curX = PAD; curY -= button.height + PAD; } button.x = curX; button.y = curY - button.height; addChild(button); curX += button.width + PAD; } return curY - button.height; } public static function makeCheckBox( label:String, checked:Boolean, callback:Function, labelFormat:TextFormat=null): Sprite { var sprite:Sprite = new Sprite(); var tf:TextField = new TextField(); tf.autoSize = TextFieldAutoSize.LEFT; tf.text = label; tf.background = true; tf.backgroundColor = 0xffffff; tf.selectable = false; tf.mouseEnabled = false; tf.setTextFormat(labelFormat || new TextFormat("_sans")); sprite.addChild(tf); var size:Number = tf.height; var background:Shape = new Shape(); background.graphics.beginFill(0xffffff); background.graphics.drawRect(0, 0, size, size); background.x = tf.width + PAD; sprite.addChild(background); var border:Shape = new Shape(); border.graphics.lineStyle(1, 0x000000); border.graphics.drawRect(0, 0, size, size); border.x = background.x; sprite.addChild(border); var check:Shape = new Shape(); check.graphics.lineStyle(1, 0x000000); check.graphics.moveTo(0, 0); check.graphics.lineTo(size, size); check.graphics.moveTo(size, 0); check.graphics.lineTo(0, size); check.x = background.x; check.visible = checked; sprite.addChild(check); sprite.addEventListener( MouseEvent.CLICK, function(ev:MouseEvent): void { checked = !checked; check.visible = checked; callback(checked); } ); return sprite; } private function onButton(ev:MouseEvent): void { var mode:String = ev.target.getChildByName("lbl").text; switch (mode) { case "Move Forward": camera.moveForward(1); break; case "Move Backward": camera.moveBackward(1); break; case "Move Left": camera.moveLeft(1); break; case "Move Right": camera.moveRight(1); break; case "Move Up": camera.moveUp(1); break; case "Move Down": camera.moveDown(1); break; case "Yaw Left": camera.yaw(-10); break; case "Yaw Right": camera.yaw(10); break; case "Pitch Up": camera.pitch(-10); break; case "Pitch Down": camera.pitch(10); break; case "Roll Left": camera.roll(10); break; case "Roll Right": camera.roll(-10); break; } } private function onFastSortingChecked(checked:Boolean): void { fastSorting = !fastSorting; } private function sortByCameraDistance(a:Cube, b:Cube): int { var deltaX:Number = a.posX - tempCameraPosX; var deltaY:Number = a.posY - tempCameraPosY; var deltaZ:Number = a.posZ - tempCameraPosZ; var aDist:Number = deltaX*deltaX + deltaY*deltaY + deltaZ*deltaZ; deltaX = b.posX - tempCameraPosX; deltaY = b.posY - tempCameraPosY; deltaZ = b.posZ - tempCameraPosZ; var bDist:Number = deltaX*deltaX + deltaY*deltaY + deltaZ*deltaZ; return bDist - aDist; } private function sortFast(): void { // Cache camera position tempCameraPosX = camera.positionX; tempCameraPosY = camera.positionY; tempCameraPosZ = camera.positionZ; // Only add cubes that pass frustum culling to visible list var numVisibleCubes:int; visibleCubes.length = 0; for each (var cube:Cube in cubes) { if (camera.isSphereInFrustum(cube.sphere)) { visibleCubes[numVisibleCubes++] = cube; // Compute distance of cube to camera var deltaX:Number = cube.posX - tempCameraPosX; var deltaY:Number = cube.posY - tempCameraPosY; var deltaZ:Number = cube.posZ - tempCameraPosZ; cube.camDist = deltaX*deltaX + deltaY*deltaY + deltaZ*deltaZ; } } // Sort all visible cubes fastSort(visibleCubes, "camDist", Array.NUMERIC); } private function sortOriginal(): void { // Sort all cubes tempCameraPosX = camera.positionX; tempCameraPosY = camera.positionY; tempCameraPosZ = camera.positionZ; cubes.sort(sortByCameraDistance); // Only add cubes that pass frustum culling to visible list var numVisibleCubes:int; visibleCubes.length = 0; for each (var cube:Cube in cubes) { if (camera.isSphereInFrustum(cube.sphere)) { visibleCubes[numVisibleCubes++] = cube; } } } private function onEnterFrame(ev:Event): void { // Set up rendering context3D.setProgram(program); context3D.setVertexBufferAt(0, vertexBuffer, 0, Context3DVertexBufferFormat.FLOAT_3); context3D.setVertexBufferAt(1, vertexBuffer2, 0, Context3DVertexBufferFormat.FLOAT_2); context3D.setTextureAt(0, texture); context3D.clear(0.5, 0.5, 0.5); context3D.setBlendFactors( Context3DBlendFactor.SOURCE_ALPHA, Context3DBlendFactor.ONE_MINUS_SOURCE_ALPHA ); // Cull and sort var beforeCullingTime:int = getTimer(); if (fastSorting) { sortFast(); } else { sortOriginal(); } var afterCullingTime:int = getTimer(); // Draw visible cubes var worldToClip:Matrix3D = camera.worldToClipMatrix; var drawMatrix:Matrix3D = TEMP_DRAW_MATRIX; var numDraws:int; for each (var cube:Cube in visibleCubes) { cube.mat.copyToMatrix3D(drawMatrix); drawMatrix.prepend(worldToClip); context3D.setProgramConstantsFromMatrix( Context3DProgramType.VERTEX, 0, drawMatrix, false ); context3D.drawTriangles(indexBuffer, 0, 12); numDraws++; } context3D.present(); // Update stat displays draws.text = "Draws: " + numDraws + " / " + NUM_CUBES_TOTAL + " (" + (100*(numDraws/NUM_CUBES_TOTAL)).toFixed(1) + "%)\n" + "Culling Time: " + (afterCullingTime-beforeCullingTime); frameCount++; var now:int = getTimer(); var elapsed:int = now - lastFPSUpdateTime; if (elapsed > 1000) { var framerateValue:Number = 1000 / (elapsed / frameCount); fps.text = "FPS: " + framerateValue.toFixed(1); lastFPSUpdateTime = now; frameCount = 0; } lastFrameTime = now; } } } import flash.geom.*; class Cube { private static var NEXT_ID:int = 0; public var id:int = NEXT_ID++; public var posX:Number; public var posY:Number; public var posZ:Number; public var mat:Matrix3D; public var sphere:Vector3D; public var camDist:Number; public function Cube(x:Number, y:Number, z:Number) { posX = x; posY = y; posZ = z; mat = new Matrix3D( new <Number>[ 1, 0, 0, x, 0, 1, 0, y, 0, 0, 1, z, 0, 0, 0, 1 ] ); sphere = new Vector3D(x, y, z, 2); } }
I ran this test app in the following environment:
- Flex SDK (MXMLC) 4.6.0.23201, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 11.2.202.235
- 2.4 Ghz Intel Core i5
- Mac OS X 10.7.4
- NVIDIA GeForce GT 330M 256 MB
And here are the results I got:
32768 | 53 | 30 |
0 | 40 | 10 |
These two tests show the two optimizations in full effect. When all of the cubes are visible (first test), both approaches end up sorting all the cubes since they all pass the view frustum check. Therefore the only optimization being applied is the switch from sorting using Vector.sort
(which uses a compare function) and sorting using Skyboy’s fastSort
function (which uses a “distance from camera” field). This alone makes sorting twice as fast as it otherwise was.
The second case is where I’ve pointed the camera away from the cubes and none of them pass the view frustum check. In this case, zero cubes are being sorted in the “fast” method and all 32768 are being sorted in the “original” method. This results in a 3x speedup over the “fast” approach with all of the cubes present and a 4x speedup over the “original” method.
The above optimizations are just a couple of ways of improving performance when alpha textures are used in a 3D scene. If you have more techniques to suggest or have simply spotted a bug or have a suggestion, post a comment and let me know!
https://github.com/skyboy/AS3-Utilities/blob/master/skyboy/utils/fastSort.as