Why Starling (or any other 2D framework on top of Stage3D)?

阅读更多

from: http://blog.kaourantin.net/?p=138

 

Why Starling (or any other 2D framework on top of Stage3D)?

 

Let me try to answer a pressing question from the community: Why did Adobe not accelerate the classic display list APIs to support the GPU instead of inventing a new API called Starling?

Well, we have done that (accelerating the classic display list APIs that is) and we have learned a lot from it. In fact we did it twice with two completely different approaches:

Approach #1: Back in Flash Player 10 (early 2008) we introduced ‘wmode=gpu’ which accelerated compositing of display list objects using the GPU. It did so by pre-rendering display list objects on the CPU and uploading them as textures to the GPU where they are then composited using the GPU. It worked in some cases but in the end we discovered that only a handful of sites were using this mode as no one could figure how to create faster content. Worse, in some cases it looked like this was enabled by accident as the site was running much faster in non-GPU mode. Designing content for GPUs is non-obvious as I will outline below. Because of these reasons and because GPU code is generally very expensive to maintain for Adobe we decided to pull that rendering mode from Flash Player 10.3 and let it fall back to ‘wmode=direct’ mode.

Approach #2: On mobile, which includes Android and iOS, we have ‘renderMode=gpu’. Unlike the ‘wmode=gpu’ on the desktop this mode renders vector graphics directly in the GPU using OpenGLES2. This mode is still available to you today on Android and iOS and we see some content using it. Content which is using ‘renderMode=gpu’ successfully sticks to a very small subset of the classic display list APIs which looks eerily close to the subset Starling provides. And yet there is a higher cost overall in the Flash Player than if you would just be using Starling due to the many layers involved to emulate some classic display list features. In short: You are likely better off using Starling going forward for new content.

So what is the problem with using the classic display list APIs? The essence is that the classic display list APIs were designed for a software renderer. It does not easily scale to be pushed to a GPU for faster rendering.

- The classic display list has many legacy features which are tied to the specific way our software rasterizer works. That includes vectors masks and scale-9 for instance. You will see that with Starling you will have to find a different way to get the same effects.

- A lot of other classic display list features can not be easily expressed on a GPU without going through slow and complex code paths and more importantly loss of fidelity. That includes blend modes, filters, some forms of transformations, device text among many others. In some of those cases we have to fall back to software. That makes creating well performing SWF content difficult to say the least. You need to exactly understand what happens under the hood of the Flash Player to get well performing content. Documenting the exact behavior of the Flash Player without access to the actual Flash Player code is very difficult as there are simply too many special cases. That documentation could be nothing more than the actual Flash Player code. And reading a large C++ code base might not be your thing either. ;-)

- GPUs like flat display hierarchies. Deeply nested MovieClips are a big no no. You might think this could be easily optimized behind the scenes. I can tell you that without hints about the original application data structure layout that this is not possible. It’s the classic problem where each additional abstract API layer in an application introduces more entropy and at the end you are unable to figure out the original intent of the application which you need to apply meaningful optimizations. I see too much content where excessive use of nested MovieClips makes it impossible to figure out what the content is actually doing on the screen.

Le me put this into an analogy you might be able to understand better: Let’s say the Flash Player would have no APIs to draw strings or text, only APIs to draw individual characters. Drawing strings would be implemented by some AS3 code. OK fine, but actually drawing individual characters is 10x slower than drawing complete strings for the internal Flash Player code. That means that the Flash Player would have to reverse guess what the string/text was which is expensive and sometimes not possible.

- GPUs like bitmaps. Rendering vectors either has to be done on the CPU which means you incur texture upload costs for each frame or will create a lot of vertex data which is a problem on mobile GPUs (and Intel desktop GPUs ;-) . Rendering gradients has its own challenges as pre-rendering a radial gradient into a bitmap can be faster than using pixel shader code on most GPUs. This seems counter-intuitive but makes sense if you realize that texture fetches are implemented in a dedicated part of the silicon vs. a pixel shader which has to be run in the ALU.

- Mouse events are implemented with perfect hit testing in the classic display list API, i.e. it is based on the actual vector graphics shapes. If you have a circular vector shape as a button a mouse click will not activate that button unless it is within that circle. This makes sense on the desktop where you have a precise mouse cursor but is extremely wasteful on mobile where you really want to deal with simple large rectangular touch areas. Each additional computation cycle for detecting mouse hits increases the perceived lag of a SWF. What’s worse is that if you want to express large touch areas which extend over the graphic representation of the button you would do this by adding another MovieClip with a transparent vector rectangle to the display list which further impacts overall performance.

- The classic display list API is a giant state machine which needs to be evaluated for every frame. Just x/y translating an object can trigger expensive recalculations and re-rendering without you knowing it. The classic example here is cacheAsBitmap which is probably the most misunderstood and misused feature in the Flash runtime. With Starling the state changes from frame to frame are not hidden but plainly visible in ActionScript which means you have a chance to see what is actually going on.

I could go on and on, but I hope this answers some questions of why we are offering Starling.

Long term I hope that most games and multimedia content will move to Stage3D and use the classic display list for what it’s really good at which is to create high fidelity vector graphics on the fly, rendering text, pixel processing any many others. It certainly won’t go away and we will continue to add features and optimize performance. If you have fixed graphics assets it is usually better to bring these in externally as bitmaps and stick with Stage3D.

I strongly believe that with Stage3D and Starling we are way ahead compared to other web technologies who still have to go through the same learning experience we went through over the last 4 or so years.

 

文章摘要:) M4 e% b; k4 W. R' s6 l
很多朋友在问上面的问题(为什么不直接对DisplayList进行GPU支持),而实际上,Adobe已经在这条路上探索了好几年了。
1.在2008年推出的Flash Player 10,增加了一个"wmode=gpu"选项,这个选项的作用就是,将CPU预处理的显示对象,作为纹理发送给GPU来处理和合成。但实际上发现开启这个 选项的网站非常少,大多数人对于“更快的内容展现”没有兴趣,更糟糕的是在某些情况下人们会发现CPU渲染比GPU渲染更快(后文会解释),基于这个原 因,而且增加GPU支持的研发成本很高,所以Adobe在Flash Player 10.3的时候就将选项回滚到了wmode=direct
2.在移动设备上,包括Android和iOS,我们提供了‘renderMode=gpu’.不同于桌面的wmode设置,这个将直接通过OpenGLES2调用GPU来渲染矢量图形。这个特性至今而在使用并取得了成功。

那么对于传统的显示列表进行GPU渲染的问题是什么?因为传统的DisplayList的设计思想是基于软件渲染的,它并不是很容易就能通过GPU渲染得到更快的速度。& C+ Q; ?% U8 a

1.传统的显示列表渲染提供了一些特性,比如遮罩,九宫格,这些您在Starling中不得不寻找其它的途径才能得到相似的结果。! O6 s, x0 a5 d$ N0 `8 z
2.很多传统的显示列表的特性难以很快的转移到GPU模式,包括:图层色彩混合模式,滤镜,变换,和系统文本等等。这些无法实现的部分我们还是需要回滚到软件渲染模式。
3.GPU难以处理深层次的嵌套。在Flash中一个电影剪辑套另一个电影剪辑是常见的,但这对GPU渲染是一大禁忌。您或许认为应该很容易实现对它的优化,但实际情况并非如此。
4.GPU喜欢位图。如果是矢量图形,将意味着每帧都要提交纹理,这将产生大量的vertex data,将在移动设备的GPU和Intel的桌面GPU产生问题。
5. 鼠标事件是基于经典的DisplayList的事件冒泡机制。) _; b, I2 y- ?/ ~4 @
6.传统的DisplayList是一个巨大的状态机,基于“帧”进行刷新。

希望这些解释,可以解答一部分网友的疑问。7 Z. ?) r$ {, n
我希望在未来,大多数的游戏和多媒体内容,将迁移至Stage3D,但仍然基于传统的显示列表,去处理诸如高保真的矢量图形,文本渲染,像素处理等内容。传统的显示列表肯定不会消失,我们将继续增加新的功能和优化性能。

 

 

你可能感兴趣的:(Why Starling (or any other 2D framework on top of Stage3D)?)