摘自:http://sourceforge.net/p/vector-agg/mailman/vector-agg-general/?viewmonth=200308
First of all, let me thank you for the suggestions perfectly explained. The other thing is that the "rectangle" optimization doesn't make much sense in comparison with "hline" one. It complicates things a lot with miserable result. In fact, the low-level rectangle() uses hline(). I agree with you completely that solid polygons can be optimized and it can be done in a way you described. But first let me tell you about the rasterization algorithm. Honestly, I don't understand completely the math (calculating part, outline_aa::render_scanline) and I guess we'll have to ask David Turner to explain it if we need. The algorithm consists of three phases - decomposing the path into square cells (pixels), sorting the cells, and "sweeping" the scanlines.
When you call rasterizer.move_to() it accumulates cells (pixels) that are crossed by the path. It also calculates the coverage value for each cell (the area of the cell that the polygon covers). After it's done all the cells are sorted by Y and then by X with a modified quick-sort (actually, I sort pointers to the cells). Finally, method render() "sweeps" the sorted cells and converts them into a number of scanlines. It sums all the cells with the same coordinates in order to calculate right cover value (it happens when the pixel is crossed more than once by different edges of the path). It guarantees correct cover values even if a complex polygon is so small that it falls into one pixel. This is the main advantage of the algorithm because it allows you to render very thin objects correctly (simulating very thin lines with proper fading). Function render() creates a scanline that consists of a number of horizontal spans and then calls renderer::render() that actually draws the scanline. I see two major possibilities to optimize rendering.
1. Using agg::scanline_p32 instead of agg::scanline_u8. Here 'p' and 'u' refer to 'packed' and 'unpacked'. Packed means that all the cells in the scanline that have equal cover value are represented as a horizontal line with x,y,length, and cover. 'Unpacked' means that every cell in the scanline has its own cover value even if they are all the same. So, the straight way is to use agg::scanline_p32 and to optimize renderer_scanline::render(). There're two notes. First, it makes sense if the area of the objects is large enough, that is, rendering small glyphs is more efficient with using agg::scanline_u8 because of less branched code. Second, it works only for solid color filling. For gradients, images, Gouraud we'll have to process the scanline pixel-by-pixel anyway. From this point of view using agg::scanline_p32 doesn't make much sense either. Still, solid filling is a very common operation and it'd be very good to optimize it. We'll have the best result when the color is opaque. But we can optimize translucent colors too! Here we can use the fact of the relative coherence of the background image. For solid span we calculate (alpha-blend) the first pixel and then check if the color of the next pixel is the same we simply put previously calculated value. It's all very good, but it doesn't help much to speed up drawing thin strokes. In this case the distribution of spent time is quite different. The most time consuming operation in this case is qick_sort. You can ultra-optimize the scanline rendering but you won't achieve much because scanlines in this case are very short and they don't play any considerable role. There's the second posibility of optimization.
2. We cannot get rid of rasterizing and calculating cells (outline_aa::render_line) but we can play with quick-sort. Namely, we don't have to sort the whole path but only each scanline. If the sorting algorithm had exactly linear complexity it would't make sense. But it's faster to sort 100 arrays of 10 elements each than one array of 1000 elements. In case of rasterizing a thin line we usually have 2, 3, or 4 cells in the scanline. Simple insertion sort works faster than the quick-sort in these cases. It looks rather promising but it requires a kind of smart memory managing (it requires reallocs and I wouldn't rely on that they are fast and painless). agg::scanline_p32 is not finished yet. '32' refers to the maximal capacity of the coverage value - 32 bits, but I'd add one more template argument in order to use 8-bit values. McSeem --- eric jones <eric@...> wrote: > Hey Group, > > Thinking out loud and long winded... > > I've just started exploring the process for building > an optimized > renderer for general (anti-aliased, thick, etc.) > veritical/horizontal > lines and rectangles. Since I haven't poked around > much in this part of > agg, it is all high-level. Forgive me if this all > falls under the > category of "obvious," but I have not done much with > low level graphic > algorithms before. I am just trying to figure out > what the important > abstractions are and hopefully get some ideas about > where to plug such > ideas into agg.
> The easiest place to start looking is at drawing a > single, solid ( i.e: > non-dashed but potentially semi-transparent), > vertical/horizontal line > with arbitrary thickness and end-caps. The line can > be broken into > three basic regions - the two end regions and the > middle region. For a > horizontal line, the regions can be labeled as so: > > End1 Middle End2 > > Lets look at the middle region of pixels first > because it is the > simplest to render. For example, if our line > horizontal line is 5 > pixels (scanlines) thick, the "cover" value for all > the pixels on a > single scanline will be the same because the > antialiasing algorithm > would return the same alpha value for all of these > pixels (I hope I am > using the cover term correctly here). For example, > the 5 scanlines of > the middle region might have alpha values (assume > 0-9 as full alpha > range) of 1, 5, 9, 5, and 1 respectively resulting > in the following > alpha mask for the middle region's applied to the > line's color value. > > 111111111111111111111111111111111111 > 555555555555555555555555555555555555 > 999999999999999999999999999999999999 > 555555555555555555555555555555555555 > 111111111111111111111111111111111111 >
> Based on this, we only have to calculate alpha once > for the line and > then call the hline() method (with the alpha > blending patch) 5 times - > once for each scanline. I'm guessing this would be > a decent speed win > over the current algorithm. Is this a correct > assumption McSeem? Also, > it makes hline() and vline() great candidates for > platform dependent > optimization in the future (SSE2, the Intel Image > library, or whatever) > because making them fast would speed up a large > amount of the general > cases. > > As for the end regions, they need a "complex" > anti-aliasing algorithm > applied to them where the "cover" value each pixel > value is treated > independently. This is similar to the current > rendering approach, but > we can't just treat the end-caps as polygons and > feed them into the > current path render because antialiasing is applied > from one side (left > on End1, right on End2). McSeem, is this right or > is there some way for > the current path renderer to handle this? > > Here are the alpha values of my (fictitious) width=5 > line with rounded > end-caps broken out by the region in which they are > rendered.
> > End1 Middle End2 > 111111111111111111111111111111111111 > 123 555555555555555555555555555555555555 > 321 > 1257 999999999999999999999999999999999999 > 7521 > 123 555555555555555555555555555555555555 > 321 > 111111111111111111111111111111111111 > > > Hmmm. I guess, with thicker lines, there would > really be another region > of interest: > > Top-Middle > End1 Center-Middle End2 > Bottom-Middle > > Here, the ends are rendered the same way as before. > The Top-Middle and > Bottom-Middle are regions would be the anti-aliasing > "blend-in" regions > of the line and rendered as previously discussed for > the "Middle" > region. The Center-Middle section would be the area > of the line that > has a constant alpha cover of 9 and could be filled > with a call to the > rectangle() primitive. So, breaking out the Top, > Center and Bottom > regions, assuming a new line of with 10, we would > have something like: >
> 111111111111111111111111111111111111 > Top-Middle > 555555555555555555555555555555555555 > > 999999999999999999999999999999999999 > 999999999999999999999999999999999999 > 999999999999999999999999999999999999 > Center-Middle > 999999999999999999999999999999999999 > 999999999999999999999999999999999999 > 999999999999999999999999999999999999 > > 555555555555555555555555555555555555 > Bottom-Middle > 111111111111111111111111111111111111 > > So, I guess this all can be generalize by saying > there are three major > types of regions for anti-aliased rendering of any > type of object be it > a thick line, a rectangle, or an arbitrary path: > > 1. Quickly varying areas where the alpha is > calculated for each > pixel. > 2. Slowly varying areas where alpha is calculated > for an entire > row > (vertical or horizontal) at a time. > 3. Constant areas where the alpha doesn't > change. This could be, > but > doesn't have to be a rectangular region. >
> It happens that it is fairly simple to break > horizontal/vertical lines > and rectangles into these regions. The vertical > line is the same as the > horizontal if we exchange "scan-column" for > "scanline." As for a > rectangle, we have to deal with the joins in the > corners. Its regions > would break down as follows: > > TL-Corner Top-Middle TR-Corner > Left-Middle Center-Middle Right Middle > BL-Corner Bottom-Middle BR-Corner > > Here, the Corners are all "quickly varying," the Top > and Bottom Middle > are "slowly varying" using calls to the hline() > primitive, the Left and > Right Middle are "slowly varying" using calls to the > vline() > primitive(), and the Center-Middle is, again, > constant. > > I'm most interested in the cases that are described > above, but it occurs > to me that it is possible to decompose arbitrary > paths up into these > three types of regions prior to calling a renderer. > This > domain-decomposition might be so expensive that it > swamps any benefits > in some cases -- I am not experienced with such > algorithms. I would > think that there is some way to do it, perhaps on a > per scanline basis > instead of on the entire path, that would provide a > speed improvement. > > Back to horz/vert lines and rectangles. I still > need to handle dashed > lines. I'm guess the way to do this is pass this > through