dark_sylinc wrote:[*] No Boost please. Most of it is bloated for a performance intensive application (and compile times!). Not to mention the executable size sky rockets by unbelievable factors (10x is not uncommon); and debugging the call stack of a thread created with boost is a major pita when compared to clean callstacks and thread names in the Thread window when using native thread creation. Using Boost can be very tempting, but we do everyone a service by refraining ourselves from using it in Ogre.[/list]
stealth977 wrote:So, i would strongly suggest dropping boost and instead using own code for the very little static functionality OGRE needs...
saejox wrote:that was nice read. thanks
what alternative do you propose for boost::thread?
saejox wrote:that was nice read. thanks
what alternative do you propose for boost::thread?
Klaim wrote:so I wanted to ask if you took a look at the recent containers in boost which, I think some could be very helpful in the design you are suggesting, like flat containers? (basically built around a vector) It's unperfect with VS Debugger data visualization though.
AAA[b] C[b][b][b]
AAAA CCC[b] A[b][b][b]
AAAA A[b][b][b] CCC[b]
DanielSefton wrote:Ideally Ogre requires a complete rewrite from the core. It's matured around an architecture designed over a decade ago. What worries me is who's going to be up to the challenge to essentially write a brand new engine with Data-Oriented paradigms? Chipping away at the current codebase would be like painting a rotting fence.
Edit: I haven't kept up with discussions on Ogre 2.0, I'm guessing the plan was to start a fresh anyway?
dark_sylinc wrote:Encapsulating native thread creation into generic classes should be enough. It's not hard at all. "CreateThread" is more than enough.[...]
Yes, flat containers are just vectors using std::lower_bound and insert.
Note that "flat" containers are for fast lookup, fast iteration, but slow removal, slow insertion (which is usually what Ogre wants).
The Resource system has it's limitations, and not very threadable. It could have a rewrite too. But... let's just keep it one step at a time.
The plan is to start fresh.
2.0 -> Cache misses, DX11 & OGL4 RS
2.1 -> Scene manager redesign: scene traversal & processing
2.3 -> FF -> "states"
2.4 -> Vertex format enhancements
Well, I think it should be done in 2.1, since the "scene manager refactor" will be mixed with the "new compositor manager" so after having that done plus the changes from 2.0 it'll be far easy/guided.Note that the bone system (skeletal animation) may be updated either in 2.0 & 2.1
Great. Definitively all the memory-layout must be specifiable at compile time(not all processors have 64+ cache lines, etc). More info: viewtopic.php?f=4&t=30250&start=175#p454275 ( Important: care about OGRE_DOUBLE_PRECISION before starting to code it). I think that should be done for 2.0 and then in 2.1 taking benefit from it (In 2.0 just the SoA classes, shouldn't be complex)We can create the system and use SoA adjusted for 1 component (XYZXYZXYZ) when we finish 2.1, we can change at compile time to use 4 component (XXXXYYYYZZZZ)
+1. Also I doubt this is going to do a big boost: the idea is doing occlusion culling so at the end we're not to render big queues.Note that 2.4 can be done at anytime, and it can be done by GSoC students.
CABAListic wrote:saejox wrote:that was nice read. thanks
what alternative do you propose for boost::thread?
Given the kind of threading design that would benefit the Ogre core, we'll need something along the lines of Intel's tbb. Now, given tbb's unpopular license, it won't become a dependency. But since people might have their own task-based threading system in place which they'd like Ogre to use, a possible way might be to use an abstracted task scheduling system that, as its backend, can use tbb or a custom backend. We could then, as the default choice for people who are not using either, offer a more light-weight alternative based on e.g. cpptasks or perhaps even our own implementation in Ogre.
But this is just speculation on my part on what might work.
Klaim wrote:The Resource system has it's limitations, and not very threadable. It could have a rewrite too. But... let's just keep it one step at a time.
There were discussions at some point to make it a component or something optional because some of us (me included) need to setup a game/app-specific resource system which then would inject resources into Ogre itself. Can't remember if there was some agreement on this. Anywyay it's a concern that is not directly linked to all you said so I guess it's ok to put on the side or to have some work done on this in 1.x
Herb wrote:As for Boost, I agree with the comments. I actually like the fact that I can select what threading library to use, as for example, I use POCO instead of Boost. Really, if Boost is a requirement, then we should actually "use" it's features throughout the library. But, as for threading, has anyone looked at the threading support in C++11? I thought threading support was baked into that and that should be cross-platform, pending Visual Studio has it implemented (most things I find the GNU guys have already baked in).
Herb wrote:Couple of thoughts.... I like the idea in principle of separating Ogre into more smaller components, but I also realize there can be more complexity with that model "if" you're using a majority of the components. More DLL's to load and register to use for the components. I guess I'm speaking more towards a person who's new to Ogre as there are so many other components to integrate already before even thinking about integrating components within Ogre. If nothing else, it's a thought to consider if that moves forward.
As for Boost, I agree with the comments. I actually like the fact that I can select what threading library to use, as for example, I use POCO instead of Boost. Really, if Boost is a requirement, then we should actually "use" it's features throughout the library.
But, as for threading, has anyone looked at the threading support in C++11? I thought threading support was baked into that and that should be cross-platform, pending Visual Studio has it implemented (most things I find the GNU guys have already baked in).
Klaim wrote:There are two things I see:
1. the call to renderOneFrame()
2. resource loading
1) can be done only with always the same thread (currently - can it be fixed?)
Some parts of 1) can be asynchronous tasks spawned (animation update? etc.).
2) can be all asynchronous tasks.
The user control what thread is calling 1), so it might be the main thread or another thread.
My understanding is that:
A. Ogre itself don't need to spawn threads itself (I mean the core). 1) is controlled by the user code, 2) should pass control to user's tasks scheduler (to avoid subscription)
B. Ogre needs to provide potentially asynchronous tasks to be crunched by worker threads (which mean potentially in a linear execution if there is only the main thread running)
C. Ogre can provide an implementation of a tasks scheduler (which spawn and manage worker thread(s)) IF the user don't explicitly provide his own. As it would be optional (but default), it would be a component (almost as now?)
lunkhound wrote:I have some concerns about the whole SoA thing though. I worry that it may be alot of developer-pain for very little gain. Considering that:
1. SoA isn't necessary to fix the cache-misses. Cache misses can be fixed by reorganizing how data is laid out but without interleaving vector components of different vectors together.
2. SoA isn't necessary to improve performance of vector math using SIMD. OK maybe you don't get the full benefit of SIMD and not in all cases but you can probably get 70% of SoA performance simply by using a SIMD-ified vector library.
3. SoA is not easy to work with. Code is harder to read, harder to debug, more effort to maintain going forward. Imagine inspecting Node structures in the debugger when the vector components are interleaved in memory with other Nodes...
I think SoA is best for a limited-scope, highly optimized, tight loop where every cycle counts, and only affecting a small amount of code. Kind of like assembly language, SoA comes with a cost in developer time and I'm just not sure it would be worth it.
Thanks again for all the work on those slides. I'm really glad to see these issues being raised.
for( int i=0; i4 )
{
/* prefetch() around here */
//We're updating 4 elements here.
const SoA_Vector3 &parentPos = mChunk[level+0].pos + i;
SoA_Vector3 &localPos = mChunks[level+1].pos + i;
SoA_Vector3 &derivedPos = mChunks[level+1].derivedPos + i;
const SoA_Quaternion &parentRot = mChunk[level+0].rot + i;
SoA_Quaternion &localRot = mChunk[level+1].rot + i;
SoA_Quaternion &derivedRot = mChunk[level+1].derivedRot + i;
const SoA_Vector3 &parentScale = mChunk[level+0].scale + i;
SoA_Vector3 &localScale = mChunk[level+1].scale + i;
SoA_Vector3 &derivedScale = mChunk[level+1].derivedScale + i;
SoA_Matrix4 &derivedTransform = mChunk[level+1].transform + i;
derivedPos = parentPos + parentRot * (parentScale * localPos);
derivedRot = parentRot * localRot; //fsel() to see if we should parentRot should be identity rot.
derivedScale = parentScale * localScale; //fsel() here too.
derivedTransform = NonTemporal( SoA_Matrix4( derivedPos, derivedRot, derivedScale ) );
}
for( int i=0; i4 ) //Actually, it's not "+= 4", but rather += compile_time_number_of_simd_elements_macro
{
/* prefetch() around here */
//We're updating 4 elements here.
SoA_Vector3 &localPos = mChunks[level+1].pos + i;
SoA_Quaternion &localRot = mChunk[level+1].rot + i;
SoA_Vector3 &localScale = mChunk[level+1].scale + i;
const SoA_Matrix4 &parentTransform = mChunk[level+0].transform + i;
SoA_Matrix4 &derivedTransform = mChunk[level+1].transform + i;
SoA_Matrix4 localTransform = SoA_Matrix4( localPos, localRot, localScale ); //Use fsel for rot & scale
derivedTransform = NonTemporal( parentTransform * localTransform );
}
dark_sylinc wrote:My view is that renderOneFrame() is called from one thread. The user may want to update it's logic & physics in the same thread, or in another one.
As for Ogre's managament of threads:
- The CompositorManager must have a high degree of control over it's batch threads.
- The animation & scenenode transform update may have it's own threads. Because their jobs are fairly trivial (and there are many ways to split the work), the idea of a TaskScheduler provided by the user seems fine to me.
Note that all components (including, and specially, the CompositorManager) should accept a hint on number of threads they can spawn, in order to prevent oversubscription (i.e. the user wants to run many threads for himself, unrelated with Ogre)
Well, actually I don't think it's fair to compare Unity to Ogre that way. Unity is a full game engine, very featured, with an awesome editor perfectly married with the engine. Also quite optimized, specially last year versions. Ogre is a render engine, just that, which urgently needs a redesign focussed on optimization and DX11/OGL4 arquitecture. It's not cool seeing that a complex scene runs twice faster in UDK or even Unity. It's not very cool either how each compositor render_scene pass culls the whole scene again, etc.and Unity, which strangely was not mentioned even if it is the greatest Ogre-killer between AAs) are TOOLS. lots of excellent tools for artists and designers.
What kind of tools do you want to see? A scene editor? Material editor? But that's again the same story: ogre is just a render engine, should not provide any kind of high level tool. Just mesh/material importer/exporters and mesh optimization tools, not much more, IMHO.the docs are great but the biggest setback Ogre has in regard of the said engines (and Unity, which strangely was not mentioned even if it is the greatest Ogre-killer between AAs) are TOOLS. lots of excellent tools for artists and designers.
Actually this is my case. Of course I'm not leaving Ogre, but I'm quite concerned about 1.8.X/1.9.X performance. I think any ogre developer using some compositors(specially involving render_scene passes), high quality shadows(cascaded shadow mapping with 3 or 4 shadow maps, for example) and any kind of water system(which will need at least 2 more render passes: reflection, refraction. Depth map may be shared with other depth-based effects, like the used for DOF and similar) shares my concerns about ogre performance.Thinking that devs are leaving Ogre because it is "not fast enough for AAAs" means completely missing the point.
IMHO PCs and next gen consoles are not a very restrictive use case. But indeed, it would be nice put some attention on ARM, although mobile SoC are evolving very fast. Anyway I think the development should be focussed on "next-gen" PC and consoles architecture(aka DX11) rather than on limited mobile ones(GLES2 / 3?).The DICE papers might be good for their very restrictive use cases (next gen consoles and PCs) but fail quite badly when you try to make, say, an Android game.
+1!There are many opportunities for SSE2+ and cache friendly structures as mentioned in the paper.
Ogre is already the most usable open source rendering engine, it just need to be faster and less resource hungry to be more competitive.
Xavyiy wrote:Although I think the development should be focussed on "next-gen" PC and consoles architecture(aka DX11) rather than in limited mobile ones(GLES2 / 3?).
Xavyiy wrote:Unity is a full game engine, very featured, with an awesome editor perfectly married with the engine. Also quite optimized, specially last year versions. Ogre is a render engine, just that, which urgently needs a redesign focussed on optimization
saejox wrote:Does 2.0 aim for better performance or better usability?
saejox wrote:Ogre is already the most usable open source rendering engine, it just need to be faster and less resource hungry to be more competitive.
_tommo_ wrote:to me, Ogre, as pure graphics engine needs to NOT expose any threading system.
Xavyiy wrote:Well, actually I don't think it's fair to compare Unity to Ogre that way. Unity is a full game engine, very featured, with an awesome editor perfectly married with the engine. Also quite optimized, specially last year versions. Ogre is a render engine, just that, which urgently needs a redesign focussed on optimization and DX11/OGL4 arquitecture. It's not cool seeing that a complex scene runs twice faster in UDK or even Unity. It's not very cool either how each compositor render_scene pass culls the whole scene again, etc.
Xavyiy wrote:What kind of tools do you want to see? A scene editor? Material editor? But that's again the same story: ogre is just a render engine, should not provide any kind of high level tool. Just mesh/material importer/exporters and mesh optimization tools, not much more, IMHO.
Xavyiy wrote:I've the feeling that whatever the 2.0 roadmap will be, it'll not be ideal for the whole community. I would like to read concrete solutions rather than "general ideas", since I see a very low SNR in all ogre redesign threads (of course! each person has its own interests, but things must move ahead!)
_tommo_ wrote:to me, Ogre, as pure graphics engine needs to NOT expose any threading system.
I don't like at all the idea that a renderer will "take life of its own" and start spawning threads unless I do some arcane forms of control (ie. subclassing the default task manager class).
The default should be simplicity.
As for Ogre's managament of threads:
The CompositorManager must have a high degree of control over it's batch threads.
I don't understand this. To me, whatever the kind of parallel work, it should work with the task scheduler underneath, the same way parallel_for in tbb will spawn tasks for each batch of cycle.
_tommo_ wrote:PS: imo all of Ogre 2.0 should aim at being a pure graphics library, focusing on simplicity. And this imo means dropping a lot of existing functionality, and becoming more passive on which role Ogre takes in a game engine architecture.
Basically everyone that approaches Ogre feels the urge to place it at the cornerstone of its engine (with no decoupling between maths, threading, and scene managing between rendering & logic ), and Ogre is responsible of this because of the current all-encompassing architecture.
_tommo_ wrote:PPS: the docs are great but the biggest setback Ogre has in regard of the said engines (and Unity, which strangely was not mentioned even if it is the greatest Ogre-killer between AAs) are TOOLS. lots of excellent tools for artists and designers.
So along of a simplification of the graphic library itself, there should be a serious effort in making the engine useful, as in, in the real world. Thinking that devs are leaving Ogre because it is "not fast enough for AAAs" means completely missing the point.
_tommo_ wrote:Thinking that devs are leaving Ogre because it is "not fast enough for AAAs" means completely missing the point.
_tommo_ wrote:PPPS: most of the proposed ways of "optimizing" by bruteforcing jumps or switching full-on to SoA + SIMD just ignore that Ogre today needs to run energy-efficiently on cheap ARMs much more than squeezing SSE2 archs and are probably best ignored, and are indeed an ugly case of optimizing without even thinking what the use case will be.
The DICE papers might be good for their very restrictive use cases (next gen consoles and PCs) but fail quite badly when you try to make, say, an Android game.
saejox wrote:If it is going to thread-safe it means hundreds of mutexes in every function.
Goodbye performance.
saejox wrote:Ogre already has many shared_ptrs and locks, even tho it is not thread-safe.
I think all those useless locks and shared_ptr should be removed.
No need to wait for a big release for that.
Mako_energy wrote:I hear this often among Ogre users more experienced than I, however I can't really see how this is true. Ogre does SO MUCH, I feel it is half-way to a game engine and as I have stated in some other posts the resource system is a large part of that. I completely agree that what you are saying is how Ogre should be...but I can't at all agree that's what it is. Breaking off more things into components or plugins is needed. Starting with the resource system, imo.
Xavyiy wrote:I would like to read concrete solutions rather than "general ideas", since I see a very low SNR in all ogre redesign threads
dark_sylinc wrote:lunkhound wrote:I have some concerns about the whole SoA thing though. I worry that it may be alot of developer-pain for very little gain. Considering that:
1. SoA isn't necessary to fix the cache-misses. Cache misses can be fixed by reorganizing how data is laid out but without interleaving vector components of different vectors together.
2. SoA isn't necessary to improve performance of vector math using SIMD. OK maybe you don't get the full benefit of SIMD and not in all cases but you can probably get 70% of SoA performance simply by using a SIMD-ified vector library.
3. SoA is not easy to work with. Code is harder to read, harder to debug, more effort to maintain going forward. Imagine inspecting Node structures in the debugger when the vector components are interleaved in memory with other Nodes...
I think SoA is best for a limited-scope, highly optimized, tight loop where every cycle counts, and only affecting a small amount of code. Kind of like assembly language, SoA comes with a cost in developer time and I'm just not sure it would be worth it.
Thanks again for all the work on those slides. I'm really glad to see these issues being raised.
You're right about your concerns. So let me address them:
1. It is true that there are other ways to optimize the data. However, transformation and culling is something that is actually fairly trivial operations, which are are done sequentially on massive amount of elements. Note that the interleaving is for SIMD. An arragement of "XYZXYZXYZ" is possible by specifying 1 float per object at compile time.
The performance gains of using SoA for critical elements such as position & matrices are documented in the SCEE's paper (reference 4)
struct StructureOfArrays
{
float x[numVertices];
float y[numVertices];
float z[numVertices];
...
};
dark_sylinc wrote:2. We already do SIMD math and tries to do it's best. There are huge margins to gain using SoA + SIMD because the access patterns and the massive number of operations to perform fit exactly the way SSE2 works. There's a lot of overhead in unpacking & packing.
DICE's Culling the Battlefield slides show the big gains of using SoA + SIMD (reference 3)
dark_sylinc wrote:It's very true that debugging becomes much harder, specially when examining a single Entity or SceneNode.
I see two complementary solutions:
- Use getPosition() would retrieve the scalar version; which can be called from the watch window (as long as we ensure it's fully const...)
- There are a few MSVC features (I don't remember if they had to be installed, or if they were defined through pragmas) that tell MSVC how to read objects while debugging. I'm sure gdb probably has something similar.
Some programmer pops up, decides he needs X thing implemented to render his stuff the way he wants, without investing much time if there was already a way of achieving the same result; then he submit his change and gets into core
lunkhound wrote:Sorry, I didn't make myself clear. I agree that SCEE paper is exactly the sort of thing that we ought to be doing, but it doesn't mention SoA as I understand it. When I see "SoA" I think of this: http://software.intel.com/en-us/articles/how-to-manipulate-data-structure-to-optimize-memory-use-on-32-bit-intel-architecture
- Code: Select all
struct StructureOfArrays { float x[numVertices]; float y[numVertices]; float z[numVertices]; ... };
Intel has been telling everyone to swizzle their data like this ever since they came out with MMX. My comments were ONLY directed at this Intel-style swizzling, and not at the sort of grouping of homogeneous data structures featured in the SCEE reference. I will refer to it as "swizzling" and not "SoA" for clarity.
class SceneNode
{
Vector3 *Position; //ptr
Quaternion *qRot;//ptr
Quaternion *qRot;//ptr
Matrix4 *matrix; //ptr
}
lunkhound wrote:I think looking at those DICE slides again actually convinced me that there is very little to gain from keeping stuff in swizzled format in memory. Just swizzle the frustum planes on the fly and a bit of optimized SIMD code will yield great performance.
If there are any performance gains to be had from swizzling the SceneNodes in memory, I would expect them to be tiny and not at all worth the trouble it would cause every user who has to examine a SceneNode in the debugger.
However, I'm sure there are cases where it would make sense, like a particle-system.
spookyboo wrote:Some programmer pops up, decides he needs X thing implemented to render his stuff the way he wants, without investing much time if there was already a way of achieving the same result; then he submit his change and gets into core
This is indeed the disadvantage of open source. If you want to redesign Ogre, you need a dedicated team that sticks 'till the end' and has a clear vision.
Klaim wrote:As for Ogre's managament of threads:
- The CompositorManager must have a high degree of control over it's batch threads.
I don't understand this. To me, whatever the kind of parallel work, it should work with the task scheduler underneath, the same way parallel_for in tbb will spawn tasks for each batch of cycle.
dark_sylinc wrote:To prevent oversubscription, tell Ogre at startup how many threads it can spawn at max.
dark_sylinc wrote:lunkhound wrote:Sorry, I didn't make myself clear. I agree that SCEE paper is exactly the sort of thing that we ought to be doing, but it doesn't mention SoA as I understand it. When I see "SoA" I think of this: http://software.intel.com/en-us/articles/how-to-manipulate-data-structure-to-optimize-memory-use-on-32-bit-intel-architecture
- Code: Select all
struct StructureOfArrays { float x[numVertices]; float y[numVertices]; float z[numVertices]; ... };
Intel has been telling everyone to swizzle their data like this ever since they came out with MMX. My comments were ONLY directed at this Intel-style swizzling, and not at the sort of grouping of homogeneous data structures featured in the SCEE reference. I will refer to it as "swizzling" and not "SoA" for clarity.
Oh I see. SCEE's is technically "SoA" which stands for Structure of Arrays (or ptrs). If we look at the SceneNode declaration from SCEE's, it is:
- Code: Select all
class SceneNode { Vector3 *Position; //ptr Quaternion *qRot;//ptr Quaternion *qRot;//ptr Matrix4 *matrix; //ptr }
dark_sylinc wrote:Indeed, Intel's proposal since the introduction of MMX sucked hard. Because when we need to go scalar (we know that happens sooner or later) reading X, Y & Z are three cache fetches, because they're too far a part. It's horrible. Not to mention very inflexible.
That's why I came out with the idea of interleaving the data as XXXXYYYYZZZZ: When we go scalar, it is still one fetch (in systems that fetch 64-byte lines).
lunkhound wrote:I think looking at those DICE slides again actually convinced me that there is very little to gain from keeping stuff in swizzled format in memory. Just swizzle the frustum planes on the fly and a bit of optimized SIMD code will yield great performance.
If there are any performance gains to be had from swizzling the SceneNodes in memory, I would expect them to be tiny and not at all worth the trouble it would cause every user who has to examine a SceneNode in the debugger.
However, I'm sure there are cases where it would make sense, like a particle-system.
Actually, we have nothing to lose and possible something to gain (performance). And I'll tell you why:
Regardless of whether you want to swizzle in memory, or swizzle using instructions; we still have to write the code that ensure all memory is contiguous. Even if we don't use SSE at all (we would use XYZXYZ model, that is, specifying one float instead of four at compile time) we need continuity, and being able to load from memory without data depedencies.
My idea is that in PC systems, default to four floats, and use SSE. However if you really, really think debugging is going to be a big problem (even with MSVC's custom data display, I admit not everyone uses MSVC), then compile using one float; and there can be also a "SoA_Vector3" implementation that uses packing instructions to swizzle the memory onto the registers on the fly.
After all SoA_Vector3 & co. is platform dependant. In PCs with 4 floats per object, it will use SSE intrinsics. In ARM with 2 & 4 floats per object, it will use NEON.
In PCs with 1 float per object, it can use scalar operations... or packing+shuffling SSE intrinsics and still operate 4 objects at the time, like you suggest.
So, it is a win-win situation. We can have it my way and your way too, with minimal effort (other than writing multiple versions of SoA_Vector3, SoA_Quaternion & SoA_Matrix4). The magic happens in the memory manager that will dictate how the SoA memory gets allocated & arranged. The rest of the systems are totally abstracted from the number of floats interleaved.
Sqeaky wrote:It may be easier to understand if you think of all multithreaded code as providing and expecting guarantees. Different task/workunit scheduling algorithms expect different amounts of thread-safety from their workunits and interact with their workunits based on these assumptions. Some schedulers require no thread safety, some require just re-entrancy, some require full data write isolation, and there are other more esoteric requirements that are possible. Tasks/WorkUnits will also implicitly make assumptions of their schedulers. They are written differently if workunits finish in a known order, if two workunits are guaranteed to not access the same resources, if every data access need to be wrapped in a mutex/atomic cas, and based what information the scheduler provides the workunit.
If the default Ogre task scheduler provides a certain guarantees and the game developer provides a task scheduler of his own it must provide at least the same guarantees. If the new scheduler provides more, the Ogre tasks/workunits will not be able to take full advantage of the because they are already written. If he provides fewer guarantees he will likely introduce race conditions or deadlocks.
For a more concrete example please consider Apples LibDispatch ( http://libdispatch.macosforge.org/ ), which uses a custom barrier primitive, custom semaphores and communication with the scheduler to ensure data consistency and an Apache WorkQueue( http://cxf.apache.org/javadoc/latest/or ... Queue.html ) which implicitly assumes that the work unit will provide it own consistency mechanism without any communication from the Queue, and implements a timeout to provide other guarantees. It would be very difficult to write a work unit that would work ideally in both places.
The parallel_for construct in Threading Building Blocks really is different class of construct than a scheduler. It is a threading primitive designed to parallelize obviously parellizable problems. I do not know, but I suspect that many parts of Ogre are not obviously parellizable, and I suspect that any threading algorithm used in Ogre must be carefully designed to get maximum performance.
dark_sylinc wrote:What I meant is that the control over the batch (worker) threads is too advanced. Creating a generic task scheduler that would run on is not a trivial issue at all. May be something for the far future, IF it seems to be viable.
To prevent oversubscription, tell Ogre at startup how many threads it can spawn at max.
Do you have a few links of similar implementations of what you have in mind? Because I think I'm no seeing what you see.
Klaim wrote:So far, my understanding is that all task scheduler implementations (even an synchronous one) only provide a "at some point in the future, the provided task will be executed" guarantee at least.
Klaim wrote:The default task scheduler should do exactly: execute task now synchronously. (call the task execution immediately in the same thread) It's the simplest one and don't need any dependency.
I don't see the relationship between race conditions/deadlocks and tasks schedulers because to me it's the user code that have to protect shared data (or not share it).
masterfalcon wrote:From what I understand our requirements would be, libdispatch is nearly perfect. But I don't know about its platform support at this time.
jwatte wrote:1) manage mesh->entity objects
2) implement spatial hierarchy
3) do visibility culling
4) do state management/sorting
5) do pass management/sorting
6) load terrain data
7) generate terrain/heightmaps
manage sets of (lights,cameras,billboards,animations,entities)
9) do sky rendering
tuan kuranes wrote: Transform Stage -> Ogre transforms the buffers (handling page/locality/etc) filling an Cull buffers
Cull Stage -> Culling per render target fills Ogre renderqueues
Shading Stage -> Shade/Render each renderqueue according to its "shading/rendering" type into a dx/gl/etc. command buffer
Execute Stage -> merge (or dispatch between GPU/Tiles/etc) all command buffer and execute them (asynchronously)
wrote:
Couple of thoughts.... I like the idea in principle of separating Ogre into more smaller components, but I also realize there can be more complexity with that model "if" you're using a majority of the components. More DLL's to load and register to use for the components. I guess I'm speaking more towards a person who's new to Ogre as there are so many other components to integrate already before even thinking about integrating components within Ogre. If nothing else, it's a thought to consider if that moves forward.
Regarding my view on "separate components":
However, this doesn't mean that each "component" can go into a DLL or static lib. There are multiple issues that can't be easily tackled. This level of modularization is something that sounds nice, but ends up being impractical.Furthermore the bigger the project, the bigger the chances there's some dependency between seemingly independent modules. I could've written about this dependencies in the slides, but it's very implementation-specific, and it would misslead from the main topic.When objects are highly modular, they're easier to mantain, refactor, or even replace. But that doesn't mean it has to go each in it's own dll or lib.
It seems every body agrees Transforms -> Cull -> Complex Visuals -> Render, but which stages depend on each other for data and which connections are essential? How many of these become into 1 Task/WorkUnit and how many become some data driven amount of Tasks/WorkUnits?
DanielSefton wrote:If you're building a game for windows, choose DirectX at compile time, if you're building for Mac, choose OpenGL at compile time etc. There's no benefit for allowing the user to swap between the two.
Klaim wrote:DanielSefton wrote:If you're building a game for windows, choose DirectX at compile time, if you're building for Mac, choose OpenGL at compile time etc. There's no benefit for allowing the user to swap between the two.
+++
Definitely agree. That being said, the renderer switching feature is still useful for some kind of application (like samples?), so if they are still considered necessary, I dont see how to avoid it...
DanielSefton wrote:If you're building a game for windows, choose DirectX at compile time, if you're building for Mac, choose OpenGL at compile time etc. There's no benefit for allowing the user to swap between the two.
masterfalcon wrote:My understanding of some of the plans is a little sketchy. I haven't been keeping up with it as much as I should have. But I believe one of the main goals is speed up scene graph updates via threads.
Sqeaky wrote:Sqeaky's whole posts
Sqeaky wrote:It seems every body agrees Transforms -> Cull -> Complex Visuals -> Render, but which stages depend on each other for data and which connections are essential? How many of these become into 1 Task/WorkUnit and how many become some data driven amount of Tasks/WorkUnits?
tuan kuranes wrote:For Each scene: const shared Nodes Buffer -> Transformed Nodes Buffer
For Each Viewport: const shared Transformed Nodes Buffer -> Culled Transformed Node Buffer (threadable)
For Each renderTarget: const shared Culled Transformed Node Buffer -> RenderQueue (threadable)
For Each RenderQueue: const shared Render Queue -> Command Buffer
sparkprime wrote: sparkprime's whole post
sparkprime wrote:What I would propose is doing the work from back to front. First fixing the rendersystem API (...)
sparkprime wrote:(...) It needs cleaning up too, with the removal of fixed function functionality. It seems we're coming close to this with the GL4 and D3D11 work nearing completion.
masterfalcon wrote:start off with a thread for each feature or task so we can have some great, distinct discussions. For example: Threading, RenderSystem design and changes, SceneManager, RTSS, etc.
sparkprime wrote:What I would propose is doing the work from back to front. First fixing the rendersystem API (...)
dark_sylinc wrote:[*]Export integration. So far the biggest complain I get from artists & indie companies is that the export process pipeline sucks. UDK & Unity do this pretty well. There are too many steps involved into getting from 3DS Max/Maya/Blender into the actual game. This is because:
Having that said, most of us don't have the time to work on tools, because tools involve GUI code (many find it boring & frustrating). Making good GUI is an art, and requires a lot of co-developing with artists & designers (after all, those are the users). This forum sadly lacks artists.
- It usually involves setting a material file (by hand, using text! artists don't like text!), being careful not to overwrite a previous material file
- Exporting all the right submeshes; and placing the .mesh folder into the right folder (or setup resources.cfg)
- Setting up an additional file for the custom game engine to link the GameObject with a given Mesh
- Getting a preview means doing all the above (+ launching a custom preview tool, Ogre Meshy, or even loading the full game depending on each case). It's a pita if something was wrong and above steps must be followed all over again. Cuts the iteration process
It's a chicken and egg problem; we can't make appealing tools because we don't have artists to work with, and we don't have artists because we have no appealing tools.
(ex: Kojack/ Spark / Klaim / Tommo ... s
Brocan wrote:What about dropping actual mesh/material formats and adopt a "more-standard" format like fbx?
And made somekind of library for tools, that can be easily used for adding the node/material/what-else editor for your engine editor?
dark_sylinc wrote:Brocan wrote:What about dropping actual mesh/material formats and adopt a "more-standard" format like fbx?
And made somekind of library for tools, that can be easily used for adding the node/material/what-else editor for your engine editor?
What we should aim is to an import/export simplification process just like Unity & UDK do. Not adopting a foreign format as our own.
Wolfmanfx wrote:No unity has no direct FBX support - the unity editor has a fbx importer which is a different thing the runtime has not - a thirdparty runtime importer exist for 300$.
I recently wrote an runtime importer for unity for a custom format but it sucked too you can not use it with unity flash because of the sanboxing and stuff like that.
Half written and never finished - its super unrealistic to say that we are faster with a scratch rewrite, we forget the man hours/years in testing and every workaround which is done over the years to make ogre that stable.
. I agree with you; but it is not an attack toward the engine, these are proposition to make it better now that we got better knowledge of what should be done on today's hardware. Given what we know now, it is easier and faster to rewrite than to modify.When someone read this thread it sounds that OGRE is unusable (too slow)
Well, I guess it's just a personal opinion which doesn't represent to the whole team, although I think (and hope) it's the same too.madmarx wrote:I'd be completely against it.
Thanks for the clear answer! I think it's better for the community to hear clear answer like that from the team, than to have no answer on that subject.
TheSHEEEP wrote:Hmm.. would such a thing (pipeline tools) be an interesting GSoC project?
How many team members are there btw?
Kojack wrote:How many team members are there btw?
Dev team: pjcast, Noman, Praetor, Wolfmanfx, Assaf Raman, CABAListic, masterfalcon, Mattan Furst, spacegaier, TheSHEEEP, Nir Hasson, jbuck
Xavyiy wrote:2.0 -> Cache misses, DX11 & OGL4 RS
2.1 -> Scene manager redesign: scene traversal & processing
2.3 -> FF -> "states"
2.4 -> Vertex format enhancements
2.5 - 2.9 -> Fix bugs. Remaining stuff
3.0 -> First stable version of the "new ogre"
drwbns wrote:Ah ok. I was wondering, is there any reason why Ogre can't have a donation system for feature changes / additions? Or is that not really the question?