Ogre 2.0 doc (slides) - Updated 1st dec 2012

Top

Re: Ogre 2.0 doc (slides)

by Wolfmanfx » Mon Nov 19, 2012 7:06 am

Awesome work!
1. we need to update our roadmaps but more importatnt we need to create working packages regardless if we do the work or mentor it through GSoc.
2. Jira should be used for that task planning + wiki is outdated
3. we should start to search out students for next year GSoc maybe right now - a campaign ;)
Image
Follow me on twitter
User avatar
Wolfmanfx
OGRE Team Member
OGRE Team Member
 
Posts: 1384
Kudos: 93
Joined: 03 Feb 2006
Location: Austria - Leoben
  • Website
  • ICQ
Top

Re: Ogre 2.0 doc (slides)

by syedhs » Mon Nov 19, 2012 7:08 am

Awesome effort to at least document what to dos... :)
A willow deeply scarred, somebody's broken heart
And a washed-out dream
They follow the pattern of the wind, ya' see
Cause they got no place to be
That's why I'm starting with me
User avatar
syedhs
Silver Sponsor
Silver Sponsor
 
Posts: 2596
Kudos: 42
Joined: 29 Aug 2005
Location: Kuala Lumpur, Malaysia
Top

Re: Ogre 2.0 doc (slides)

by DanielSefton » Mon Nov 19, 2012 11:25 am

This is excellent, it addresses everything wrong with Ogre in comparison to modern techniques. I lol'd at the confusion memes. :)

Ideally Ogre requires a complete rewrite from the core. It's matured around an architecture designed over a decade ago. What worries me is who's going to be up to the challenge to essentially write a brand new engine with Data-Oriented paradigms? Chipping away at the current codebase would be like painting a rotting fence.

Edit: I haven't kept up with discussions on Ogre 2.0, I'm guessing the plan was to start a fresh anyway?
Twitter / Blog / TidalWare
User avatar
DanielSefton
Ogre Magi
 
Posts: 1220
Kudos: 8
Joined: 26 Oct 2007
Location: Mountain View, CA
  • Website
Top

Re: Ogre 2.0 doc (slides)

by Klaim » Mon Nov 19, 2012 12:13 pm

Nice work!

dark_sylinc wrote:[*] No Boost please. Most of it is bloated for a performance intensive application (and compile times!). Not to mention the executable size sky rockets by unbelievable factors (10x is not uncommon); and debugging the call stack of a thread created with boost is a major pita when compared to clean callstacks and thread names in the Thread window when using native thread creation. Using Boost can be very tempting, but we do everyone a service by refraining ourselves from using it in Ogre.[/list]



I kind of agree with you on this but not with everything in boost (the libraries I use don't have these effects, but I choose them with great care and I use the c++11 standard library), so I wanted to ask if you took a look at the recent containers in boost which, I think some could be very helpful in the design you are suggesting, like flat containers? (basically built around a vector) It's unperfect with VS Debugger data visualization though. Anyway at least the concept (or the code) could be reused in ogre because these containers purpose are mainly to improve performance on search and traversal by making data contiguous, making insertion and removing slower but then more appropriate for modeling data I think (while not appropriate to build lists in each frame for example).
NetRush - Klaim's Den
User avatar
Klaim
Deity
 
Posts: 2150
Kudos: 38
Joined: 11 Sep 2005
Location: Lille, France
  • Website
Top

Re: Ogre 2.0 doc (slides)

by stealth977 » Mon Nov 19, 2012 12:48 pm

my 50 cents about external libraries, and especially for boost:

If OGRE is going to use only a few functionality of a library, adding that library as a dependency is usually overkill. That kind of functionality usually doesnt need maintenance, so a little time can be spared to include OGRE's own implementation without the burden of a huge dependency.

Especially boost, which is a great library, but OGRE makes use of maybe %1 of it and maybe even less. Also what OGRE uses from boost is a static functionality. What I mean by static is : Once OGRE implements its own code instead of boost's, that part of the code will most of the time wont need any further modifications, and the functionality we borrow from boost already has well tested and easy to implement example codes all over the net...

So, i would strongly suggest dropping boost and instead using own code for the very little static functionality OGRE needs...
Ismail TARIM
Ogitor - Ogre Scene Editor
WWW: http://www.ogitor.org
Source: http://hg.ogitor.org/development - Unstable
Source: http://hg.ogitor.org/v0-4 - Stable

For this message the author stealth977 has received kudos
User avatar
stealth977
Hobgoblin
 
Posts: 572
Kudos: 24
Joined: 15 Dec 2008
Location: Istanbul, Turkey
Top

Re: Ogre 2.0 doc (slides)

by Klaim » Mon Nov 19, 2012 12:52 pm

stealth977 wrote:So, i would strongly suggest dropping boost and instead using own code for the very little static functionality OGRE needs...


Currently, boost isn't a dependency in Ogre, it's optional for threading support. Shared pointers and all were, if I remember correctly, copy/pasted in Ogre from boost (or other libraries) a long time ago, as you suggest. I think it's the good way to go too (even if as pointed, shared pointers in ogre are not very good).
NetRush - Klaim's Den
User avatar
Klaim
Deity
 
Posts: 2150
Kudos: 38
Joined: 11 Sep 2005
Location: Lille, France
  • Website
Top

Re: Ogre 2.0 doc (slides)

by saejox » Mon Nov 19, 2012 1:54 pm

that was nice read. thanks :)

what alternative do you propose for boost::thread?
Nimet - Advanced Ogre3D Mesh/dotScene Viewer
asPEEK - Remote Angelscript debugger with html interface
ogreHTML - HTML5 user interfaces in Ogre
User avatar
saejox
Goblin
 
Posts: 260
Kudos: 34
Joined: 25 Oct 2011
Top

Re: Ogre 2.0 doc (slides)

by CABAListic » Mon Nov 19, 2012 2:22 pm

saejox wrote:that was nice read. thanks :)

what alternative do you propose for boost::thread?


Given the kind of threading design that would benefit the Ogre core, we'll need something along the lines of Intel's tbb. Now, given tbb's unpopular license, it won't become a dependency. But since people might have their own task-based threading system in place which they'd like Ogre to use, a possible way might be to use an abstracted task scheduling system that, as its backend, can use tbb or a custom backend. We could then, as the default choice for people who are not using either, offer a more light-weight alternative based on e.g. cpptasks or perhaps even our own implementation in Ogre.
But this is just speculation on my part on what might work.
CABAListic
OGRE Team Member
OGRE Team Member
 
Posts: 2902
Kudos: 58
Joined: 18 Jan 2007
  • Website
Top

Re: Ogre 2.0 doc (slides)

by dark_sylinc » Mon Nov 19, 2012 2:42 pm

saejox wrote:that was nice read. thanks :)

what alternative do you propose for boost::thread?

Encapsulating native thread creation into generic classes should be enough. It's not hard at all. "CreateThread" is more than enough.
Boost has a few tempting solutions that are meant to spawn a thread at any time, and execute it immediately or delay it's execution until a "submit" command (aka. tasks). Fortunately we won't miss that, because such pattern leads to obscure race conditions, lack of control over what's going on, and other weird bugs. Furthermore spawning a new thread is quite expensive. The problem of tasks is that they're hard to track, so it's design has to be thought in advance.
On 90% of cases, the best solution is to spawn worker threads at startup, and keep it suspended until certain condition (using sync barriers or condition variables).
Writing working threads reyling on a core class and shared data that execute a predefined algorithm with synchronization classes works better on the long run. It's not very different from OpenCL's approach, except that threads are dormant.

I haven't tried posix threads on Win32, but afaik on Windows they just wrap to OS native functions (except for functionality that isn't present).

The only feature we would miss is the mutex familiy of boost. Note that in Win32 most threading functions are completely broken (except for those added after Vista)
PulseEvent is broken, and SignalObjectAndWait does not do what the name suggests (it signals and then waits, but both operations are not atomic)
This makes fast condition variables impossible on WinXP, and making a sync barrier tricky (but possible!). Implementations that claim to do condition variables on XP are either using a mutex, or wrongly relying in SignalObjectAndWait to be atomic; or they are using a custom-made driver running in kernel mode (I've never seen that though).
Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
User avatar
dark_sylinc
Google Summer of Code Student
Google Summer of Code Student
 
Posts: 718
Kudos: 136
Joined: 21 Jul 2007
Location: Buenos Aires, Argentina
  • Website
Top

Re: Ogre 2.0 doc (slides)

by dark_sylinc » Mon Nov 19, 2012 3:00 pm

Klaim wrote:so I wanted to ask if you took a look at the recent containers in boost which, I think some could be very helpful in the design you are suggesting, like flat containers? (basically built around a vector) It's unperfect with VS Debugger data visualization though.

Yes, flat containers are just vectors using std::lower_bound and insert. That's all. In fact, I use lower_bound directly in Distant Souls, no debugger visualization issues :) (Boost already had the linky, that's very nice of them)

Note that "flat" containers are for fast lookup, fast iteration, but slow removal, slow insertion (which is usually what Ogre wants). Note that the HighLevelCull may want to keep stuff ordered for SIMD, which sorting rules doesn't have to match the standard sort. i.e.
We have objects A & C. "[b]" is for blank spaces. Three As get created, then one C. The high level cull organizes them like this for efficient SSE2:
Code: Select all
AAA[b] C[b][b][b]


Then two As get created, then another two Cs:
Code: Select all
AAAA CCC[b] A[b][b][b]


If we were using a flat container, it would force us to move all Cs by inserting the As in the middle, which is unnecessary:
Code: Select all
AAAA A[b][b][b] CCC[b]


Both outputs are legal for the HighLevelCull as they satisfy the locality and SIMD rules. Which one is better, however, depends on profiling. My wild guess is the former will beat the latter every time. But I could be proven wrong.

Also, note that while doing (i.e) transformations, we don't have to iterate through the Entity pointers and reading their SoA_Vector3s. We can just read the mChunk of the position, rotation & matrices, and iterate them directly (because reading the pointers to read the SoA_Vector3 address would add an additional indirection for the CPU)
Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
User avatar
dark_sylinc
Google Summer of Code Student
Google Summer of Code Student
 
Posts: 718
Kudos: 136
Joined: 21 Jul 2007
Location: Buenos Aires, Argentina
  • Website
Top

Re: Ogre 2.0 doc (slides)

by dark_sylinc » Mon Nov 19, 2012 3:19 pm

DanielSefton wrote:Ideally Ogre requires a complete rewrite from the core. It's matured around an architecture designed over a decade ago. What worries me is who's going to be up to the challenge to essentially write a brand new engine with Data-Oriented paradigms? Chipping away at the current codebase would be like painting a rotting fence.

Edit: I haven't kept up with discussions on Ogre 2.0, I'm guessing the plan was to start a fresh anyway?

The plan is to start fresh.

My view is that we can keep the Script parser (i.e. Materials), the Mesh format (although with some modifications), Vector, Quaternions & Matrix are great (we just need to add their SoA counterparts), the Shadow camera setup math still works, the Camera works (only change is Frustum check code), the Vertex declaration & binding is fine (although, we could remove the "start offset" parameter, and reinforce declaration order as important, that would be less error prone and easier to manage, I believe; anyway it's not a major change) there's a lot of stuff in RenderSystems that can be reused (from Ogre-related stuff like Depth sharing & RTTs management to HW management like window creation & device lost) and even the Resource system.
The Resource system has it's limitations, and not very threadable. It could have a rewrite too. But... let's just keep it one step at a time. The only thing that needs a rewrite is our "read only" initialization pattern which I talked in the DX11 thread: DX11 requires that read only memory has to be initialiazed on creation, but we first create then lock() and fill data once; this is holding DX11 back in terms of optimization.
Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
User avatar
dark_sylinc
Google Summer of Code Student
Google Summer of Code Student
 
Posts: 718
Kudos: 136
Joined: 21 Jul 2007
Location: Buenos Aires, Argentina
  • Website
Top

Re: Ogre 2.0 doc (slides)

by Klaim » Mon Nov 19, 2012 3:43 pm

dark_sylinc wrote:Encapsulating native thread creation into generic classes should be enough. It's not hard at all. "CreateThread" is more than enough.[...]


I agree with Cabalistic on this, you can't let Ogre spawn worker thread for it's own use and ignore there rest of the application usage. Providing a way for tasks to be pushed into whatever task scheduler is used by the client code would help a lot.

Yes, flat containers are just vectors using std::lower_bound and insert.


As far as I understand, it's a bit more efficient than that, but I might be wrong.

Note that "flat" containers are for fast lookup, fast iteration, but slow removal, slow insertion (which is usually what Ogre wants).


My understanding was that part of ogre wants fast insertion (while building rendering lists) and different parts of it wants (very) fast lookup. Am I wrong?

(I'm not saying we have to use these flat containers, just that they are good examples that could be used in ogre - if ogre need it, otherwise forget about it, just a comment)

The Resource system has it's limitations, and not very threadable. It could have a rewrite too. But... let's just keep it one step at a time.


There were discussions at some point to make it a component or something optional because some of us (me included) need to setup a game/app-specific resource system which then would inject resources into Ogre itself. Can't remember if there was some agreement on this. Anywyay it's a concern that is not directly linked to all you said so I guess it's ok to put on the side or to have some work done on this in 1.x
NetRush - Klaim's Den
User avatar
Klaim
Deity
 
Posts: 2150
Kudos: 38
Joined: 11 Sep 2005
Location: Lille, France
  • Website
Top

Re: Ogre 2.0 doc (slides)

by Xavyiy » Mon Nov 19, 2012 4:11 pm

After reading the slides, I cannot agree more with you.
But I don't know how the ogre team/community must deal with that.

The plan is to start fresh.

Rewriting ogre from the scratch would be tremendous work(who is going to do it? how much time that would take? We'll need new demos. Also a lot of testing.) and also will break the compatibility with almost ogre-based apps, which is not bad taking account ogre 2.0 is going to be a completely new version, but the "break" may be just too big(I think we should keep the new API similar to the current one as max as possible).

Instead of it, I would suggest split the redesign into 4-6 parts and make it step by step. Taking account that this rewrite might require several months or even years, maybe we can target Ogre 2.X for this redesign and think about it as a transition to the stable 3.X. For example:

2.0 -> Cache misses, DX11 & OGL4 RS
2.1 -> Scene manager redesign: scene traversal & processing
2.3 -> FF -> "states"
2.4 -> Vertex format enhancements
2.5 - 2.9 -> Fix bugs. Remaining stuff

3.0 -> First stable version of the "new ogre"

Well, this is just an idea, but I really think doing it from the scratch and without releasing intermediate versions may be quite dangerous.

Xavier
Creator of SkyX, Hydrax and Paradise Sandbox.
Looking for Ogre3D consulting services?
Follow me: @Xavyiy
User avatar
Xavyiy
OGRE Expert User
OGRE Expert User
 
Posts: 830
Kudos: 77
Joined: 12 Apr 2005
Location: Albacete - Spain
Top

Re: Ogre 2.0 doc (slides)

by PhilipLB » Mon Nov 19, 2012 4:34 pm

I agree with big refactoring instead of a rewrite.
Think of Netscape back then, this broke their neck: http://www.joelonsoftware.com/articles/ ... 00069.html
Google Summer of Code 2012 Student
Topic: "Volume Rendering with LOD aimed at terrain"
Project links: Project thread, WIKI page, Code fork for the project
Mentor: Mattan Furst


Volume GFX, accepting donations.
PhilipLB
Google Summer of Code Student
Google Summer of Code Student
 
Posts: 506
Kudos: 96
Joined: 04 Jun 2009
Location: Berlin
Top

Re: Ogre 2.0 doc (slides)

by dark_sylinc » Mon Nov 19, 2012 4:39 pm

Good Xavyiy, that's the feedback I was looking for. To be honest I share the same fears as you.

2.0 -> Cache misses, DX11 & OGL4 RS

Good! We can create the system and use SoA adjusted for 1 component (XYZXYZXYZ) when we finish 2.1, we can change at compile time to use 4 component (XXXXYYYYZZZZ)
The DX11 RS will require the submit on creation change though.
Note that the bone system (skeletal animation) may be updated either in 2.0 & 2.1
Also, we may want to start stripping down data from MovableObject & Renderable that isn't used as often.

2.1 -> Scene manager redesign: scene traversal & processing
2.3 -> FF -> "states"
2.4 -> Vertex format enhancements

Nice, looks like a plan. Note that 2.4 can be done at anytime, and it can be done by GSoC students.
I wouldn't leave 2.0 and/or 2.1 to students, unless they're really good, for example.
Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
User avatar
dark_sylinc
Google Summer of Code Student
Google Summer of Code Student
 
Posts: 718
Kudos: 136
Joined: 21 Jul 2007
Location: Buenos Aires, Argentina
  • Website
Top

Re: Ogre 2.0 doc (slides)

by DanielSefton » Mon Nov 19, 2012 7:18 pm

Yeah, I meant that ideally it would require a rewrite, but refactoring is a wiser option. Specific areas can definitely be targeted one at a time. Just that, it's not as easy as "fixing" cache misses without code restructure, and the whole of Ogre's core works in a totally anti-data-oriented manner. Anyway, it's a step in the right direction.

For those unfamiliar with DOD, here's a whole bunch of handy articles: https://plus.google.com/115950681746193428612/posts
Twitter / Blog / TidalWare
User avatar
DanielSefton
Ogre Magi
 
Posts: 1220
Kudos: 8
Joined: 26 Oct 2007
Location: Mountain View, CA
  • Website
Top

Re: Ogre 2.0 doc (slides)

by Xavyiy » Mon Nov 19, 2012 7:50 pm

Good!

Note that the bone system (skeletal animation) may be updated either in 2.0 & 2.1
Well, I think it should be done in 2.1, since the "scene manager refactor" will be mixed with the "new compositor manager" so after having that done plus the changes from 2.0 it'll be far easy/guided.

We can create the system and use SoA adjusted for 1 component (XYZXYZXYZ) when we finish 2.1, we can change at compile time to use 4 component (XXXXYYYYZZZZ)
Great. Definitively all the memory-layout must be specifiable at compile time(not all processors have 64+ cache lines, etc). More info: viewtopic.php?f=4&t=30250&start=175#p454275 ( Important: care about OGRE_DOUBLE_PRECISION before starting to code it). I think that should be done for 2.0 and then in 2.1 taking benefit from it (In 2.0 just the SoA classes, shouldn't be complex)

Note that 2.4 can be done at anytime, and it can be done by GSoC students.
+1. Also I doubt this is going to do a big boost: the idea is doing occlusion culling so at the end we're not to render big queues.

-------------------

IMHO, these are the next steps:
  • 1. Add masterfalcon's OGL3+ RS to 1.9
  • 2. 1.9 RC before the end of the year
  • 3. Release 1.9.0 in ~February

----------- Ogre 2.0 development starts here

  • 1. Reduce cache misses everywhere: Node, Frustum, Entity, etc. ---> Profiling a lot!
  • 2. Write all SoA classes
  • 3. Improve the DX11&OGL4 Render systems. Make them the default in PC platforms.
  • 4. ThreadManager?

----------- Ogre 2.1 development starts here

  • 1. Scene manager redesign ---> Role decomposition + SoA for all nodes, etc. (SoA classes are here since 2.0)
  • 2. etc.

------------- > > >

Ideally, these 2.X versions should have a <=3 months life time: enough for developing it and - the most important- enough to allow users update their apps. Otherwise it'll be a really pain in the ass to update a more or less complex app from Ogre 1.9 to Ogre 2.

Note: this will require a lot of support from the ogre team and also to change their way of work: more than working in their particular area of interest, the work must be very focussed on the current version roadmap (just until 2.4-3.0, of course). (Sorry if that sounds rude, but I think it's the reality)

Personally, this university year I'll not be able to help in the ogre development since I'm very busy, but I think I'll have much more free time next year(my 'backup' year: master project + some little stuff). Also next September I'll go commercial with the Paradise Engine, so after that I'll actively contribute to ogre since I'm very interested in this "next-gen" changes. Btw I'll be able to test these changes/performance improvements in a lot of different scenes, that will help.

Xavier
Creator of SkyX, Hydrax and Paradise Sandbox.
Looking for Ogre3D consulting services?
Follow me: @Xavyiy
User avatar
Xavyiy
OGRE Expert User
OGRE Expert User
 
Posts: 830
Kudos: 77
Joined: 12 Apr 2005
Location: Albacete - Spain
Top

Re: Ogre 2.0 doc (slides)

by Mako_energy » Tue Nov 20, 2012 12:56 am

Sorry I'm late to the show, I just want to add my input on some of the things covered earlier in the thread.

CABAListic wrote:
saejox wrote:that was nice read. thanks :)

what alternative do you propose for boost::thread?


Given the kind of threading design that would benefit the Ogre core, we'll need something along the lines of Intel's tbb. Now, given tbb's unpopular license, it won't become a dependency. But since people might have their own task-based threading system in place which they'd like Ogre to use, a possible way might be to use an abstracted task scheduling system that, as its backend, can use tbb or a custom backend. We could then, as the default choice for people who are not using either, offer a more light-weight alternative based on e.g. cpptasks or perhaps even our own implementation in Ogre.
But this is just speculation on my part on what might work.


Regarding the thread discussion, I have a co-worker working on the threading bits for our engine and is separating it out into it's own library for others to use. I can't give too many details just yet but it should be available in the not too distant future(well before Ogre 2.0), and we could offer use of that library as an alternative to TBB for Ogre to use if the Ogre team is at all interested. One thing to mention is that it is a lockless task-based threading library. Not sure if that fits into your existing plans or not.

Klaim wrote:
The Resource system has it's limitations, and not very threadable. It could have a rewrite too. But... let's just keep it one step at a time.


There were discussions at some point to make it a component or something optional because some of us (me included) need to setup a game/app-specific resource system which then would inject resources into Ogre itself. Can't remember if there was some agreement on this. Anywyay it's a concern that is not directly linked to all you said so I guess it's ok to put on the side or to have some work done on this in 1.x


1000x this. The resource system is very unfriendly to people that need/want to implement their own solutions, and dancing around the existing framework to make it integrated is more work then it should be. The resource system (imo) really needs to be separated out into it's own component.

Regarding the refactor/rewrite discussion...ultimately if the end destination is the same(or very similar) then I won't care too much either way. But one thing I do worry about when talking about using the 2.0 series of Ogre as a transition to the rebuild is if Ogre uses the existing release schedule than we are talking 3+ years until this re-write is complete. Admittedly I don't know the background to most of this stuff (btw Dark_Sylinc, I appreciate you dumbing down the issues as much as you did in your slides) so maybe that is an inevitability, but I agree with Dark_Sylinc's comment in his original post. These changes need to occur as fast as possible.

Edit:
I made my post before I could see Xavyiy's post that is above mine. He seems to address my last concern. If that is the plan that goes forward then I retract my previous statement regarding the release schedule.
User avatar
Mako_energy
Halfling
 
Posts: 94
Kudos: 3
Joined: 22 Feb 2010
Top

Re: Ogre 2.0 doc (slides)

by dark_sylinc » Tue Nov 20, 2012 2:28 am

I was thinking on my way back home from University, and I think we should follow the Blender approach: For a long time the 2.5x branch was "unstable" or "alpha".
Nothing was set in stone (particularly Python interfaces) while still trying to be as much as backwards compatible as possible. It was a "Use at your own risk"

The way Ogre 2.x you're proposing could work much like that.
We have to temporarily break some stuff. For example, to fix the cache misses we start by SoA, then by removing if( dirty ) update(). But then we change how SceneNodes are traversed. Then how SceneNodes are stored & traversed (breadth first) and updated in one single place.
At this point, the OctreeSceneManager will be incompatible. Therefore Ogre 2.0 would ship without a high level culling system, but efficient cache usage.

For 2.1 we start moving a few bits of SceneNodes management from (i.e.) SceneManager (or wherever we stored it) into HighLevelCull. Then HighLevelCull starts the bookeeping system, etc.

The point is, some updates will unnavoidable break a few features; only to get them back in the next release and lose something else, temporarily.
Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
User avatar
dark_sylinc
Google Summer of Code Student
Google Summer of Code Student
 
Posts: 718
Kudos: 136
Joined: 21 Jul 2007
Location: Buenos Aires, Argentina
  • Website
Top

Re: Ogre 2.0 doc (slides)

by tuan kuranes » Tue Nov 20, 2012 11:49 am

great work.
As Ogre Core source code is a huge and complex monster, I would add re-factoring it into small modules prior to any other work extending the plugin system to core code.

pros
  • faster recompilation
  • benchmark new version of module against old module is easy.
  • Unit/visual testing/comparison for new code against old module.
  • Ability to provide multiple modules for the same "feature" (sw skeletal/hw skeletal/dualquat/no, old morph / texture morph, deferred/forward/lightprepass, culling/chc/occlusion, etc.) like Ogre does for scene management and renderer.
  • Deeper User customization of ogre core: small scenes based project, just rendering a few objects in a static scene with few interactions, could remove all the culling/skeletal/morph/compositor code just removing according module. (product shows, small puzzle games, adventure games, etc.)
cons:
  • Boring/administrative work to do on the building system
  • Testing/maintaining build system
  • Too much user choice (much can be automated or documented, but still burdens users with complex choices)

some ideas of modules:
composition - culling - render cull result to render queue - transforms/node - entity - maths - color/images/pixel - render queue execution to render system - etc. or more stage based
Consulting Services | My Ogre links | twitter
User avatar
tuan kuranes
OGRE Moderator
OGRE Moderator
 
Posts: 3064
Kudos: 4
Joined: 24 Sep 2003
Location: Haute Garonne, France
  • Website
Top

Re: Ogre 2.0 doc (slides)

by drwbns » Wed Nov 21, 2012 4:43 pm

I always thought that maybe since there's a lot of people on both sides of the fence of refactoring, continuing code development vs a complete rewrite. Maybe the 2 sides should fork there and have whoever signs up for a rewrite continue their work on a completely different branch maybe called 2.0b. I think it would truly seperate the 2 ideas but what do I know? :)
Check out my site - https://sites.google.com/site/realtimecpu/
drwbns
Gnoll
 
Posts: 651
Kudos: 21
Joined: 18 Jan 2010
Location: Costa Mesa, California
Top

Re: Ogre 2.0 doc (slides)

by Herb » Thu Nov 22, 2012 8:18 pm

Couple of thoughts.... I like the idea in principle of separating Ogre into more smaller components, but I also realize there can be more complexity with that model "if" you're using a majority of the components. More DLL's to load and register to use for the components. I guess I'm speaking more towards a person who's new to Ogre as there are so many other components to integrate already before even thinking about integrating components within Ogre. If nothing else, it's a thought to consider if that moves forward.

As for Boost, I agree with the comments. I actually like the fact that I can select what threading library to use, as for example, I use POCO instead of Boost. Really, if Boost is a requirement, then we should actually "use" it's features throughout the library. But, as for threading, has anyone looked at the threading support in C++11? I thought threading support was baked into that and that should be cross-platform, pending Visual Studio has it implemented (most things I find the GNU guys have already baked in).

Herb
Gremlin
 
Posts: 186
Kudos: 12
Joined: 04 Jun 2009
Location: Kalamazoo,MI
Top

Re: Ogre 2.0 doc (slides)

by Mako_energy » Thu Nov 22, 2012 9:28 pm

Herb wrote:As for Boost, I agree with the comments. I actually like the fact that I can select what threading library to use, as for example, I use POCO instead of Boost. Really, if Boost is a requirement, then we should actually "use" it's features throughout the library. But, as for threading, has anyone looked at the threading support in C++11? I thought threading support was baked into that and that should be cross-platform, pending Visual Studio has it implemented (most things I find the GNU guys have already baked in).


C++11 threading support isn't all in there just yet. As of GCC 4.7 std::thread isn't actually implemented in GCC. It'll probably be complete enough by the time we need it for Ogre 2.0, but even then if Ogre is going for something more task based you need more than what C++11 provides. Unless Ogre plans to make it's own framework for the tasks and scheduler, that is.
User avatar
Mako_energy
Halfling
 
Posts: 94
Kudos: 3
Joined: 22 Feb 2010
Top

Re: Ogre 2.0 doc (slides)

by syedhs » Thu Nov 22, 2012 11:37 pm

Probably threading should be made into core and thus, compulsory not an option?
A willow deeply scarred, somebody's broken heart
And a washed-out dream
They follow the pattern of the wind, ya' see
Cause they got no place to be
That's why I'm starting with me
User avatar
syedhs
Silver Sponsor
Silver Sponsor
 
Posts: 2596
Kudos: 42
Joined: 29 Aug 2005
Location: Kuala Lumpur, Malaysia 

Top

Re: Ogre 2.0 doc (slides)

by Klaim » Fri Nov 23, 2012 9:04 pm

There are two things I see:

1. the call to renderOneFrame()
2. resource loading

1) can be done only with always the same thread (currently - can it be fixed?)
Some parts of 1) can be asynchronous tasks spawned (animation update? etc.).
2) can be all asynchronous tasks.
The user control what thread is calling 1), so it might be the main thread or another thread.

My understanding is that:
A. Ogre itself don't need to spawn threads itself (I mean the core). 1) is controlled by the user code, 2) should pass control to user's tasks scheduler (to avoid subscription)
B. Ogre needs to provide potentially asynchronous tasks to be crunched by worker threads (which mean potentially in a linear execution if there is only the main thread running)
C. Ogre can provide an implementation of a tasks scheduler (which spawn and manage worker thread(s)) IF the user don't explicitly provide his own. As it would be optional (but default), it would be a component (almost as now?)

As soon as you have an abstraction for a task scheduler, you don't need to spawn threads yourself and you can assume (if you don't spawn infinite tasks...) that if the user don't want Ogre to spawn threads, it will not. The fact that there would be a default implementation (whatever how it works) would be only to help get quickly something that run and simplify samples, like the default scene manager provided.

Is my understanding correct? So far that's what I thought Cabalistic was talking about.

Herb wrote:Couple of thoughts.... I like the idea in principle of separating Ogre into more smaller components, but I also realize there can be more complexity with that model "if" you're using a majority of the components. More DLL's to load and register to use for the components. I guess I'm speaking more towards a person who's new to Ogre as there are so many other components to integrate already before even thinking about integrating components within Ogre. If nothing else, it's a thought to consider if that moves forward.


AFAIK most Ogre "Components" (not plugins) are static libs or additional code injected into OgreMain, right?

As for Boost, I agree with the comments. I actually like the fact that I can select what threading library to use, as for example, I use POCO instead of Boost. Really, if Boost is a requirement, then we should actually "use" it's features throughout the library.


Boost have never been a requirement for Ogre and it was already confirmed that it will not be one.

But, as for threading, has anyone looked at the threading support in C++11? I thought threading support was baked into that and that should be cross-platform, pending Visual Studio has it implemented (most things I find the GNU guys have already baked in).


I did try to provide C++11 implementation of the current use of multithreading in Ogre (there is a thread somewhere). It was a failure because:

- C++11 don't provide any task scheduling features (even async is flawed and not appropriate)
- C++11 don't provide multiple-readers-unique-writer mutex which makes things a bit hard to handle for high efficiency (even if in some cases, exploiting shared_ptr atomicity fixes this).

Now, as stated above, I don't think there is a need for direct manipulation of threads in Ogre and C++11 provide mostly only thread basics, which are an excellent base for future works, but we're not quite there.

Following the approach I was talking about before, Ogre core always relying to a task scheduling abstraction for potentially asynchronous tasks, I would suggest Ogre to implement and provide by default the simplest tasks scheduler ever, which would just execute tasks sequentially. It would be a good test too. Then, optional task scheduling interface implementations could be provided for tbb and other popular frameworks. One based on C++11 std::async() would be easy to provide (but really far from being ideal or performant I think - also, there is a leak in VS2012 implementation).
NetRush - Klaim's Den
User avatar
Klaim
Deity
 
Posts: 2150
Kudos: 38
Joined: 11 Sep 2005
Location: Lille, France
  • Website
Top

Re: Ogre 2.0 doc (slides)

by lunkhound » Sat Nov 24, 2012 2:29 am

Nice work on the slides. Some great ideas in there, and obviously alot of thought has gone into them.
I agree with what some others have mentioned earlier about tackling these changes in manageable chunks, i.e. a series of smaller refactors rather than a big rewrite.

I have some concerns about the whole SoA thing though. I worry that it may be alot of developer-pain for very little gain. Considering that:

1. SoA isn't necessary to fix the cache-misses. Cache misses can be fixed by reorganizing how data is laid out but without interleaving vector components of different vectors together.
2. SoA isn't necessary to improve performance of vector math using SIMD. OK maybe you don't get the full benefit of SIMD and not in all cases but you can probably get 70% of SoA performance simply by using a SIMD-ified vector library.
3. SoA is not easy to work with. Code is harder to read, harder to debug, more effort to maintain going forward. Imagine inspecting Node structures in the debugger when the vector components are interleaved in memory with other Nodes...

I think SoA is best for a limited-scope, highly optimized, tight loop where every cycle counts, and only affecting a small amount of code. Kind of like assembly language, SoA comes with a cost in developer time and I'm just not sure it would be worth it.
Thanks again for all the work on those slides. I'm really glad to see these issues being raised.

Chris
User avatar
lunkhound
Greenskin
 
Posts: 123
Kudos: 15
Joined: 29 Apr 2012
Location: Santa Monica, California
Top

Re: Ogre 2.0 doc (slides)

by dark_sylinc » Sat Nov 24, 2012 8:15 pm

Regarding my view on "separate components":
I'm always keen on abstracting, providing modularity. In fact that's what I tried to achieve with the proposed render flow algorithm.
However, this doesn't mean that each "component" can go into a DLL or static lib. There are multiple issues that can't be easily tackled. This level of modularization is something that sounds nice, but ends up being impractical.
Furthermore the bigger the project, the bigger the chances there's some dependency between seemingly independent modules. I could've written about this dependencies in the slides, but it's very implementation-specific, and it would misslead from the main topic.

When objects are highly modular, they're easier to mantain, refactor, or even replace. But that doesn't mean it has to go each in it's own dll or lib.

Klaim wrote:There are two things I see:

1. the call to renderOneFrame()
2. resource loading

1) can be done only with always the same thread (currently - can it be fixed?)
Some parts of 1) can be asynchronous tasks spawned (animation update? etc.).
2) can be all asynchronous tasks.
The user control what thread is calling 1), so it might be the main thread or another thread.

My understanding is that:
A. Ogre itself don't need to spawn threads itself (I mean the core). 1) is controlled by the user code, 2) should pass control to user's tasks scheduler (to avoid subscription)
B. Ogre needs to provide potentially asynchronous tasks to be crunched by worker threads (which mean potentially in a linear execution if there is only the main thread running)
C. Ogre can provide an implementation of a tasks scheduler (which spawn and manage worker thread(s)) IF the user don't explicitly provide his own. As it would be optional (but default), it would be a component (almost as now?)

My view is that renderOneFrame() is called from one thread. The user may want to update it's logic & physics in the same thread, or in another one.

As for Ogre's managament of threads:
  • The CompositorManager must have a high degree of control over it's batch threads.
  • The animation & scenenode transform update may have it's own threads. Because their jobs are fairly trivial (and there are many ways to split the work), the idea of a TaskScheduler provided by the user seems fine to me.
Note that all components (including, and specially, the CompositorManager) should accept a hint on number of threads they can spawn, in order to prevent oversubscription (i.e. the user wants to run many threads for himself, unrelated with Ogre)

lunkhound wrote:I have some concerns about the whole SoA thing though. I worry that it may be alot of developer-pain for very little gain. Considering that:

1. SoA isn't necessary to fix the cache-misses. Cache misses can be fixed by reorganizing how data is laid out but without interleaving vector components of different vectors together.
2. SoA isn't necessary to improve performance of vector math using SIMD. OK maybe you don't get the full benefit of SIMD and not in all cases but you can probably get 70% of SoA performance simply by using a SIMD-ified vector library.
3. SoA is not easy to work with. Code is harder to read, harder to debug, more effort to maintain going forward. Imagine inspecting Node structures in the debugger when the vector components are interleaved in memory with other Nodes...

I think SoA is best for a limited-scope, highly optimized, tight loop where every cycle counts, and only affecting a small amount of code. Kind of like assembly language, SoA comes with a cost in developer time and I'm just not sure it would be worth it.
Thanks again for all the work on those slides. I'm really glad to see these issues being raised.

You're right about your concerns. So let me address them:

1. It is true that there are other ways to optimize the data. However, transformation and culling is something that is actually fairly trivial operations, which are are done sequentially on massive amount of elements. Note that the interleaving is for SIMD. An arragement of "XYZXYZXYZ" is possible by specifying 1 float per object at compile time.
The performance gains of using SoA for critical elements such as position & matrices are documented in the SCEE's paper (reference 4)

2. We already do SIMD math and tries to do it's best. There are huge margins to gain using SoA + SIMD because the access patterns and the massive number of operations to perform fit exactly the way SSE2 works. There's a lot of overhead in unpacking & packing.
DICE's Culling the Battlefield slides show the big gains of using SoA + SIMD (reference 3)

3. Without proper planning, it's harder to write. That's true. However my idea is that SoA_Vector3 is encapsulated, including operators (+, -, /, *, etc; using _mm_add_ps & co.).
So, the code would roughly look like one of these two:

a. Keep derivedPos, derivedRot, derivedScale, and WorldTransform matrix (like currently Ogre does):
Code: Select all
for( int i=0; i4 )
{
/* prefetch() around here */

//We're updating 4 elements here.
const SoA_Vector3 &parentPos = mChunk[level+0].pos + i;
SoA_Vector3 &localPos = mChunks[level+1].pos + i;
SoA_Vector3 &derivedPos = mChunks[level+1].derivedPos + i;
const SoA_Quaternion &parentRot = mChunk[level+0].rot + i;
SoA_Quaternion &localRot = mChunk[level+1].rot + i;
SoA_Quaternion &derivedRot = mChunk[level+1].derivedRot + i;
const SoA_Vector3 &parentScale = mChunk[level+0].scale + i;
SoA_Vector3 &localScale = mChunk[level+1].scale + i;
SoA_Vector3 &derivedScale = mChunk[level+1].derivedScale + i;

SoA_Matrix4 &derivedTransform = mChunk[level+1].transform + i;

derivedPos = parentPos + parentRot * (parentScale * localPos);
derivedRot = parentRot * localRot; //fsel() to see if we should parentRot should be identity rot.
derivedScale = parentScale * localScale; //fsel() here too.

derivedTransform = NonTemporal( SoA_Matrix4( derivedPos, derivedRot, derivedScale ) );
}


b. Discard derivedPos, derivedRot, derivedScale; always work with matrices (but harder to retrieve derived rotation & scale on demand):
Code: Select all
for( int i=0; i4 ) //Actually, it's not "+= 4", but rather += compile_time_number_of_simd_elements_macro
{
/* prefetch() around here */

//We're updating 4 elements here.
SoA_Vector3 &localPos = mChunks[level+1].pos + i;
SoA_Quaternion &localRot = mChunk[level+1].rot + i;
SoA_Vector3 &localScale = mChunk[level+1].scale + i;

const SoA_Matrix4 &parentTransform = mChunk[level+0].transform + i;
SoA_Matrix4 &derivedTransform = mChunk[level+1].transform + i;

SoA_Matrix4 localTransform = SoA_Matrix4( localPos, localRot, localScale ); //Use fsel for rot & scale
derivedTransform = NonTemporal( parentTransform * localTransform );
}


No intrinsics whatsoever, they're inside the operators. It's not harder to read either. And with good docs about how the render flow works (which we already have with the slides) it's not hard to "get" what's going on within the loop (i.e., why += 4).
If we expose the SoA nature to users and even Ogre devs, then we're probably doing something wrong. The idea is to work in SoA, not make the users think in SoA (other than very high level knowledge, like "I'm working X objs at the same time", note that X can be reduced to 1)

It's very true that debugging becomes much harder, specially when examining a single Entity or SceneNode.
I see two complementary solutions:
  • Use getPosition() would retrieve the scalar version; which can be called from the watch window (as long as we ensure it's fully const...)
  • There are a few MSVC features (I don't remember if they had to be installed, or if they were defined through pragmas) that tell MSVC how to read objects while debugging. I'm sure gdb probably has something similar.
Last edited by dark_sylinc on Sun Nov 25, 2012 1:07 am, edited 1 time in total.
Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
User avatar
dark_sylinc
Google Summer of Code Student
Google Summer of Code Student
 
Posts: 718
Kudos: 136
Joined: 21 Jul 2007
Location: Buenos Aires, Argentina
  • Website
Top

Re: Ogre 2.0 doc (slides)

by dark_sylinc » Sat Nov 24, 2012 11:08 pm

I found it!
Visual Studio allows customizing how it shows data in the debugger. All we have to do is to mess with the autoexp.dat (and optionally, use a DLL that can be added using the ADDIN directive) which is usually located in C:\Program Files\Microsoft Visual Studio 9\Common7\Packages\Debugger\Autoexp.dat.
Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
User avatar
dark_sylinc
Google Summer of Code Student
Google Summer of Code Student
 
Posts: 718
Kudos: 136
Joined: 21 Jul 2007
Location: Buenos Aires, Argentina
  • Website
Top

Re: Ogre 2.0 doc (slides)

by Klaim » Sun Nov 25, 2012 1:17 am

dark_sylinc wrote:My view is that renderOneFrame() is called from one thread. The user may want to update it's logic & physics in the same thread, or in another one.


Yes that's what I meant.
As for Ogre's managament of threads:
  • The CompositorManager must have a high degree of control over it's batch threads.


I don't understand this. To me, whatever the kind of parallel work, it should work with the task scheduler underneath, the same way parallel_for in tbb will spawn tasks for each batch of cycle.

  • The animation & scenenode transform update may have it's own threads. Because their jobs are fairly trivial (and there are many ways to split the work), the idea of a TaskScheduler provided by the user seems fine to me.
    Note that all components (including, and specially, the CompositorManager) should accept a hint on number of threads they can spawn, in order to prevent oversubscription (i.e. the user wants to run many threads for himself, unrelated with Ogre)


  • Well I really don't understand why there is a need to spawn threads if you want to prevent oversubscription (as I said too), because the only way is to let the user control the task scheduler and make Ogre agnostic on this. I might missunderstand something but to me as soon as a library spawn it's own threads, it does become candidate for oversubscription.
    Threads are too low level resources.
    NetRush - Klaim's Den
    User avatar
    Klaim
    Deity
     
    Posts: 2150
    Kudos: 38
    Joined: 11 Sep 2005
    Location: Lille, France
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by _tommo_ » Sun Nov 25, 2012 1:51 am

    I'll drop this here, even if I'm not an ogre user since a long time (and the non-existence of said 2.0 is one of the reasons), because it strikes me that noone sees it this way:

    to me, Ogre, as pure graphics engine needs to NOT expose any threading system.
    I don't like at all the idea that a renderer will "take life of its own" and start spawning threads unless I do some arcane forms of control (ie. subclassing the default task manager class).
    The default should be simplicity.

    Threading is something that is clearly orthogonal to all the systems in a complete game engine made of graphics, sound, physics, ai and whatever library;
    so relevant ogre functions need to be clearly defined as thread-safe but Ogre should not attempt to spawn threads or tasks in itself.
    It should leave to the client application all the freedom on when, how and how concurrently run its code while being clear on what can and what cannot be parallelized.

    This should be both much faster and solid to develop, and easier to support while threading needs do evolve, because they will; and at the same time eases a burden on those that need a lean system without bloat.

    PS: imo all of Ogre 2.0 should aim at being a pure graphics library, focusing on simplicity. And this imo means dropping a lot of existing functionality, and becoming more passive on which role Ogre takes in a game engine architecture.
    Basically everyone that approaches Ogre feels the urge to place it at the cornerstone of its engine (with no decoupling between maths, threading, and scene managing between rendering & logic ), and Ogre is responsible of this because of the current all-encompassing architecture.

    PPS: the docs are great but the biggest setback Ogre has in regard of the said engines (and Unity, which strangely was not mentioned even if it is the greatest Ogre-killer between AAs) are TOOLS. lots of excellent tools for artists and designers.
    So along of a simplification of the graphic library itself, there should be a serious effort in making the engine useful, as in, in the real world. Thinking that devs are leaving Ogre because it is "not fast enough for AAAs" means completely missing the point.

    PPPS: most of the proposed ways of "optimizing" by bruteforcing jumps or switching full-on to SoA + SIMD just ignore that Ogre today needs to run energy-efficiently on cheap ARMs much more than squeezing SSE2 archs and are probably best ignored, and are indeed an ugly case of optimizing without even thinking what the use case will be.
    The DICE papers might be good for their very restrictive use cases (next gen consoles and PCs) but fail quite badly when you try to make, say, an Android game.
    OverMindGames Blog
    IndieVault.it: Il nuovo portale italiano su Game Dev & Indie Games
    User avatar
    _tommo_
    Gnoll
     
    Posts: 677
    Kudos: 3
    Joined: 19 Sep 2006
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by saejox » Sun Nov 25, 2012 2:31 am

    Does 2.0 aim for better performance or better usability?

    If it is going to thread-safe it means hundreds of mutexes in every function.
    Goodbye performance.

    Ogre already has many shared_ptrs and locks, even tho it is not thread-safe.
    I think all those useless locks and shared_ptr should be removed.
    No need to wait for a big release for that.

    There are many opportunities for SSE2+ and cache friendly structures as mentioned in the paper.
    Ogre is already the most usable open source rendering engine, it just need to be faster and less resource hungry to be more competitive.

    that's how i see it.
    Nimet - Advanced Ogre3D Mesh/dotScene Viewer
    asPEEK - Remote Angelscript debugger with html interface
    ogreHTML - HTML5 user interfaces in Ogre
    User avatar
    saejox
    Goblin
     
    Posts: 260
    Kudos: 34
    Joined: 25 Oct 2011
    Top

    Re: Ogre 2.0 doc (slides)

    by Xavyiy » Sun Nov 25, 2012 2:46 am

    and Unity, which strangely was not mentioned even if it is the greatest Ogre-killer between AAs) are TOOLS. lots of excellent tools for artists and designers.
    Well, actually I don't think it's fair to compare Unity to Ogre that way. Unity is a full game engine, very featured, with an awesome editor perfectly married with the engine. Also quite optimized, specially last year versions. Ogre is a render engine, just that, which urgently needs a redesign focussed on optimization and DX11/OGL4 arquitecture. It's not cool seeing that a complex scene runs twice faster in UDK or even Unity. It's not very cool either how each compositor render_scene pass culls the whole scene again, etc.

    the docs are great but the biggest setback Ogre has in regard of the said engines (and Unity, which strangely was not mentioned even if it is the greatest Ogre-killer between AAs) are TOOLS. lots of excellent tools for artists and designers.
    What kind of tools do you want to see? A scene editor? Material editor? But that's again the same story: ogre is just a render engine, should not provide any kind of high level tool. Just mesh/material importer/exporters and mesh optimization tools, not much more, IMHO.

    Thinking that devs are leaving Ogre because it is "not fast enough for AAAs" means completely missing the point.
    Actually this is my case. Of course I'm not leaving Ogre, but I'm quite concerned about 1.8.X/1.9.X performance. I think any ogre developer using some compositors(specially involving render_scene passes), high quality shadows(cascaded shadow mapping with 3 or 4 shadow maps, for example) and any kind of water system(which will need at least 2 more render passes: reflection, refraction. Depth map may be shared with other depth-based effects, like the used for DOF and similar) shares my concerns about ogre performance.

    The DICE papers might be good for their very restrictive use cases (next gen consoles and PCs) but fail quite badly when you try to make, say, an Android game.
    IMHO PCs and next gen consoles are not a very restrictive use case. But indeed, it would be nice put some attention on ARM, although mobile SoC are evolving very fast. Anyway I think the development should be focussed on "next-gen" PC and consoles architecture(aka DX11) rather than on limited mobile ones(GLES2 / 3?).

    -----------------------------

    I've the feeling that whatever the 2.0 roadmap will be, it'll not be ideal for the whole community. I would like to read concrete solutions rather than "general ideas", since I see a very low SNR in all ogre redesign threads (of course! each person has its own interests, but things must move ahead!)

    As I've said before, hope my posts don't sound rude, that's not my intention at all.

    There are many opportunities for SSE2+ and cache friendly structures as mentioned in the paper.
    Ogre is already the most usable open source rendering engine, it just need to be faster and less resource hungry to be more competitive.
    +1!

    Xavier

    Edit: Just to clarify: I'm not saying I don't consider mobile platforms important, in fact I think them are very important. What I want to say is that IHMO, ogre development should be done around PC (like until now) and not around mobile.
    Creator of SkyX, Hydrax and Paradise Sandbox.
    Looking for Ogre3D consulting services?
    Follow me: @Xavyiy
    User avatar
    Xavyiy
    OGRE Expert User
    OGRE Expert User
     
    Posts: 830
    Kudos: 77
    Joined: 12 Apr 2005
    Location: Albacete - Spain
    Top

    Re: Ogre 2.0 doc (slides)

    by DanielSefton » Sun Nov 25, 2012 3:38 am

    Xavyiy wrote:Although I think the development should be focussed on "next-gen" PC and consoles architecture(aka DX11) rather than in limited mobile ones(GLES2 / 3?).

    Actually I believe that Ogre's mobile development userbase may even surpass that of traditional PC/console development in the near future.

    Regardless, both should be the focus, just like Unreal and Unity are able to run fast on both PC and mobile. If Ogre becomes more lightweight and cache friendly, then it will naturally benefit speed on mobiles as well as PC, and various team members can make sure the balance of platform specific optimisation is met (like David on iOS, Murat on Android, Assaf on DX11 etc.)

    Xavyiy wrote:Unity is a full game engine, very featured, with an awesome editor perfectly married with the engine. Also quite optimized, specially last year versions. Ogre is a render engine, just that, which urgently needs a redesign focussed on optimization

    Agreed. One of Ogre's main attractions is that it allows developers to create their own tools and engines around it. As we tell users all the time, Ogre is NOT a game engine, it's a graphics library. If you want tools, use/extend Ogitor, or something third party like Xavyiy's Paradise Engine ;)

    saejox wrote:Does 2.0 aim for better performance or better usability?

    You said it yourself:
    saejox wrote:Ogre is already the most usable open source rendering engine, it just need to be faster and less resource hungry to be more competitive.


    And to add my opinion to the threading discussion, parallel architecture should be up to third party engines (Ogre itself shouldn't be using TBB or boost threads). But you may perhaps expose multiple update loops, like scene graph and frame rendering, I guess. There would be nothing stopping us from providing a basic example framework which makes use of TBB.

    However, my research stopped before I got the chance to check out the idea of turning render operations into tasks though, like what Sony's Phyre Engine does.
    Twitter / Blog / TidalWare
    User avatar
    DanielSefton
    Ogre Magi
     
    Posts: 1220
    Kudos: 8
    Joined: 26 Oct 2007
    Location: Mountain View, CA
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by Mako_energy » Sun Nov 25, 2012 3:52 am

    _tommo_ wrote:to me, Ogre, as pure graphics engine needs to NOT expose any threading system.


    I've been thinking about this more heavily as of late and I am growing into that mentality myself. The more I think about the complications of working with a scheduler that Ogre is aware of that it interfaces with the more I think that it'll just cause issues if anyone has a different idea of how threading should work in their game. Different projects have different needs and it seems somewhat unrealistic to assume you can put a catch-all into Ogre that will work. Then there is my next point...

    Xavyiy wrote:Well, actually I don't think it's fair to compare Unity to Ogre that way. Unity is a full game engine, very featured, with an awesome editor perfectly married with the engine. Also quite optimized, specially last year versions. Ogre is a render engine, just that, which urgently needs a redesign focussed on optimization and DX11/OGL4 arquitecture. It's not cool seeing that a complex scene runs twice faster in UDK or even Unity. It's not very cool either how each compositor render_scene pass culls the whole scene again, etc.

    Xavyiy wrote:What kind of tools do you want to see? A scene editor? Material editor? But that's again the same story: ogre is just a render engine, should not provide any kind of high level tool. Just mesh/material importer/exporters and mesh optimization tools, not much more, IMHO.


    I hear this often among Ogre users more experienced than I, however I can't really see how this is true. Ogre does SO MUCH, I feel it is half-way to a game engine and as I have stated in some other posts the resource system is a large part of that. I completely agree that what you are saying is how Ogre should be...but I can't at all agree that's what it is. Breaking off more things into components or plugins is needed. Starting with the resource system, imo.

    In addition I don't think more systems should be added to exacerbate the issue. If something lightweight and flexible can't be implemented, then don't try to put threading into Ogre at all. One possibility that comes to mind is the multi-threading in Bullet. It's a very simple class meant to be overridden that I have heard works well with a large number of multi-threading strategies. I personally haven't used it(yet) so I can't comment too much on it, but it's an idea I just wanted to throw out there. Has anyone here used it? Would a similar class be appropriate for Ogre?

    Xavyiy wrote:I've the feeling that whatever the 2.0 roadmap will be, it'll not be ideal for the whole community. I would like to read concrete solutions rather than "general ideas", since I see a very low SNR in all ogre redesign threads (of course! each person has its own interests, but things must move ahead!)


    I think we all want to see this move ahead as fast as possible, but a lot of us have different use cases that must be made aware if we are to hope to arrive at a solution that is the most ideal for the community. To that end maybe expecting people to post in the development forums is asking too much of most of the people out there. If the Ogre team has the time maybe it would be better to do another survey. One aimed more directly at all the subjects raised here. At least ask enough questions to get a start on the whole thing.
    User avatar
    Mako_energy
    Halfling
     
    Posts: 94
    Kudos: 3
    Joined: 22 Feb 2010
    Top

    Re: Ogre 2.0 doc (slides)

    by dark_sylinc » Sun Nov 25, 2012 6:46 am

    _tommo_ wrote:to me, Ogre, as pure graphics engine needs to NOT expose any threading system.
    I don't like at all the idea that a renderer will "take life of its own" and start spawning threads unless I do some arcane forms of control (ie. subclassing the default task manager class).
    The default should be simplicity.

    This is WHY I insist so much in allowing the user to specify how many threads it wants Ogre to spawn. For example, Havok can spawn no threads, or spawn as many as it wants. This value is set during startup.

    If you don't want Ogre to "take life on it's own", just tell it not to while creating Ogre::Root.
    BTW, some sound systems tend to take life on it's own without notice, because that's what DirectSound needs, for example. The future is in multi-core, so it's about time we tackle that.

    As for Ogre's managament of threads:

    The CompositorManager must have a high degree of control over it's batch threads.


    I don't understand this. To me, whatever the kind of parallel work, it should work with the task scheduler underneath, the same way parallel_for in tbb will spawn tasks for each batch of cycle.

    What I meant is that the control over the batch (worker) threads is too advanced. Creating a generic task scheduler that would run on is not a trivial issue at all. May be something for the far future, IF it seems to be viable.
    To prevent oversubscription, tell Ogre at startup how many threads it can spawn at max.
    Do you have a few links of similar implementations of what you have in mind? Because I think I'm no seeing what you see.

    _tommo_ wrote:PS: imo all of Ogre 2.0 should aim at being a pure graphics library, focusing on simplicity. And this imo means dropping a lot of existing functionality, and becoming more passive on which role Ogre takes in a game engine architecture.
    Basically everyone that approaches Ogre feels the urge to place it at the cornerstone of its engine (with no decoupling between maths, threading, and scene managing between rendering & logic ), and Ogre is responsible of this because of the current all-encompassing architecture.

    You're describing to convert into a set of utilities library. A rendering engine is exactly composed of a math library, a render queue, a batch dispatcher & material manager, and a scene graph.
    The urge IMO comes from all this being in the same big chunk called "SceneManager" (except for the math part)

    _tommo_ wrote:PPS: the docs are great but the biggest setback Ogre has in regard of the said engines (and Unity, which strangely was not mentioned even if it is the greatest Ogre-killer between AAs) are TOOLS. lots of excellent tools for artists and designers.
    So along of a simplification of the graphic library itself, there should be a serious effort in making the engine useful, as in, in the real world. Thinking that devs are leaving Ogre because it is "not fast enough for AAAs" means completely missing the point.

    I agree on the tools. This is why I added a few slides about RTSS to be more node-like. If we make a customizable node system, creating a graphical interactive tool for setting up material would be very easy. As for the rest, I left them out because they demand a PDF on it's own.
    There are a few tools that are too game engine specific (rather than render engine) when compared to Unity & UDK (like DanielSefton said); however these are areas we can develop on:
    • Material editor. A real one. Preferably with node views for setting the relations (that artists would use), and a syntax highlighter to write the shader associated with each node (that programmers would write). Of course, WYSIWYG
    • Compositor editor. Preferably with node views; but a stack based implementation (like GIMP) may work too. WYSIWYG
    • Export integration. So far the biggest complain I get from artists & indie companies is that the export process pipeline sucks. UDK & Unity do this pretty well. There are too many steps involved into getting from 3DS Max/Maya/Blender into the actual game. This is because:
      1. It usually involves setting a material file (by hand, using text! artists don't like text!), being careful not to overwrite a previous material file
      2. Exporting all the right submeshes; and placing the .mesh folder into the right folder (or setup resources.cfg)
      3. Setting up an additional file for the custom game engine to link the GameObject with a given Mesh
      4. Getting a preview means doing all the above (+ launching a custom preview tool, Ogre Meshy, or even loading the full game depending on each case). It's a pita if something was wrong and above steps must be followed all over again. Cuts the iteration process
    Having that said, most of us don't have the time to work on tools, because tools involve GUI code (many find it boring & frustrating). Making good GUI is an art, and requires a lot of co-developing with artists & designers (after all, those are the users). This forum sadly lacks artists.
    It's a chicken and egg problem; we can't make appealing tools because we don't have artists to work with, and we don't have artists because we have no appealing tools.

    _tommo_ wrote:Thinking that devs are leaving Ogre because it is "not fast enough for AAAs" means completely missing the point.

    It's not fast for AAA, nor for indies either. I'm not working for an AAA company and Ogre's limitations are annoying me, as well as other users. The big main problem is that it's lacking scalability. Overclock your CPU from 3Ghz to 6Ghz and it will only speed up a little because of the cache misses (you can overclock the RAM to increase the bandwidth, but then you'll increase latency...). Throw a CPU with more cores or a faster GPU and it will run as slow as it was before. In other words we're doomed if we don't change this scenario. Specially since AAA companies are lending their engines to the average Joe (thus competing with Ogre & game engines relying on Ogre).

    Faster means more flexibility. Even small games have to look out for the number of bones & entities they spawn; where as it is much more flexible if they don't have to worry about that at all; and let the problem to experts making games who need all the juice they can squeeze.

    Also appealing to AAA companies may have it's perks. If it's good enough, there's the potential for Ogre to start getting sponsors like LuaJIT & Bullet do; because many game companies find it easier to fund an open source project they feel useful, than paying >$50.000 per title license.

    _tommo_ wrote:PPPS: most of the proposed ways of "optimizing" by bruteforcing jumps or switching full-on to SoA + SIMD just ignore that Ogre today needs to run energy-efficiently on cheap ARMs much more than squeezing SSE2 archs and are probably best ignored, and are indeed an ugly case of optimizing without even thinking what the use case will be.
    The DICE papers might be good for their very restrictive use cases (next gen consoles and PCs) but fail quite badly when you try to make, say, an Android game.

    It's true that we're bruteforcing. But the current implementation is trying to be smart and fails misserably. Android phones are going multicore, and NEON is the SSE of ARMs.
    In fact the most power hungry element in a phone is the RAM memory. More bandwidth usage = more battery wear. And since we're doing lots of cache misses and wasting lots of RAM for needless variables, and running ultra slow, it's safe to say we're draining energy like a nuclear submarine.

    Optimizing means finishing the frame faster. If the Android game is updated at 5hz, a faster frame update means sleeping more between each frame. Most phone optimizations revolve around lazy frame updating (updating only when necessary, or just parts that need to) or updating elements at different frequencies depending on the object's nature (i.e. a tree vs the main player).
    Lazy frame updating 90% of the time falls out of the scope of a render engine: It has to be done at a higher level, and with a proper Ogre setup (i.e. compositors, multiple visibility layers).
    As for updating elements at different frequencies, it should be much easier. Because we plan to separate elements into set of chunks. Just place the elements into different render queue IDs, and slightly modify Ogre (a design feature that can be evaluated) to update the Chunks at different intervals.

    saejox wrote:If it is going to thread-safe it means hundreds of mutexes in every function.
    Goodbye performance.

    God no! The threading model is about spliting work on objects that aren't being touched at the same time (hence no need for locking except when the job is done), that's all. :D

    saejox wrote:Ogre already has many shared_ptrs and locks, even tho it is not thread-safe.
    I think all those useless locks and shared_ptr should be removed.
    No need to wait for a big release for that.

    I cannot agree more! :)

    Mako_energy wrote:I hear this often among Ogre users more experienced than I, however I can't really see how this is true. Ogre does SO MUCH, I feel it is half-way to a game engine and as I have stated in some other posts the resource system is a large part of that. I completely agree that what you are saying is how Ogre should be...but I can't at all agree that's what it is. Breaking off more things into components or plugins is needed. Starting with the resource system, imo.

    Part of this "Ogre does SO MUCH" comes (as downside?) from Open Source. Some programmer pops up, decides he needs X thing implemented to render his stuff the way he wants, without investing much time if there was already a way of achieving the same result; then he submit his change and gets into core. This programmer probably doesn't show up again after that.
    Therefore we end up with redundant ways of doing the same thing.
    Take the variable mVisible for example. It's unnecessary. Why? Because we have visibility masks. Just reserve one layer for making stuff invisible, and problem solved. Instead, we add an extra byte (up to 4 if bad alignment packing happens) where using one 1 bit out of the 32 from mVisibilityMask was enough.
    I remember the main reason one of the many Render Listener was added (I don't recall exactly which one of the listeners) was because someone in the community wanted to control whether some objects were visible in particular passes. Oh wait, that's what mVisibilityMask is for, to selectively render & filter between passes...
    Making Node::getPosition virtual is another example of this habit (I'M JUST GUESSING, but it probably became virtual because someone at some point needed it).

    It's not bad per-se, as someone's work may prove very useful for something totally unrelated to the original intention (that couldn't be achieved with the other preexisting method); but because these programmers tend implement their contribution in a rush, it leaves little time for thinking how all fits together in the grand design. When these contributions start piling up, we end up with half render engine, half game engine; and the "Ogre does SO MUCH" phrase.

    Like I said, it's not necessarily bad. And I don't want to disrespect the Open Source community at all! :)
    But every once in a while we need to clear the mess, tie the loose ends; and remove what's totally unnecessary. It's not easy though. What looks totally unnecessary to someone may not be actually be for a few niches.
    Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
    User avatar
    dark_sylinc
    Google Summer of Code Student
    Google Summer of Code Student
     
    Posts: 718
    Kudos: 136
    Joined: 21 Jul 2007
    Location: Buenos Aires, Argentina
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by Sqeaky » Sun Nov 25, 2012 9:22 am

    I agree that a lighter more focused Ogre would be nice. Some features are easy enough to avoid, but others, like the resource system seem very core and hard to avoid if unwanted. Despite this I think that threading should be a core component of Ogre. I think that trying to add an abstraction layer such that Ogre takes advantage of external threading systems is not useful with out adding copious complexity. Going the other way and allowing the customization by game developers is a good idea, but there is a limit to what can be accomplished. Even if it where just a renderer, because of the renderer's importance to a game, Ogre will either be providing a threading model for the game or will be run separately from the rest of the frame in a game and in practice only simple configurations (like thread count) change will be made to its threading. If the Ogre threading model is well thought out the former will seem a reasonable solution for many. If the Ogre threading model is lightweight enough to be ignored in the microseconds before or after it runs and the threading can be disabled allowing the game logic to ignore it and run in other threads this then everyone else can be happy.

    Xavyiy wrote:I would like to read concrete solutions rather than "general ideas", since I see a very low SNR in all ogre redesign threads

    I don't know a huge amount about cache misses, but I am writing a threading library I will gladly re-license to zlib (Currently it is GPL3) for Ogre use. It is not ready for prime time yet, but it has some of the features I want it to have when it is done. I also want to tune it extensively for performance.

    It is a variation on a conventional WorkQueue/Threadpool that moves where synchronization occurs to minimize contention. Currently it is lockless and uses a Atomic CAS operations to prevent race conditions internally, while exposing a concept of dependencies to make writing workunits easier. In WorkUnit code there is no need to use or know about conventional synchronization primitives, but they can be optionally used if desired. I can describe in as much detail as anyone would like and you can also see what I have so far at https://github.com/BlackToppStudios/DAGFrameScheduler/ . It also has fairly comprehensive doxygen docs, including a description of the algorithm, located at doc/html/index.html relative to the root of the downloaded repo.
    Need an alternative to a single threaded main loop for a game: https://github.com/BlackToppStudios/DAGFrameScheduler/
    --Sqeaky
    Sqeaky
    Gnoblar
     
    Posts: 24
    Kudos: 0
    Joined: 19 Jun 2010
    Top

    Re: Ogre 2.0 doc (slides)

    by lunkhound » Sun Nov 25, 2012 11:14 am

    dark_sylinc wrote:
    lunkhound wrote:I have some concerns about the whole SoA thing though. I worry that it may be alot of developer-pain for very little gain. Considering that:

    1. SoA isn't necessary to fix the cache-misses. Cache misses can be fixed by reorganizing how data is laid out but without interleaving vector components of different vectors together.
    2. SoA isn't necessary to improve performance of vector math using SIMD. OK maybe you don't get the full benefit of SIMD and not in all cases but you can probably get 70% of SoA performance simply by using a SIMD-ified vector library.
    3. SoA is not easy to work with. Code is harder to read, harder to debug, more effort to maintain going forward. Imagine inspecting Node structures in the debugger when the vector components are interleaved in memory with other Nodes...

    I think SoA is best for a limited-scope, highly optimized, tight loop where every cycle counts, and only affecting a small amount of code. Kind of like assembly language, SoA comes with a cost in developer time and I'm just not sure it would be worth it.
    Thanks again for all the work on those slides. I'm really glad to see these issues being raised.

    You're right about your concerns. So let me address them:

    1. It is true that there are other ways to optimize the data. However, transformation and culling is something that is actually fairly trivial operations, which are are done sequentially on massive amount of elements. Note that the interleaving is for SIMD. An arragement of "XYZXYZXYZ" is possible by specifying 1 float per object at compile time.
    The performance gains of using SoA for critical elements such as position & matrices are documented in the SCEE's paper (reference 4)


    Sorry, I didn't make myself clear. I agree that SCEE paper is exactly the sort of thing that we ought to be doing, but it doesn't mention SoA as I understand it. When I see "SoA" I think of this: http://software.intel.com/en-us/articles/how-to-manipulate-data-structure-to-optimize-memory-use-on-32-bit-intel-architecture
    Code: Select all
    struct StructureOfArrays
    {
       float x[numVertices];
       float y[numVertices];
       float z[numVertices];
    ...
    };
    

    Intel has been telling everyone to swizzle their data like this ever since they came out with MMX. My comments were ONLY directed at this Intel-style swizzling, and not at the sort of grouping of homogeneous data structures featured in the SCEE reference. I will refer to it as "swizzling" and not "SoA" for clarity.

    dark_sylinc wrote:2. We already do SIMD math and tries to do it's best. There are huge margins to gain using SoA + SIMD because the access patterns and the massive number of operations to perform fit exactly the way SSE2 works. There's a lot of overhead in unpacking & packing.
    DICE's Culling the Battlefield slides show the big gains of using SoA + SIMD (reference 3)


    You'll notice however, that in those DICE slides, they are not actually storing any of their data structures swizzled in memory. They swizzled the frustum planes (on the fly presumeably), and then loop over un-swizzled bounding spheres. That's a great use of SoA/swizzling because no user-facing data structures are swizzled.

    code looks OK...

    dark_sylinc wrote:It's very true that debugging becomes much harder, specially when examining a single Entity or SceneNode.
    I see two complementary solutions:
    • Use getPosition() would retrieve the scalar version; which can be called from the watch window (as long as we ensure it's fully const...)
    • There are a few MSVC features (I don't remember if they had to be installed, or if they were defined through pragmas) that tell MSVC how to read objects while debugging. I'm sure gdb probably has something similar.


    I think looking at those DICE slides again actually convinced me that there is very little to gain from keeping stuff in swizzled format in memory. Just swizzle the frustum planes on the fly and a bit of optimized SIMD code will yield great performance.
    If there are any performance gains to be had from swizzling the SceneNodes in memory, I would expect them to be tiny and not at all worth the trouble it would cause every user who has to examine a SceneNode in the debugger.
    However, I'm sure there are cases where it would make sense, like a particle-system.
    User avatar
    lunkhound
    Greenskin
     
    Posts: 123
    Kudos: 15
    Joined: 29 Apr 2012
    Location: Santa Monica, California
    Top

    Re: Ogre 2.0 doc (slides)

    by spookyboo » Sun Nov 25, 2012 1:51 pm

    Some programmer pops up, decides he needs X thing implemented to render his stuff the way he wants, without investing much time if there was already a way of achieving the same result; then he submit his change and gets into core

    This is indeed the disadvantage of open source. If you want to redesign Ogre, you need a dedicated team that sticks 'till the end' and has a clear vision. Every change to the core is validated by that team. The problem is that such a team needs time and an incentive (money, no personal life) to stick to the project. That is the difference between Ogre development and companies like Epic and Crytek. Ogre can survice when combined with some kind of commercial activity. This is tried before (by Steve) but I am the first to admit that this is no easy task. Ogre needs at least some substantial gifts from large companies (you know who you are!). Maybe these companies want something in return, but as long as this fits into the teams' vision, I don't see a problem.
    User avatar
    spookyboo
    Silver Sponsor
    Silver Sponsor
     
    Posts: 983
    Kudos: 48
    Joined: 06 Jul 2004
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by dark_sylinc » Sun Nov 25, 2012 6:43 pm

    lunkhound wrote:Sorry, I didn't make myself clear. I agree that SCEE paper is exactly the sort of thing that we ought to be doing, but it doesn't mention SoA as I understand it. When I see "SoA" I think of this: http://software.intel.com/en-us/articles/how-to-manipulate-data-structure-to-optimize-memory-use-on-32-bit-intel-architecture
    Code: Select all
    struct StructureOfArrays
    {
       float x[numVertices];
       float y[numVertices];
       float z[numVertices];
    ...
    };
    

    Intel has been telling everyone to swizzle their data like this ever since they came out with MMX. My comments were ONLY directed at this Intel-style swizzling, and not at the sort of grouping of homogeneous data structures featured in the SCEE reference. I will refer to it as "swizzling" and not "SoA" for clarity.

    Oh I see. SCEE's is technically "SoA" which stands for Structure of Arrays (or ptrs). If we look at the SceneNode declaration from SCEE's, it is:
    Code: Select all
    class SceneNode
    {
       Vector3 *Position; //ptr
       Quaternion *qRot;//ptr
       Quaternion *qRot;//ptr
       Matrix4 *matrix; //ptr
    }


    Indeed, Intel's proposal since the introduction of MMX sucked hard. Because when we need to go scalar (we know that happens sooner or later) reading X, Y & Z are three cache fetches, because they're too far a part. It's horrible. Not to mention very inflexible.
    That's why I came out with the idea of interleaving the data as XXXXYYYYZZZZ: When we go scalar, it is still one fetch (in systems that fetch 64-byte lines).

    lunkhound wrote:I think looking at those DICE slides again actually convinced me that there is very little to gain from keeping stuff in swizzled format in memory. Just swizzle the frustum planes on the fly and a bit of optimized SIMD code will yield great performance.
    If there are any performance gains to be had from swizzling the SceneNodes in memory, I would expect them to be tiny and not at all worth the trouble it would cause every user who has to examine a SceneNode in the debugger.
    However, I'm sure there are cases where it would make sense, like a particle-system.

    Actually, we have nothing to lose and possible something to gain (performance). And I'll tell you why:
    Regardless of whether you want to swizzle in memory, or swizzle using instructions; we still have to write the code that ensure all memory is contiguous. Even if we don't use SSE at all (we would use XYZXYZ model, that is, specifying one float instead of four at compile time) we need continuity, and being able to load from memory without data depedencies.

    My idea is that in PC systems, default to four floats, and use SSE. However if you really, really think debugging is going to be a big problem (even with MSVC's custom data display, I admit not everyone uses MSVC), then compile using one float; and there can be also a "SoA_Vector3" implementation that uses packing instructions to swizzle the memory onto the registers on the fly.
    After all SoA_Vector3 & co. is platform dependant. In PCs with 4 floats per object, it will use SSE intrinsics. In ARM with 2 & 4 floats per object, it will use NEON.
    In PCs with 1 float per object, it can use scalar operations... or packing+shuffling SSE intrinsics and still operate 4 objects at the time, like you suggest.

    So, it is a win-win situation. We can have it my way and your way too, with minimal effort (other than writing multiple versions of SoA_Vector3, SoA_Quaternion & SoA_Matrix4). The magic happens in the memory manager that will dictate how the SoA memory gets allocated & arranged. The rest of the systems are totally abstracted from the number of floats interleaved.
    Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
    User avatar
    dark_sylinc
    Google Summer of Code Student
    Google Summer of Code Student
     
    Posts: 718
    Kudos: 136
    Joined: 21 Jul 2007
    Location: Buenos Aires, Argentina
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by Sqeaky » Sun Nov 25, 2012 7:03 pm

    spookyboo wrote:
    Some programmer pops up, decides he needs X thing implemented to render his stuff the way he wants, without investing much time if there was already a way of achieving the same result; then he submit his change and gets into core

    This is indeed the disadvantage of open source. If you want to redesign Ogre, you need a dedicated team that sticks 'till the end' and has a clear vision.


    I disagree that this is a disadvantage, free labor is rarely bad, particularly if you do have a core team as Ogre appears to. Many open source projects have become very successful exactly because of this kind of free labor. But this is clearly off-toppic.

    Klaim wrote:
    As for Ogre's managament of threads:
    • The CompositorManager must have a high degree of control over it's batch threads.

    I don't understand this. To me, whatever the kind of parallel work, it should work with the task scheduler underneath, the same way parallel_for in tbb will spawn tasks for each batch of cycle.

    It may be easier to understand if you think of all multithreaded code as providing and expecting guarantees. Different task/workunit scheduling algorithms expect different amounts of thread-safety from their workunits and interact with their workunits based on these assumptions. Some schedulers require no thread safety, some require just re-entrancy, some require full data write isolation, and there are other more esoteric requirements that are possible. Tasks/WorkUnits will also implicitly make assumptions of their schedulers. They are written differently if workunits finish in a known order, if two workunits are guaranteed to not access the same resources, if every data access need to be wrapped in a mutex/atomic cas, and based what information the scheduler provides the workunit.

    If the default Ogre task scheduler provides a certain guarantees and the game developer provides a task scheduler of his own it must provide at least the same guarantees. If the new scheduler provides more, the Ogre tasks/workunits will not be able to take full advantage of the because they are already written. If he provides fewer guarantees he will likely introduce race conditions or deadlocks.

    For a more concrete example please consider Apples LibDispatch ( http://libdispatch.macosforge.org/ ), which uses a custom barrier primitive, custom semaphores and communication with the scheduler to ensure data consistency and an Apache WorkQueue( http://cxf.apache.org/javadoc/latest/or ... Queue.html ) which implicitly assumes that the work unit will provide it own consistency mechanism without any communication from the Queue, and implements a timeout to provide other guarantees. It would be very difficult to write a work unit that would work ideally in both places.

    The parallel_for construct in Threading Building Blocks really is different class of construct than a scheduler. It is a threading primitive designed to parallelize obviously parellizable problems. I do not know, but I suspect that many parts of Ogre are not obviously parellizable, and I suspect that any threading algorithm used in Ogre must be carefully designed to get maximum performance.

    dark_sylinc wrote:To prevent oversubscription, tell Ogre at startup how many threads it can spawn at max.

    There are likely a few other configurations that can used when Ogre starts to adjust it, but I agree the thread count is the obvious one. IMHO a good threading design will allow the game developer to interact with the Ogre threading system in at least three ways.
    1. Tight Integration with Ogre's threading - Using the same system would clearly be beneficial for small projects or projects that require similar performance characteristics. This could be supported by exposing whatever classes and functions implement the threading model.
    2. Ignore Ogre by letting it do its work all at once - This would use the thread count to give Ogre full control of all the system resources so it could finish its work swiftly. The game would use its own threading system and simply ignore Ogre's during the rest of the frame while it performed the required game tasks. This can be supported by allowing configuration on the threading models classes (thread count, and maybe others).
    3. Ignore Ogre by letting it do its work a limited number of threads - Some games may already take full advantage of threading or may require a long time for a single threaded task. This allows the game logic to run the full duration of a frame while Ogre works its mostly separate task. This would be the hardest of these use cases to support, but I think it could be managed by providing a limited isolated/thread way to provide updates to Ogre through a small number of functions
    Need an alternative to a single threaded main loop for a game: https://github.com/BlackToppStudios/DAGFrameScheduler/
    --Sqeaky
    Sqeaky
    Gnoblar
     
    Posts: 24
    Kudos: 0
    Joined: 19 Jun 2010
    Top

    Re: Ogre 2.0 doc (slides)

    by lunkhound » Sun Nov 25, 2012 9:50 pm

    dark_sylinc wrote:
    lunkhound wrote:Sorry, I didn't make myself clear. I agree that SCEE paper is exactly the sort of thing that we ought to be doing, but it doesn't mention SoA as I understand it. When I see "SoA" I think of this: http://software.intel.com/en-us/articles/how-to-manipulate-data-structure-to-optimize-memory-use-on-32-bit-intel-architecture
    Code: Select all
    struct StructureOfArrays
    {
       float x[numVertices];
       float y[numVertices];
       float z[numVertices];
    ...
    };
    

    Intel has been telling everyone to swizzle their data like this ever since they came out with MMX. My comments were ONLY directed at this Intel-style swizzling, and not at the sort of grouping of homogeneous data structures featured in the SCEE reference. I will refer to it as "swizzling" and not "SoA" for clarity.

    Oh I see. SCEE's is technically "SoA" which stands for Structure of Arrays (or ptrs). If we look at the SceneNode declaration from SCEE's, it is:
    Code: Select all
    class SceneNode
    {
       Vector3 *Position; //ptr
       Quaternion *qRot;//ptr
       Quaternion *qRot;//ptr
       Matrix4 *matrix; //ptr
    }


    I've never seen that called SoA before. That's a structure of pointers to structures (or a structure of pointers into arrays of structures). I'm not sure if there is an "official" definition for SoA (nothing on Wikipedia). But I've always seen it mentioned in conjunction with SIMD.

    dark_sylinc wrote:Indeed, Intel's proposal since the introduction of MMX sucked hard. Because when we need to go scalar (we know that happens sooner or later) reading X, Y & Z are three cache fetches, because they're too far a part. It's horrible. Not to mention very inflexible.
    That's why I came out with the idea of interleaving the data as XXXXYYYYZZZZ: When we go scalar, it is still one fetch (in systems that fetch 64-byte lines).

    lunkhound wrote:I think looking at those DICE slides again actually convinced me that there is very little to gain from keeping stuff in swizzled format in memory. Just swizzle the frustum planes on the fly and a bit of optimized SIMD code will yield great performance.
    If there are any performance gains to be had from swizzling the SceneNodes in memory, I would expect them to be tiny and not at all worth the trouble it would cause every user who has to examine a SceneNode in the debugger.
    However, I'm sure there are cases where it would make sense, like a particle-system.

    Actually, we have nothing to lose and possible something to gain (performance). And I'll tell you why:
    Regardless of whether you want to swizzle in memory, or swizzle using instructions; we still have to write the code that ensure all memory is contiguous. Even if we don't use SSE at all (we would use XYZXYZ model, that is, specifying one float instead of four at compile time) we need continuity, and being able to load from memory without data depedencies.

    My idea is that in PC systems, default to four floats, and use SSE. However if you really, really think debugging is going to be a big problem (even with MSVC's custom data display, I admit not everyone uses MSVC), then compile using one float; and there can be also a "SoA_Vector3" implementation that uses packing instructions to swizzle the memory onto the registers on the fly.
    After all SoA_Vector3 & co. is platform dependant. In PCs with 4 floats per object, it will use SSE intrinsics. In ARM with 2 & 4 floats per object, it will use NEON.
    In PCs with 1 float per object, it can use scalar operations... or packing+shuffling SSE intrinsics and still operate 4 objects at the time, like you suggest.

    So, it is a win-win situation. We can have it my way and your way too, with minimal effort (other than writing multiple versions of SoA_Vector3, SoA_Quaternion & SoA_Matrix4). The magic happens in the memory manager that will dictate how the SoA memory gets allocated & arranged. The rest of the systems are totally abstracted from the number of floats interleaved.


    I've used the MSVC autoexp.dat stuff before, and it works OK, but it is an extra hassle. For one thing its global, so if you have different projects with different needs you'll have to merge it all into the global autoexp.dat file somewhere in your "Program Files" directories. Also the syntax of it may vary with different versions of MSVC (see warning here). We'd probably need to put up a wiki page to help people configure their debuggers. My point is simply that this swizzling of data inside user-facing data structures DOES have a cost. And its a cost that will be paid by everyone who tries to debug their Ogre based application (assuming the default is 4 floats per object). If there is no measureable performance gain to be had from it, then it is a net loss.
    As you say, we still have to write the code to ensure the memory is contiguous, I would suggest we start with that part. Once that is done, SceneNode memory will be abstracted behind a manager of some kind, and it should be pretty easy to try out the swizzling to see if it is a net gain or not. Only then would we be able to say whether the swizzling of SceneNodes is worthwhile. I just think that the "bang for the buck" on this is low if the DICE folks aren't bothering with it.
    User avatar
    lunkhound
    Greenskin
     
    Posts: 123
    Kudos: 15
    Joined: 29 Apr 2012
    Location: Santa Monica, California
    Top

    Re: Ogre 2.0 doc (slides)

    by Klaim » Mon Nov 26, 2012 1:11 am

    Sqeaky wrote:It may be easier to understand if you think of all multithreaded code as providing and expecting guarantees. Different task/workunit scheduling algorithms expect different amounts of thread-safety from their workunits and interact with their workunits based on these assumptions. Some schedulers require no thread safety, some require just re-entrancy, some require full data write isolation, and there are other more esoteric requirements that are possible. Tasks/WorkUnits will also implicitly make assumptions of their schedulers. They are written differently if workunits finish in a known order, if two workunits are guaranteed to not access the same resources, if every data access need to be wrapped in a mutex/atomic cas, and based what information the scheduler provides the workunit.


    So far, my understanding is that all task scheduler implementations (even an synchronous one) only provide a "at some point in the future, the provided task will be executed" guarantee at least.

    If the default Ogre task scheduler provides a certain guarantees and the game developer provides a task scheduler of his own it must provide at least the same guarantees. If the new scheduler provides more, the Ogre tasks/workunits will not be able to take full advantage of the because they are already written. If he provides fewer guarantees he will likely introduce race conditions or deadlocks.


    The default task scheduler should do exactly: execute task now synchronously. (call the task execution immediately in the same thread) It's the simplest one and don't need any dependency.
    I don't see the relationship between race conditions/deadlocks and tasks schedulers because to me it's the user code that have to protect shared data (or not share it).

    For a more concrete example please consider Apples LibDispatch ( http://libdispatch.macosforge.org/ ), which uses a custom barrier primitive, custom semaphores and communication with the scheduler to ensure data consistency and an Apache WorkQueue( http://cxf.apache.org/javadoc/latest/or ... Queue.html ) which implicitly assumes that the work unit will provide it own consistency mechanism without any communication from the Queue, and implements a timeout to provide other guarantees. It would be very difficult to write a work unit that would work ideally in both places.


    My understanding is that LibDispatch is not what I mean by "task scheduler".

    The parallel_for construct in Threading Building Blocks really is different class of construct than a scheduler. It is a threading primitive designed to parallelize obviously parellizable problems. I do not know, but I suspect that many parts of Ogre are not obviously parellizable, and I suspect that any threading algorithm used in Ogre must be carefully designed to get maximum performance.


    parallel_for from TBB does exactly that (I just checked again the code to be sure I'm correct): it creates a hierarchy of tasks and spawn them (in the global task scheduler). The fact that it's a hierarchy of tasks helps the scheduler manage the tasks real execution time (and more importantly will force each child task to be allocated in separate enough memory adresses to avoid false sharing and other related performance problems), but it's still tasks pushed into the tasks scheduler.

    Also, what I don't understand is why Ogre should control this aspect of performance. It will be different between platforms anyway and that's why tbb and alike are used, because they have algorithms that knows how to exploit different contexts efficiently. How does any library would be able to do the same without relying on such library?

    I kind of agree with other around here that Ogre shouldn't do anything related to threading itself, only help or provide the different parts that CAN be parallelized, optionally. At this point, other than by decomposing renderOneFrame() in different functions, there is no other way to let the user decide how to manage thread resources than to let him provide some kind of function or interface implementation which would decide (or not) to spawn tasks in a worker threads.

    dark_sylinc wrote:What I meant is that the control over the batch (worker) threads is too advanced. Creating a generic task scheduler that would run on is not a trivial issue at all. May be something for the far future, IF it seems to be viable.

    Which is why I think Ogre shouldn't provide an asynchrounous task scheduler, only an interface and a syncrhonous implementation. Let the user plug his solution in.

    To prevent oversubscription, tell Ogre at startup how many threads it can spawn at max.


    I disagree because it is definitely hardcore to define an algorithm that would decide how much threads to use depending on other factors, like the hardware resources. tbb does that though, but it assumes that it's the only task scheduler running.

    Do you have a few links of similar implementations of what you have in mind? Because I think I'm no seeing what you see.


    I agree that there might be bad communication here (I might not use the right words in fact, I'm not an academic in this domain- or any actually). I don't have a public or well known example but I see a "simple" (maybe simplist) way to do it, that could be a good starting point.
    I'm currently in travel and will be back home in a few days. I'll try to setup some kind of proposal (basically an interface and some explaination of use) to explain what I meant. Even if it's disapproved as a solution for Ogre, it would help Ogre pointing why it would not be a good solution.
    NetRush - Klaim's Den
    User avatar
    Klaim
    Deity
     
    Posts: 2150
    Kudos: 38
    Joined: 11 Sep 2005
    Location: Lille, France
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by masterfalcon » Mon Nov 26, 2012 1:35 am

    From what I understand our requirements would be, libdispatch is nearly perfect. But I don't know about its platform support at this time.

    I like the schedule put forth on the previous page. The next step is really to break it down into individual tasks and assign them.
    User avatar
    masterfalcon
    OGRE Team Member
    OGRE Team Member
     
    Posts: 4160
    Kudos: 120
    Joined: 25 Feb 2007
    Location: Bloomington, MN
    • YIM
    Top

    Re: Ogre 2.0 doc (slides)

    by Sqeaky » Tue Nov 27, 2012 4:05 am

    Klaim wrote:So far, my understanding is that all task scheduler implementations (even an synchronous one) only provide a "at some point in the future, the provided task will be executed" guarantee at least.

    The whole concept of 'task scheduler' is still under heavy research. Just for example, on arxiv and noted as recent,in the section, "computer science/Distributed, Parallel, and Cluster Computing" there is at least one paper clearly talkng about work scheduling and about a dozen others covering tengentially related topics. The are many kinds of possible scheduling algorithms, like the two I posited in my earlier post. I intentionally picked two similar and in production constructs to demonstrate that even when similar, achieving optimal performance with different queues/schedulers would be hard.
    Klaim wrote:The default task scheduler should do exactly: execute task now synchronously. (call the task execution immediately in the same thread) It's the simplest one and don't need any dependency.
    I don't see the relationship between race conditions/deadlocks and tasks schedulers because to me it's the user code that have to protect shared data (or not share it).

    This could cost a good deal of performance without tangible benefit. I think evidence/data, theory, or at least use cases should be looked at before before asserting how ogre or any piece of software should be designed. Synchronization mechanisms, mutexes/semaphores, even atomic CAS are many orders of magnitude slower than single threaded code. If a task scheduler provides an environment where synchronization mechanisms could be avoided/minimized it would provide a performance boost as well as allowing code to be simpler.

    I am trying to build a scheduler that provides these kinds of guarantees, in part to make usable data. Based on my experience that mutexes are slow, and theory that says many points of synchronization slow code execution (many simple benchmarks and a few sophisticated ones back this up) it seems logical to move this synchronization to various places in the scheduling algorithm to minimize its impact. I am intending it for use in games, and I have structured it very carefully (at least I think so), to make writing code easy and allow good performance. Despite this effort, it might not be ideal for Ogre, or even my own use, but I am willing to update and modify the design until it is ideal for my use case.

    masterfalcon wrote:From what I understand our requirements would be, libdispatch is nearly perfect. But I don't know about its platform support at this time.

    Even though libdispatch is an ideal example of a modern task scheduler, I don't think it would work on most platforms Ogre does. Apple provides implementations on on OS X and iOS, and it has been ported to FreeBSD, but it requires a "C compiler support for blocks is required" according to http://libdispatch.macosforge.org/ . As far I know this means Clang and leaves windows completely unsupported and would be difficult to use even on Linux.

    I am not aware of a better choice which is complete (why I decided to try making one). I know what platforms Ogre works on, I am familiar with some parts of Ogre, but I only have a rough idea of the data structures inside ogre that would need synchronization. MasterFalcon, could you fill in some of the key details and design goals for threading?

    *Edit* added 'slower'
    Need an alternative to a single threaded main loop for a game: https://github.com/BlackToppStudios/DAGFrameScheduler/
    --Sqeaky
    Sqeaky
    Gnoblar
     
    Posts: 24
    Kudos: 0
    Joined: 19 Jun 2010
    Top

    Re: Ogre 2.0 doc (slides)

    by masterfalcon » Tue Nov 27, 2012 6:09 am

    My understanding of some of the plans is a little sketchy. I haven't been keeping up with it as much as I should have. But I believe one of the main goals is speed up scene graph updates via threads. Really, the word task is a better choice and has been used quite a bit because we're not talking about real very long lived processes. Instead firing off a task to aid in things like animation, node or compositor updating. Immediately dispatch_once comes to mind but as you noted libdispatch does not have good platform support.
    User avatar
    masterfalcon
    OGRE Team Member
    OGRE Team Member
     
    Posts: 4160
    Kudos: 120
    Joined: 25 Feb 2007
    Location: Bloomington, MN
    • YIM
    Top

    Re: Ogre 2.0 doc (slides)

    by Sqeaky » Wed Nov 28, 2012 4:23 am

    Judging by page 55/56 of the slides, determining the kinds/amounts of Tasks/WorkUnits up front should be straight foward. It seems that some pieces of code lend themselves to being put into tasks and others into a parallel_for of some kind and other just need to be run after others finish.

    I think that UpdateAllTransforms() and UpdateAllAnimations() should monopolize cpu resources for a brief period, it seems they could be put into an iterable collection of somekind and could be knocked out with a single parallel_for or similar construct. Then each of the boxes with rounded corners would be one or more Tasks/WorkUnits with data dependency to keep them from stepping on eachother's data (even though it is marked as read-only, the arrows and cylinders represent something).

    My frame scheduler was designed with the assumption that all the Tasks/WorkUnits would be known at the beginning of a frame. So if I am wrong about being able to know the amount of Tasks/WorkUnits up front then my library stops looking like a custom built solution. To handle the parrallel_for I have no idea. I might implement a simple one in my scheduler, because it is such a commonly used primitive. However, it might even be better to construct a simple threading routine for based on the data in those systems and the foreknowledge of what they need to do. At this point I do not know enough about Ogre to say which way would be best.

    In case anybody here cares, since I posted the link to my scheduler I have tested and fixed bugs with it on Mac OS X. So now it compiles and passes some basic tests on Ubuntu x64 with GCC 4.6.3 or clang 3.0-6, Mac OS X 10.6.8 with GCC 4.2.1 and on windows xp 32 with MinGW or vs10.

    Whatever Scheduling system we go with we are going to need to break ogre in Tasks/WorkUnits of somekind. I did try to research what converting the current system into Tasks/WorkUnits would vaguely look like:
    from viewtopic.php?f=4&t=30250
    jwatte wrote:1) manage mesh->entity objects
    2) implement spatial hierarchy
    3) do visibility culling
    4) do state management/sorting
    5) do pass management/sorting
    6) load terrain data
    7) generate terrain/heightmaps
    8) manage sets of (lights,cameras,billboards,animations,entities)
    9) do sky rendering


    From viewtopic.php?f=4&t=30250&p=453791#p453845
    tuan kuranes wrote: Transform Stage -> Ogre transforms the buffers (handling page/locality/etc) filling an Cull buffers
    Cull Stage -> Culling per render target fills Ogre renderqueues
    Shading Stage -> Shade/Render each renderqueue according to its "shading/rendering" type into a dx/gl/etc. command buffer
    Execute Stage -> merge (or dispatch between GPU/Tiles/etc) all command buffer and execute them (asynchronously)


    It seems every body agrees Transforms -> Cull -> Complex Visuals -> Render, but which stages depend on each other for data and which connections are essential? How many of these become into 1 Task/WorkUnit and how many become some data driven amount of Tasks/WorkUnits?
    Need an alternative to a single threaded main loop for a game: https://github.com/BlackToppStudios/DAGFrameScheduler/
    --Sqeaky
    Sqeaky
    Gnoblar
     
    Posts: 24
    Kudos: 0
    Joined: 19 Jun 2010
    Top

    Re: Ogre 2.0 doc (slides)

    by tuan kuranes » Wed Nov 28, 2012 4:59 pm

    My above post about more smaller components and separated stage is about the Data Oriented Design part which is not about performance, but about Design, as in leading to simpler, smaller code, each directly and only related to data it controls, which I didn't found in the slides.

    wrote:
    Couple of thoughts.... I like the idea in principle of separating Ogre into more smaller components, but I also realize there can be more complexity with that model "if" you're using a majority of the components. More DLL's to load and register to use for the components. I guess I'm speaking more towards a person who's new to Ogre as there are so many other components to integrate already before even thinking about integrating components within Ogre. If nothing else, it's a thought to consider if that moves forward.

    Regarding my view on "separate components":
    However, this doesn't mean that each "component" can go into a DLL or static lib. There are multiple issues that can't be easily tackled. This level of modularization is something that sounds nice, but ends up being impractical.Furthermore the bigger the project, the bigger the chances there's some dependency between seemingly independent modules. I could've written about this dependencies in the slides, but it's very implementation-specific, and it would misslead from the main topic.When objects are highly modular, they're easier to mantain, refactor, or even replace. But that doesn't mean it has to go each in it's own dll or lib.


    Was just advertizing a first step, as in just making sub-libs of different ogre core huge big lib, more like code reorganisation than refactoring really. Could starts with just some folder reorganisation really. And only then, next steps would be easier & faster, as spotting candidate for dependencies minimization, components In & Out, and therefore DOD refactoring would be much more obvious. (The "switchable/pluggable" sub-libs is just a bonus side-effect of that and will be possible after the DOD refactoring done, really not a goal on itself)

    Just have a look at https://github.com/openscenegraph/osg/tree/master/src and then at https://bitbucket.org/sinbad/ogre/src/3 ... reMain/src ?at=default try to find how Ogre does animation against how openscenegraph does it. (note that it's easier if you already know some class name in Ogre, but for a beginner...)


    It seems every body agrees Transforms -> Cull -> Complex Visuals -> Render, but which stages depend on each other for data and which connections are essential? How many of these become into 1 Task/WorkUnit and how many become some data driven amount of Tasks/WorkUnits?


    Idea is to make it much simpler than actual structures that are shared by too many stages, by copying only relevent data between stage using buffers.

    For Each scene: const shared Nodes Buffer -> Transformed Nodes Buffer
    For Each Viewport: const shared Transformed Nodes Buffer -> Culled Transformed Node Buffer (threadable)
    For Each renderTarget: const shared Culled Transformed Node Buffer -> RenderQueue (threadable)
    For Each RenderQueue: const shared Render Queue -> Command Buffer

    Seems like much more memory usage, but using clever structures and memory operations, it leads to much better locality, memory access and simpler code.
    Gives much more Data oriented design, easier testing/recording/debugging of each stage (buffers being serialisable, you can debug each stage independently).

    So the Component/Stage Idea is really about minimizing dependencies between stage/component with clear data In & Out at each stage/component, as in Data Oriented Code.
    Consulting Services | My Ogre links | twitter
    User avatar
    tuan kuranes
    OGRE Moderator
    OGRE Moderator
     
    Posts: 3064
    Kudos: 4
    Joined: 24 Sep 2003
    Location: Haute Garonne, France
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by sparkprime » Wed Nov 28, 2012 10:17 pm

    Wow, what a thread. The elephant in the room (ogre's performance) has been given a good kick in the balls. I can report the same kinds of problems that dark_cyclic has reported, in the Grit Engine ( http://www.gritengine.com) -- a very high per batch cost and a near idle GPU. And thanks to him for also putting together those slides, must have taken an age. I have also realised that Ogre's design is the limiting factor, and in fact if I knew 5 years ago what I know now, I probably would have written my own GL-based renderer from scratch instead of using Ogre. Over the years, I have dropped more and more functionality, starting with the background loading system, the scripting framework, particles, billboards, the compositor framework, and now, the overlay component. I'm actually anticipating dropping or heavily altering scene management and renderqueues (not with a smile on my face I can assure you) because of these issues under discussion.

    So, I have spent literally hours reading those slides and this thread and I surely haven't taken it all 100% so please bear with me.

    We certainly want optimised culling algorithms, whether that is using this or that parallel programming framework, or SIMD, doesn't really matter, as long as it is state-of-the-art. We want a render queue that can keep the driver busy while doing sorting, and whatever else work it needs to do per batch. Broadly all of this sounds like a good idea. But this is years of work.

    What I would propose is doing the work from back to front. First fixing the rendersystem API -- make it the perfect render system API for the modern world. If we have a rendersystem abstraction that is cross platform, uses d3d/gl completely transparently, and has the right performance for modern workloads, that is already immensely useful. At the moment, as far as I can tell, it is not possible to get the right performance even if you are doing explicit draws, using the RenderSystem API, because of the way it handles shaders, for instance, and the lack of key functionality like hardware PCF and normal map compression. Also there are irritatations with leaky abstractions from the two render systems, such as render target flipping and projection matrixes. I can't help thinking some of these could be hidden without losing appreciable performance. It needs cleaning up too, with the removal of fixed function functionality. It seems we're coming close to this with the GL4 and D3D11 work nearing completion. But I'd like to make sure it is 'all the way' and not just 'what is useful now given the current infrastructure on top'. There should be performance tests demonstrating that rendering explicit draws are fast enough, at this level.

    Following on from that, some sort of render queue that can offload the actual draw calls into a background thread (pipelining them) seems to be appropriate. Then at least you can use 2 cores for ogre -- one for doing all the waste during scene management, and one for pushing draws out at max speed. The render system is essentially isolated from the user threads at this point. Actual loads of resources (as opposed to prepares) would have to be performed by the hidden thread. I don't see why this thread cannot be created with a simple abstraction around the win32/posix thread library. It's not a short-lived task -- it is continual and contains synchronisation points.

    Optimising the actual culling and construction of the list of draws is then the final task. Of course if you have a SIMDised octree cull or whatever, you'd also want the naive version ifdeffed out alongside it for testing. That way if people get weird behaviour they can just switch to the naive versions for debugging. This is good practice with any highly performant code, because you always sacrifice clarity for performance. I think anything particularly advanced, using microthreading programming frameworks for instance, should maybe be kept out of the default compile. Seems like tuning it will be a pain in the arse anyway, for unbalanced workloads. I've seen things like (naive) fibonacci implementations with hardcoded constants as to how much of a subtree to do in a single task. SIMD will have to assume SSE i presume so that can't be on by default either. I say, do relatively clean cache-friendly algorithms for these kernels (principally culling, it seems) and leave more hardcore implementations as optional extras.


    Perhaps there are some lower hanging fruit in these front end stages of the pipe, but I'd rather have a solid backend that gives people the opportunity to build their own higher level layers on top. That way, if the work stalls, it is always possible to hack up an application-specific layer on top of the rendersystems and get AAA performance while Ogre remains incomplete. This may even *inform* the design of the upper layers.


    It is good to see these issues being discussed so intensely, it gives me some hope that things could get fixed. :)


    Now, a couple of rants:

    I see a big divide in programmers, tools, applications, and approaches between cellphone graphics programming and pc/console. I am unsure why anyone would want to do both with the same library. I'd take a mansion or a yacht over a houseboat, any day of the week. Some code / diskformat sharing would be good, but let's not overdesign the core APIs for compatability between cells and pcs! Noone is porting an optimised pc game to the cellphone by simply recompiling it with a different rendersystem. All the rendering techniques, materials, and assets will have to change. How about two different rendersystem APIs, two different scene management APIs, a shared resource system, and shared math/scene culling APIs? Bring all the appropriate bits together to make a complete library for your given system.

    Nobody wants DLLs. Splitting a huge binary into pieces is not a solution. They just make a mess on the filesystem, and a configuration headache. Stuff should be if-deffed out and that's all you need. Ignore pressure from package maintainers trying to create canned versions of everything. This is not libjpeg. Static linking creates a little wasted disk space. So what? The alternative is a complete mess and potential performance hazards. If it's easy to drop unused functionality in the build, the binary sizes will be reasonable anyway. And people doing serious apps on Ogre will probably be using their own fork, or an unreleased trunk. Static linking is the answer for libraries like Ogre.

    For this message the author sparkprime has received 3 kudos
    Top

    Re: Ogre 2.0 doc (slides)

    by Klaim » Thu Nov 29, 2012 12:04 am

    sparkprime> I basically agree with all you said, but the part about dlls " Static linking is the answer for libraries like Ogre." dont hold true for everybody. In most cases ok, but I have examples where this cannot be done, or an alternative would be equivalent to having dlls anyway (most common case is when you want to allow native extension of your app that needs to know the graphic api). That being said, it's still minor issue as Ogre can be compiled whatever the way the user wants. To me the point of separating in components isn't a problem if it's only Ogre that is compiled with it's component always static. Is that what you mean? If yes, I agree, forget what I just said.

    DanielSefton wrote:If you're building a game for windows, choose DirectX at compile time, if you're building for Mac, choose OpenGL at compile time etc. There's no benefit for allowing the user to swap between the two.


    +++
    Definitely agree. That being said, the renderer switching feature is still useful for some kind of application (like samples?), so if they are still considered necessary, I dont see how to avoid it...
    NetRush - Klaim's Den
    User avatar
    Klaim
    Deity
     
    Posts: 2150
    Kudos: 38
    Joined: 11 Sep 2005
    Location: Lille, France
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by Kojack » Thu Nov 29, 2012 3:18 am

    Klaim wrote:
    DanielSefton wrote:If you're building a game for windows, choose DirectX at compile time, if you're building for Mac, choose OpenGL at compile time etc. There's no benefit for allowing the user to swap between the two.


    +++
    Definitely agree. That being said, the renderer switching feature is still useful for some kind of application (like samples?), so if they are still considered necessary, I dont see how to avoid it...

    Wild Magic (the engine ogre borrowed some of it's maths code from) used to use runtime render system choosing like we do, but around version 5 they changed to a compile time choice to avoid the virtuals. I don't know how much it helped though (I've never used it). I have some vague ideas of how we could support both, but I haven't played around with it yet.
    User avatar
    Kojack
    OGRE Moderator
    OGRE Moderator
     
    Posts: 5947
    Kudos: 348
    Joined: 25 Jan 2004
    Location: Brisbane, Australia
    Top

    Re: Ogre 2.0 doc (slides)

    by Klaim » Thu Nov 29, 2012 10:34 am

    Personally I wouldn't mind having only compile time choice (I think it's a good idea). I just wonder if other will not have problems with it (in particular when comparing samples).
    NetRush - Klaim's Den
    User avatar
    Klaim
    Deity
     
    Posts: 2150
    Kudos: 38
    Joined: 11 Sep 2005
    Location: Lille, France
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by lunkhound » Thu Nov 29, 2012 7:39 pm

    DanielSefton wrote:If you're building a game for windows, choose DirectX at compile time, if you're building for Mac, choose OpenGL at compile time etc. There's no benefit for allowing the user to swap between the two.


    What if you want your PC game to support DX11 but also DX9 as a fallback (for Windows XP, or lower end GPUs)? It is pretty common for current PC games to support both.

    edit: To answer my own question, I guess in this case, one could compile OGRE as a pair of DLLs, one for DX11 and the other for DX9. The game executable then selects between the two at runtime. Each DLL is statically linked to its rendersystem and all of the other components it needs.
    User avatar
    lunkhound
    Greenskin
     
    Posts: 123
    Kudos: 15
    Joined: 29 Apr 2012
    Location: Santa Monica, California
    Top

    Re: Ogre 2.0 doc (slides)

    by masterfalcon » Thu Nov 29, 2012 9:41 pm

    As you may have noticed, we've created a new forum for 2.0 design and related discussion. Because there are so many aspects to the project using 1 thread is just unreasonable and frankly, getting confusing.

    So let's start off with a thread for each feature or task so we can have some great, distinct discussions. For example: Threading, RenderSystem design and changes, SceneManager, RTSS, etc.

    Also, if you find a comment in one of the existing threads that is particularly helpful or relevant to your comment please include a link to the comment or quote it in the new threads.

    Thanks!

    Masterfalcon
    User avatar
    masterfalcon
    OGRE Team Member
    OGRE Team Member
     
    Posts: 4160
    Kudos: 120
    Joined: 25 Feb 2007
    Location: Bloomington, MN
    • YIM
    Top

    Re: Ogre 2.0 doc (slides)

    by syedhs » Fri Nov 30, 2012 5:03 am

    I do support the idea (been thinking myself for quite some time) to do away with pluggable Rendering System. There is currently too much effort to create standardized interface for OpenGL, DirectX, DirectX11, OpenGLES.. Maybe there are common grounds for OpenGLXX but with DirectX11, it seems a bit too much effort is done so that it adheres to previous Ogre design (which was wonderful because we only have like 2 renderers).

    But we need to move forward, a simpler design yet faster is preferred. I am not expert in Graphics Programming, but that is what I see from higher level point of view. With this way, we can also aim toward 'native' DirectX/OpenGL which make possible of all available features in DirectX/OpenGL
    A willow deeply scarred, somebody's broken heart
    And a washed-out dream
    They follow the pattern of the wind, ya' see
    Cause they got no place to be
    That's why I'm starting with me
    User avatar
    syedhs
    Silver Sponsor
    Silver Sponsor
     
    Posts: 2596
    Kudos: 42
    Joined: 29 Aug 2005
    Location: Kuala Lumpur, Malaysia
    Top

    Re: Ogre 2.0 doc (slides)

    by dark_sylinc » Sat Dec 01, 2012 7:13 pm

    First of all, regarding the Threading/tasks issue. I quickly wrote an addedum to make a few things clearer. I'm updating the original post so it can be viewed there too.
    I didn't include references & stuff because they're more like notes, and I didn't have the time.

    masterfalcon wrote:My understanding of some of the plans is a little sketchy. I haven't been keeping up with it as much as I should have. But I believe one of the main goals is speed up scene graph updates via threads.

    Discussion in the forum topic went straight to threading. But as for the slides, the topics were:
    • Eliminate cache misses & cache polution through pointers to relevant data
    • Smarter behavior about culling & traversing the scene
    • Data-level parallelization (SIMD)
    • Execution level parallelization (Threading)
    • Other engine design changes.

    Sqeaky wrote:Sqeaky's whole posts

    Sqeaky seems to have a good understanding of my intentions. Also he seems to have a good graps of the parallel world hazards. I agree to everything he said.

    Sqeaky wrote:It seems every body agrees Transforms -> Cull -> Complex Visuals -> Render, but which stages depend on each other for data and which connections are essential? How many of these become into 1 Task/WorkUnit and how many become some data driven amount of Tasks/WorkUnits?

    I'm going to assume "Complex Visuals = Sorting & Material preparation".
    There's no ultimate answer. "It depends". For example if the game consists on a simple scene of one pass, no shadows; then cull -> transform is preferred, because you only update what's visible. This was your typical 1999 game.
    But if you have shadows, environment maps, and/or multiple passes; then cull -> transform may be counterproductive, because you may need to transform the same object on each pass where it appears repeated (i.e. no reuse). Transforming before culling may iterate over a few objects that were culled on all passes, but tends to be much scalable because the number of non-culled objects is statistically much higher.

    tuan kuranes wrote:For Each scene: const shared Nodes Buffer -> Transformed Nodes Buffer
    For Each Viewport: const shared Transformed Nodes Buffer -> Culled Transformed Node Buffer (threadable)
    For Each renderTarget: const shared Culled Transformed Node Buffer -> RenderQueue (threadable)
    For Each RenderQueue: const shared Render Queue -> Command Buffer

    Transforming the nodes is threadable. The baking the command buffer is threadable in D3D11 (apparently not in GL?)
    I'd like to start thinking of "compositor output" rather than Viewport, since it's easier to visualize and work with. Viewports are still useful for setting up render area & some old school tricks; and ultimately each compositor outputs to a viewport.

    sparkprime wrote: sparkprime's whole post

    I agree with Sparkprime. And his rant is hilarious.
    Although I don't agree on his view on cells. Next generation of phones is going to be a beast (Samsung Galaxy SIII computing power is quite impressive) and GL ES 3.0 is a real step forward (may be the Khronos group is doing something good with GL for once?).
    As for the PC vs Phone vs Tablet "who will win?" debate?, is a fairly subjective point. The media tends to hype "the PC is dying" because it's investor's lingo for "sales increasing slower than before".
    If we look at history, live theaters took quite a hit when movies appeared. And radio took quite a hit too when TV appeared. And TV is "dying" when broadband internet became mainstream. People now spends more time on the PC than on the TV.
    And the movie industry was predicted to be "dead by now" because of online piracy. But there are a few movies that sold quite well for a dead medium.
    Radio wasn't replaced by TV, it had to share a similar market.

    Most likely PC vs Phone/Tablet will follow the same pattern. Phones will grow as big to not be ignored hurting the PC; but the PC won't disappear. I could be wrong though. Exceptions happen.

    Therefore having two different APIs is a tremendous amount of work we're not able to cope with. Of course, the Ogre user at a higher level can't aim at plug 'n play ports between PC & phones unless computing power required for his game isn't big. But he still needs to work out the shader quirks (as we don't provide uber shaders like UDK).


    As for the static render systems, that's an interesting approach. I don't know how bad the virtuals will be after we separate states from SceneManager & RenderSystem; but certainly phones & consoles are the most affected.
    Fallback support for DX11 & DX9 isn't hard really. Just watch all the other games: Include 2 executables + 1 launcher. The launcher detects DX support (or GL option) and launches the right program.

    sparkprime wrote:What I would propose is doing the work from back to front. First fixing the rendersystem API (...)

    Well, you propose doing the reverse order I do. I'm writing the series of tasks regarding scene management, but it lacks a bit on the RenderSystem side. You could help up a little on that adding those tasks.
    They can be done at the same time. I prefer focusing on the tasks I propose, then Render APIs; but there is no reason they can't be done concurrently if enough volunteers are found. I know what I'll be working on. But I can't speak for the rest of the team and contributors.

    sparkprime wrote:(...) It needs cleaning up too, with the removal of fixed function functionality. It seems we're coming close to this with the GL4 and D3D11 work nearing completion.

    Well, one of the reasons I came with the idea of States (which isn't new btw.) is not only performance and usability (since RTSS should be able to build shaders for newbies basing on their states) but also as fear the FF makes a come back. You never really know. A new architecture. A new technique. Who knows.
    A year ago Forward Rendering was dying. Today Deferred Shading using GBuffers has it's days counted. In 2006 FF was dying, but Nintento chose to launch the Wii with FF. And in 2010 FF appeared to be a dead end, but it was a good idea for the tessellator (since Geometry Shaders sucked at that).
    Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
    User avatar
    dark_sylinc
    Google Summer of Code Student
    Google Summer of Code Student
     
    Posts: 718
    Kudos: 136
    Joined: 21 Jul 2007
    Location: Buenos Aires, Argentina
    • Website
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by Xavyiy » Sat Dec 01, 2012 7:58 pm

    I'm on a hurry so it's going to be fast, I just want to share my opinion about dark_sylinc's addendum:

    As some of you may know, 3 years ago I spent 2 months building a general-purpose task library, which I use in the Paradise Engine. In that scope, a whole framework to parallelize(Engine, editor, ocean simulation, ...), something like that is needed since it's very useful defining a list of tasks, define dependencies between them if needed, and then execute them relying on the task scheduler.

    But, the scheduler itself adds some overhead(here is where tasks granularity becomes very important) when executing tasks, since it needs to signal the worker thread, executes the task, notify the scheduler back when finished, etc. The overhead will depend a lot of the scheduler features(support or not of data dependencies between tasks, etc) but it'll be always slower than a simple barrier system like dark_sylinc is proposing.

    What I am trying to say? That if UpdateAllTransforms() & UpdateAllAnimation() are the most relevant candidates for parallelizing, I really suggest doing it in a simple, clear and efficient way: the barrier system. Easy to debug, easy to modify and, at the end, more efficient than using a task-based lib. Less overhead. Granularity could be a problem, but nothing stop us from doing some real-time profiling and adapt the range of objects each thread updates. Also, a great thing here is that the time needed to update each node is constant (it'll not be the same for animations, but it's not going to be a huge difference), so that helps a lot! (In a task-based system each task may have very different execution times, you can mix little tasks with big tasks, so granularity/profiling becomes very important to avoid wasting of CPU resources).

    Xavier
    Creator of SkyX, Hydrax and Paradise Sandbox.
    Looking for Ogre3D consulting services?
    Follow me: @Xavyiy

    For this message the author Xavyiy has received kudos
    User avatar
    Xavyiy
    OGRE Expert User
    OGRE Expert User
     
    Posts: 830
    Kudos: 77
    Joined: 12 Apr 2005
    Location: Albacete - Spain
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by dark_sylinc » Sat Dec 01, 2012 8:02 pm

    OK!! I added the tasks needed for Ogre 2.x
    They're in the WIKI PAGE.

    I may have missed a lot of tasks, so feel free to add more (debate if unsure).
    Some of you will probably argue about the order in which this has to be done. There's no magic bullet. This is the order *I* would do it. And since I'm going to commit a bit of my time in the next year for Ogre that's what I'll be working on.

    If someone else feels to work on the RS side of things, for example; he can do it parallel to my work of course :)
    For example task "Replace "create read only -> lock" pattern for "initialize on creation" pattern in order to ease D3D11 development. " is VERY important in order to let progress of D3D11 rendersystem go on smoothly; which is why I assigned it to 2.0
    Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
    User avatar
    dark_sylinc
    Google Summer of Code Student
    Google Summer of Code Student
     
    Posts: 718
    Kudos: 136
    Joined: 21 Jul 2007
    Location: Buenos Aires, Argentina
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by Sqeaky » Sun Dec 02, 2012 2:50 am

    masterfalcon wrote:start off with a thread for each feature or task so we can have some great, distinct discussions. For example: Threading, RenderSystem design and changes, SceneManager, RTSS, etc.


    I made a new thread for threading... I feel like I need an xzibit picture that says something witty. I included link to the wiki page and my response to a few things said here.
    viewtopic.php?f=25&t=75628
    Need an alternative to a single threaded main loop for a game: https://github.com/BlackToppStudios/DAGFrameScheduler/
    --Sqeaky
    Sqeaky
    Gnoblar
     
    Posts: 24
    Kudos: 0
    Joined: 19 Jun 2010
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by madmarx » Sun Dec 09, 2012 10:39 pm

    In the past, with Ogre I rewrote : an exporter (!), a resouce system, a threading system, my own node management ... i did direct calls in the rendersystem to get my own effects working.

    sparkprime wrote:What I would propose is doing the work from back to front. First fixing the rendersystem API (...)

    I think that your post describes exactly what should be done. And I would add that rewriting is what should really be done.
    1/ write a render system encapsulated in C calls (not C++).
    2/ maths functions and basic containers.
    3/ rebuild on top of that. You know, writing a queue, a node system etc... is not hard. Everything oriented with data representations.

    I think it's an error to try to correct the scenemanager/resources system/etc.. as they are now. It will take tons of efforts, while it would be vastly easier to rewrite everything. The only thing is that the team should agree on the structs that contains datas, and then the community provides the fonctions / bodys to use them.

    I tried both. My conclusion is : Correcting Ogre is too timing consuming for very little gain. rewriting is easy + it's more motivating compared to the other option. The interfaces just need to be "canon'd" by the team. The community can handle the rest IMO.

    From my point of view, the best plan would be : provide a simple render system interface for everyone, and then give a clean priority order for describing the datas necessary to represent everything else. BTW I am not saying that the team is not doing a great job, I think they are great and do a lot. (But I wouldn't mind if Kojack / Tommo / Klaim were to propose something too :wink: ).

    Question to the team : if a first good shot for data structure is proposed by someone not from the team (ex: Kojack/ Spark / Klaim / Tommo ... sorry for the one i forgot, but I mean someone not completely unrelated to Ogre), how to make it accept as a base for Ogre 2.0 and allows people to work for real on it? What should be the minimum functionality offered by it, so that it would be satisfying as a good base for 2.0 ?

    Best regards,

    Pierre
    Tutorials + Ogre searchable API + more for Ogre1.7 : http://sourceforge.net/projects/so3dtools/
    Corresponding thread : viewtopic.php?f=1&t=57693&start=0
    User avatar
    madmarx
    OGRE Expert User
    OGRE Expert User
     
    Posts: 1632
    Kudos: 41
    Joined: 21 Jan 2008
    Top

    Re: Ogre 2.0 doc (slides)

    by Brocan » Sun Dec 09, 2012 11:44 pm

    dark_sylinc wrote:[*]Export integration. So far the biggest complain I get from artists & indie companies is that the export process pipeline sucks. UDK & Unity do this pretty well. There are too many steps involved into getting from 3DS Max/Maya/Blender into the actual game. This is because:
    1. It usually involves setting a material file (by hand, using text! artists don't like text!), being careful not to overwrite a previous material file
    2. Exporting all the right submeshes; and placing the .mesh folder into the right folder (or setup resources.cfg)
    3. Setting up an additional file for the custom game engine to link the GameObject with a given Mesh
    4. Getting a preview means doing all the above (+ launching a custom preview tool, Ogre Meshy, or even loading the full game depending on each case). It's a pita if something was wrong and above steps must be followed all over again. Cuts the iteration process
    Having that said, most of us don't have the time to work on tools, because tools involve GUI code (many find it boring & frustrating). Making good GUI is an art, and requires a lot of co-developing with artists & designers (after all, those are the users). This forum sadly lacks artists.
    It's a chicken and egg problem; we can't make appealing tools because we don't have artists to work with, and we don't have artists because we have no appealing tools.


    What about dropping actual mesh/material formats and adopt a "more-standard" format like fbx?

    And made somekind of library for tools, that can be easily used for adding the node/material/what-else editor for your engine editor?
    User avatar
    Brocan
    Orc
     
    Posts: 441
    Kudos: 9
    Joined: 01 Aug 2006
    Location: Spain!!
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by Klaim » Mon Dec 10, 2012 1:59 am

    (ex: Kojack/ Spark / Klaim / Tommo ... s


    Just to be clear: I don't think I understand part of what dark_sylinc and others are talking about on the multithreading stuffs. To me the whole threading system should be managed outside Ogre or even not mentioned by Ogre at all as sparkprime suggested (if my memory is correct). IF Ogre needs to provide a support in multitasking, my current understanding is that Ogre rendering process should be cut in a way that allows client code to organize the rendering in parallel if he wants, in the way he wants, with the tools he wants. Any way to automate this (other than optionally) from ogre itself could cause problem to people using Ogre in not-so-common context.

    Now, that makes me feel like a noob which I feel I am on the subject, so don't think my input or work could help.

    Maybe on other subjects. I will have to setup my resource management system very soon...
    NetRush - Klaim's Den
    User avatar
    Klaim
    Deity
     
    Posts: 2150
    Kudos: 38
    Joined: 11 Sep 2005
    Location: Lille, France
    • Website
    Top

    Re: Ogre 2.0 doc (slides)

    by dark_sylinc » Mon Dec 10, 2012 4:02 am

    Brocan wrote:What about dropping actual mesh/material formats and adopt a "more-standard" format like fbx?

    And made somekind of library for tools, that can be easily used for adding the node/material/what-else editor for your engine editor?

    No sane engine would choose a format in which the developers have no control of, the specs depend on a 3rd party with interests that may not be aligned to ours, opening the file does not guarantee 100% that the model & animation will look exactly as it looks in the modeling tool..... as a native format.

    What we should aim is to an import/export simplification process just like Unity & UDK do. Not adopting a foreign format as our own.
    Twitter: @matiasgoldberg My GSoC2013Tech Blog, Video games & Free Music at Yosoygames.com.ar
    User avatar
    dark_sylinc
    Google Summer of Code Student
    Google Summer of Code Student
     
    Posts: 718
    Kudos: 136
    Joined: 21 Jul 2007
    Location: Buenos Aires, Argentina
    • Website
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by mkultra333 » Mon Dec 10, 2012 6:52 am

    Technical details are all well above my pay grade, but as a user I fear the following about a massive, non-incremental rewrite.

    They take ages, and if the main few involved lose interest, need to move on for financial or other reasons, or get hit by a meteor, then it's very hard for others to come in and pick up where they left off. So Ogre could simply die.

    With a more incremental approach, improvements ready to use come a lot faster, and if there's any turnover in the main movers and shakers it's a lot less devastating.

    I can live with an Ogre that isn't absolutely as perfect and fast as it could be, I couldn't live with a half-written super-fast Ogre that is never completed.
    "In theory there is no difference between practice and theory. In practice, there is." - Psychology Textbook.

    For this message the author mkultra333 has received kudos
    User avatar
    mkultra333
    Beholder
     
    Posts: 1520
    Kudos: 61
    Joined: 08 Mar 2009
    Top

    Re: Ogre 2.0 doc (slides)

    by Brocan » Mon Dec 10, 2012 9:17 am

    dark_sylinc wrote:
    Brocan wrote:What about dropping actual mesh/material formats and adopt a "more-standard" format like fbx?

    And made somekind of library for tools, that can be easily used for adding the node/material/what-else editor for your engine editor?


    What we should aim is to an import/export simplification process just like Unity & UDK do. Not adopting a foreign format as our own.


    I understand your point of view, but for example Unity has direct FBX import, which simplifies a lot the artist pipeline because you have direct integration with a lot of 3D programs and tools.
    User avatar
    Brocan
    Orc
     
    Posts: 441
    Kudos: 9
    Joined: 01 Aug 2006
    Location: Spain!!
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by Wolfmanfx » Mon Dec 10, 2012 9:38 am

    Half written and never finished - its super unrealistic to say that we are faster with a scratch rewrite, we forget the man hours/years in testing and every workaround which is done over the years to make ogre that stable. With a fresh codebase you make the same error/mistakes we think we are much smarter and elegant and then we realize that we forget this edge case and the other on too and the codebase will grow again...
    When someone read this thread it sounds that OGRE is unusable (too slow) but the world is not just black and white we are not perfect but who is perfect (and i am talking about opensource)? Its naive to say we did not catch up with AAA engines....they have paychecks and hundred of ppl working on it full time and the codebase of them is also not perfect (i have already worked with a few).

    I agree we should refactor the render system interfaces and do a more thread friendly design which process gpu commands like gamebryo does or other engines as well.

    Regarding threading - i think the best approach is to tell ogre how many threads it is allowed to use 0 - N or auto and thats it so the user can control how much is spawned. What is the benefit to make this also plug-able?

    All in all the thread is awesome do not get me wrong :) and i can not await do begin with the planning inside jira.
    Image
    Follow me on twitter
    User avatar
    Wolfmanfx
    OGRE Team Member
    OGRE Team Member
     
    Posts: 1384
    Kudos: 93
    Joined: 03 Feb 2006
    Location: Austria - Leoben
    • Website
    • ICQ
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by Wolfmanfx » Mon Dec 10, 2012 9:40 am

    No unity has no direct FBX support - the unity editor has a fbx importer which is a different thing the runtime has not - a thirdparty runtime importer exist for 300$.
    I recently wrote an runtime importer for unity for a custom format but it sucked too you can not use it with unity flash because of the sanboxing and stuff like that.
    Image
    Follow me on twitter
    User avatar
    Wolfmanfx
    OGRE Team Member
    OGRE Team Member
     
    Posts: 1384
    Kudos: 93
    Joined: 03 Feb 2006
    Location: Austria - Leoben
    • Website
    • ICQ
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by Brocan » Mon Dec 10, 2012 9:46 am

    Wolfmanfx wrote:No unity has no direct FBX support - the unity editor has a fbx importer which is a different thing the runtime has not - a thirdparty runtime importer exist for 300$.
    I recently wrote an runtime importer for unity for a custom format but it sucked too you can not use it with unity flash because of the sanboxing and stuff like that.


    You are right, but the effect is the same, you can directly use fbx without the need of exporting to unity's internal format. :P

    The problem, as you said, is that the runtime hasn't this feature due to the heavy weight of the libraries that fbx uses, so importing meshes at runtime is quasi-impossible (which sucks). :|
    User avatar
    Brocan
    Orc
     
    Posts: 441
    Kudos: 9
    Joined: 01 Aug 2006
    Location: Spain!!
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by Wolfmanfx » Mon Dec 10, 2012 9:53 am

    You have to use neoaxis or the another ogre based game engine which comes with this features btw OC3 released an fbx to ogre converter.
    Image
    Follow me on twitter
    User avatar
    Wolfmanfx
    OGRE Team Member
    OGRE Team Member
     
    Posts: 1384
    Kudos: 93
    Joined: 03 Feb 2006
    Location: Austria - Leoben
    • Website
    • ICQ
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by al2950 » Mon Dec 31, 2012 3:33 pm

    I have only just caught up with this topic and I would just like to say thank you to dark_sylinc and anyone else who has had input into the slides and this forum, it has been a very interesting read. :D

    I will soon be adding a few notes about RTSS
    al2950
    Gnome
     
    Posts: 389
    Kudos: 16
    Joined: 11 Dec 2008
    Location: Bristol, UK
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by madmarx » Mon Dec 31, 2012 5:49 pm

    Half written and never finished - its super unrealistic to say that we are faster with a scratch rewrite, we forget the man hours/years in testing and every workaround which is done over the years to make ogre that stable.

    Well this is where we disagree :) . For example, concerning the background loading in ogre, i was pretty sure i would have done better than sinbad from the first go (if i had known the internals), the reason being that i developed such multithreaded appli in pure opengl in 2005. I read many posts reporting bugs on this feature from ogre 1.5 to 1.7, which clearly asserts that to many efforts have been put on that element. Also, to write a "not-as-full feature" graphic engine is quite straigt forward : let's forget about high level (shadows, batching) first, and do it only after the core works. I suspect that most of the tricky code which had been hard to stabilize is in the plateform management. These are my opinions. Now you have a different experience than me which leads you to warn about "the man hours/years in testing and every workaround " that we could lose.

    When someone read this thread it sounds that OGRE is unusable (too slow)
    . I agree with you; but it is not an attack toward the engine, these are proposition to make it better now that we got better knowledge of what should be done on today's hardware. Given what we know now, it is easier and faster to rewrite than to modify.

    The problem IMO is that if someone propose an already working fully rewriten ogre, we don't know if the team will accept to continue with it, or if the team will throw it away without discussion :? .
    Tutorials + Ogre searchable API + more for Ogre1.7 : http://sourceforge.net/projects/so3dtools/
    Corresponding thread : viewtopic.php?f=1&t=57693&start=0
    User avatar
    madmarx
    OGRE Expert User
    OGRE Expert User
     
    Posts: 1632
    Kudos: 41
    Joined: 21 Jan 2008
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by drwbns » Tue Jan 01, 2013 6:39 am

    How many of the team members are actually interested in a complete rewrite?
    Check out my site - https://sites.google.com/site/realtimecpu/
    drwbns
    Gnoll
     
    Posts: 651
    Kudos: 21
    Joined: 18 Jan 2010
    Location: Costa Mesa, California
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by TheSHEEEP » Tue Jan 01, 2013 11:08 am

    I'd be completely against it.
    The best file format for any engine is one that is tinkered to its needs. And we already have that.

    Though it would be great to have an official importer/converter that can convert from some standard formats into our own. And if we had that we could do it on-the-fly like Unity, etc.
    Of course that would come at a performance cost (converting fbx->mesh/skeleton at runtime), but if that would be okay for some projects, why not? It would at least be interesting for artist so they could try out their assets in the engine without having to convert it manually.
    But there is nothing we can do here without having some official team members being responsible for pipeline tools.

    Hmm.. would such a thing (pipeline tools) be an interesting GSoC project?
    My site! - Have a look :)
    Also on Twitter - extra fluffy

    For this message the author TheSHEEEP has received kudos
    TheSHEEEP
    OGRE Team Member
    OGRE Team Member
     
    Posts: 844
    Kudos: 50
    Joined: 02 Jun 2008
    Location: Berlin 
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by Xavyiy » Tue Jan 01, 2013 12:21 pm

    madmarx wrote:
    I'd be completely against it.

    Thanks for the clear answer! I think it's better for the community to hear clear answer like that from the team, than to have no answer on that subject.
    Well, I guess it's just a personal opinion which doesn't represent to the whole team, although I think (and hope) it's the same too.

    As Wolfmanfx has said, to me a rewrite is absolutely unrealistic: in the best of the cases, it would take more than 3-5 years (I'm talking about a full featured engine: animation, compositors, materials/shaders, resource system, scene management, render system, shadow framework, mesh&entity/object system, terrain, and a long etc), and that's simply unaffordable: for ogre itself and for ogre users in particular. (Hell! More than 5 years!... and the worst: I'm sure the best practices now are not going to be the same in 6-10 years, so in another 10 years we're going to be in the same situation)

    Appart from being realistic or unrealistic, to me the most important reason for not doing that is simple: the users. What are you going to say to all middle/big projects using Ogre 1.X? Simply to dead? Refactoring some parts between releases like the 2.X plan is completely affordable, great and necessary, but "upgrading" ogre from 1.X to another version written from the scratch with a different API/core features... just NO! (And also, who will care about ogre in 3-5 years if the current branch is not continued?)

    Personally, I would have to create an Ogre fork and mantain/update it since switching to a new engine (maybe called ogre, but not ogre at all!) in 3-5 years is absolutely not viable to my project(s). And I'm sure I would not be the only one forced to do this.

    So please, before taking any decision, think about the current ogre user base and projects. Do not forget them.

    Xavier
    Creator of SkyX, Hydrax and Paradise Sandbox.
    Looking for Ogre3D consulting services?
    Follow me: @Xavyiy
    User avatar
    Xavyiy
    OGRE Expert User
    OGRE Expert User
     
    Posts: 830
    Kudos: 77
    Joined: 12 Apr 2005
    Location: Albacete - Spain
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by spacegaier » Tue Jan 01, 2013 12:49 pm

    TheSHEEEP wrote:Hmm.. would such a thing (pipeline tools) be an interesting GSoC project?

    This is OT, but I think we once planned to do so, but GSoC did not allow it (only work on the work core project), but that might have changed or I could be wrong. Back to topic now...

    I agree also with those who are against a rewrite. As xavyiy outlined quite nicely, that is just not manageable
    (in terms of efforts, time and resources) and also not preferable for the community/user base.
    Ogre Admin [DevTeamMember, PR, Finance, Wiki, etc.] | BasicOgreFramework | AdvancedOgreFramework
    Don't know what to do in your sparetime? Help the Ogre wiki grow!
    User avatar
    spacegaier
    OGRE Team Member
    OGRE Team Member
     
    Posts: 3771
    Kudos: 80
    Joined: 04 Feb 2008
    Location: Germany
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by drwbns » Tue Jan 01, 2013 4:57 pm

    I'd like to see something on the wiki roadmap saying what's started, finished, not yet started if any features are being worked on or added at all. I'm not sure if the team spends most their time fixing bugs but maybe a chart of everything but bug fixes would help. Or a forum dedicated to it if there isn't a private one already. Just something for us public to get an idea of where Ogre is headed. How many team members are there btw?
    Check out my site - https://sites.google.com/site/realtimecpu/
    drwbns
    Gnoll
     
    Posts: 651
    Kudos: 21
    Joined: 18 Jan 2010
    Location: Costa Mesa, California
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by Kojack » Tue Jan 01, 2013 5:36 pm

    How many team members are there btw?

    Dev team: pjcast, Noman, Praetor, Wolfmanfx, Assaf Raman, CABAListic, masterfalcon, Mattan Furst, spacegaier, TheSHEEEP, Nir Hasson, jbuck
    User avatar
    Kojack
    OGRE Moderator
    OGRE Moderator
     
    Posts: 5947
    Kudos: 348
    Joined: 25 Jan 2004
    Location: Brisbane, Australia
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by drwbns » Tue Jan 01, 2013 5:38 pm

    Ah ok. I was wondering, is there any reason why Ogre can't have a donation system for feature changes / additions? Or is that not really the question?
    Check out my site - https://sites.google.com/site/realtimecpu/
    drwbns
    Gnoll
     
    Posts: 651
    Kudos: 21
    Joined: 18 Jan 2010
    Location: Costa Mesa, California
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by spacegaier » Tue Jan 01, 2013 5:39 pm

    Kojack wrote:
    How many team members are there btw?

    Dev team: pjcast, Noman, Praetor, Wolfmanfx, Assaf Raman, CABAListic, masterfalcon, Mattan Furst, spacegaier, TheSHEEEP, Nir Hasson, jbuck

    Or for future easy reference: http://www.ogre3d.org/about/team
    Ogre Admin [DevTeamMember, PR, Finance, Wiki, etc.] | BasicOgreFramework | AdvancedOgreFramework
    Don't know what to do in your sparetime? Help the Ogre wiki grow!
    User avatar
    spacegaier
    OGRE Team Member
    OGRE Team Member
     
    Posts: 3771
    Kudos: 80
    Joined: 04 Feb 2008
    Location: Germany
    Top

    Re: Ogre 2.0 doc (slides) - Updated 1st dec 2012

    by PhilipLB » Wed Jan 09, 2013 9:16 pm

    Here is an interesting presentation about cache-misses and their optimization:
    http://harmful.cat-v.org/software/OO_pr ... CAP_09.pdf
    Google Summer of Code 2012 Student
    Topic: "Volume Rendering with LOD aimed at terrain"
    Project links: Project thread, WIKI page, Code fork for the project
    Mentor: Mattan Furst


    Volume GFX, accepting donations.
    PhilipLB
    Google Summer of Code Student
    Google Summer of Code Student
     
    Posts: 506
    Kudos: 96
    Joined: 04 Jun 2009
    Location: Berlin
    Top

    Re: Ogre 2.0 doc (slides)

    by sleo » Fri Jan 18, 2013 8:53 pm

    Awesome, if this will be implemented!

    Xavyiy wrote:2.0 -> Cache misses, DX11 & OGL4 RS
    2.1 -> Scene manager redesign: scene traversal & processing
    2.3 -> FF -> "states"
    2.4 -> Vertex format enhancements
    2.5 - 2.9 -> Fix bugs. Remaining stuff

    3.0 -> First stable version of the "new ogre"


    No, finish refactoring in 2.0 please :) The problems and almost the answers already in the slides and this thread, just need to implement.
    1.9.0 -> Scene manager redesign: scene traversal & processing
    1.9.1 -> FF -> "states"
    1.9.2 -> Vertex format enhancements
    1.9.3 - 1.9.9 -> Fix bugs. Remaining stuff

    drwbns wrote:Ah ok. I was wondering, is there any reason why Ogre can't have a donation system for feature changes / additions? Or is that not really the question?

    Yes, I think the problem only in attraction of investments. But dunno if donation system will help, need open investors :) Maybe something like RunicGames, KingArt, Dead Mage Inc., Deck13, etc. will have an ability to return 10% of their profit back to project. The procent depends on they return it as cash or improving code and of course eagerness, because it is just "can" not "must". There are a lot of open source projects that already have benefited from commerical organizations: Wayland Windowing System, Gallium3D, Qt, JQuery, etc.

    你可能感兴趣的:(C/C++,OGRE,引擎开发,图形引擎,游戏引擎)