OBIEE Coodinated Caching Strategy

阅读更多

Hello, 

I can only speak for the BI Server cache. To my knowledge, it is the only cache that can be flushed in a deterministic, piecemeal fashion. 

The most basic way to handle cache flushing is to set the Cache Persistence of all of your physical tables to an appropriate value. In the case of real-time or near-real-time data, you'd want to set it to a very small number, if not just zero. If your subject area is sitting on top of an OLTP/real-time database, then I'd just turn off caching for all physical tables that change in real-time. There is no need to mess with Event Polling or any other mechanism (see below) if the data is going to change every minute. There's no real benefit to caching in a real-time system, so just turn it off for those tables. 

If you have data that changes, say, every 15 or 30 minutes, then you need to carefully consider the cache behavior you want. For example, let's say your main fact source is updated every 15-30 minutes (the exact period is unknown, and fluctuates between those endpoints). Some questions to ask are: 

1) Is the model fast enough with caching completely disabled? If so, you can leave it disabled and be sure that every query will return the most-recent data. 

2) If not, what is the impact of a user seeing stale data? If the impact is low, then you can use a Cache Persistence setting of, say, 15 minutes. This ensures that the data seen by users will never be more than 15 minute stale. You can use a smaller number, but below, say, 10 minutes in my opinion you start running into the question of just how useful the cache is, if it's only available for 8 minutes? If I had 100s of users hitting the data almost constantly, then maybe an 8-minute-lifespan cache would be useful. If the usage is sporadic, then it might not make sense to retain cache for just one or two hits before being flushed. 

3) If the impact of users seeing stale data is high, then using Cache Persistence settings will not be sufficient and you'll have to look at either Event Polling or a triggered batchfile/cmdfile. 

Event Polling is well documented. The basic idea is that your ETL process inserts a row into a special table, which OBIEE is configured to ping every X minutes. When OBIEE detects a new row in that special table, it purges cache related to a phsycial database, or even just a specific table. There is still a small risk of stale data with this method, though your polling interval can be set to 1 minute to minimize the staleness. 

The triggered batch method uses the SAPurgeAllCache(), SAPurgeCacheByDatabase(), and/or SAPurgeCacheByTable() methods of the API. The idea here is that at the end of your ETL process, you run a batch file that contains a command similar to this: 

nqcmd -d [OBIEE dsn] -u [user] -p [password] -s [sql file] 

The [sql file] referenced above contains one or more commands like this: 

{Call SAPurgeCacheByDatabase ('Sales Transactions') }; 
{Call SAPurgeCacheByTable( 'Sales Transactions','','dbo','SalesFacts')}; 
{call SAPurgeAllCache()}; 

This method has the benefit of minimizing any chance of stale data (the cache is purged essentially immediately on the completion of your ETL), and it doesn't require OBIEE to ping an Event Polling table every x minutes. We recently converted all of our Event Polling-based cache operations to the batchfile method. 

One final note - if your OBIEE model is laid on top of a true OLTP system - that is, you don't have an ETL process, but instead users are directly entering data - then it could be difficult to set up either Event Polling or a triggered batchfile. Which user process would trigger these methods? You may have to rely on proper Cache Persistence settings and live with occasional stale data.

你可能感兴趣的:(Cache,SQL,Server,SQL,IDEA,UP)