4.1.4 In Memory Format 内存存储模型
IMap 拥有可配置的内存存储格式.缺省的Hazelcast存储数据时,会将二进制序列化后的结果放入内存中存储起来.但有时,它也会将他们对象的键值以对象的形式进行有效率的存储,尤其是在本地数据处理比如说在查询或者键值对处理的时候.设置map在内存中的配置您可以决定数据具体以什么样的方式储存在内存当中,下面是可供选择的配置:
-
BINARY
(default): This is the default option. The data will be stored in serialized binary format. You can use this option if you mostly perform regular map operations like put and get. - 二进制:缺省的配置,数据将存储为序列化后的二进制格式,如果你经常使用map的常规方法比如说get或者put方法的时候,选择这个比较好.
-
OBJECT
: The data will be stored in deserialized form. This configuration is good for maps where entry processing and queries form the majority of all operations and the objects are complex ones, so serialization cost is respectively high. By storing objects, entry processing will not contain the deserialization cost. - 对象:数据将会存储为反序列化后的格式.这个配置适用于对于一个负责的map键值对进行大量的数据处理或者查询的时候,推荐使用这个,因为序列化会占用大量的资源.使用这种方式存储对象,将不会花费多余的时间或资源在反序列化上面.
常规操作像get方法.它依赖于对象实例.当使用对象模式存储时,调用get方法,map并不会返回存储的实例,而是创建一份这个map的拷贝.因此当使用get操作时,它将会(在本节点拥有的实例上进行操作)首先进行序列化,然后进行反序列化操作(在本节点上调用实例).但是当二进制模式使用时,只有反序列化操作,这种模式下速度回比较快.
类似的,像put方法使用二进制存储时速度会比较快.如果用对象方式存储的话,map会创建一份该对象克隆的实例.因此首先会序列化然后会反序列化.如果使用二进制存储方式的话,只需要反序列化就可以了.
NOTE: If a value is stored in OBJECT
format, a change on a returned value does not effect the stored instance. In this case, the returned instance is not the actual one but a clone. Therefore, changes made on an object after it is returned will not reflect on the actual stored data. Similarly, when a value is written to a map and the value is stored in OBJECT
format, it will be a copy of the put value. So changes made on the object after it is stored, will not reflect on the actual stored data.
4.1.5 Map Persistence 字典持久化
Hazelcast允许你从一个持久化存储仓库(比如说一个关系型数据库)加载或存储分布式的map.因此如果需要你可以使用MapStore或者MapLoader接口来实现.
当你需要使用MapLoader的时候,如果一个键值对(IMap.get())并不存在于内存中,可使用MapLoader的load或者loadall方法将键值对由数据库加载过来.加载后的键值对将会放入map中(将会一直放在map中直到你将它移除).
当一个MapStore被实现时,键值对将会自动放入数据库中.
NOTE: Data store needs to be a centralized system that is accessible from all Hazelcast Nodes. Persisting to local file system is not supported.
接下来进一个例子:
public class PersonMapStore implements MapStore{ private final Connection con; public PersonMapStore() { try { con = DriverManager.getConnection("jdbc:hsqldb:mydatabase", "SA", ""); con.createStatement().executeUpdate( "create table if not exists person (id bigint, name varchar(45))"); } catch (SQLException e) { throw new RuntimeException(e); } } public synchronized void delete(Long key) { System.out.println("Delete:" + key); try { con.createStatement().executeUpdate( format("delete from person where id = %s", key)); } catch (SQLException e) { throw new RuntimeException(e); } } public synchronized void store(Long key, Person value) { try { con.createStatement().executeUpdate( format("insert into person values(%s,'%s')", key, value.name)); } catch (SQLException e) { throw new RuntimeException(e); } } public synchronized void storeAll(Map map) { for (Map.Entry entry : map.entrySet()) store(entry.getKey(), entry.getValue()); } public synchronized void deleteAll(Collection keys) { for (Long key : keys) delete(key); } public synchronized Person load(Long key) { try { ResultSet resultSet = con.createStatement().executeQuery( format("select name from person where id =%s", key)); try { if (!resultSet.next()) return null; String name = resultSet.getString(1); return new Person(name); } finally { resultSet.close(); } } catch (SQLException e) { throw new RuntimeException(e); } } public synchronized Map loadAll(Collection keys) { Map result = new HashMap (); for (Long key : keys) result.put(key, load(key)); return result; } public Set loadAllKeys() { return null; } }
NOTE: Loading process is performed on a thread different than the partition threads using ExecutorService.
RELATED INFORMATION
For more MapStore/MapLoader code samples please see here.
Hazelcast supports read-through, write-through and write-behind persistence modes which are explained in below subsections.
接下来说下缓存机制:(不太了解的朋友可以参考网上的例子:http://www.jdon.com/repository/cache.html)
Read-Through
如果当一个应用请求键值对时,如果它不存在于内存当中,Hazelcast将会要求你的loader从数据库中读取所需要的键值对.如果键值对存在,loader将会直接得到它并把它交给Hazelcast,Hazelcast将会把它放入内存,这就是read-through持久化模式.
Write-Through
MapStore的write-delay-secondes如果该属性设置为0,那么MapStore将会被配置为write-through模式.在这种模式下,将会同步存储键值对至数据库.
当调用map.put(key,value)返回时,可以确定如下几件事:
-
MapStore.store(key,value)
is successfully called so the entry is persisted. - MapStore.store(key,value)成功调用,数据已被持久化
-
In-Memory entry is updated
- 内存中的键值对已被更新
-
In-Memory backup copies are successfully created on other JVMs (if
backup-count
is greater than 0) - 如果backup-count大于0那么另一台java虚拟机中的内存备份已经完成
相似的操作就不一一列举了,比如说map.remove(key),唯一不同的一点就是当它被删除的时候,会调用MapStore.delete(key).
如果MapStore抛出了一个异常.那么这个异常将会返回最初调用put或者remove方法的地方并抛出运行时异常.
Write-Behind
MapStore的write-delay-seconds如果设置为大于0的值那么将会意味着更新后的键值对将会在设置好的延迟秒数后异步的放入数据库中.
NOTE: In write-behind mode, by default Hazelcast coalesces updates on a specific key, i.e. applies only the last update on it. But, you can set MapStoreConfig#setWriteCoalescing
to FALSE
and you can store all updates performed on a key to the data store.
NOTE: When you set MapStoreConfig#setWriteCoalescing
to FALSE
, after you reached per-node max write-behind-queue capacity, subsequent put operations will fail with ReachedMaxSizeException
. This exception will be thrown to prevent uncontrolled growing of write-behind queues. You can set per node max capacity with GroupProperty#MAP_WRITE_BEHIND_QUEUE_CAPACITY
在这个模式下 如果map.put(key,value)调用并且返回,那么可以确定如下几件事:
-
In-Memory entry is updated
-
内存中的数据已被更新
-
In-Memory backup copies are successfully created on other JVMs (if
backup-count
is greater than 0) - 如果backup-count大于0那么另一台java虚拟机中的内存备份已经完成
-
The entry is marked as dirty so that after
write-delay-seconds
, it can be persisted withMapStore.store(key,value)
call. - 在write-delay-seconds设置的时间过后,该键值对将会被标记为脏数据.将会调用MapStore.store(key,value)进行持久化.
像map.remove(key)的行为一样,唯一不同的一点就是当它被删除的时候,会调用MapStore.delete(key).
如果MapStore抛出异常那么Hazelcast将会重试保存操作.如果仍然未被保存,那么将会打印一条log并且将其放入重新等待队列中.
更新写选项可允许Hazelcast将使用MapStore.storeAll(map)
, and MapStore.deleteAll(collection)进行所有的写操作(在一次调用中完成).
NOTE: If a map entry is marked as dirty, i.e. it is waiting to be persisted to the MapStore
in a write-behind scenario, the eviction process forces the entry to be stored. By this way, you will have control on the number of entries waiting to be stored, so that a possible OutOfMemory exception can be prevented.
NOTE: MapStore or MapLoader implementations should not use Hazelcast Map/Queue/MultiMap/List/Set operations. Your implementation should only work with your data store. Otherwise, you may get into deadlock situations.
下面进一个例子:
...
MapStoreFactory and MapLoaderLifecycleSupport Interfaces
众所周知的,可通过通配符将一个设置应用到多个map当中(Please see Using Wildcard),意味着配置会共享给多个map.但是MapStore并知道当一个配置在多个map中生效时键值对是怎样存储的.为了克服这个,Hazelcast提供了MapStoreFactory接口.
使用这个工厂时,通过使用通配符的方式MapStores将会为每个map创建时添加该配置选项.
Config config = new Config(); MapConfig mapConfig = config.getMapConfig( "*" ); MapStoreConfig mapStoreConfig = mapConfig.getMapStoreConfig(); mapStoreConfig.setFactoryImplementation( new MapStoreFactory
除此之外,如果配置继承了MapLoaderLifecycleSupport,那么用户就可以控制初始化MapLoader的参数如通过给定map的名字,配置选项以及Hazelcast的实例.接下来看个例子:
public interface MapLoaderLifecycleSupport { /** * Initializes this MapLoader implementation. Hazelcast will call * this method when the map is first used on the * HazelcastInstance. Implementation can * initialize required resources for the implementing * mapLoader such as reading a config file and/or creating * database connection. */ void init( HazelcastInstance hazelcastInstance, Properties properties, String mapName ); /** * Hazelcast will call this method before shutting down. * This method can be overridden to cleanup the resources * held by this map loader implementation, such as closing the * database connections etc. */ void destroy(); }
Initialization on startup
MapLoader.loadAllKeys API 当map第一次touched/used的时候通常预读取内存中的map.如果MapLoader.loadAllKeys会返回空,那么意味着无任何值被加载.MapLoader.loadAllKeys实现接口则会返回所有键,否则只返回一些keys.你可以选择对于一个实例,只返回Hot keys.你也可以使用最快捷的一种方式,预读取的map,Hazelcast将优先加载每一个节点自身的键值对.
除吃之外,还有一种InitialLoadMode设置参数存在于MapStoreConfig当中.这个参数拥有两个值:LAZY和EAGER.如果InitialLoadMode设置为LAZY那么当map被创建的时候数据将不会被加载.如果InitialLoadMode设置为EAGER时,所有的数据将在map被创建并且已变为可以使用状态时会被加载.如果你使用MapIndexConfig类或者addIndex方法将会为你的map加载指数.假如MapStoreConfig被设置为EAGER那么InitialLoadMode将会被重写.
下面是MapLoader的初始化:
- When
getMap()
is first called from any node, initialization will start depending on the value of InitialLoadMode. If it is set as EAGER, initialization starts. If it is set as LAZY, initialization actually does not start but data is loaded at each time a partition loading is completed. - Hazelcast will call
MapLoader.loadAllKeys()
to get all your keys on each node - Each node will figure out the list of keys it owns
- Each node will load all its owned keys by calling
MapLoader.loadAll(keys)
- Each node puts its owned entries into the map by calling
IMap.putTransient(key,value)
NOTE: If the load mode is LAZY and when clear()
method is called (which triggersMapStore.deleteAll()
), Hazelcast will remove ONLY the loaded entries from your map and datastore. Since the whole data is not loaded for this case (LAZY mode), please note that there may be still entries in your datastore.
Forcing All Keys To Be Loaded
loadAll方法用作像数据库加载一些或所有的Keys.目的是提供多种加载操作方式.这个方法有两个签名(i.e.相同方法有两个不同的参数列表).一个是加载给定key和剩余所有的keys.进一个例子:
public class LoadAll { public static void main(String[] args) { final int numberOfEntriesToAdd = 1000; final String mapName = LoadAll.class.getCanonicalName(); final Config config = createNewConfig(mapName); final HazelcastInstance node = Hazelcast.newHazelcastInstance(config); final IMapmap = node.getMap(mapName); populateMap(map, numberOfEntriesToAdd); System.out.printf("# Map store has %d elements\n", numberOfEntriesToAdd); map.evictAll(); System.out.printf("# After evictAll map size\t: %d\n", map.size()); map.loadAll(true); System.out.printf("# After loadAll map size\t: %d\n", map.size()); } }
Post Processing Map Store
在有些情况下,你可能需要在更新对象后再存入数据库.举个例子,你可以在你需要存储你的分布式Map里的更新后的对象时,依靠数据库自动获取ID或者是版本号,而不是打破数据库间的同步操作或数据结构.你可以通过使用PostProcessingMapStore接口的方式实现上述操作(更新后的对象将会放入分布式的map中).但会引发一个额外序列化步骤,所以仅当需要的时候再用这个方法(这个解释仅对使用wirte-through的map有效).
下面看个例子:
class ProcessingStore extends MapStoreimplements PostProcessingMapStore { @Override public void store( Integer key, Employee employee ) { EmployeeId id = saveEmployee(); employee.setId( id.getId() ); } }
4.1.6 Near Cache
在Hazelcast中map键值对将会分布在各个Cluster中.想想一下你需要读取一个键为k的值,会花费很多时间,因为如果k是其在cluster中另一个成员本身存储的key.那么对于每个map.get(k)将会进行一个远程操作,这意味着会花费大量网络消耗.如果你的map是只读的那么你可以考虑为这个map创建Near Cache的方式.通过这种方式会降低网络消耗并且提升客观的访问速度.当然好处不白来.当你应用near cache的时候,你必须考虑以下问题:
-
JVM will have to hold extra cached data so it will increase the memory consumption.
- java虚拟机将会hold住额外的缓存,以至于它会增加内存的消耗
-
If invalidation is turned on and entries are updated frequently, then invalidations will be costly.
- 如果校验开启并且键值对频繁更新,那么校验会十分消耗资源
-
Near cache breaks the strong consistency guarantees; you might be reading stale data.
- Near cache 将会打破强一致性原则;你没准读取到的数据不是最新数据
再次重申一下,如果对一个map进行大量读操作,那么Near Cache是一个不错的选择.下面是一个map应用Near Cache的配置例子:
...
NOTE: Programmatically, near cache configuration is done by using the class NearCacheConfig. And this class is used both in nodes and clients. To create a near cache in a client (native Java client), use the methodaddNearCacheConfig
in the class ClientConfig
(please see Java Client section). Please note that near cache configuration is specific to the node or client itself, a map in a node may not have near cache configured while the same map in a client may have.
后续章节敬请关注.
关于翻译的一点说明:仅作为学习交流之用.如有错误,请大家指出,谢谢!
---------------------------------------------------------------------------------------------------------------------------------------------
原文地址:http://docs.hazelcast.org/docs/3.3/manual/html-single/hazelcast-documentation.html#map-persistence
http://docs.hazelcast.org/docs/3.3/manual/html-single/hazelcast-documentation.html#near-cache