因为项目里面有一个需求,需要对一个实体类的集合按不同字段做查询。传统的list hashmap等数据结构都不能很好的满足,遂在网上搜索一番,最后锁定CQEngine。
本文主要记录下CQEngine的基本概念和一些用法,并会附上小例子。
CQEngine
全名为Collection Query Engine
,看名字就知道是一个集合查询引擎。有了CQEngine,我们能使用SQL-like
语句高效率地查询 Java 集合。
具体来说,CQEngine拥有如下特点:
百万/秒
微秒
级传统上从容器中搜索Object的方法是采用遍历,十分低效。如果要优化就必须深入了解该容器的组成。
而CQEngine 性能优异,下面是CQEngine和 遍历 关于 range-type
查询的对比测试结果图:
CQEngine测试描述如下:
关于CQEngine Benchmark 测试更多内容请点击 CQEngine Benchmark
CQEngine通过为集合类内部的Object的fields
建立indexes
索引以及应用了依据为集合理论的规则的算法来减少搜索时间复杂度,在可扩展性和延时上胜过遍历。
具体来说,可以用CQEngine的index做以下优化:
Multiple indexes
可以被添加到同一个field上,针对不同类型查询(等值、范围等)做不同优化CQEngine的索引集合有三种支持不同并发和事务隔离的实现:
IndexedCollection
addIndex
方法可以添加查询使用的索引,提高查询效率add
/remove
)之间线程安全add
或remove
本collection中同一个元素对象(同一个实例或者是hashcode
相同且equals
方法返回true
)时,线程不安全。此时可能会因为索引不同步,从而导致不一致的结果。此时应该使用其子类ObjectLockingIndexedCollection
,可以使得写与写在任何时候都线程安全。ConcurrentIndexedCollection
striped lock
ConcurrentIndexedCollection
READ_COMMITTED
级别事务隔离)。特别是对多线程add/remove同一个对象设计了专门的锁机制保证线程安全MVCC
可以像下面这样通过一个Long值添加一个版本号:
static final AtomicLong VERSION_GENERATOR = new AtomicLong();
final long version = VERSION_GENERATOR.incrementAndGet();
@Override
public boolean equals(Object o) {
if (this == o) { return true; }
if (null == o || getClass() != o.getClass()) return false;
Car car = (Car) o;
if (carId != car.carId) { return false; }
if (this.version != car.version) return false;
return true;
}
关于事务隔离更多内容请点击TransactionIsolation
CQEngine 需要访问Object中的field来添加index和检索值,但并非是通过反射而是通过一种叫attributes
的概念。
attribute是一个访问者对象,可以读取POJO中的field值。
下面是一个读取carId
的CAR_ID Attribute
例子:
public static final Attribute<Car, Integer> CAR_ID = new SimpleAttribute<Car, Integer>("carId") {
public Integer getValue(Car car, QueryOptions queryOptions) { return car.carId; }
};
另一种通过lambda表达式的例子:
public static final Attribute<Car, String> FEATURES = com.googlecode.cqengine.query.QueryFactory.attribute(String.class, "features", Car::getFeatures);
如果数据包含null
,那应该使用 SimpleNullableAttribute
或 MultiValueNullableAttribute
,而不是SimpleAttribute
,MultiValueAttribute
,否则可能抛出NullPointerException
数据持久化默认在堆内。
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>(OffHeapPersistence.onPrimaryKey(Car.CAR_ID));
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>(DiskPersistence.onPrimaryKey(Car.CAR_ID));
存在指定路径的文件中:
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>(DiskPersistence.onPrimaryKeyInFile(Car.CAR_ID, new File("cars.dat")));
索引也有on-heap, off-heap, on disk 三种持久化方式,而且可以和数据持久化方式不同,甚至可以一个集合类不同索引使用不同持久化方式。
cars.addIndex(NavigableIndex.onAttribute(Car.MANUFACTURER));
cars.addIndex(OffHeapIndex.onAttribute(Car.MANUFACTURER));
cars.addIndex(DiskIndex.onAttribute(Car.MANUFACTURER));
注意使用持久化数据的ResultSet的时候使用try catch 防止异常抛出。
IndexedCollection是一个接口,继承自java.util.Set
。IndexedCollection还提供了两个额外的方法:
ResultSet
,可用iterator或Java 8中的Stream进行遍历Stream
,就可以愉快的用lambda表达式进行操作了IndexedCollection
的对象类型、添加的index类型。IndexedCollection
或index,且将他们持久化到off-heap
或disk
时,就必须在最后close ResultSet。可以使用以下方式关闭:try (ResultSet results = cars.retrieve(equal(Car.MANUFACTURER, "Ford"))) {
results.forEach(System.out::println);
}
因为CQEngine整合了Java8+的 Stream API
,所以本身的CQEngine API不支持分组和聚合,而是用lambda表达式来实现。
ResultSet可以使用ResultSet.stream()来转化为 Java Stream。但为了查询最佳性能,请尽可能多的使用CQEngine支持的语句,非必要不要使用Java Stream。
下面是一个将ResultSet转换为Stream的例子:
public static void main(String[] args) {
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<>();
cars.addAll(CarFactory.createCollectionOfCars(10));
cars.addIndex(NavigableIndex.onAttribute(Car.MANUFACTURER));
Set<Car.Color> distinctColorsOfFordCars = cars.retrieve(equal(Car.MANUFACTURER, "Ford"))
.stream()
.map(Car::getColor)
.collect(Collectors.toSet());
System.out.println(distinctColorsOfFordCars); // prints: [GREEN, RED]
}
在5.3节还会展示一个按指定字段分组(group by)的聚合统计例子。
这里写一些常用的堆内索引 API,实现展示一下Car.java:
import com.googlecode.cqengine.attribute.Attribute;
import com.googlecode.cqengine.attribute.MultiValueAttribute;
import com.googlecode.cqengine.attribute.SimpleAttribute;
import com.googlecode.cqengine.query.option.QueryOptions;
import java.util.List;
/**
* @author Niall Gallagher
*/
public class Car {
public final int carId;
public final String name;
public final String description;
public final List<String> features;
public Car(int carId, String name, String description, List<String> features) {
this.carId = carId;
this.name = name;
this.description = description;
this.features = features;
}
@Override
public String toString() {
return "Car{carId=" + carId + ", name='" + name + "', description='" + description + "', features=" + features + "}";
}
// -------------------------- Attributes --------------------------
public static final Attribute<Car, Integer> CAR_ID = new SimpleAttribute<Car, Integer>("carId") {
public Integer getValue(Car car, QueryOptions queryOptions) { return car.carId; }
};
public static final Attribute<Car, String> NAME = new SimpleAttribute<Car, String>("name") {
public String getValue(Car car, QueryOptions queryOptions) { return car.name; }
};
public static final Attribute<Car, String> DESCRIPTION = new SimpleAttribute<Car, String>("description") {
public String getValue(Car car, QueryOptions queryOptions) { return car.description; }
};
public static final Attribute<Car, String> FEATURES = new MultiValueAttribute<Car, String>("features") {
public List<String> getValues(Car car, QueryOptions queryOptions) { return car.features; }
};
}
前面已经提到过,是在Java 堆内持久化的索引。支持以下查询类型:
例子:
cars.addIndex(NavigableIndex.onAttribute(Car.CAR_ID));
System.out.println("Cars whose id is less than 2:");
Query<Car> query1 = lessThan(Car.CAR_ID, 2);
由ConcurrentReversedRadixTree
支持实现的索引,支持以下查询类型:
cars.addIndex(ReversedRadixTreeIndex.onAttribute(Car.NAME));
System.out.println("Cars whose name ends with 'vic'");
Query<Car> query1 = endsWith(Car.NAME, "vic");
由ConcurrentReversedRadixTree
支持实现的索引,支持以下查询类型:
cars.addIndex(SuffixTreeIndex.onAttribute(Car.DESCRIPTION));
System.out.println("Cars whose description contains 'flat tyre can'");
Query<Car> query2 = contains(Car.DESCRIPTION, "flat tyre");
由ConcurrentReversedRadixTree
支持实现的索引,支持以下查询类型:
System.out.println("\nCars which have a sunroof or a radio: ");
Query<Car> query3 = in(Car.FEATURES, "sunroof", "radio");
示例一是一个简单例子,使用ConcurrentIndexedCollection
,通过不同条件查询Car。
import static com.googlecode.cqengine.query.QueryFactory.*;
/**
* An introductory example which demonstrates usage using a Car analogy.
*
* @author Niall Gallagher
*/
public class Introduction {
public static void main(String[] args) {
// 创建一个索引集合
// 也可以通过CQEngine.copyFrom()从已存在的集合来创建
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
// 添加一些索引
cars.addIndex(NavigableIndex.onAttribute(Car.CAR_ID));
cars.addIndex(ReversedRadixTreeIndex.onAttribute(Car.NAME));
cars.addIndex(SuffixTreeIndex.onAttribute(Car.DESCRIPTION));
cars.addIndex(HashIndex.onAttribute(Car.FEATURES));
// Add some objects to the collection...
// 添加对象到集合中
cars.add(new Car(1, "ford focus", "great condition, low mileage", Arrays.asList("spare tyre", "sunroof")));
cars.add(new Car(2, "ford taurus", "dirty and unreliable, flat tyre", Arrays.asList("spare tyre", "radio")));
cars.add(new Car(3, "honda civic", "has a flat tyre and high mileage", Arrays.asList("radio")));
// -------------------------- 查询 --------------------------
System.out.println("Cars whose name ends with 'vic' or whose id is less than 2:");
Query<Car> query1 = or(endsWith(Car.NAME, "vic"), lessThan(Car.CAR_ID, 2));
cars.retrieve(query1).forEach(System.out::println);
System.out.println("\nCars whose flat tyre can be replaced:");
Query<Car> query2 = and(contains(Car.DESCRIPTION, "flat tyre"), equal(Car.FEATURES, "spare tyre"));
cars.retrieve(query2).forEach(System.out::println);
System.out.println("\nCars which have a sunroof or a radio but are not dirty:");
Query<Car> query3 = and(in(Car.FEATURES, "sunroof", "radio"), not(contains(Car.DESCRIPTION, "dirty")));
cars.retrieve(query3).forEach(System.out::println);
}
}
CQEngine支持SQL和CQN(CQEngine 语言)的查询格式。
SQL例子:
SQLParser<Car> parser = SQLParser.forPojoWithAttributes(Car.class, createAttributes(Car.class));
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
cars.addAll(CarFactory.createCollectionOfCars(10));
ResultSet<Car> results = parser.retrieve(cars, "SELECT * FROM cars WHERE (" +
"(manufacturer = 'Ford' OR manufacturer = 'Honda') " +
"AND price <= 5000.0 " +
"AND color NOT IN ('GREEN', 'WHITE')) " +
"ORDER BY manufacturer DESC, price ASC");
results.forEach(System.out::println); // Prints: Honda Accord, Ford Fusion, Ford Focus
CQN例子:
CQNParser<Car> parser = CQNParser.forPojoWithAttributes(Car.class, createAttributes(Car.class));
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
cars.addAll(CarFactory.createCollectionOfCars(10));
ResultSet<Car> results = parser.retrieve(cars,
"and(" +
"or(equal(\"manufacturer\", \"Ford\"), equal(\"manufacturer\", \"Honda\")), " +
"lessThanOrEqualTo(\"price\", 5000.0), " +
"not(in(\"color\", GREEN, WHITE))" +
")");
results.forEach(System.out::println); // Prints: Ford Focus, Ford Fusion, Honda Accord
CQEngine没有直接实现count(*) … group by x 语法,但是官方推荐是用Java8+的stream API来实现。
下面是一个按汽车颜色来分组并分别求数量的完整例子:
import com.googlecode.cqengine.ConcurrentIndexedCollection;
import com.googlecode.cqengine.IndexedCollection;
import com.googlecode.cqengine.query.parser.sql.SQLParser;
import com.googlecode.cqengine.resultset.ResultSet;
import demos.cqengine.testutils.Car;
import demos.cqengine.testutils.CarFactory;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import static com.googlecode.cqengine.codegen.AttributeBytecodeGenerator.createAttributes;
/**
* Created by chengc on 2018/11/6.
*/
public class SQLAggregationDemo
{
public static void main(String[] args) {
SQLParser<Car> parser = SQLParser.forPojoWithAttributes(Car.class, createAttributes(Car.class));
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
cars.addAll(CarFactory.createCollectionOfCars(10));
ResultSet<Car> results = parser.retrieve(cars, "SELECT * FROM cars");
Map<String,List<Car>> carResultMap = results.stream().collect(Collectors.groupingBy(t -> t.getColor().toString()));
Iterator iterator = carResultMap.entrySet().iterator();
while (iterator.hasNext()){
Map.Entry<String, List<Car>> entry = (Map.Entry) iterator.next();
System.out.println("Car.color=" + entry.getKey()+", count=" + entry.getValue().size());
}
}
}
import java.util.List;
/**
* @author Niall Gallagher
*/
public class Car {
public enum Color {RED, GREEN, BLUE, BLACK, WHITE}
final int carId;
final String manufacturer;
final String model;
final Color color;
final int doors;
final double price;
final List<String> features;
public Car(int carId, String manufacturer, String model, Color color, int doors, double price, List<String> features) {
this.carId = carId;
this.manufacturer = manufacturer;
this.model = model;
this.color = color;
this.doors = doors;
this.price = price;
this.features = features;
}
public int getCarId() {
return carId;
}
public String getManufacturer() {
return manufacturer;
}
public String getModel() {
return model;
}
public Color getColor() {
return color;
}
public int getDoors() {
return doors;
}
public double getPrice() {
return price;
}
public List<String> getFeatures() {
return features;
}
@Override
public String toString() {
return "Car{" +
"carId=" + carId +
", manufacturer='" + manufacturer + '\'' +
", model='" + model + '\'' +
", color=" + color +
", doors=" + doors +
", price=" + price +
", features=" + features +
'}';
}
@Override
public boolean equals(Object o) {
if (this == o) { return true; }
if (!(o instanceof Car)) { return false; }
Car car = (Car) o;
if (carId != car.carId) { return false; }
return true;
}
@Override
public int hashCode() {
return carId;
}
}
import com.googlecode.concurrenttrees.common.LazyIterator;
import java.util.*;
import java.util.concurrent.atomic.AtomicInteger;
import static java.util.Arrays.asList;
/**
* @author Niall Gallagher
*/
public class CarFactory {
public static Set<Car> createCollectionOfCars(int numCars) {
Set<Car> cars = new LinkedHashSet<Car>(numCars);
for (int carId = 0; carId < numCars; carId++) {
cars.add(createCar(carId));
}
return cars;
}
public static Iterable<Car> createIterableOfCars(final int numCars) {
final AtomicInteger count = new AtomicInteger();
return new Iterable<Car>() {
@Override
public Iterator<Car> iterator() {
return new LazyIterator<Car>() {
@Override
protected Car computeNext() {
int carId = count.getAndIncrement();
return carId < numCars ? createCar(carId) : endOfData();
}
};
}
};
}
public static Car createCar(int carId) {
switch (carId % 10) {
case 0: return new Car(carId, "Ford", "Focus", Car.Color.RED, 5, 5000.00, noFeatures());
case 1: return new Car(carId, "Ford", "Fusion", Car.Color.RED, 4, 3999.99, asList("hybrid"));
case 2: return new Car(carId, "Ford", "Taurus", Car.Color.GREEN, 4, 6000.00, asList("grade a"));
case 3: return new Car(carId, "Honda", "Civic", Car.Color.WHITE, 5, 4000.00, asList("grade b"));
case 4: return new Car(carId, "Honda", "Accord", Car.Color.BLACK, 5, 3000.00, asList("grade c"));
case 5: return new Car(carId, "Honda", "Insight", Car.Color.GREEN, 3, 5000.00, noFeatures());
case 6: return new Car(carId, "Toyota", "Avensis", Car.Color.GREEN, 5, 5999.95, noFeatures());
case 7: return new Car(carId, "Toyota", "Prius", Car.Color.BLUE, 3, 8500.00, asList("sunroof", "hybrid"));
case 8: return new Car(carId, "Toyota", "Hilux", Car.Color.RED, 5, 7800.55, noFeatures());
case 9: return new Car(carId, "BMW", "M6", Car.Color.BLUE, 2, 9000.23, asList("coupe"));
default: throw new IllegalStateException();
}
}
static List<String> noFeatures() {
return Collections.<String>emptyList();
}
}
Car.color=RED, count=3
Car.color=WHITE, count=1
Car.color=BLUE, count=2
Car.color=BLACK, count=1
Car.color=GREEN, count=3
在5.2的官方例子中没有使用索引,会发现getRetrievalCost()
和getMergeCost()
都返回Integer.MAX_VALUE。
这里讲一个加index的例子:
import com.googlecode.cqengine.ConcurrentIndexedCollection;
import com.googlecode.cqengine.IndexedCollection;
import com.googlecode.cqengine.attribute.Attribute;
import com.googlecode.cqengine.index.hash.HashIndex;
import com.googlecode.cqengine.index.navigable.NavigableIndex;
import com.googlecode.cqengine.query.parser.sql.SQLParser;
import com.googlecode.cqengine.resultset.ResultSet;
import demos.cqengine.testutils.Car;
import demos.cqengine.testutils.CarFactory;
import java.util.Map;
import static com.googlecode.cqengine.codegen.AttributeBytecodeGenerator.createAttributes;
/**
* Created by chengc on 2018/11/6.
*/
public class SQLQueryDemo {
public static void main(String[] args) {
Map<String, ? extends Attribute<Car, ?>> attributesMap = createAttributes(Car.class);
SQLParser<Car> parser = SQLParser.forPojoWithAttributes(Car.class, attributesMap);
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
cars.addIndex(HashIndex.onAttribute((Attribute<Car, String>) attributesMap.get("manufacturer")));
cars.addIndex(NavigableIndex.onAttribute((Attribute<Car, String>) attributesMap.get("price")));
cars.addIndex(HashIndex.onAttribute((Attribute<Car, String>) attributesMap.get("color")));
cars.addAll(CarFactory.createCollectionOfCars(10));
ResultSet<Car> results = parser.retrieve(cars, "SELECT * FROM cars WHERE (" +
"(manufacturer = 'Ford' OR manufacturer = 'Honda') " +
"AND price <= 5000.0 " +
"AND color NOT IN ('GREEN', 'WHITE')) " +
"ORDER BY manufacturer DESC, price ASC");
System.out.println("results.getRetrievalCost()=" + results.getRetrievalCost());
System.out.println("results.getMergeCost()=" + results.getMergeCost());
results.forEach(System.out::println); // Prints: Honda Accord, Ford Fusion, Ford Focus
}
}
最后关于CQEngine 查询开销的输出如下:
results.getRetrievalCost()=40
results.getMergeCost()=25
CQEngine - Collection Query Engine