PostgreSQL数据库数组相似度计算

场景:数据库中有一个字段是数组,要计算数组的相似度。

建表语句如下:

create table cataract_wt (name text NOT NULL, content float8[] NOT NULL, label float NOT NULL);

其中字段content 就是一个float类型的数组,可以赋予一维或二维。

1.Cube组件

参考:https://zejn.net/b/2016/06/10/postgresql-tutorial-color-similarity-search/

cube是内带,直接启用。

CREATE EXTENSION cube;

使用cube计算相似度如下:

SELECT name, smlr FROM ( SELECT name, cube_distance(cube(content),cube((SELECT content FROM cataract_wt WHERE name = 'C020_20180514_100234_R_CASIA2_LGC_002.jpg'))) AS smlr FROM cataract_wt WHERE name <> 'C020_20180514_100234_R_CASIA2_LGC_002.jpg') x ORDER BY x.smlr ASC LIMIT 10;

但cube元素过多不能计算,会提示:A cube cannot have more than 100 dimensions.

 

2.smlar组件

参考:https://github.com/jirutka/smlar

部署过程:

1.git clone git://sigaev.ru/smlar
2.make USE_PGXS=1
3.make USE_PGXS=1 install
4.CREATE EXTENSION smlar;

使用:

SELECT name, smlr FROM ( SELECT name, smlar(content,(SELECT content FROM cataract_wt WHERE name = 'c0100_20181102_111708_R_CASIA2_LGC_002.jpg')) AS smlr FROM cataract_wt WHERE name <> 'c0100_20181102_111708_R_CASIA2_LGC_002.jpg') x ORDER BY x.smlr ASC LIMIT 10;

切记:赋予一维,二维的不行。可将多维通过numpy压到一维。

 

你可能感兴趣的:(python专栏)