首页 > 代码库 > 实习日记:图像检索算法 LSH 的总结与分析(matlab)

实习日记:图像检索算法 LSH 的总结与分析(matlab)

最开始仿真和精度测试,基于 matlab 完成的。

Demo_MakeTable.m (生成 Hash 表)

%========================================%***********************************%******* 设定参数: *****************%*******  l : hash表个数    ********%*******  k : 各表关键字个数 *******%=========================================clear all; close all; clc;l = 3;k = 15;sData = http://www.mamicode.com/textread(‘./Data/data.txt‘, ‘%s‘);>

lsh_1norm.m (核心函数,我用其组织了整个索引结构生成过程)

function T2 = lsh_1norm(l, k, yy)%==================================% parameters setting && getting% [n d] = size(x);%==================================%************ 数据集预处理 : 转 hamming 空间(维度小于100时使用) ********%   fprintf(‘数据集转换到 Hamming space.\n‘);%   tic%   x = x‘; % n x d   (d < 100)%   C = max(x(:));%   dim = size(x);%   yy = false(dim(1), dim(2) * 255);%   for i = 1 : dim(1)%       for j = 1 : dim(2)%           oneO = false(1,C);%           oneO(1:x(i,j)) = 1;%           yy(i,(j-1)*C+1 : (j-1)*C+C) = oneO; %       end%   end %   clear oneO x;%   toc%   fprintf(‘转换 Hamming space 完成.\n‘);%==================================fprintf(‘初始化 %d 个 Hash 表...\n‘, l);%  matlabpool local 10%  可并行for i = 1 : l    % creat and init Tables[i] = f(k, x);    T1(i) = createTable(k, yy);end%  matlabpool closefprintf(‘初始化完成。\n‘);%=====================================tic;matlabpool local 10% save the index of feature data%  insert(T, x);for i = 1 : l    fprintf(‘数据插入第 %d 个hash表\n‘, i);    T2(i) = insert_data(T1(i), yy);endmatlabpool closetoc;% clc;%======================================

 createTable.m

function T = createTable(k, x)% M = size(x,1)+17; % length of second hashTable (hashTable2)M = 587474;d = size(x, 2);select_d = unidrnd(d, 1, k);I.d = select_d;% I.threshold = unifrnd(0, 1, 1, k) * 255; % value interval [0 255]I.k = k;T.I = I;T.randDigits = unidrnd(M, 1, k);T.buckets = [];T.index = {};T.hashTable2 = cell(M,1); 

 insert_data.m

function T = insert_data(T, x)%  M = size(x, 1) + 17;M = 587474;%  buck01 = x(T.I.d, :)‘ < repmat(T.I.threshold, size(x,2), 1); buck01 = x(:, T.I.d); [uBuck id1 id2] = unique(buck01,‘rows‘); T.buckets = logical(uBuck); T.bucket_cnt = length(id1); key = mod(sum(bsxfun(@times, uBuck, T.randDigits),2), M) + 1; % matalb 下标从 1 开始   T.index = cell(length(id1), 1); for bb = 1 : length(id1)     sameBucket = find(id2 == bb);     T.index{bb} = [T.index{bb}; sameBucket‘];      T.hashTable2{key(bb)} = [T.hashTable2{key(bb)} bb];  end  

Demo_computeError.m (测试精确度)

%  clear all; clc; p = 1;   load([‘..\Data\3_15_Tables.mat‘]); load(‘..\Data\data.mat‘); load(‘..\Data\query.mat‘); Data = http://www.mamicode.com/Data‘;>

 Linear_Search.m

function Index = Linear_Search(q, K, DataSet, p)D = feval(‘lp_norm‘, q, DataSet, p);[~, id] = sort(D);Index = id(1 : K);

 lp_norm.m  (此处使用了 Hamming distance, p = 2 时,可以调整为 l2 范式欧式距离)

%************   get the distance **********************function distance = lp_norm(x0, x, p)tem = repmat(x0, 1, size(x,2));distance = sum((abs(tem - x) .^ p), 1) .^ (1/p);

  lookup.m (LSH 查找)

function Index = lookup(T, q) % x can be removed%========================================================%************     参数解释:     ***********************  %************   T : 哈希表      ***********************%************   x : 总数据集     **********************%************   q: 查询%%========================================================%========================================================index = [];% 可并行for i = 1 : length(T)    tem = getIndex(T(i), q);    index = [index tem];endIndex = unique(index);

 getIndex.m

function tableiIndex = getIndex(T, x0)M = length(T.hashTable2);tableiIndex = [];seq01_x = x0(:, T.I.d); index_x = mod(sum(bsxfun(@times, seq01_x, T.randDigits),2), M) + 1; if ~isempty(T.hashTable2{index_x})     index_bucket = T.hashTable2{index_x};    %******************************************    %for i = 1 : length(index_bucket)    %******************************************    uni_index_bucket = index_bucket(find(all(bsxfun(@eq, seq01_x, T.buckets(index_bucket, :)), 2)));    for i = 1 : length(uni_index_bucket)        tableiIndex = [tableiIndex T.index{uni_index_bucket(i)}];    endend

 Linear_Search.m (线性查找)

function Index = Linear_Search(q, K, DataSet, p)D = feval(‘lp_norm‘, q, DataSet, p);[~, id] = sort(D);Index = id(1 : K);

  

               (横轴为 K-NN 中 K 的值,纵轴为准确度)

相关截图:

 

Algorithm proposed from Papers :   

(Indyk 1999) similarity search in hish dimensions via hashing.

(Indyk 2005) Locality-sensitive hashing scheme based on p-stable distributions.