首页 > 代码库 > 实习日记:图像检索算法 LSH 的总结与分析(matlab)
实习日记:图像检索算法 LSH 的总结与分析(matlab)
最开始仿真和精度测试,基于 matlab 完成的。
Demo_MakeTable.m (生成 Hash 表)
%========================================%***********************************%******* 设定参数: *****************%******* l : hash表个数 ********%******* k : 各表关键字个数 *******%=========================================clear all; close all; clc;l = 3;k = 15;sData = http://www.mamicode.com/textread(‘./Data/data.txt‘, ‘%s‘);>
lsh_1norm.m (核心函数,我用其组织了整个索引结构生成过程)
function T2 = lsh_1norm(l, k, yy)%==================================% parameters setting && getting% [n d] = size(x);%==================================%************ 数据集预处理 : 转 hamming 空间(维度小于100时使用) ********% fprintf(‘数据集转换到 Hamming space.\n‘);% tic% x = x‘; % n x d (d < 100)% C = max(x(:));% dim = size(x);% yy = false(dim(1), dim(2) * 255);% for i = 1 : dim(1)% for j = 1 : dim(2)% oneO = false(1,C);% oneO(1:x(i,j)) = 1;% yy(i,(j-1)*C+1 : (j-1)*C+C) = oneO; % end% end % clear oneO x;% toc% fprintf(‘转换 Hamming space 完成.\n‘);%==================================fprintf(‘初始化 %d 个 Hash 表...\n‘, l);% matlabpool local 10% 可并行for i = 1 : l % creat and init Tables[i] = f(k, x); T1(i) = createTable(k, yy);end% matlabpool closefprintf(‘初始化完成。\n‘);%=====================================tic;matlabpool local 10% save the index of feature data% insert(T, x);for i = 1 : l fprintf(‘数据插入第 %d 个hash表\n‘, i); T2(i) = insert_data(T1(i), yy);endmatlabpool closetoc;% clc;%======================================
createTable.m
function T = createTable(k, x)% M = size(x,1)+17; % length of second hashTable (hashTable2)M = 587474;d = size(x, 2);select_d = unidrnd(d, 1, k);I.d = select_d;% I.threshold = unifrnd(0, 1, 1, k) * 255; % value interval [0 255]I.k = k;T.I = I;T.randDigits = unidrnd(M, 1, k);T.buckets = [];T.index = {};T.hashTable2 = cell(M,1);
insert_data.m
function T = insert_data(T, x)% M = size(x, 1) + 17;M = 587474;% buck01 = x(T.I.d, :)‘ < repmat(T.I.threshold, size(x,2), 1); buck01 = x(:, T.I.d); [uBuck id1 id2] = unique(buck01,‘rows‘); T.buckets = logical(uBuck); T.bucket_cnt = length(id1); key = mod(sum(bsxfun(@times, uBuck, T.randDigits),2), M) + 1; % matalb 下标从 1 开始 T.index = cell(length(id1), 1); for bb = 1 : length(id1) sameBucket = find(id2 == bb); T.index{bb} = [T.index{bb}; sameBucket‘]; T.hashTable2{key(bb)} = [T.hashTable2{key(bb)} bb]; end
Demo_computeError.m (测试精确度)
% clear all; clc; p = 1; load([‘..\Data\3_15_Tables.mat‘]); load(‘..\Data\data.mat‘); load(‘..\Data\query.mat‘); Data = http://www.mamicode.com/Data‘;>
Linear_Search.m
function Index = Linear_Search(q, K, DataSet, p)D = feval(‘lp_norm‘, q, DataSet, p);[~, id] = sort(D);Index = id(1 : K);
lp_norm.m (此处使用了 Hamming distance, p = 2 时,可以调整为 l2 范式欧式距离)
%************ get the distance **********************function distance = lp_norm(x0, x, p)tem = repmat(x0, 1, size(x,2));distance = sum((abs(tem - x) .^ p), 1) .^ (1/p);
lookup.m (LSH 查找)
function Index = lookup(T, q) % x can be removed%========================================================%************ 参数解释: *********************** %************ T : 哈希表 ***********************%************ x : 总数据集 **********************%************ q: 查询%%========================================================%========================================================index = [];% 可并行for i = 1 : length(T) tem = getIndex(T(i), q); index = [index tem];endIndex = unique(index);
getIndex.m
function tableiIndex = getIndex(T, x0)M = length(T.hashTable2);tableiIndex = [];seq01_x = x0(:, T.I.d); index_x = mod(sum(bsxfun(@times, seq01_x, T.randDigits),2), M) + 1; if ~isempty(T.hashTable2{index_x}) index_bucket = T.hashTable2{index_x}; %****************************************** %for i = 1 : length(index_bucket) %****************************************** uni_index_bucket = index_bucket(find(all(bsxfun(@eq, seq01_x, T.buckets(index_bucket, :)), 2))); for i = 1 : length(uni_index_bucket) tableiIndex = [tableiIndex T.index{uni_index_bucket(i)}]; endend
Linear_Search.m (线性查找)
function Index = Linear_Search(q, K, DataSet, p)D = feval(‘lp_norm‘, q, DataSet, p);[~, id] = sort(D);Index = id(1 : K);
(横轴为 K-NN 中 K 的值,纵轴为准确度)
相关截图:
Algorithm proposed from Papers :
(Indyk 1999) similarity search in hish dimensions via hashing.
(Indyk 2005) Locality-sensitive hashing scheme based on p-stable distributions.
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。