首页 > 代码库 > SQL-一道特殊的字符串分解题目

SQL-一道特殊的字符串分解题目

本题不是一道直接的字符串拆解,

应用场景如下,表中有一个字段,是表示事件受影响的国家集合,使用逗号进行分隔,不幸的是,居然发现有些国家本身就带有逗号,这样在规范化的时候,如何准确地找到这些国家呢?

以下的代码是有一定限制的。但基本上够用。

下面的代码使用到了分析函数lag和lead还有cte,sqlserver2012及其以后的版本都支持,oracle好像10g以上就支持了。

主要思路:

字符串的分解,可以使用数字辅助表,然后cross join刷副本,然后根本分隔符出现的位置然后切豁字符串拆解到我们需要的东东。(解决方案中我使用的递归CTE来处理找到对应的位置)

现在还需要多加一步,就是对拆解的部分进行验证和去重不符合要求的那一部。

使用LAG和LEAD的好处,就是不需要再用自连接去找到对应的下一条数据了。

本题的解题原则是如何长项能连接到正确的国家,则取长项的,否则取短项的。

代码如下:

 --准备示例表与数据drop table my_countries;drop table valid_country;   create table my_countries(rid int,country_name_cc varchar(200));insert into my_countries(rid,country_name_cc) values(1,china,test, public of);insert into my_countries(rid,country_name_cc) values(2,us, public of,china,Evan, public of);   create table valid_country(cid int, country_name varchar(30));insert into valid_country(cid,country_name) values(1,china);insert into valid_country(cid,country_name) values(2,test, public of);insert into valid_country(cid,country_name) values(3,Evan, public of);insert into valid_country(cid,country_name) values(4,us, public of);insert into valid_country(cid,country_name) values(5,Evan);--select * from my_countries;--select * from valid_country;

 

正确的结果是:

WITH SPLIT_COUNTRY AS(SELECTRID,1 AS LVL,1 AS STARTPOS,CHARINDEX(,,COUNTRY_NAME_CC+,)-1 AS ENDPOSFROM MY_COUNTRIESUNION ALLSELECTSC.RID,LVL+1 AS LVL,ENDPOS+2,CHARINDEX(,,COUNTRY_NAME_CC+,,ENDPOS+2)-1FROMMY_COUNTRIES CC JOINSPLIT_COUNTRY SC ON CC.RID=SC.RIDWHERE CHARINDEX(,,CC.COUNTRY_NAME_CC+,,ENDPOS+2)>0),CTE_COUNTRY AS (SELECT RID,LVL,STARTPOS,ENDPOS,LEAD(ENDPOS,1) OVER(PARTITION BY RID ORDER BY LVL) AS NEXTENDPOS FROM SPLIT_COUNTRY),CTE AS (SELECT MC.RID,SC.LVL,CASE WHEN NEXTENDPOS IS NOT NULL AND EXISTS (SELECT * FROM VALID_COUNTRY VC WHERE VC.COUNTRY_NAME = SUBSTRING(COUNTRY_NAME_CC,STARTPOS,NEXTENDPOS-STARTPOS+1)) THENSUBSTRING(COUNTRY_NAME_CC,STARTPOS,NEXTENDPOS-STARTPOS+1)ELSESUBSTRING(MC.COUNTRY_NAME_CC,STARTPOS,ENDPOS-STARTPOS+1)ENDAS COUNTRYFROM MY_COUNTRIES MC JOIN CTE_COUNTRY SCON MC.RID=SC.RID),CHECK_VALID AS (SELECT CASE WHEN CHARINDEX(,,LAG(COUNTRY,1) OVER(PARTITION BY RID ORDER BY LVL))>0 THEN 0 ELSE 1 END AS ISVALID,* FROM CTE)SELECT CV.RID,CV.COUNTRY,VC.CID FROM CHECK_VALID CV JOIN VALID_COUNTRY VCON CV.COUNTRY = VC.COUNTRY_NAMEAND ISVALID=1 ORDER BY RID;

 

 另一种方案,在第一种的基础上稍加修改:

WITH SPLIT_COUNTRY AS(SELECTRID,1 AS LVL,1 AS STARTPOS,CHARINDEX(,,COUNTRY_NAME_CC+,)-1 AS ENDPOSFROM MY_COUNTRIESUNION ALLSELECTSC.RID,LVL+1 AS LVL,ENDPOS+2,CHARINDEX(,,COUNTRY_NAME_CC+,,ENDPOS+2)-1FROMMY_COUNTRIES CC JOINSPLIT_COUNTRY SC ON CC.RID=SC.RIDWHERE CHARINDEX(,,CC.COUNTRY_NAME_CC+,,ENDPOS+2)>0),CTE_COUNTRY AS (SELECT RID,LVL,STARTPOS,ENDPOS,LEAD(ENDPOS,1) OVER(PARTITION BY RID ORDER BY LVL) AS NEXTENDPOS FROM SPLIT_COUNTRY),CTE AS (SELECT MC.RID,SC.LVL,SUBSTRING(MC.COUNTRY_NAME_CC,STARTPOS,ENDPOS-STARTPOS+1) AS COUNTRY,SUBSTRING(COUNTRY_NAME_CC,STARTPOS,NEXTENDPOS-STARTPOS+1) AS COUNTRY2FROM MY_COUNTRIES MC JOIN CTE_COUNTRY SCON MC.RID=SC.RID)SELECT CTE.RID,VC.COUNTRY_NAME,VC.CIDFROMCTE JOIN VALID_COUNTRY VCON (CASE WHEN EXISTS(SELECT * FROM VALID_COUNTRY X WHERE X.COUNTRY_NAME=CTE.COUNTRY2) THEN CTE.COUNTRY2ELSE CTE.COUNTRY END) = VC.COUNTRY_NAME;

 

   

   

   

SQL-一道特殊的字符串分解题目