首页 > 代码库 > Salesforce 数据清洗

Salesforce 数据清洗

新系统上线后,需要导入历史数据,但是旧数据格式,数据缺失,数据错误,奇异值,属性归类与新系统有很大的gap。因此我们需要建立一套数据动态清洗规则给Salesforce系统,通过这些规则自动清洗导入数据,清洗规则可以让function自己配置。而不需要IT负责

 

下面将详细举一个例子如何在salesforce中做数据处理。数据清洗需要分成5个步骤

1,建立2个关联数据的Object的和 一个数据清洗后台设置的Object的
2,数据导入页面csv
3,定义每个字段的范围、属性,如果是错误的则自动重新分配,或者修改成临近值
4,数据清洗合并。
5,导出错误数据到Excel
 
第一步,新建立两个关联的Recruit 和 Recruit Department, 并且建立一个清洗规则的Object,当导入数据后我们可以读取设置的清洗规则,并对导入的数据进行清洗
技术分享
技术分享
技术分享
第二步,对于清洗规则,我们只能有一条规则被激活,因此我们在插入新规则和更改旧规则的时候,我们需要添加一个tirgger针对Data_Washing_Setting,保证规则的唯一性。
 
 1 trigger IsActiveChecking on Data_Washing_Setting__c (before insert,before update) { 2  3     List<Data_Washing_Setting__c> ListOldData =http://www.mamicode.com/[select Id from Data_Washing_Setting__c  4                                                where Active_this_Rule__c = true]; 5     List<Data_Washing_Setting__c> ListNewData =http://www.mamicode.com/trigger.new; 6      7     //system.debug(‘ListNewData:‘+ListNewData.size()); 8     integer itemNum = 0; 9     if(trigger.isInsert)10     {11         if(trigger.isBefore)12         {13             for(Data_Washing_Setting__c dws : trigger.new)14             {15                 if(dws.Active_this_Rule__c)16                 {17                     itemNum++;18                 }19             }20            itemNum +=ListOldData.size();21            22            if(itemNum>1)23            {24                for(Data_Washing_Setting__c dws : trigger.new){   25                     dws.adderror(‘only one record can be actived! pls check your history data and try again.‘);26                 }27            }28         }29     }30     else if(trigger.isUpdate)31     {   32         if(trigger.isBefore)33         {34             // 去掉更新的数据35             for(Data_Washing_Setting__c dws : trigger.new)36             {37                 for(integer i=0;i<ListOldData.size();i++){38                     if(dws.Id== ListOldData[i].Id)39                     {40                         ListOldData.remove(i);41                     }42                 }43                 if(dws.Active_this_Rule__c)44                 {45                     itemNum++;46                 }47             }48            itemNum +=ListOldData.size();49            if(itemNum>1)50            {51                for(Data_Washing_Setting__c dws : trigger.new){   52                     dws.adderror(‘only one record can be actived! pls check your history data and try again.‘);53                 }54            }55           56         }57     }58 }

第三步,我们需要建立导入页面,并添加相应的验证按钮

技术分享

VF的代码

 1 <apex:page controller="BatchInsertByCsvController"> 2     <apex:form > 3     <apex:sectionHeader title="Upload Recruit Data"/> 4    <apex:pageMessages /> 5    <apex:pageblock > 6         <center> 7             <apex:inputFile value="{!contentFile}" fileName="{!fileName}" /> 8             <apex:commandButton action="{!LoadData}" value="Batch Insert"/> 9             <apex:commandButton action="{!LoadBlankList}" value="Filter Blank Data"/>10             <apex:commandButton action="{!ExportBlankToCSV}" value="Export CSV"/>11             12         </center>13     </apex:pageblock>14      <apex:pageBlock title="Import Data">15          <apex:pageblocktable value="{!RecruitList}" var="ReList">16               <apex:column value="{!ReList.Name}" />17               <apex:column value="{!ReList.Position_Name__c}" />18               <apex:column value="{!ReList.Recruit_Department__c}" />19               <apex:column value="{!ReList.Recruit_Type__c}" />20               <apex:column value="{!ReList.Recruit_Number__c}" />21         </apex:pageblocktable>22      </apex:pageBlock>23      <apex:pageBlock title="Blank Data">24          <apex:pageblocktable value="{!BlankList}" var="BList">25               <apex:column value="{!BList.Name}" />26               <apex:column value="{!BList.Position_Name__c}" />27               <apex:column value="{!BList.Recruit_Department__c}" />28               <apex:column value="{!BList.Recruit_Type__c}" />29               <apex:column value="{!BList.Recruit_Number__c}" />30         </apex:pageblocktable>31      </apex:pageBlock>32     </apex:form>33 </apex:page>

后台APEX 导入代码

  1 public class BatchInsertByCsvController {  2       3     public string fileName{get;set;}  4     //Blob:二进制对象类型。通过inputFile选中后的文件在后台获取的时候是一个Blob类型,  5     public Blob contentFile{get;set;}  6     public String[] filelines = new String[]{};  7     public List<Recruit__c> RecruitList{get;set;}  8     public List<Recruit__c> BlankList{get;set;}  9     public List<Recruit__c> invaildList{get;set;} 10     //初始化 11     public PageReference LoadData() 12     { 13         try{ 14             filename = bitToString(contentFile,‘ISO-8859-1‘); 15             filelines = fileName.split(‘\n‘); 16            // ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, ‘import account:‘+filelines.size()); 17            // ApexPages.addMessage(msgs); 18             RecruitList = new List<Recruit__c>(); 19             string[] inputvalues; 20             string SwpNumber; 21              22             for(Integer i=1;i<filelines.size();i++) 23             { 24                 inputvalues = new string[]{}; 25                 inputvalues = filelines[i].split(‘,‘); 26                 Recruit__c recruits = new Recruit__c(); 27                 recruits.Name = inputvalues[0]; 28                 recruits.Position_Name__c = inputvalues[1]; 29                 recruits.Recruit_Department__c = [SELECT Id  30                                 FROM Recruit_Department__c  31                                 WHERE Name =:inputvalues[2] LIMIT 1].Id; 32                 recruits.Recruit_Type__c = inputvalues[3]; 33                 SwpNumber = inputvalues[4]; 34                 recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim()); 35                 RecruitList.add(recruits); 36             } 37         } 38         catch(exception e){ 39             ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,‘An error has occured reading the CSV file: ‘+e.getMessage()); 40             ApexPages.addMessage(errormsg); 41         } 42         try{ 43            // insert RecruitList; 44           //   ApexPages.Message successMsg = new ApexPages.Message(ApexPages.severity.INFO,‘import success‘); 45             // ApexPages.addMessage(successMsg); 46         } 47         catch(Exception e) 48         { 49             //ApexPages.Message errormsg = new ApexPages.Message(ApexPages.severity.ERROR,‘An error has occured inserting the records‘+e.getMessage()); 50             //ApexPages.addMessage(errormsg); 51         } 52         return null; 53     } 54     //blob是二进制存储的,String是16进制存储的,所以使用此种方式加上编码解码等操作肯定会更加适应,包括中文 55     private String bitToString(Blob input, String inCharset){ 56          //转换成16进制 57         String hex = EncodingUtil.convertToHex(input); 58          //一个String类型两个字节 32位(bit),则一个String长度应该为两个16进制的长度,所以此处向右平移一个单位,即除以2 59          //向右平移一个单位在正数情况下等同于除以2,负数情况下不等 60          //eg 9  00001001  >>1 00000100   结果为4 61          final Integer bytesCount = hex.length() >> 1; 62          //声明String数组,长度为16进制转换成字符串的长度 63          String[] bytes = new String[bytesCount]; 64          for(Integer i = 0; i < bytesCount; ++i) { 65              //将相邻两位的16进制字符串放在一个String中 66              bytes[i] =  hex.mid(i << 1, 2); 67          } 68          //解码成指定charset的字符串 69          return EncodingUtil.urlDecode(‘%‘ + String.join(bytes, ‘%‘), inCharset); 70      } 71     //筛选空值 72     public PageReference LoadBlankList() 73     { 74         try 75         { 76             BlankList=new list<Recruit__c>(); 77             DataWashingSetting dws=new DataWashingSetting(); 78             string[] flines = dws.AddQuestionsData(filelines); 79             string[] inputvalues; 80             string SwpNumber; 81              82             for(Integer i=0;i<flines.size();i++) 83             { 84                     inputvalues = new string[]{}; 85                     inputvalues = flines[i].split(‘,‘); 86                     Recruit__c recruits = new Recruit__c(); 87                     recruits.Name = inputvalues[0]; 88                     recruits.Position_Name__c = inputvalues[1]; 89                     recruits.Recruit_Department__c = [SELECT Id  90                                     FROM Recruit_Department__c  91                                     WHERE Name =:inputvalues[2] LIMIT 1].Id; 92                     recruits.Recruit_Type__c = inputvalues[3]; 93                     SwpNumber = inputvalues[4]; 94                     recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim()); 95                     BlankList.add(recruits); 96             } 97             ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, ‘blank num:‘+BlankList.size()); 98             ApexPages.addMessage(msgs); 99         }100         catch(Exception e)101         {102             ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,‘An error has occured reading the CSV file: ‘+e.getMessage());103             ApexPages.addMessage(errormsg);104         }105         return null;106     }107     public PageReference ExportBlankToCSV()108     {109          return new PageReference(‘/apex/ExportCSV‘);110     }111 }

后台调用的验证清洗代码,可以根据需要任意添加

 1 public class DataWashingSetting { 2  3     //消除重复数据 4     public List<Recruit__c> DelDuplicateData(List<Recruit__c> OriginalList) 5     {         6         set<Recruit__c> myset= new set<Recruit__c>(); 7         List<Recruit__c> result = new List<Recruit__c>(); 8          9         myset.addAll(OriginalList);10         result.addAll(myset);11         12         return result;13     }14     //筛选为空数据15     public string[] AddQuestionsData(string[] filelines)16     {17         string[] result =new string[]{}; 18         string[] inputvalues;19         for(Integer i=1;i<filelines.size();i++)20         {21             inputvalues = new string[]{};22             inputvalues = filelines[i].split(‘,‘);23             if(inputvalues[0] == ‘‘||inputvalues[1] == ‘‘ ||inputvalues[2] == ‘‘ 24                 ||inputvalues[3] == ‘‘ ||inputvalues[4] == ‘‘)25             {26                 result.add(filelines[i]);   27             }28          }29         return result;30     }31     //检测各个字段的合理性32     public string[] CheckFiled(string[] filelines)33     {34         //读取规则35         Data_Washing_Setting__c dws = [select Position_Name_Rule__c,36                                        Recruit_End_Number__c,Recruit_Department_Rule__c,Recruit_Start_Number__c from Data_Washing_Setting__c where Active_this_Rule__c = true];37         string PositionNameRule = dws.Position_Name_Rule__c; //部门规则是否允许重复38         decimal startNumber= dws.Recruit_Start_Number__c; //招聘人数底线39         decimal endNumber= dws.Recruit_End_Number__c; //招聘人数上线40         string department = dws.Recruit_Department_Rule__c;//部门限制41         42         string[] result =new string[]{}; 43         string[] inputvalues;44         for(Integer i=1;i<filelines.size();i++)45         {46             inputvalues = new string[]{};47             inputvalues = filelines[i].split(‘,‘);48             //填写验证代码49          }50         return result; //返回不合格代码51     }52 }

出现问题数据直接导出问题数据到Excel,手动处理后再导入。

 1 <apex:page controller="BatchInsertByCsvController" cache="true" contentType="application/x-excel# BlankList.xls" showHeader="false"> 2  <head> 3       <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" /> 4  </head> 5      <apex:pageBlock > 6      <apex:pageblocktable value="{!BlankList}" var="BList"> 7               <apex:column value="{!BList.Name}" /> 8               <apex:column value="{!BList.Position_Name__c}" /> 9               <apex:column value="{!BList.Recruit_Department__c}" />10               <apex:column value="{!BList.Recruit_Type__c}" />11               <apex:column value="{!BList.Recruit_Number__c}" />12         </apex:pageblocktable>13     </apex:pageBlock>14 </apex:page>

下面就是最终效果:

1,导入数据,自动筛选有缺失值的数据,并支持Excel导出

技术分享

2,后台清洗的规则设置。

技术分享

 

Salesforce 数据清洗