首页 > 代码库 > Salesforce 数据清洗
Salesforce 数据清洗
新系统上线后,需要导入历史数据,但是旧数据格式,数据缺失,数据错误,奇异值,属性归类与新系统有很大的gap。因此我们需要建立一套数据动态清洗规则给Salesforce系统,通过这些规则自动清洗导入数据,清洗规则可以让function自己配置。而不需要IT负责
下面将详细举一个例子如何在salesforce中做数据处理。数据清洗需要分成5个步骤
1,建立2个关联数据的Object的和 一个数据清洗后台设置的Object的
2,数据导入页面csv
3,定义每个字段的范围、属性,如果是错误的则自动重新分配,或者修改成临近值
4,数据清洗合并。
5,导出错误数据到Excel
第一步,新建立两个关联的Recruit 和 Recruit Department, 并且建立一个清洗规则的Object,当导入数据后我们可以读取设置的清洗规则,并对导入的数据进行清洗
第二步,对于清洗规则,我们只能有一条规则被激活,因此我们在插入新规则和更改旧规则的时候,我们需要添加一个tirgger针对Data_Washing_Setting,保证规则的唯一性。
1 trigger IsActiveChecking on Data_Washing_Setting__c (before insert,before update) { 2 3 List<Data_Washing_Setting__c> ListOldData =http://www.mamicode.com/[select Id from Data_Washing_Setting__c 4 where Active_this_Rule__c = true]; 5 List<Data_Washing_Setting__c> ListNewData =http://www.mamicode.com/trigger.new; 6 7 //system.debug(‘ListNewData:‘+ListNewData.size()); 8 integer itemNum = 0; 9 if(trigger.isInsert)10 {11 if(trigger.isBefore)12 {13 for(Data_Washing_Setting__c dws : trigger.new)14 {15 if(dws.Active_this_Rule__c)16 {17 itemNum++;18 }19 }20 itemNum +=ListOldData.size();21 22 if(itemNum>1)23 {24 for(Data_Washing_Setting__c dws : trigger.new){ 25 dws.adderror(‘only one record can be actived! pls check your history data and try again.‘);26 }27 }28 }29 }30 else if(trigger.isUpdate)31 { 32 if(trigger.isBefore)33 {34 // 去掉更新的数据35 for(Data_Washing_Setting__c dws : trigger.new)36 {37 for(integer i=0;i<ListOldData.size();i++){38 if(dws.Id== ListOldData[i].Id)39 {40 ListOldData.remove(i);41 }42 }43 if(dws.Active_this_Rule__c)44 {45 itemNum++;46 }47 }48 itemNum +=ListOldData.size();49 if(itemNum>1)50 {51 for(Data_Washing_Setting__c dws : trigger.new){ 52 dws.adderror(‘only one record can be actived! pls check your history data and try again.‘);53 }54 }55 56 }57 }58 }
第三步,我们需要建立导入页面,并添加相应的验证按钮
VF的代码
1 <apex:page controller="BatchInsertByCsvController"> 2 <apex:form > 3 <apex:sectionHeader title="Upload Recruit Data"/> 4 <apex:pageMessages /> 5 <apex:pageblock > 6 <center> 7 <apex:inputFile value="{!contentFile}" fileName="{!fileName}" /> 8 <apex:commandButton action="{!LoadData}" value="Batch Insert"/> 9 <apex:commandButton action="{!LoadBlankList}" value="Filter Blank Data"/>10 <apex:commandButton action="{!ExportBlankToCSV}" value="Export CSV"/>11 12 </center>13 </apex:pageblock>14 <apex:pageBlock title="Import Data">15 <apex:pageblocktable value="{!RecruitList}" var="ReList">16 <apex:column value="{!ReList.Name}" />17 <apex:column value="{!ReList.Position_Name__c}" />18 <apex:column value="{!ReList.Recruit_Department__c}" />19 <apex:column value="{!ReList.Recruit_Type__c}" />20 <apex:column value="{!ReList.Recruit_Number__c}" />21 </apex:pageblocktable>22 </apex:pageBlock>23 <apex:pageBlock title="Blank Data">24 <apex:pageblocktable value="{!BlankList}" var="BList">25 <apex:column value="{!BList.Name}" />26 <apex:column value="{!BList.Position_Name__c}" />27 <apex:column value="{!BList.Recruit_Department__c}" />28 <apex:column value="{!BList.Recruit_Type__c}" />29 <apex:column value="{!BList.Recruit_Number__c}" />30 </apex:pageblocktable>31 </apex:pageBlock>32 </apex:form>33 </apex:page>
后台APEX 导入代码
1 public class BatchInsertByCsvController { 2 3 public string fileName{get;set;} 4 //Blob:二进制对象类型。通过inputFile选中后的文件在后台获取的时候是一个Blob类型, 5 public Blob contentFile{get;set;} 6 public String[] filelines = new String[]{}; 7 public List<Recruit__c> RecruitList{get;set;} 8 public List<Recruit__c> BlankList{get;set;} 9 public List<Recruit__c> invaildList{get;set;} 10 //初始化 11 public PageReference LoadData() 12 { 13 try{ 14 filename = bitToString(contentFile,‘ISO-8859-1‘); 15 filelines = fileName.split(‘\n‘); 16 // ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, ‘import account:‘+filelines.size()); 17 // ApexPages.addMessage(msgs); 18 RecruitList = new List<Recruit__c>(); 19 string[] inputvalues; 20 string SwpNumber; 21 22 for(Integer i=1;i<filelines.size();i++) 23 { 24 inputvalues = new string[]{}; 25 inputvalues = filelines[i].split(‘,‘); 26 Recruit__c recruits = new Recruit__c(); 27 recruits.Name = inputvalues[0]; 28 recruits.Position_Name__c = inputvalues[1]; 29 recruits.Recruit_Department__c = [SELECT Id 30 FROM Recruit_Department__c 31 WHERE Name =:inputvalues[2] LIMIT 1].Id; 32 recruits.Recruit_Type__c = inputvalues[3]; 33 SwpNumber = inputvalues[4]; 34 recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim()); 35 RecruitList.add(recruits); 36 } 37 } 38 catch(exception e){ 39 ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,‘An error has occured reading the CSV file: ‘+e.getMessage()); 40 ApexPages.addMessage(errormsg); 41 } 42 try{ 43 // insert RecruitList; 44 // ApexPages.Message successMsg = new ApexPages.Message(ApexPages.severity.INFO,‘import success‘); 45 // ApexPages.addMessage(successMsg); 46 } 47 catch(Exception e) 48 { 49 //ApexPages.Message errormsg = new ApexPages.Message(ApexPages.severity.ERROR,‘An error has occured inserting the records‘+e.getMessage()); 50 //ApexPages.addMessage(errormsg); 51 } 52 return null; 53 } 54 //blob是二进制存储的,String是16进制存储的,所以使用此种方式加上编码解码等操作肯定会更加适应,包括中文 55 private String bitToString(Blob input, String inCharset){ 56 //转换成16进制 57 String hex = EncodingUtil.convertToHex(input); 58 //一个String类型两个字节 32位(bit),则一个String长度应该为两个16进制的长度,所以此处向右平移一个单位,即除以2 59 //向右平移一个单位在正数情况下等同于除以2,负数情况下不等 60 //eg 9 00001001 >>1 00000100 结果为4 61 final Integer bytesCount = hex.length() >> 1; 62 //声明String数组,长度为16进制转换成字符串的长度 63 String[] bytes = new String[bytesCount]; 64 for(Integer i = 0; i < bytesCount; ++i) { 65 //将相邻两位的16进制字符串放在一个String中 66 bytes[i] = hex.mid(i << 1, 2); 67 } 68 //解码成指定charset的字符串 69 return EncodingUtil.urlDecode(‘%‘ + String.join(bytes, ‘%‘), inCharset); 70 } 71 //筛选空值 72 public PageReference LoadBlankList() 73 { 74 try 75 { 76 BlankList=new list<Recruit__c>(); 77 DataWashingSetting dws=new DataWashingSetting(); 78 string[] flines = dws.AddQuestionsData(filelines); 79 string[] inputvalues; 80 string SwpNumber; 81 82 for(Integer i=0;i<flines.size();i++) 83 { 84 inputvalues = new string[]{}; 85 inputvalues = flines[i].split(‘,‘); 86 Recruit__c recruits = new Recruit__c(); 87 recruits.Name = inputvalues[0]; 88 recruits.Position_Name__c = inputvalues[1]; 89 recruits.Recruit_Department__c = [SELECT Id 90 FROM Recruit_Department__c 91 WHERE Name =:inputvalues[2] LIMIT 1].Id; 92 recruits.Recruit_Type__c = inputvalues[3]; 93 SwpNumber = inputvalues[4]; 94 recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim()); 95 BlankList.add(recruits); 96 } 97 ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, ‘blank num:‘+BlankList.size()); 98 ApexPages.addMessage(msgs); 99 }100 catch(Exception e)101 {102 ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,‘An error has occured reading the CSV file: ‘+e.getMessage());103 ApexPages.addMessage(errormsg);104 }105 return null;106 }107 public PageReference ExportBlankToCSV()108 {109 return new PageReference(‘/apex/ExportCSV‘);110 }111 }
后台调用的验证清洗代码,可以根据需要任意添加
1 public class DataWashingSetting { 2 3 //消除重复数据 4 public List<Recruit__c> DelDuplicateData(List<Recruit__c> OriginalList) 5 { 6 set<Recruit__c> myset= new set<Recruit__c>(); 7 List<Recruit__c> result = new List<Recruit__c>(); 8 9 myset.addAll(OriginalList);10 result.addAll(myset);11 12 return result;13 }14 //筛选为空数据15 public string[] AddQuestionsData(string[] filelines)16 {17 string[] result =new string[]{}; 18 string[] inputvalues;19 for(Integer i=1;i<filelines.size();i++)20 {21 inputvalues = new string[]{};22 inputvalues = filelines[i].split(‘,‘);23 if(inputvalues[0] == ‘‘||inputvalues[1] == ‘‘ ||inputvalues[2] == ‘‘ 24 ||inputvalues[3] == ‘‘ ||inputvalues[4] == ‘‘)25 {26 result.add(filelines[i]); 27 }28 }29 return result;30 }31 //检测各个字段的合理性32 public string[] CheckFiled(string[] filelines)33 {34 //读取规则35 Data_Washing_Setting__c dws = [select Position_Name_Rule__c,36 Recruit_End_Number__c,Recruit_Department_Rule__c,Recruit_Start_Number__c from Data_Washing_Setting__c where Active_this_Rule__c = true];37 string PositionNameRule = dws.Position_Name_Rule__c; //部门规则是否允许重复38 decimal startNumber= dws.Recruit_Start_Number__c; //招聘人数底线39 decimal endNumber= dws.Recruit_End_Number__c; //招聘人数上线40 string department = dws.Recruit_Department_Rule__c;//部门限制41 42 string[] result =new string[]{}; 43 string[] inputvalues;44 for(Integer i=1;i<filelines.size();i++)45 {46 inputvalues = new string[]{};47 inputvalues = filelines[i].split(‘,‘);48 //填写验证代码49 }50 return result; //返回不合格代码51 }52 }
出现问题数据直接导出问题数据到Excel,手动处理后再导入。
1 <apex:page controller="BatchInsertByCsvController" cache="true" contentType="application/x-excel# BlankList.xls" showHeader="false"> 2 <head> 3 <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" /> 4 </head> 5 <apex:pageBlock > 6 <apex:pageblocktable value="{!BlankList}" var="BList"> 7 <apex:column value="{!BList.Name}" /> 8 <apex:column value="{!BList.Position_Name__c}" /> 9 <apex:column value="{!BList.Recruit_Department__c}" />10 <apex:column value="{!BList.Recruit_Type__c}" />11 <apex:column value="{!BList.Recruit_Number__c}" />12 </apex:pageblocktable>13 </apex:pageBlock>14 </apex:page>
下面就是最终效果:
1,导入数据,自动筛选有缺失值的数据,并支持Excel导出
2,后台清洗的规则设置。
Salesforce 数据清洗
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。