首页 > 代码库 > 快速批量导入庞大数据到SQL SERVER数据库(ADO.NET)

快速批量导入庞大数据到SQL SERVER数据库(ADO.NET)

如果你需要在程序中批量插入成千上万行的数据,你会怎么编写代码呢?最近在帮朋友调优这个的时候,总结了几种方法,并对其进行比较。

大概的界面如下,我模拟了一个客户资料表.

image

数据我是放在一个XML文件的,大约6734行。类似下面的格式

<?xml version="1.0" encoding="utf-8" ?> <root><Customers>  <CustomerID>ALFKI</CustomerID>  <CompanyName>Sina</CompanyName>  <ContactName>Maria Anders</ContactName>  <ContactTitle>Sales Representative</ContactTitle>  <Address>Obere Str. 57</Address>  <City>Berlin</City>  <PostalCode>12209</PostalCode>  <Country>Germany</Country>  <Phone>030-0074321</Phone>  <Fax>030-0076545</Fax></Customers><Customers>  <CustomerID>ANATR</CustomerID>  <CompanyName>Ana Trujillo Emparedados y helados</CompanyName>  <ContactName>Ana Trujillo</ContactName>  <ContactTitle>Owner</ContactTitle>  <Address>Avda. de la Constitución 2222</Address>  <City>México D.F.</City>  <PostalCode>05021</PostalCode>  <Country>Mexico</Country>  <Phone>(5) 555-4729</Phone>  <Fax>(5) 555-3745</Fax></Customers><Customers>  <CustomerID>ANTON</CustomerID>  <CompanyName>Antonio Moreno Taquería</CompanyName>  <ContactName>Antonio Moreno</ContactName>  <ContactTitle>Owner</ContactTitle>  <Address>Mataderos  2312</Address>  <City>México D.F.</City>  <PostalCode>05023</PostalCode>  <Country>Mexico</Country>  <Phone>(5) 555-3932</Phone></Customers></root>
<style></style>

 

下面首先在服务器稍微准备一下环境

USE [tempdb]GOSET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOCREATE TABLE [dbo].[Customers](    [CustomerID] [nchar](5) NOT NULL,    [CompanyName] [nvarchar](40) NOT NULL,    [ContactName] [nvarchar](30) NULL,    [ContactTitle] [nvarchar](30) NULL,    [Address] [nvarchar](60) NULL,    [City] [nvarchar](15) NULL,    [Region] [nvarchar](15) NULL,    [PostalCode] [nvarchar](10) NULL,    [Country] [nvarchar](15) NULL,    [Phone] [nvarchar](24) NULL,    [Fax] [nvarchar](24) NULL) ON [PRIMARY]CREATE PROCEDURE [dbo].[usp_InsertCustomer]    @CustomerID nchar(5),    @CompanyName nvarchar(40),    @ContactName nvarchar(30),    @ContactTitle nvarchar(30),    @Address nvarchar(60),    @City nvarchar(15),    @Region nvarchar(15),    @PostalCode nvarchar(10),    @Country nvarchar(15),    @Phone nvarchar(24),    @Fax nvarchar(24)ASSET NOCOUNT ONINSERT INTO [dbo].[Customers] (    [CustomerID],    [CompanyName],    [ContactName],    [ContactTitle],    [Address],    [City],    [Region],    [PostalCode],    [Country],    [Phone],    [Fax]) VALUES (    @CustomerID,    @CompanyName,    @ContactName,    @ContactTitle,    @Address,    @City,    @Region,    @PostalCode,    @Country,    @Phone,    @Fax)
<style></style>
 
 
我们在tempdb中创建了一个表和一个存储过程
 
首先,我们把数据加载到一个DataSet
        DataSet ds = new DataSet();        private void btLoadData_Click(object sender, EventArgs e)        {            string dataFile = "CustomersData.xml";            ds.ReadXml(dataFile);            bindingSource1.DataSource = ds;            bindingSource1.DataMember = "Customers";            dataGridView1.DataSource = bindingSource1;                    }
<style></style>

然后,我们第一个测试代码是遍历这个DataSet,每一行提交一次

        private string GetConnectionString()        {            return "server=(local);database=tempdb;integrated security=true;";        }        /// <summary>        /// 直接遍历,一个一个的提交给服务器。时间为265毫秒左右        /// 每一行都需要写日志        /// </summary>        /// <param name="sender"></param>        /// <param name="e"></param>        private void btOneByOne_Click(object sender, EventArgs e)        {            SqlConnection conn = new SqlConnection(GetConnectionString());            SqlCommand cmd = conn.CreateCommand();            cmd.CommandText = "usp_InsertCustomer";            cmd.CommandType = CommandType.StoredProcedure;            conn.Open();            TimeSpan startTime = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime;            foreach (DataRow row in ds.Tables[0].Rows)            {                cmd.Parameters.Clear();                SqlParameter[] param = new SqlParameter[]{                    new SqlParameter("@CustomerID",row[0].ToString()),                    new SqlParameter("@CompanyName",row[1].ToString()),                    new SqlParameter("@ContactName",row[2].ToString()),                    new SqlParameter("@ContactTitle",row[3].ToString()),                    new SqlParameter("@Address",row[4].ToString()),                    new SqlParameter("@City",row[5].ToString()),                    new SqlParameter("@Region",row[6].ToString()),                    new SqlParameter("@PostalCode",row[7].ToString()),                    new SqlParameter("@Country",row[8].ToString()),                    new SqlParameter("@Phone",row[9].ToString()),                    new SqlParameter("@Fax",row[10].ToString())                };                cmd.Parameters.AddRange(param);                cmd.ExecuteNonQuery();            }            conn.Close();            TimeSpan duration = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime.Subtract(startTime);            MessageBox.Show("已经全部插入成功,所用时间为" + duration.Milliseconds.ToString() + "毫秒");        }
<style></style>

 

接下来,我们使用ADO.NET内置的一个DataAdapter来提交

        /// <summary>        /// 这是使用Adapter的方式,其实还是遍历,而且语法也没有简单        /// 同时,速度甚至更慢。时间为650毫秒左右        /// 每一行都需要写日志        /// </summary>        /// <param name="sender"></param>        /// <param name="e"></param>        private void btUseAdapter_Click(object sender, EventArgs e)        {            SqlDataAdapter adapter = new SqlDataAdapter();            adapter.AcceptChangesDuringUpdate = false;//为了演示目的,把这个开关关掉,以免它在更新完成后把数据集标记为未更改            adapter.UpdateBatchSize = 10;//这个好像也没有什么用                        SqlConnection conn = new SqlConnection(GetConnectionString());            SqlCommand insertCommand = conn.CreateCommand();            insertCommand.CommandText = "usp_InsertCustomer";            insertCommand.CommandType = CommandType.StoredProcedure;            insertCommand.UpdatedRowSource = UpdateRowSource.None;            insertCommand.Parameters.Add("@CustomerID", SqlDbType.NChar, 5, "CustomerID");            insertCommand.Parameters.Add("@CompanyName", SqlDbType.NVarChar, 40, "CompanyName");            insertCommand.Parameters.Add("@ContactName", SqlDbType.NVarChar, 30, "ContactName");            insertCommand.Parameters.Add("@ContactTitle", SqlDbType.NVarChar, 30, "ContactTitle");            insertCommand.Parameters.Add("@Address", SqlDbType.NVarChar, 60, "Address");            insertCommand.Parameters.Add("@City", SqlDbType.NVarChar, 15, "City");            insertCommand.Parameters.Add("@Region", SqlDbType.NVarChar, 15, "Region");            insertCommand.Parameters.Add("@PostalCode", SqlDbType.NVarChar, 10, "PostalCode");            insertCommand.Parameters.Add("@Country", SqlDbType.NVarChar, 15, "Country");            insertCommand.Parameters.Add("@Phone", SqlDbType.NVarChar, 24, "Phone");            insertCommand.Parameters.Add("@Fax", SqlDbType.NVarChar, 24, "Fax");            adapter.InsertCommand = insertCommand;            TimeSpan startTime = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime;            adapter.Update(ds,"Customers");            TimeSpan duration = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime.Subtract(startTime);            MessageBox.Show("已经全部插入成功,所用时间为" + duration.Milliseconds.ToString() + "毫秒");                                }
<style></style>

 

最后,我们找到了最快的方法

 

        /// <summary>        /// 使用新的API,批量导入,这个速度很快,大约26毫秒,很显然,这种方式只写一次日志,不会为每一行写日志        /// </summary>        /// <param name="sender"></param>        /// <param name="e"></param>        private void btBCP_Click(object sender, EventArgs e)        {            using (SqlConnection conn = new SqlConnection(GetConnectionString()))            {                SqlBulkCopy bcp = new SqlBulkCopy(conn);                bcp.DestinationTableName = "Customers";                bcp.BatchSize = 100;//这是批尺寸可以调整                for (int i = 0; i < 11; i++)                {                    bcp.ColumnMappings.Add(i, i);                }                TimeSpan startTime = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime;                conn.Open();                bcp.WriteToServer(ds.Tables[0]);                TimeSpan duration = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime.Subtract(startTime);                MessageBox.Show("已经全部插入成功,所用时间为" + duration.Milliseconds.ToString() + "毫秒");            }        }
<style></style>

 

还有一种办法是通过在服务器OPENXML,因为XML反复处理效率很差,所以就没有测试了,可以断定它肯定比其他几种还要慢。

另外提示一下,如果不用编程的方式,那么有其他三个可能的途径去做这个事情

1. BCP工具(这是一个命令行,可以做导入和导出,不过来源文件如果不规范,那么可能很费劲)

2. BULK INSERT语句(这是一个T-SQL语句,只能做导入,我们上面使用的SQLBULKCopy应该和他很类似)

3. XML Bulk Load(这是一套COM的对象模型,适合导入XML文档)

 

还有,在做大量的数据导入和导出时,可以考虑微软为SQL Server配套的SSIS(Integration Service)

 

快速批量导入庞大数据到SQL SERVER数据库(ADO.NET)