首页 > 代码库 > Data Pump(数据抽取)介绍
Data Pump(数据抽取)介绍
从10g开始,Oracle提供更高效的Data Pump(即expdp/impdp)来进行数据的导入和导出,老的exp/imp还可以用,但已经不建议使用。注意:expdp/impdp和exp/imp之间互不兼容,也就是说exp导出的文件只能用imp导入,expdp导出的文件只能用impdp导入。
Data Pump的组成部分
- 客户端工具:expdp/impdp
- Data Pump API (即DBMS_DATAPUMP)
- Metadata API(即DMBS_METADATA)
Data Pump相关的角色
- DATAPUMP_EXP_FULL_DATABASE
- DATAPUMP_IMP_FULL_DATABASE
Data Pump数据导入方法
- 数据文件拷贝:这种是最快的方法,dumpfile里只包含元数据,在操作系统层面拷贝数据文件,相关参数有:TRANSPORT_TABLESPACES,
TRANSPORTABLE=ALWAYS
- 直接路径加载:这是除了文件拷贝之外最快的方法,除非无法用(比如BFILE),否则都用这种方法
- 外部表:第1,2种无法用的情况下,才会使用外部表
- 传统路径加载:只有在以上所有方法都不可用的情况下,才会使用传统路径加载,这种方法性能很差
Data Pump Job
- 主进程(master process):控制整个job,是整个job的协调者。
- 主表(master table):记录dumpfile里数据库对象的元信息,expdp结束时将它写入dumpfile里,impdp开始时读取它,这样才能知道dumpfile里的内容。
- 工作进程(worker processes):执行导出导入工作,根据实际情况自动创建多个工作进程并行执行,但不能超过参数PARALLEL定义的个数。
监控Job状态
DBA_DATAPUMP_JOBS
,USER_DATAPUMP_JOBS
, or DBA_DATAPUMP_SESSIONS。
- USERNAME - job owner
- OPNAME - job name
- TARGET_DESC - job operation
- SOFAR - megabytes transferred thus far during the job
- TOTALWORK - estimated number of megabytes in the job
- UNITS - megabytes (MB)
- MESSAGE - a formatted status message of the form:
- ‘job_name: operation_name : nnn out of mmm MB done‘
创建Directory
- SQL> CREATE DIRECTORY dpump_dir1 AS ‘/usr/apps/datafiles‘;
创建了directory对象之后,还要把读写权限赋给执行Data Pump的用户,如下所示:
- SQL> GRANT READ, WRITE ON DIRECTORY dpump_dir1 TO hr;
导出模式
SYS
, ORDSYS
, and MDSYS等。
Full模式
- > expdp user_name FULL=y DUMPFILE=expdat.dmp DIRECTORY=data_pump_dir LOGFILE=export.log
Schema模式
导出一个或多个Schemas(参数SCHEMAS),默认导出当前用户的schema,只有拥有DATAPUMP_EXP_FULL_DATABASE角色才能导出其它Schemas, 例子:
- > expdp hr DIRECTORY=dpump_dir1 DUMPFILE=expdat.dmp SCHEMAS=hr,sh,oe
Table模式
- TABLES=[schema_name.]table_name[:partition_name] [, ...]
如果schema_name省略,表示导出当前用户schema下的表,
- expdp hr TABLES=employees,jobs DUMPFILE=dpump_dir1:table.dmp NOLOGFILE=YES
Tablespace模式
- > expdp hr DIRECTORY=dpump_dir1 DUMPFILE=tbs.dmp
- TABLESPACES=tbs_4, tbs_5, tbs_6
Transpotable Tablespace模式
- > expdp hr DIRECTORY=dpump_dir1 DUMPFILE=tts.dmp
- TRANSPORT_TABLESPACES=tbs_1 TRANSPORT_FULL_CHECK=YES LOGFILE=tts.log
导出过程中的过滤
数据过滤
- QUERY = [schema.][table_name:] query_clause
- QUERY=employees:"WHERE department_id > 10 AND salary > 10000"
- NOLOGFILE=YES
- DIRECTORY=dpump_dir1
- DUMPFILE=exp1.dmp
参数SAMPLE指定导出百分比,其语法如下:
- SAMPLE=[[schema_name.]table_name:]sample_percent
下面是一个例子:
- > expdp hr DIRECTORY=dpump_dir1 DUMPFILE=sample.dmp SAMPLE=70
元数据过滤
- expdp FULL=YES DUMPFILE=expfull.dmp EXCLUDE=SCHEMA:"=‘HR‘"
- > expdp hr DIRECTORY=dpump_dir1 DUMPFILE=hr_exclude.dmp EXCLUDE=VIEW,
- PACKAGE, FUNCTION
INCLUDE例子:
- SCHEMAS=HR
- DUMPFILE=expinclude.dmp
- DIRECTORY=dpump_dir1
- LOGFILE=expinclude.log
- INCLUDE=TABLE:"IN (‘EMPLOYEES‘, ‘DEPARTMENTS‘)"
- INCLUDE=PROCEDURE
- INCLUDE=INDEX:"LIKE ‘EMP%‘"
主要参数说明
- > expdp hr SCHEMAS=hr DIRECTORY=dpump_dir1 DUMPFILE=dpump_dir2:exp1.dmp,
- exp2%U.dmp PARALLEL=3
ESTIMATE_ONLY:如果你只想事先评估下dump文件占用空间大小,可以指定ESTIMATE_ONLY=yes
导入模式
Full模式
Schema模式
SCHEMAS=schema_name [,...]
- > impdp hr SCHEMAS=hr DIRECTORY=dpump_dir1 LOGFILE=schemas.log
- DUMPFILE=expdat.dmp
Table模式
TABLES=[schema_name.]table_name[:partition_name]
> impdp hr DIRECTORY=dpump_dir1 DUMPFILE=expfull.dmp TABLES=employees,jobs
> impdp hr DIRECTORY=dpump_dir1 DUMPFILE=expdat.dmp TABLES=sh.sales:sales_Q1_2012,sh.sales:sales_Q2_2012
Tablespace模式
TABLESPACES=tablespace_name [, ...]
> impdp hr DIRECTORY=dpump_dir1 DUMPFILE=expfull.dmp TABLESPACES=tbs_1,tbs_2,tbs_3,tbs_4
Transpotable Tablespace模式
TRANSPORT_TABLESPACES=tablespace_name [, ...]
TRANSPORT_DATAFILES=datafile_name
DIRECTORY=dpump_dir1
NETWORK_LINK=source_database_link
TRANSPORT_TABLESPACES=tbs_6
TRANSPORT_FULL_CHECK=NO
TRANSPORT_DATAFILES=‘user01/data/tbs6.dbf‘
导入过程中的过滤
主要参数说明
ACCESS_METHOD=[AUTOMATIC | DIRECT_PATH | EXTERNAL_TABLE | CONVENTIONAL]定义导入方法,强烈建议采用默认设置AUTOMATIC,不要改动。
CONTENT=[ALL | DATA_ONLY | METADATA_ONLY]定义只导入数据、元数据还是都要
DIRECTORY=directory_object
指定导入数据文件所在的文件夹DUMPFILE=[directory_object:]file_name [, ...]
指定导入Dump文件名称,可用通配符%U匹配多个Dump文件HELP=YESimpdp help=y 显示帮助信息
JOB_NAME=jobname_string
指定Job_name,一般默认即可LOGFILE=[directory_object:]file_name
指定日志文件名MASTER_ONLY=[YES | NO]指定只导入master table,由于master table包含dumpfile的信息,这样就可以指定dumpfile里包含哪些数据。
PARALLEL=integer
指定导入时的并行度PARFILE=[directory_path]file_name
指定参数文件REMAP_DATA=http://www.mamicode.com/[schema.]tablename.column_name:[schema.]pkg.function
导入时对数据进行修改,比如重新生成PK防止和原有的PK冲突等。REMAP_DATAFILE=source_datafile:target_datafile
可以解决异构平台间文件命名规范不同的问题REMAP_SCHEMA=source_schema:target_schema
这个参数很常用,可以让你导入到不同的schema中,如果target_schema不存在,导入时会自动创建,下面是一个例子:> expdp system SCHEMAS=hr DIRECTORY=dpump_dir1 DUMPFILE=hr.dmp > impdp system DIRECTORY=dpump_dir1 DUMPFILE=hr.dmp REMAP_SCHEMA=hr:scott
REMAP_TABLE=[schema.]old_tablename[.partition]:new_tablename
可以在导入时重命名表或分区,下面是一个例子:> impdp hr DIRECTORY=dpump_dir1 DUMPFILE=expschema.dmp TABLES=hr.employees REMAP_TABLE=hr.employees:emps
REMAP_TABLESPACE=source_tablespace:target_tablespace
在导入时修改表空间名,下面是一个例子:> impdp hr REMAP_TABLESPACE=tbs_1:tbs_6 DIRECTORY=dpump_dir1 DUMPFILE=employees.dmp
REUSE_DATAFILES=[YES | NO]是否重用数据文件,默认为NO,一定要谨慎,一旦设为YES,原有同名的数据文件将被覆盖
SQLFILE=[directory_object:]file_name
如果指定该参数,则不真正执行导入,而是把导入时所需的DDL SQL写入到SQLFILE里。expdp的network_link
- source_db =
- (DESCRIPTION =
- (ADDRESS_LIST =
- (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.15)(PORT = 1521))
- )
- (CONNECT_DATA =
- (sid = orcl)
- )
- )
2. 在目标数据库创建db link:
- SQL>create public database link source_db_link connect to system identified by *** using ‘test15‘;
- Database link created.
- SQL>select instance_name from v$instance@source_db;
- INSTANCE_NAME
- ----------------
- orcl
3. 在目标服务器执行expdp:
- $ expdp system directory=dump_dir network_link=test15 tables=test.test dumpfile=test.dmp logfile=expdp_test.log
其中network_link等于第2步上创建的db link,dumpfile,logfile放在目标数据库的dump_dir
impdp的network_link
- $ impdp system directory=dump_dir network_link=test15 tables=test.test logfile=impdp_test.log
上面语句直接把test.test表从源数据库导入至目标数据库,中间不产生dumpfile,到会产生logfile(logfile放在目标数据库端的dump_dir里)
当我们起了一个datapump job之后,可以通过v$session_longops查看当前进度。
- USERNAME - job owner
- OPNAME - job name
- TARGET_DESC - job operation
- SOFAR - megabytes transferred thus far during the job
- TOTALWORK - estimated number of megabytes in the job
- UNITS - megabytes (MB)
- MESSAGE - a formatted status message of the form:
- ‘job_name: operation_name : nnn out of mmm MB done‘
- SYS@TEST16>select username,opname,sofar,TOTALWORK,UNITS,message from v$session_longops where opname=‘SYS_EXPORT_FULL_03‘;
- USERNAME OPNAME SOFAR TOTALWORK UNITS MESSAGE
- --------------- -------------------- ---------- ---------- ----- ------------------------------------------------------------
- SYSTEM SYS_EXPORT_FULL_03 4737 35368 MB SYS_EXPORT_FULL_03: EXPORT : 4737 out of 35368 MB done
但有时候单单监控是不够的,我们可能还需要修改相应的JOB,这时我们就需要进行datapumo的命令交互模式。
有两种方式可以进入命令交互模式,分别是:
1. 在logging模式下按ctrl+C
2. expdp or impdp attach=SYSTEM.SYS_EXPORT_FULL_03
expdp交互模式的命令如下:
Activity | Command Used |
---|---|
Add additional dump files. |
ADD_FILE |
Exit interactive mode and enter logging mode. |
CONTINUE_CLIENT |
Stop the export client session, but leave the job running. |
EXIT_CLIENT |
Redefine the default size to be used for any subsequent dump files. |
FILESIZE |
Display a summary of available commands. |
HELP |
Detach all currently attached client sessions and terminate the current job. |
KILL_JOB |
Increase or decrease the number of active worker processes for the current job. This command is valid only in the Enterprise Edition of Oracle Database 11g. |
PARALLEL |
Restart a stopped job to which you are attached. |
START_JOB |
Display detailed status for the current job and/or set status interval. |
STATUS |
Stop the current job for later restart. |
STOP_JOB |
impdp的交互模式命令如下:
Activity | Command Used |
---|---|
Exit interactive-command mode. |
CONTINUE_CLIENT |
Stop the import client session, but leave the current job running. |
EXIT_CLIENT |
Display a summary of available commands. |
HELP |
Detach all currently attached client sessions and terminate the current job. |
KILL_JOB |
Increase or decrease the number of active worker processes for the current job. This command is valid only in Oracle Database Enterprise Edition. |
PARALLEL |
Restart a stopped job to which you are attached. |
START_JOB |
Display detailed status for the current job. |
STATUS |
Stop the current job. |
STOP_JOB |
下面以expdp为例,介绍几个常用命令(如果忘记命令,敲万能的help)。
1. status:查看当前job的状态,如完成的百分比、并行度等,每个worker代表一个并行进程。
- Export> status
- Job: SYS_EXPORT_FULL_03
- Operation: EXPORT
- Mode: FULL
- State: EXECUTING
- Bytes Processed: 8,357,285,928
- Percent Done: 23
- Current Parallelism: 2
- Job Error Count: 0
- Dump File: /home/oracle/dump/full_%u.dmp
- Dump File: /home/oracle/dump/full_01.dmp
- bytes written: 8,357,294,080
- Dump File: /home/oracle/dump/full_02.dmp
- bytes written: 4,096
- Worker 1 Status:
- Process Name: DW00
- State: EXECUTING
- Object Type: DATABASE_EXPORT/SCHEMA/TABLE/COMMENT
- Completed Objects: 5,120
- Worker Parallelism: 1
- Worker 2 Status:
- Process Name: DW01
- State: EXECUTING
- Object Schema: P95169
- Object Name: GRADE_RCCASE
- Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
- Completed Objects: 3
- Total Objects: 1,866
- Completed Rows: 23,505,613
- Worker Parallelism: 1
2. parallel:动态调整并行度
- Export> parallel=4
3. add_file:增加dumpfile
- Export> ADD_FILE=hr2.dmp, dpump_dir2:hr3.dmp
4. stop_job, kill_job, start_job
stop_job只是暂停,之后可以用start_job重新启动,而kill_job直接杀掉,不可恢复
5. continue_client:退出交互模式,进入logging模式;
exit_client: 退出客户端
Data Pump(数据抽取)介绍