首页 > 代码库 > Oracle 12.1.0.2 New Feature翻译学习【In-Memory column store内存列存储】【原创】

Oracle 12.1.0.2 New Feature翻译学习【In-Memory column store内存列存储】【原创】

翻译没有追求信达雅,不是为了学英语翻译,是为了快速了解新特性,如有语义理解错误可以指正。欢迎加微信12735770或QQ12735770探讨oracle技术问题:)

In-Memory Column Store内存列存储

 

Starting in Oracle Database 12c Release 1 (12.1.0.2), the In-Memory Column Store (IM column store) is an optional, static SGA pool that stores copies of tables and partitions in a special columnar format optimized for rapid scans.

12.1.0.2开始,有一个可选项叫In-Memory column storeIM column store),可使静态SGA存储为了快速扫描优化过的指定列格式(column format)的表和分区的复制。

 

The IM column store does not replace the buffer cache, but acts as a supplement so that both memory areas can store the same data in different formats. By default, only objects specified as INMEMORY using DDL are candidates to be populated in the IM column store.

内存列存储不是为了替代Buffer Cache,而是作为补充可以让所有内存区域存储同样的数据用不同的格式。默认的,需要用DDL对指明的对象用INMEMEORY选项才会加入队列等待存入内存列存储中。

 

It is not necessary for objects populated in the IM column store to be loaded into the database buffer cache.

完全不需要把内存中的填充对象加载到buffer cache中。

柱状格式只存在于内存中。下图展示了SH模式中存储在内存列存储中的三个表:customers,products,sales内存列存储使用列存储数据而不是行。数据库保证柱状格式的数据与buffer cache中的数据是一致的,只是存储格式不同而已。

 

Benefits of the IM Column Store

使用内存列存储的好处

 

The IM column store enables the database to perform scans, joins, and aggregates much faster than when it uses the on-disk format exclusively.

使用内存列存储可以使数据库的扫描scans、连接joins、聚合aggregates比用on-disk format快很多。

 

In particular, the IM column store is useful for:

特别是内存列存储用在:

l Performing fast full scans of large tables

l 执行大表快速扫描;

l Evaluating predicate filters that use operators such as =, <, >, and IN

评估用<>=in操作的谓语过滤;

l Querying a subset of columns in a table, for example, selecting 5 of 100 columns

查询表中列的子集,如:select 5of 100 columns.

l Accelerating joins by converting predicates on small dimension tables into filters on a large fact table

通过转换小的维度表的谓语到一个大的事实表的过滤来加速join

 

Business applications, ad-hoc analytic queries, and data warehouse workloads benefit most. Pure OLTP databases that perform short transactions using index lookups benefit less.

商业应用,即席分析查询,以及数据仓库负载收益最多。完全的OLTP数据库,使用索引查询短事物,收益较小。

 

The IM column store also provides the following advantages:

内存列存储优势:

l 1.All existing database features are supported, including High Availability features (see "Overview of High Availability").

l 支持现有所有数据库特性,包括高可用。

l No application changes are required.

l 不需要更改应用。

l 2.The optimizer automatically takes advantage of the columnar format.

l 优化器自动利用柱状格式。

l 3.Configuration is simple.

l 配置简单。

l The INMEMORY_SIZE initialization parameter specifies the amount of memory reserved for use by the IM column store. DDL statements specify the tablespaces, tables, partitions, or columns to be read into the IM column store.

只需要对INMEMORY_SIZE初始化参数指定为使用IM column store保留的内存大小。然后DDL语句指定表空间、表、分区或者需要被读入IM column store的列就可以。

 

 

 

 

 

l 4.Compression is optimized for query performance.

l 为查询性能优化了压缩。

l These compression techniques increase the effective memory bandwidth by enabling sessions to read more data into memory.

l 这些压缩技术提高了内存带宽的使用效率,使会话可以读入更多数据到内存中。

 

l 5.Fewer indexes, materialized views, and OLAP cubes are required.

更少的需要索引、物化视图和OLAP多维数据集。

l The reduction in the number of pre-built objects results in reduced storage space and significantly less processing overhead.

l 预创建对象数量的减少使存储空间降低并且显著地减少处理开销。

 

DUAL Memory Format:columnar and row

双内存模式:柱模式和行模式

 

When fetching data, Oracle Database can read either the IM column store or the database buffer cache, or both within the same query.

取数据的时候,oracle既可以读取内存列存储的,也可以读取buffer cacho中的,或者同时用他们查询同样的数据。

The database transparently sends OLTP queries (such as primary key lookups) to the buffer cache, and analytic and reporting queries to the IM column store. Thus, dual memory formats provide the best of both worlds.

数据库透明地发送OLTP查询(如主键查询)到buffer cache中,并且也分析、报告基于内存列存储的查询。因此,双内存格式提供比两个单独的(IM column store/buffer cache)方式更好的方式。

 

In the execution plan, the operation TABLE ACCESS IN MEMORY FULL indicates use of the IM column store.

在执行计划中,TABLE ACCESS IN MEMORY FULL表明使用的是内存列存储。

The following figure shows a sample IM column store. The sh.sales table is stored on disk in traditional row format. The SGA stores the data in columnar format in the IM column store, and in row format in the database buffer cache.

下图展示了内存列存储的样例,sh.sales表被存在磁盘上以传统的行格式。SGAIM column store上以柱状格式存储数据,并且在buffer cache中以行格式存储数据。

 

 

 

 

How Background Processes Populate IMCUs

后台进程如何填充IMCUs

 

在填充数据期间,数据库从磁盘上以行的形式读取数据,并且以行为单位创建列,然后压缩数据到In-Memory-Compression-UnitsIMCU),worker processWnnn)进程填充数据到内存列存储,每个worker进程操作一个对象上的数据块子集,填充是一个流机制,压缩并且同时转换数据为柱状格式。

 

The INMEMORY_MAX_POPULATE_SERVERS initialization parameter specifies the maximum number of worker processes to use for IM column store population. By default, the setting is one half of CPU_COUNT. Set this parameter to an appropriate value for your environment. More worker processes result in faster population, but they use more CPU resources. Fewer worker processes result in slower population, which reduces CPU overhead.

INMEMORY_MAX_POPULATE_SERVERS初始化参数指定用来执行内存列存储填充任务的worker进程最大数量。该参数默认为CPU_COUNT的一半。Worker越多,填充越快,但是也越消耗CPU,所以需要根据实际环境设置该参数。

注意,如果该参数设置为0,则意味着禁用了填充功能。

 

Population of the IM Column Store at Instance Startup

实例启动时的内存列存储的填充操作

The database must reconstruct the entire in-memory columnar representation from the on-disk representation every time the database instance restarts. This reconstruction is necessary because the IM column store resides only in memory.

数据库实例每次重启都必须要根据磁盘中的representation来重建内存中的representation。这个重建过程是必须的,因为内存列存储只能存储在内存中。

 

Population of the IM Column Store in Response to Queries

查询响应中的内存列存储填充操作

 

Setting the INMEMORY attribute on an object means that this object is a candidate for population in the IM column store, not that the database immediately populates the object in memory.

在一个对象上设置了INMEMORY属性,意味着这个对象已经加入等待填充到内存列存储的队列中了,并不是立即把这个对象填充进内存。

By default (INMEMORY PRIORITY is set to NONE), the database delays population of a table in the IM column store until the database considers it useful. When the INMEMORY attribute is set for an object, the database may choose not to materialize all columns when the database determines that the memory is better used elsewhere. Also, the IM column store may populate a subset of columns from a table.

INMEMORY PRIORITY默认为NONE,数据库需要先确定使用内存列存储有用才会进行一个表的填充。即使INMEMORY属性已经设置给对象了,但是数据库如果觉得内存用在其他地方更好,则会选择不去物化所有的列。内存列存储可能只会填充表上列的部分子集。

 

 

Example 14-1 Population of an Object in the IM Column Store

 

Assume that you connect to the database as administrator. To determine whether data from the sh.customers table has been populated in the IM column store, you execute the following query :

假设你以SYSDBA权限登录数据库,为了查询是否sh.customres表已经被填充进IMcolumn store中,你执行以下语句:

SQL> SELECT SEGMENT_NAME, POPULATE_STATUS FROM V$IM_SEGMENTS WHERE SEGMENT_NAME = ‘CUSTOMERS‘;

 

no rows selected

 

In this case, no segments are populated in the IM column store because sh.customers has not yet been accessed. You query sh.customers, and then query V$IM_SEGMENTS again:

这种情况为什么没有段被填充进内存列存储中,是因为sh.customers还没有被访问过,当你查询一次sh.customres,然后在查询V$IM_SEGMENTS时候:

SQL> SELECT cust_id, cust_last_name, cust_first_name

  2  FROM sh.customers WHERE cust_city = ‘Hyderabad‘

  3  AND cust_income_level LIKE ‘C%‘ AND cust_year_of_birth > 1960;

 

   CUST_ID CUST_LAST_NAME  CUST_FIRST

---------- --------------- ----------

      6284 Hummer          Liane

     12515 Oppy            Linette

     39514 Ready           Theodric

     33292 Weston          Theodric

     33284 Grigsby         Thatcher

     27062 Patterson       Thatcher

 

6 rows selected.

 

SQL> COL SEGMENT_NAME FORMAT a20

SQL> SELECT SEGMENT_NAME, POPULATE_STATUS FROM V$IM_SEGMENTS WHERE SEGMENT_NAME = ‘CUSTOMERS‘;

 

SEGMENT_NAME         POPULATE_STAT

-------------------- -------------

CUSTOMERS            COMPLETED

 

The following query confirms that the database used the IM column store to retrieve the results:

下面的查询可以证明是否数据库使用了内存列存储来检索结果:

SQL> COL NAME FORMAT a25

SQL> SELECT ul.NAME, ul.DETECTED_USAGES FROM DBA_FEATURE_USAGE_STATISTICS ul

   WHERE ul.VERSION=(SELECT MAX(u2.VERSION) FROM DBA_FEATURE_USAGE_STATISTICS u2

                     WHERE u2.NAME = ul.NAME AND ul.NAME LIKE ‘%Column Store%‘);

 

NAME                      DETECTED_USAGES

------------------------- ---------------

In-Memory Column Store    1

 

Population of the IM Column Store in Oracle RAC

RAC中的IM column store填充操作

 

Each node in an Oracle Real Application Clusters (Oracle RAC) environment has its own IM column store.

RAC每个节点有自己的内存列存储(毕竟每个节点有自己的内存)

By default, each in-memory object is distributed among the Oracle RAC instances, effectively employing a share-nothing architecture for the IM column store. When INMEMORY is specified, the DISTRIBUTE and DUPLICATE keywords control the distribution of objects.

每个内存对象默认被分布在RAC实例中,并且内存列存储都设置为share-nothing。当指定INMEMORYDISTRIBUTEDUPLICATE关键字控制对象的具体分布。

 

注意:可以为每个节点设置IMEMORY_SIZE0,意味着禁用该节点的IM column store

 

In-Memory Columnar Compression

内存柱状压缩

 

The IM column store uses special compression formats optimized for access speed rather than storage reduction.

内存列存储使用为了提高访问速度而不是减少磁盘访问而优化过的特别压缩格式。

The database increases speed in the following ways:

数据库通过以下途径提高访问速度:

The compression formats enable the database to decrease the amount of memory processed for each column. SQL executes directly on the compressed columns.

压缩格式使数据库减少每列的内存进程数量,SQL直接执行在压缩列上。

 

The database uses SIMD Single instruction, multiple datavector (array) instructions to process an array of column values in a single CPU clock cycle. The database can store many values in a vector, which maximizes the performance benefits with SIMD vector processing.

数据库使用SIMD矢量(阵列)指令来处理单个CPU周期中的列值的阵列。数据库可以存储许多矢量中的值,这个矢量最大化了SIMD矢量处理的性能收益。

 

You can use the MEMCOMPRESS subclause of CREATE and ALTER to select different compression options for each table, partition, or column. For example, you might optimize some columns in a table for scan speed, and others for storage.

你可以使用CREATEALTERMEMCOMPRESS子句选择不同的压缩选项给每张表,分区,或者列。例如:你可能需要在这张表上提高扫描速度,而在另一张表上提高存储性能。

 

The default compression option is MEMCOMPRESS FOR QUERY LOW. This option provides the fastest read times because the database does not need to decompress the data. The other compression options, such as FOR QUERY HIGH and FOR CAPACITY, use layers of compression, some of which require decompression. Before storing a table in the IM column store, you can use DBMS_COMPRESSION.GET_COMPRESSION_RATIO to report the compression ratio, and thus estimate how much space is required.

默认压缩选项是MEMCOMPRESS FOR QUERY LOW。这个选项提供最快的读取速度,因为数据库不需要解压数据。

其他压缩选项,例如:FOR QUERY HIGHFOR CAPACITY,不同的压缩层次,有一些需要先解压。在把表存进内存列存储中之前,可以使用DBMS_COMPRESSION.GET_COMPRESSION_RATIO报告压缩比例,然后可以评估还需要多少空间。

 

The in-memory columnar compression is closely related to Hybrid Columnar Compression. Both technologies involve processing sets of column vectors. The primary difference is that the column vectors for the IM column store are optimized for memory storage, whereas the column vectors for Hybrid Columnar Compression are optimized for disk storage.

内存列存储压缩与Oracle exdata的混合柱压缩非常相似。两个技术都包含处理列矢量的集合。主要区别在于内存列存储的列矢量是为内存存储而优化过的,然后混合柱压缩的列矢量是为磁盘存储而优化过的。

 

 

Scan Optimizations for the IM Column Store

内存列存储的扫描优化

 

The columnar format enables queries to scan only necessary columns.

柱状格式使查询仅扫描需要的列。

For example, suppose a user executes the following ad hoc query:

例如,一个用户执行以下即席查询:

SELECT cust_id, time_id, channel_id

FROM   sales

WHERE  prod_id > 14

AND    prod_id < 29

 

When using the buffer cache, the database would typically scan an index to find the product IDs, use the rowids to fetch the rows from disk into the buffer cache, and then discard the unwanted column values. Scanning data in row format in the buffer cache requires many CPU instructions, and can result in poor CPU efficiency.

当使用buffer cache时,数据库会扫描索引去查找product IDs,用rowids从磁盘取出数据放到buffer cache,然后丢掉不要的值。行模式下载buffer cache中扫描数据需要CPU执行许多指令,导致CPU效率很低。

When using the IM column store, the database can scan only the requested sales columns, avoiding expensive disk I/O altogether. Scanning data in columnar format pipelines only necessary columns to the CPU, increasing efficiency. Each CPU core scans local in-memory columns using SIMD vector instructions.

当使用内存列存储时,数据库只需要扫描需要的sales列,完全避免了昂贵的磁盘I/O开销。柱状格式扫描数据仅传输需要的列给CPU,提高了CPU效率。每个CPU扫描本地内存中的列使用SIMD矢量命令。

Oracle 12.1.0.2 New Feature翻译学习【In-Memory column store内存列存储】【原创】