理解Oracle ASM条带化

Oracle ASM在处理ASM File的extent与ASM Diskgroup AU的关系时,采用了条带化的技术.条带化的具体实现是通过对Diskgroup Template的定义实现的, 包含粗粒度和细粒度两种.

下面以AU=1MB和AU=64MB大小为例,分别来解释一下10g和11g的条带化实现.
1, Oracle 10g R2条带化
(1), 10g中extent的小大为1MB,AU的大小也为1MB, 这里的extent大小固定, AU的大小也是固定的.

(2), 粗粒度类型的文件:数据文件,归档文件,临时文件, 它们的extent与AU为一一对应,即一个extent就占据在一个AU中, AU的stripe chunk大小为1MB.http://www.cndba.cn/redhat/article/2480

(3), 细粒度类型的文件:控制文件,online redo logfile,闪回日志, 它们的extent与AU默认采用的是_asm_stripwidth=8的拆分方式进行循环条带化存储.这样理解,1MB/8=128KB, 这里的stripe chunk大小为128KB, 这个里的1个extent被以128KB大小进行拆分;如果磁盘组只有有1个磁盘,那么这个128KB的stripe chunk会放在一个磁盘的一个AU中,这个AU中实际上存放了8个128KB的stripe chunk;但如果磁盘组有2个磁盘,那么stripe chunk会依次循环平均存储在2个磁盘的2个AU中,每个AU存储了4个128KB的stripe chunk;类似当磁盘组中磁盘数量增加为8时,则每个磁盘的一个AU存放1个128KB的strie chunk;

这里AU的大小是由au_size决定的,默认由_asm_zusize=1048576 bits(即1MB),是在创建diskgroup就确定的. 10g不能进行AU和extent的大小修改.
另外,粗粒度和细粒度也是由模板确定的,asmcmd不提供属性修改.

2, Oracle 11g R1/R2条带化
(1), 11g中支持extent大小的变化,AU也支持创建磁盘组时自定义.
(2), 默认AU=1MB与10g一样,但extent是随着extent数量的增加而变化.

具体见下表11gR1的AU为1MB和64MB为例

区编号 AU数量 extent大小(AU=1MB为例)
0-19999 1 1MB(1AU)
20000-39999 8 8MB(8AU)

40000 64 64MB(64*AU)

区编号 AU数量 extent大小(AU=64MB为例)
0-19999 1 64MB(1AU)
20000-39999 8 512MB(8AU)

40000 64 4096MB(64*AU)

(3), 以AU=1MB为例,粗粒度文件:除控制文件外,全是粗粒度,当文件的大小在20GB以内,extent的大小为1MB,当文件再增大直到160GB内,extent变为8MB,此时出现了,8个AU对应一个extent的情况,当超过180GB的空间时,extent会达到64MB的大小,同时也会出现64个AU对应一个extent;细粒度仅为控制文件,
按照_asm_stripwidth=8,出现和10g一直的情况.

(4), 以AU=64MB为例,粗粒度文件:除控制文件外,全是粗粒度,当文件大小在1280GB内,extent大小为64MB,当增加到11520GB时,extent大小变为512MB,同理,接着再增加时extent就会达到4096MB.;此时细粒度的控制文件,按照_asm_stripwidth=8,stripe chunk大小为1AU/8=64MB/8=8MB.如果磁盘组的数量为1时,1个extent拆分为8个8MB的stripe chunk完全分布在一个包含8个stripe chunk的AU中;磁盘组的磁盘数量为2时,1个extent分布在2个磁盘的2个AU中,其中每个AU含有4个stripe chunk;磁盘组增加的话,依次类推.

(5), 另外11g R2的规则变化依次为1,4, 8 16 32 64 AU数据块.

注意以上ASM的粗细粒度文件,指的是整个ASM中占用的数据库整体所有文件容量和.

Coarse-grain粗粒度striping spreads allocation units across the disks in a disk group. This is what provides load balancing for disk groups. When a file is allocated, ASM spreads allocation units evenly across all of the disks. Sometimes the distribution分配cannot be perfectly even, but over time it will tend to be nearly equal.

The above diagram shows a file with five allocation units striped across five disks in an external redundancy disk group containing eight disks in total.
For the first 20,000 extents, the extent size is equal to the AU size. After 20,000 extents and up to 40,000 extents, the extent sets are always allocated 8 at a time with the extent size equal to 4AU size. If the AU size is 1 MB, this means the ASM file will grow 32 MB at a time (8 4 1 MB). If the file is coarse-grained striped then it is striped across the 8 extent sets with stripes of 1 AU. Striping is always done at the AU level, not at the extent level. Thus every AU of a coarse-grained file is on a different disk than the previous AU of that file no matter how large the file. After 40,000 extents, the extents are still allocated 8 at a time, but with an extent size equal to 16AU size.

Fine-Grained Striping细粒度条带化
Fine-grain striping splits data extents into 128 KB chunks and it is provided to improve latency for certain types of files by spreading the load for each extent across a number of disks. Fine grain striping is used by default for control files and online redo log files.
The diagram on this page shows how fine grain striping works. In this example, the first 1 MB extent of a new file ends up occupying the first 128 KB of 8 different allocation units spread across the eight disks in the disk group. Consequently, a one-megabyte read or write is spread across eight disks instead of one.

案例：ASM中新建一个数据文件，然后，我们再来查看它的AU分布(和你的镜像）
sqlplus / as sysasm
startup
启动后主要查看是否挂在磁盘

1* select GROUP_NUMBER,NAME,STATE,ALLOCATION_UNIT_SIZE,TYPE from V$ASM_DISKGROUP
SQL> /

GROUP_NUMBER NAME STATE ALLOCATION_UNIT_SIZE TYPE

       1 DATA       MOUNTED                  1048576 HIGH
       2 FRA        MOUNTED                  1048576 NORMAL

如果没有在mount下就要使用命令挂载
alter disk …. mount ;挂载

查看磁盘组的信息
在ASM中确定磁盘组的信息：

SQL> select group_number,name from v$asm_diskgroup;

GROUP_NUMBER NAME

       1 DATA  磁盘编号1
       2 FRA

查看ASM磁盘信息

select group_number,disk_number,path from v$asm_disk
where group_number=1
SQL> /

GROUP_NUMBER DISK_NUMBER PATH

       1           0 ORCL:DISK1
       1           1 ORCL:DISK2
       1           2 ORCL:DISK3

读取数据文件
ASM中新建一个数据文件，然后，我们再来查看它的AU分布。
创建如下表空间：
create tablespace tbs_tst01 datafile ‘+DATA/tbs_tst01_00.dbf’ size 10M autoextend off;

我们创建了一个10M大的数据文件，也就是说，它会有10个AU ，在根据你的冗余级别是high 3路镜像，则应该有30个au 。如下查询一下它在ASM中的文件索引号：
（在ASM实例中执行如下语句：）
1* select name,file_number from v$asm_alias where name like ‘tbs_tst01%’
SQL> /

NAME Number

tbs_tst01_00.dbf 270

FILE_NUMBER列也是文件号，我们一般称它为ASM文件索引号，在这里 tbs_tst01_00.dbf的索引号是 270

如果只是想看看某个文件的AU分布，在ASM实例中，查询X$KFFXP视图。http://www.cndba.cn/redhat/article/2480

X$KFFXP
This X$ table contains the mapping between files, extents and allocation units. It allows to track the position of all the extents of a given file striped and mirrored across storage.
Note: RDBMS read operations access only the primary extent of a mirrored couple(unless there is an IO error) . Write operations instead write all mirrored extents to disk.

这个视图很容易理解，我把它的主要列介绍一下：
GROUP_KFFXP :磁盘组编号
NUMBER_KFFXP :文件编号
PXN_KFFXP :物理区号
XNUM_KFFXP :逻辑区号
LXN_KFFXP :0=primary, 1=first mirror, 2=second mirror
DISK_KFFXP :磁盘编号
AU_KFFXP :AU号
通常，一个AU就是一个区。逻辑区和物理区的区别是，如果冗余模式是Nomarl，有两个FailGroup，那么文件的每一个AU，可以称为一个逻辑区。它在两个FailGroup中分别各自对应一个AU，每个AU称为物理区。

reading ASM files with direct OS access
（1）Find the 3 mirrored extents of an ASM file
su - grid
sqlplus / as sysasm

select GROUP_KFFXP,DISK_KFFXP,AU_KFFXP from x$kffxp
where number_kffxp=(select file_number from v$asm_alias where name=’tbs_tst01_00.dbf’); http://www.cndba.cn/redhat/article/2480

GROUP_KFFXP DISK_KFFXP AU_KFFXP

      1          0       3380
      1          1       3380
      1          2       3380
      1          1       3381
      1          2       3381
      1          0       3381
      1          2       3382
      1          1       3382
      1          0       3382
      1          0       3383
      1          1       3383

GROUP_KFFXP DISK_KFFXP AU_KFFXPhttp://www.cndba.cn/redhat/article/2480

      1          2       3383
      1          1       3384
      1          2       3384
      1          0       3384
      1          2       3385
      1          1       3385
      1          0       3385
      1          0       3386
      1          1       3386
      1          2       3386
      1          1       3387

GROUP_KFFXP DISK_KFFXP AU_KFFXP

      1          2       3387
      1          0       3387
      1          2       3388
      1          1       3388
      1          0       3388
      1          0       3389
      1          1       3389
      1          2       3389
      1          1       3390
      1          2       3390
      1          0       3390

33 rows selected.
10M大的数据文件，也就是说，它会有10个AU

（2）find the oracle asm disk name
select disk_number,path
from v$asm_disk
where GROUP_NUMBER=1 and disk_number in (0,1,2);

DISK_NUMBER PATH

      0 ORCL:DISK1
      1 ORCL:DISK2
      2 ORCL:DISK3

ASM Failure Groups
Within a disk group, disks may be collected into failure groups. Failure groups are the way a storage or database administrator specifies the hardware boundaries that ASM mirroring operates across.
For example, all the disks attached to a single disk controller could be specified to be within a common failure group. This would lead to file extents being mirrored on disks connected to separate controllers. Additionally, an administrator can configure ASM to choose a default failure group policy. The default policy is that each individual disk is in its own failure group.
You can group disks into failure groups using whatever criteria you need. Failure groups can be used to protect from the failure of individual disks, disk controllers, I/O network components, and even entire storage systems. Typically, an administrator would analyze their storage environment and would organize failure groups to mitigate specific failure scenarios.
It is up to the database or storage administrator to determine what is the best failure group configuration for his or her installation.

磁盘组镜像和故障组
故障组是某个特定磁盘组中的一组磁盘，共享一个可以容错的公用资源。例如，故障组可以是连接到公用 SCSI 控制器的一组 SCSI 磁盘。

在定义磁盘组中的镜像类型前，必须将磁盘分组到故障组中。
除非专门将一个磁盘赋给故障组，否则磁盘组中的每个磁盘都赋给它自己的故障组。

一旦已经定义故障组，就可以定义磁盘组的镜像。可用于磁盘组中的故障组的数量可以限制可用于磁盘组的镜像类型。有3种可用的镜像类型：外部冗余、普通冗余和高度冗余。

外部冗余外部冗余只需要一个磁盘位置，并且假设磁盘对于正在进行的数据库操作不是至关重要的，或者使用高可用性的硬件(例如RAID控制器)在外部管理磁盘。

普通冗余普通冗余提供双向镜像，并且需要磁盘组中至少有两个故障组。故障组中的一个磁盘产生故障不会造成磁盘组的任何停机时间或数据丢失，除了对磁盘组中对象的查询有一些性能上的影响。当故障组的所有磁盘都处于联机状态时，读性能一般会得到提高，因为请求的数据在多个磁盘上可用。

高度冗余高度冗余提供三向镜像，并且需要磁盘组中的至少3个故障组。对于数据库用户来说，任意两个故障组中的磁盘产生故障基本上不会有明显的表现，如同在普通冗余镜像中那样。
总结：
ASM提供了3种冗余方法
external redundancy 表示Oracle不帮你管理镜像，功能由外部存储系统实现，比如通过RAID技术。
normal redundancy（默认方式）表示Oracle提供2路镜像来保护数据。
high redundancy 表示Oracle提供3路镜像来保护数据。

Stripe and Mirror Example
案例：ASM的failgroup
首先failgroup必须是diskgroup的一个子集，一个failgroup只能属于一个diskgroup。
当我们有mirror的要求时（例如normal redundancy），数据会被存两份，第一份假设存在failgroup A里，那么另一份就一定会存入非failgroup A的任意一个failgroup里。
所以，我们就知道了，当normal redundancy时（data copy=2），任意一个failgroup fail了，都没问题。

2个或3个failgroup的只是他们的下限，而非上限。
从文档里只是说到：
A normal redundancy disk group must contain at least two failure groups.
A high redundancy disk group must contain at least three failure groups.

因为当redundancy=normal时，并且failgroup=2时，那么每一个failgroup都包含完全的一整套数据镜像，大家认为这是最合理的。
但当failgroup>2时，由于data copy=2，自然而然每一个failgroup就不能包含所有的数据了，但这也是合理的。
因为failgroup带来的好处是，丢失一个（normal redundancy）或者两个（high redundancy）failgroup的所有数据，没有任何问题。

x$kffxp视图它提供的是ASM中每个文件的每个extent在disk上的mapping关系。
由于通常我们的AU是1MB，所以一个10MB的文件会被切成10个extents存放在不同的磁盘上以达到分散磁盘IO的目的。http://www.cndba.cn/redhat/article/2480

NUMBER_KFFXP：对应v$asm_file.FILE_NUMBER
XNUM_KFFXP:ASM文件的extent号。如果我们设定normal redundancy，那么一个extent会出现两份。high则出现三份。
DISK_KFFXP：对应v$asm_disk.DISK_NUMBER
LXN_KFFXP:0->primary extent, 1->mirror extent, 2->2nd mirror copy (high redundancy and metadata)

首先我是HIGH redundancy：http://www.cndba.cn/redhat/article/2480

select type from v$asm_diskgroup;

TYPE

HIGH —-我的磁盘data是高冗余
EXTERN

建立了一个4M大小的datafile：
create tablespace test datafile ‘+DATA’ size 4M;

找出数据文件datafile的file number
asmcmd
ls +data/orcl/datafile
TEST.268.800982215

或者通过视图
select name,file_number from v$asm_alias where name like ‘TEST%’

NAME FILE_NUMBER

TEST.268.800982215 268

该数据文件的文件比编号268

于是我们查看这个4M大小的文件在disk上的分布情况：
select disk_kffxp disk#,
XNUM_KFFXP extent#,
case lxn_kffxp
when 0 then ‘Primary Copy’
when 1 then ‘Mirrored Copy’
when 2 then ‘2nd Mirrored Copy or metadata’
else ‘Unknown’ END TYPE
from x$kffxp
where
number_kffxp=268
and xnum_kffxp!=65534
order by 2;
结果：
DISK# EXTENT# TYPE

     1          0 Primary Copy
     2          0 Mirrored Copy
     0          0 2nd Mirrored Copy or metadata
     2          1 Primary Copy
     1          1 Mirrored Copy
     0          1 2nd Mirrored Copy or metadata
     0          2 Primary Copy
     1          2 Mirrored Copy
     2          2 2nd Mirrored Copy or metadata
     1          3 Primary Copy
     2          3 Mirrored Copy
     0          3 2nd Mirrored Copy or metadata
     2          4 Primary Copy
     1          4 Mirrored Copy
     0          4 2nd Mirrored Copy or metadata

如上可以看到，每一个extent都被存了三份，这是期望的。
我们有3块disks。
由于在建diskgroup时我并没有指明failgroup，这里看到，
每个diskgroup的failgroup就是他自己本身，于是说明了当前我拥有3个failgroups，
说明其实failgroup的数量跟redundancy normal/high没有直接关系。只有下限的关系。

SQL> select name,FAILGROUP from v$asm_disk;

NAME FAILGROUP

DISK1 DISK1
DISK2 DISK2
DISK3 DISK3
DISK4 DISK4

接着我们来到另一个external redundancy的磁盘组上看：

SQL> select TYPE from v$asm_diskgroup;

TYPE

EXTERN 外部冗余

同样的，由于我并没有显示指明failgroup，每个diskgroup的failgroup就是他自己本身，当前failgroup数量为磁盘数量为2。
但由于data copy=2，所以不用存第二份镜像，所以failgroup此时无含义。

SQL> select name,FAILGROUP from v$asm_disk; 这里有两磁盘
NAME FAILGROUP

DATA01 DATA01
DATA02 DATA02

SQL> create tablespace testa datafile ‘+DATA’ size 4M; —-》建立4Ｍ表空间　默认ＡＵ＝１ｍ　分成４个
Tablespace created.

asm alias|grep testa
DATA 287 +DATA/xxx/DATAFILE/TESTA.287.729561149 http://www.cndba.cn/redhat/article/2480

select disk_kffxp disk#,
XNUM_KFFXP extent#,
case lxn_kffxp
when 0 then ‘Primary Copy’
when 1 then ‘Mirrored Copy’
when 2 then ‘2nd Mirrored Copy or metadata’
else ‘Unknown’ END TYPE
from x$kffxp
where
number_kffxp=287
and xnum_kffxp!=65534
order by 2;
DISK# EXTENT# TYPE

      1          0 Primary Copy
      0          1 Primary Copy
      1          2 Primary Copy
      0          3 Primary Copy
      1          4 Primary Copy

此时只有primary copy。　

签到成功

CNDBA社区