RAC's some conceptual and theory of knowledge (transferred from Dai Mingming an article)

2010-08-06  来源:本站原创  分类:Database  人气:199 

1.1 Concurrency Control

In a clustered environment, critical data is often shared storage, such as on a shared disk. The various nodes of the data have the same access, then there must be some mechanism to control node access to data. Oracle RAC is the use of DLM (Distribute Lock Management) mechanism for concurrency control among multiple instances.

1.2 Amnesia (Amnesia)

Cluster environment is not a centralized storage of configuration files, but each node has a local copy of the normal operation of the cluster, the user can cluster in the configuration of any node Genggai, and this change will be automatically synchronized to other nodes.

There is a special case: Node A normal shut down, Node B, changes in the configuration, close the node A, start the node B. This case, modify the configuration file is missing, is called amnesia.

1.3 Brain split (Split Brain)

In the cluster, the nodes through a mechanism (heart) to understand each other's health, to ensure coordination of all nodes. Suppose only the "heart" problems, each node is still running, this time, each node that is down the other nodes, and that he is the cluster environment, "the only built in those" who should get the whole cluster The "control." In a clustered environment, storage devices are shared, which means that data disaster, this is the "split brain"

The usual way to solve this problem is to use the voting method (Quorum Algorithm). Its algorithm mechanism is as follows:

Each node in the cluster heartbeat mechanism needs to inform each other of the "health state", assuming that each node receive a "briefing" on behalf of one vote. For the three nodes of the cluster, the normal operation, each node will have three votes. When node A node A heart failure but is still running, then the entire cluster will be split into two small to partition. Node A is one of the remaining two is a. This is a partition can be removed to protect the health of the cluster to run.

For three nodes of the cluster, A heart problems later, B and C is a partion, 2 votes, A is only one vote. In accordance with the voting algorithm, B and C received control of the composition of the clusters, A to be removed.

If only two nodes, the voting algorithm becomes ineffective. Because each node is only 1 vote. This required the introduction of a third device: Quorum Device. Quorum Device Hunger is usually shared disk, the disk is also called the Quorum disk. The Quorum Disk also represents one vote. When the two nodes of heart problems, two nodes at the same time to fight for Quorum Disk the vote, the earliest arrival of the request is the first meet. Therefore, the first access node on the Quorum Disk and get 2 votes. Another node will be removed.

1.4 IO isolation (Fencing)

When the cluster a "split brain" problem, we can use the "voting method" to solve the cluster control who gets the problem. But this is not enough, we must also ensure that was driven out of the shared data nodes can not operate. This is IO Fencing problem to be solved.

IO Fencing achieve a hardware and software in 2 ways:

Software approach: to support SCSI Reserve / Release command of the storage devices can be used to achieve SG command. Normal node using the SCSI Reserve command "lock" storage devices, fault was found after the storage device is locked, you know they're being driven out of the cluster, that is their abnormal situation occurred, we should restart itself in order to restore to normal. This mechanism is called Sicide (suicide). Sun, and Veritas is using this mechanism.

Hardware mode: STONITH (Shoot The Other Node in the Head), the direct operation of the power switch in this way, when a node fails, another node can detect if it will issue an order through the serial port to control the power supply fault node switch, through a temporary power outage, but power means the fault node is restarted, this approach requires hardware support.

2 RAC cluster

2.1 Clusterware

In stand-alone environment, Oracle is running in the OS Kernel above. OS Kernel is responsible for managing hardware devices, and provides hardware access interface. Oracle does not directly operate the hardware, but to replace it with OS Kernel to complete the call request to the hardware.

In the cluster environment, the storage device is shared. OS Kernel are designed for stand-alone and can only control between multiple processes on a single visit. If you also on the OS Kernel services, we can not guarantee the coordination between multiple hosts. Then on the need to introduce additional control mechanism, in RAC, this mechanism is located between Oracle and the OS Kernel Clusterware, it will intercept the request before the OS Kernel, and then other nodes of the Clusterware, culminating in the completion of the upper request.

Prior to the Oracle 10G, RAC Clusterware needed dependence and hardware vendors such as SUN, HP, Veritas. From Oracle 10.1 version, Oracle launched its own cluster product. Cluster Ready Service (CRS), from RAC is not dependent on any cluster software vendor. In Oracle 10.2 version, the product was renamed: Oracle Clusterware.

So we can see the entire RAC cluster, in fact there are two clusters, the existence of a composition by the Clusterware software cluster, another cluster formed by the Database.

2.2 Clusterware component

Oracle Cluster is a separate installation package, installed in each node of the Oracle Clusterware will start automatically. Oracle Clusterware is running environment consists of two disk files (OCR, Voting Disk), a number of process and network elements.

2.2.1 disk file:

Clusterware requires two files during the operation: OCR and Voting Disk. This two files must be stored in the shared storage. OCR is used to solve the problem forgetfulness, Voting Disk is used to solve the problem forgetfulness. Oracle recommends using raw device to store the two files, each file to create a bare device, about 100M each raw device allocation of space is enough. OCR

Forgetfulness problem is due to the configuration information for each node has a copy, modify, synchronize the node configuration information is not caused. Oracle solution is used in this configuration file on the shared storage, this file is the OCR Disk.

OCR save the cluster configuration information, configuration information for "Key-Value" Save the form of one. In Oracle 10g ago, this file is called Server Manageability Repository (SRVM). In Oracle 10g, this part has been redesigned, both known as OCR. In the Oracle Clusterware installation process, the installation program will prompt the user to specify the OCR location. And the user specified in this position will be recorded in the / etc / oracle / ocr.Loc (Linux System) or / var / opt / oracle / ocr.Loc (Solaris System) file. In Oracle 9i RAC, the reciprocal is srvConfig.Loc file. Oracle Clusterware will start inside the content according to the specified location read from the OCR content.

1). OCR key

The information is the OCR tree structure, there are three major branches. Are the SYSTEM, DATABASE, and CRS. Below each branch, there are many small branches. The recorded information can only be modified by the root user.

2) OCR process

Oracle Clusterware cluster stored in the OCR configuration information, so the contents of the OCR is very important that all of the OCR's operations to ensure the content integrity of the OCR, so ORACLE Clusterware to run the course, not all nodes can operate OCR Disk.

Memory in each node has a copy OCR content, this copy is called OCR Cache. Each node has a OCR Process to read and write OCR Cache, but only one node in the OCR process to read and write OCR Disk of the contents of this node is called OCR Master node. The node's OCR process is responsible for updating local and other nodes OCR Cache content.

All need to OCR the content of other processes, such as OCSSD, EVM and so called Client Process, these processes do not access the OCR Cache, but like the OCR Process sends a request, through OCR Process access the content, if you want to modify the content of OCR, but also to by the node's OCR Process as Master node of the OCR process to submit an application completed by the Master OCR Process physical read and write, and synchronize all nodes OCR Cache content. Voting Disk

Voting Disk used to record the nodes of this paper's main members of the state, in case of split brain, the decision to get control of that Partion other Partion be removed from the cluster. When you install Clusterware also prompted to specify this location. After installation is complete, the following command to see through the Voting Disk here.

$ Crsctl query css votedisk

2.2.2 Clusterware background processes

Clusterware by a number of processes, of which the most important three are: CRSD, CSSD, EVMD. Clusterware in the final stage of installation, will require the implementation of root.sh script on each node, the script in / etc / inittab file the final start the process of adding these three items, so that each subsequent system startup, Clusterware will automatically start, which EVMD and CRSD exception if the two processes, the system will automatically restart these two processes, if the process is the CSSD abnormal system will immediately restart.


OCSSD this process is the most critical process Clusterware, if the abnormal process will cause the system to restart, the process CSS (Cluster Synchronization Service) service. CSS services through a variety of real-time monitoring cluster status heartbeat mechanism, providing the basis of cluster split brain protection services.

CSS services are two kinds of heartbeat mechanism: one is through the private network, Network Heartbeat, the other is through the Voting Disk to Disk Heartbeat.

This two kinds of heart has the largest delay, the Disk Heartbeat, this delay is called IOT (I / O Timeout); the Network Heartbeat, this delay is called MC (Misscount). The two parameters are in seconds, by default IOT than MC, by default, which two parameters are automatically determined Oracle, and is not recommended to adjust. By the following command to see the parameter values:

$ Crsctl get css disktimeout

$ Crsctl get css misscount

Note: In addition Clusterware need this process in a single node environment, if you use ASM, also need this process; the process used to support the ASM Instance and the communication between the RDBMS Instance. If the node using ASM to install RAC, one problem: RAC nodes require only a OCSSD process, and should be run $ CRS_HOME directory The following 's, requiring a stop the ASM, And by $ ORACLE_HOME / bin / localcfig.Sh delete the inittab entries before deletion. Before installing ASM, we also use this script to start OCSSD: $ ORACLE_HOME / bin / localconfig.Sh add.

2). CRSD

CRSD is to achieve "high-availability (HA)" The main process, which provides services called CRS (Cluster Ready Service) service.

Oracle Clusterware is a component in the cluster level, it should be for the application layer resources (CRS Resource) to provide "high availability services," so, Oracle Clusterware must monitor these resources, and the abnormal operation of these resources to intervene, including the closure, restart process or transfer services. CRSD process is these services.

All the components that require high availability, will install the configuration when it comes to CRS Resource to the OCR in the form of registration, and CRSD process that is under the OCR contents to determine which monitor the process, how to monitor, but how do solve a problem. In other words, CRSD process responsible for monitoring the CRS Resource's operations, and to start, stop, monitor, Failover these resources. By default, CRS will automatically attempt to restart the resource 5 times, if still fails, then not try.

CRS Resource including the GSD (Global Serveice Daemon), ONS (Oracle Notification Service), VIP, Database, Instance and Service. These resources are divided into two categories:

GSD, ONS, VIP and Listener classes are Noteapps

Database, Instance and Service are Database-Related Resource class.

We can interpret it this way: Nodeapps each node that is only a sufficient, such as each node is only one Listener, and Database-Related Resource that is related to these resources and databases, without restrictions on the node, for example, a node can have multiple instances, each instance can have multiple Service.

GSD, ONS, VIP the three services are in the final Clusterware installation, implementation VIPCA create and register the time to OCR in. The Database, Listener, Instance and Configuration Service is in the process of their registration to the OCR automatically or manually in the.

3). EVMD

EVMD this process is responsible for publishing the events generated by CRS (Event). The Event can be distributed to clients 2 ways: ONS and Callout Script. Users can customize the callback script, placed in a specific directory, so that when there is a some event occurs, EVMD will automatically scan the directory and call the user's script, this call is to be completed by racgevt process.

EVMD process in addition to publishing the incident outside the complex, which is between the two processes CRSD and CSSD bridge. CRS and CSS 2 services before the process of communication is done through EVMD.


RACGIMON this process is responsible for checking the health status of the database, for Service to start, stop, fail (Failover). This process creates a persistent connection to the database, regularly checks SGA in the specific information, the information is updated regularly by the PMON process.


OPROCD This process is called Process Monitor Daemon. If the non-Linux platforms, and does not use third-party cluster software, you will see this process. This process is used to check the nodes Processor Hang (CPU hang), if the activation time over 1.5 seconds, you think the work exceptions CPU will restart the node. That this process "IO isolation" feature. From its Windows platform, the service name: OraFnceService you can see its capabilities. In the Linux platform, is the use of Hangcheck-timer module to achieve the "IO isolation".

2.3 VIP principles and characteristics of

Oracle's TAF is based on VIP technology above. The difference between IP and the VIP with: IP is the use of TCP layer timeout, VIP is the application layer using the immediate response. VIP is a floating IP. When a node problem will automatically to another node.

Suppose a 2 node RAC, each node during normal operation there is a VIP. VIP1 and VIP2. When Node 2 fails, such as abnormal relationship. RAC will do the following:

1). CRS rac2 node abnormalities detected, will trigger Clusterware reconstruction, and finally to remove the cluster node rac2 from Node 1 to form a new cluster.

2). RAC's Failover mechanism of VIP Node 2 will move to Node 1, Node 1, then there are 3 PUBLIC NIC IP address: VIP1, VIP2, PUBLIC IP1.

3). VIP2 user connection requests will be routed to Node 1 IP layer

4). Because node 1, there VIP2's address, all packets will pass the routing layer, network layer, transport layer.

5). However, the Node 1 monitor VIP1 and public IP1 only two IP addresses. Do not listen VIP2, so the application layer does not correspond to the application to receive the packet, the error was caught immediately.

6). Customer segment can immediately receive this error, then customer segment will re-launch the connection request to VIP1.

VIP features:

1). VIP is a script created by VIPCA

2). VIP as Nodeapps types of CRS Resource to the OCR in the registration by the CRS to maintain state.

3). VIP will be bound to the node's public network card on the public network card it has two addresses.

4). When a node fails, CRS will fault VIP node to other nodes.

5). Each node will also monitor public Listener's public ip cards and VIP

6). Tnsnames.Ora client normally configured to point nodes VIP.

2.4 Clusterware log system

Oracle Clusterware of diagnosis, can only be carried out from the log and trace. And it's log system more complicated.


$ ORA_CRS_HOME \ log \ hostname \ alert.Log, this is the preferred view the file.

Clusterware daemon log:

crsd.Log: $ ORA_CRS_HOME \ log \ hostname \ crsd \ crsd.Log

ocssd.Log: $ ORA_CRS_HOME \ log \ hostname \ cssd \ ocsd.Log

evmd.Log: $ ORA_CRS_HOME \ log \ hostname \ evmd \ evmd.Log

Nodeapp log in:

$ ORA_CRS_HOME \ log \ hostname \ racg \

This release is nodeapp inside the log, including the ONS, and VIP, such as: ora.Rac1.ons.Log

Tools for implementation of the log:

$ ORA_CRS_HOME \ log \ hostname \ client \

Clusterware provides a number of command-line tool:

For example ocrcheck, ocrconfig, ocrdump, oifcfg and clscfg, these tools create log on on this directory

There are $ ORACLE_HOME \ log \ hostname \ client \ and

$ ORACLE_HOME \ log \ hostname \ racg also related to the log.

Note: The order sub-film article Xiaoming Zhang's "lying Oracle RAC"

This article comes from CSDN blog, reproduced, please indicate the source: http://blog.csdn.net/tianlesoftware/archive/2010/02/27/5331067.aspx

  • RAC's some conceptual and theory of knowledge (transferred from Dai Mingming an article) 2010-08-06

    1.1 Concurrency Control In a clustered environment, critical data is often shared storage, such as on a shared disk. The various nodes of the data have the same access, then there must be some mechanism to control node access to data. Oracle RAC is t

  • UG mold design a detailed explanation of the theory of knowledge (Founder tutorial WWW.883Q.COM) 2011-08-23

    mold design mold ug ug ug ug mold mold mold design tutorial ug PRO / E progressive die design and installation video PDX software tutorial disk devices and device wildfire 3.0M060 video tutorial PROE mold base library EMX4.0/EMX4.1/EMX5.0 device soft

  • Learning basic knowledge database 2010-03-23

    First of all, to master standard SQL, each vendor to achieve a difference. To a very good grasp of SQL, can not do without a deep understanding of the relational model, the core is the set theory of knowledge, (FPL's list comprehension syntax, the sa

  • What is the structure 2010-07-29

    As the components in the software industry into the assembly industry (software component industry) in the process of their designers continue to find components of the desired component of the applications environment and application software divisi

  • Transfer: How to build a database server platform 2011-01-06

    Play Oracle 2 years, from contact with Oracle until now, have never stopped learning. Too many things to learn, just started feeling like this when, now and in that sense. Sometimes also would like to, but also to learn how long can they feel good, a

  • Design and mechanical design software? 2009-09-03

    Software architecture design and the design of mechanical systems have anything to do with it? Mechanical design, aircraft design, UI design and software design of these designs are there any similarities between the places? JE had the original above

  • Switch to Windows message interceptor technology 2010-04-01

    http://hi.baidu.com/kitter/blog/item/1e1082dda40460375982dd13.html I. Introduction As we all know, Windows runs the program is relying on events to drive. In other words, the program continued to wait for a message from occurring, and then determine

  • Workplace communication: a 7-year software engineer summary 2010-06-16

    1, sharing first experience: "Education on behalf of the past, the capacity to represent now, learning ability on behalf of the future." In fact, this is a field of education from a foreign study. Believe in a few years, ten years experience of

  • Switch: podcast-jbpm Chi Chuan and OA project (1) 2010-06-30

    OA has long been popular used in office automation management software industry has done a lot compared to the previous generation of OA System. They accumulated the rich and the OA development experience, so that JBOSS developed a specific framework

  • Transfer: transfer Chile podcast-jbpm and OA project (1) 2010-06-30

    OA has long been popular used in office automation management software industry has done a lot compared to the previous generation of OA System. They accumulated the rich and the OA development experience, so that JBOSS developed a specific framework

  • PL / SQL supports nested transactions 2010-07-21

    Usage scenario: When a PL / SQL program execution error need to output log information to the database when the table will encounter this problem. log content to be submitted to the table in front of the error, but the content can not be submitted. A

  • In the java program using comm port in the local management [Back Qinbo: Senior Software Engineer] 2010-08-23

    In the java program using comm port in the local management Recently did telecom project encountered a number of professional issues, to share with everyone here, under the comm package class and how to use the next issues: In java program, the local

  • How to do a good systems analyst? 2007-07-24

    I often think about a problem, what is the systems analyst? What kind of person is a good system analyst? What kind of person is the business really need a system analyst? Systems analyst may be very mysterious, and perhaps it is abstract, he has man

  • To upgrade the kernel to 2.6.36 Ubuntu10.10 use systemtap 2010-12-12

    Has been done on the linux platform has been developed for years, but the linux kernel are not seriously studied, personal knowledge of linux have stayed in the system api's to use level, the underlying design of little knowledge of linux. In coworke

  • Doing things right and doing the right thing is also important 2009-12-31

    http://www.cnblogs.com/hyddd/archive/2009/08/31/1556980.html Doing things right and doing the right thing is also important [ZZ] Author: Cheng Xiao-Xu (Disclaimer: All content is to ensure the integrity of the article) "Another year of graduation,&qu

  • Software Test Engineer entry: software testing from scratch 2010-06-13

    Test Preparation In the beginning of testing, software testing software testing engineers should find out what is the purpose. If you mention this issue to the project manager, he often replied: "that all of our products inside BUG, ​​which is the pu

  • Systems analyst of the Road 2010-08-22

    Programmer knowledge systems: Basic knowledge of operating system database SQL, basic knowledge of the hardware (computer theory), network knowledge, the basic algorithms, basic data structures, computer architecture, a programming language system an

  • From another angle, Agility 3 - in my mind agile 2011-02-21

    Agile mind "Agile is," this issue has long troubled me. Some time ago made a quick problem-solving approach, regarded from the practice (ways of doing things) on the agility for a simple summary. Recently been cleaning up, which attempts to desc

  • Workflow system on the things I need to do 2011-08-28

    I have bid farewell to my college and took my feet gently into the graduate age, although the direction of my computer network technology, but I think in my career planning is the most important stage in the post-graduate training from professional p

  • II. Magento extension using a template file 2010-02-26

    <br /> Prior knowledge before the start of the article to ensure you have read before the first Magento Extension I. <br /> This question still based on the code I provided, and then increase. Should the output text is not just My First Module