Re: split block (torn page) problem

  • From: <allan.robertson@xxxxxxx>
  • To: <Laimutis.Nedzinskas@xxxxxx>
  • Date: Mon, 12 Dec 2011 06:53:51 -0500

Laimis

The Oracle double checksum method had  "smart" storage systems aware of 
different data blocks stored inside the storage system as holding Oracle 
database pages.

The idea here is that when a host write arrives at the storage destined for a 
region known to hold Oracle data, some logic would be exercised to execute  a 
special form of "checksum re-verification".  This is about identifying that an 
incoming 8KB block write is an Oracle DB page, and that there is an Oracle 
checksum that is located specifically at bytes 24 through 32 within that 8KB 
that should be used. This is over and above the regular SCSI data block CRC 
transmission check summing, etc.

If a corruption is detected,  on that 8KB "Oracle DB page" that has been 
received, the storage system is supposed to immediately flag a SCSI IO error 
going back, as opposed to corrupting the previously stored data.

Now, however,  Oracle has been working with a number of partners, e.g. EMC and 
Emulex,  in driving a new end-to-end data integrity standard into the T10 
standard body.

With this new standard, each component that is T10 PI  compliant, formerly 
called T10 DIF, observes an expanded standard in the SCSI IO block structure 
which would ensure that they not only check the received data for correctness, 
but also passes along the standard data integrity checking information to the 
next physical device inside the SCSI request packet.  Each component, starting 
from the host's HBA port, through the SAN switches, the array front side, the 
array backend ports, the physical drives, etc should all enforce the check 
along the way.

Instead of having to worry about which block that is written down is "relevant 
Oracle data", we are, inside the storage, contending with a standard SCSI write 
request that direct us specifically to take additional checking action on the 
data block received per the T10 standard definition, reporting back any 
potential errors back in the manner specified by the standard.  The devices 
just need to be T10 DIF compliant.  They do not have to worry about 
distinguishing between Oracle data and something else.

EMC's  Yaron Dar, who wrote the techbook you quoted - "Oracle Databases on EMC 
Symmetrix Storage Systems", presented at Openworld this year covering T10 PI. A 
copy of the OOW slides can be found here with the info on T10 PI at  slide 33 
onwards

https://oracleus.wingateweb.com/published/oracleus2011/sessions/33580/S33580_1542630.pdf

Hope that this helps.

Allan
Principal Solutions Engineer,
Enterprise Applications,
Strategic Solutions Engineering
EMC Solutions Group (ESG)


--
//www.freelists.org/webpage/oracle-l


Other related posts: