cancel
Showing results for 
Search instead for 
Did you mean: 

Questions on LUNs, containers and DB2_PARALLEL_IO

Former Member
0 Kudos

In my research on storage configuration (DB2 v9.1 on AIX 5.3) I am getting conflicting advice. Therefore, let me throw out a couple of questions for comment.

1) When using a virtualized SAN storage system (ie. Hitachi 9990V) is using one large LUN (spread over lots of disk) for sapdata as good as using several LUNs (spread over the same number of disks)? Why or why not?

2) Is using one container per tablespace with DB2_PARALLEL_IO properly set as good as using several containers per tablespace?

My goal is maximum performance, but I would like the simplicity of fewer LUNs and containers to manage.

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Hi Chris,

this is a good question, and there is probably no good answer for it. Sorry, but I'll describe my perspective anyway:

Everything depends on the OS and its filesystem. From database perspective it should not make a difference which of the configurations you're using. The database normally should use the blocks of the data files independent of the file the block resides in. As you found out the database is able to handle few big files or many small files. The I/O is normally done by I/O servers, agents and by page cleaners. Normally I configure as many I/O servers as I have disks (but not more than containers in the biggest tablespace or the number of parallel I/Os per tablespace). And I configure as many cleaners as many CPUs I have. The parallel I/O parameter should be set dependent on the decision you made, on the number and size of the files. But normally we have not more than 32 I/O servers configured.

Now, after I described that the problem has nothing to do with the database, I describe how you could decide which solution is the best for you. You have to find out which configuration suits your OS best. I have no experience on AIX 5.3, so I cannot tell you what I would suggest. But I've heard, that AIX 5.2 JFS had some problems with direct I/O and concurrent I/O. If you could use direct and concurrent I/O, you may use few big files and set DB2_PARALLEL_IO respectively. If not, you should configure more smaller files. I tested on Sun Solaris with direct I/O and on HP-UX. Solaris supports concurrent reads and writes and HP-UX only concurrent reads (at least without additional products from Veritas or other vendors). I have no data on other filesystem's yet. If you want to find it out for yourself you may use a small program written by Jonathan Lewis. He is an Oracle guru, but you may adapt it easily to DB2. You find it in his discussion if you should use raw devices or filesystem's: <a href="http://www.jlcomp.demon.co.uk/raworfs_i.html">http://www.jlcomp.demon.co.uk/raworfs_i.html</a>

For the size of the filesystem's we're limiting them to 1TB. The reason is, that not long ago the theoretical limit for many filesystem's was 2TB and we don't want to test if all of these limits are really gone. So in this point I would recommend that you decide for your own, if you want to have less filesystem's or you follow the bit more conservative track.

I hope this helps you, at least a bit.

Best regards

Ralph

Answers (1)

Answers (1)

Former Member
0 Kudos

Chris,

(1) if you are using one LUN, compared to multiple LUNs, you may see differences in IO latency. Based on your storage subsystem, the number of FC adapters, and the involved AIX device driver, this may be a measurable effect. I personally would think that multiple LUNs will lead to lower latency for a specific tablespace.

(2) Based on my experiences, changes PARALLEL_IO will not lead to dramatic performance changes, as large caches in your storage subsystem will make any calculation of spindles obsolete. If you are running AIX 5.3, jfs2, and the latest maintenance level, you should also not have to worry too much about DB2 container sizes. SAP's older recommendations on "maximum sizes" are based on contention, occurring with multiple writers to a single file (i-node contention). With jfs2 providing, and SAP software on DB2 9 using CIO, this problem does not occur too much any more.

Still, if you are going to use a file system, no one will question that this will consume less CPU cycles than direct access to LUNs (via DEVICE containers). This is the maximum performance option. It is a matter of manageability in your specific environment if you want to take this option. Almost all customers are fine with FILE based containers.

So, to summarize, don't just focus on maximum throughput, low latency is equally important.

Greetings,

Martin

Former Member
0 Kudos

Thanks for your reply. But to help me understand better, I would be grateful for more input regarding the following questions.

I have available two 6+2 RAID arrays across which I can stripe LUNS. I think this means I have a total of 12 disks (spindles) for data plus 4 parity disks.

Given that I am limited to this array configuration, would having multiple LUNS striped across the same RAID array (same disks) minimize IO latency or do the LUNS need to be on a different set of disks to decrease latency?

Where does the latency come from? The OS, the database, the storage array?

edmund_hfele2
Explorer
0 Kudos

Hi Chris,

I'd like to follow up on Martin's first point (1):

From the view of the storage system there shouldn't be a difference in targeting all the I/Os against one single, big LUN in the disk array or having multiple smaller LUNs in this disk array: At the end of the day all the disks in the RAID array need to serve the same amount of I/O requests. If the RAID array there gets into saturation, the only chance is to distribute the workload across more disks or faster ones...

Of course there could be a also difference in the paths of accessing one LUN or multiple LUNs through the SAN (e.g. spread across multiple FC HBA adapters...). Multiple LUNs will appear as multiple devices (hdisks) to the AIX operating system also: Each disk device will have its own queueing (controlled by the queue_depth parameter of the disk device). If there is a bottleneck in this layer then multiple disk devices would be advantageous; but alternatively increasing queue_depth for the big LUN then could help also.

best regards

Edmund