Friday, November 16, 2012

VLDBs: dbcc checktable, checkalloc, checkcatalog

I like the idea of a full dbcc checkdb run against a space efficient storage level point-in-time copy, or a mounted independent full point-in-time copy at least once within the retnetion period for the oldest full backup and all transaction logs from there to current. But not all of the VLDB systems I am concerned about have that capability. Running a full dbcc checkdb on these systems is concerning because of the total execution time (it would often overlap a high data change ETL), and the potential effects of the private database snapshot maintained by dbcc checkdb.

So I'll be Scripting dbcc checktable, checkalloc, checkcatalog per table for a VLDB with several thousand tables.  A few questions as I determine the scripting design.

1. It looks like, other than stress on the disk subsytem and/or CPU, the private database snapshot maintained by dbcc checktable is a good reason not to run these concurrently for different tables.  Is it true that this is best only attempted in a single thread of activity due to the database snapshot?

2. Do dbcc checktable, checkalloc, checkcatalog each disfavor database blocks they read?

3. If dbcc checktable, checkalloc, checkcatalog  are each run in succession for a given table, will various relevent pages be read from disk 3 times even if not forced by buffer pressure?

4. If a table or index page is already in the buffer pool cache, will dbcc checkdb read it from disk anyway? (some database systems will use cached blocks from cache, and bypass cache for misses/physical reads for integrity check purposes.)

5. Other than the answers to questions 2, 3, and 4 above, are there other considerations when deciding whether to perform operations table by table, or potentially mix-n-match in order to hit an execution time between a low and high watermark?  For example, assume I want only 4 hours per day of consistency check activity.  Should I run select tables for which to run the set of checktable, checkalloc, and checkcatalog based on the sum of their most recent executions being near 4 hours... or should I consider mx-n-match of checktable, checkalloc, and checkcatalog if it results in a more even distribution of execution time?

Background information on per-table dbcc checkdb ~ dbcc checkdb, checkalloc, checkcatalog...  
http://www.sqlskills.com/BLOGS/PAUL/post/CHECKDB-From-Every-Angle-Consistency-Checking-Options-for-a-VLDB.aspx
Not much information on dbcc checkdb disfavoring pagess read, but here's a mention...
http://www.sqlpassion.at/blog/PermaLink,guid,e9deffeb-55f9-4b2d-815d-a9efd523ca41.aspx

Thursday, November 15, 2012

Not 'break-fix' by default

I'm a systems engineer.

I'm hard to work with sometimes, and I know it.  I'm working on it.   Never wanted to be hard to work with, always wanted to be good at what I do.

Its difficult for me to make progress in a world that is often operating in 'break-fix' mentality.

I know the way I visualize and model systems is somewhat uncommon.  But there are reasons I profile and model systems in particular ways.  When I learn a strategy that better accomplishes my goals, I'll adapt.

Asking me the question, 'what problem are you trying to solve?' is not always helpful.  Sometimes I am trying to prevent future problems (system best practices, code development and testing strategies, evaluation and intervention based on exposure before an incident). Sometimes I am trying to increase transparency, so that future problems are evident (how can systems be profiled, how can desired outcomes be modeled, what is the motivation of a currently accepted best proactive).  Sometimes, I already know that the problem I am wrestling with is so far in the weeds that if I describe it, I'll get ushered out of whatever forum I've turned to for information (I have to collect and understand corner cases.  Its part of who I am... but also, in my sphere, I'm the guy that gets called when everyone else runs out of ideas.)

So, I'll ask questions.  Maybe I won't give the context to which you're accustomed.  There's a reason for that.  I'm not a developer.  I'm not a DBA (not primarily, anyway).  I'm a systems engineer.  I'm not just concerned about the absence of errors in a sunny day scenario... I need to understand how things break and what to do next.  I'm not just concerned about how fast things get done... I'm concerned about performance reliability and variability... so I can spot issues before there's a complaint.

Sometimes I hear folks talking about the need to get developers and DBAs together.  I agree that'll be extremely helpful.  If server/virtualization/network/storage engineers/admins and systems engineers/architects can also be brought in... I think we might all be able to move forward.

Maybe...