I had initially wrote this post with a great deal of information, but it came across as snarky and condescending which was not my original intention. Thus I’ve rewritten it in the hopes that it’s succinct and distilled down the the minimal amount of information needed to make an informed decision. This post is specifically geared towards B-Tree indexes as In-Memory indexes are only rebuilt upon start-up (or adding a new one) and Columnstore indexes have their own set of rules for which Niko Neugebauer has a fantastic write up.
Many of the issues that I end up having to explain to DBAs and Developers alike is that index fragmentation really doesn’t mean a whole lot. In fact it’s somewhat asinine to rebuild or reorg indexes on a nightly basis, even if that’s based on things such as fragmentation level – whether we’re considering fragmentation at the filesystem layer or pages that have a fair amount of unused space on them. Yes, this means you implementing things such as the built in index maintenance tasks in SQL Server 2014+, Adaptive Index Defrag, and Ola’s Maintenance Scripts are pretty much worthless at best and downright disastrous at worst.
Index fragmentation at a filesystem level really only hurts two things, the first of those is read-ahead which functions the best when all pages are contiguous on disk. That’s all well and good if you’re using read-ahead, which really should only be needed a handful of times unless memory thrashing is happening when it comes to query performance. Note that there are many other places in the engine where the scatter-gather API is used. The second place this can efficiently help is when LazyWriter [1,2] is invoked which can do scattered reads or gathered writes, which hopefully this isn’t invoked too often on your systems.
In addition to just the internal workings, the other upsides include less disk and buffer pool (memory) usage. This is all well and good, but does it really make sense to churn IO and CPU for sometimes minimal if any gains?
The downside to all this maintenance can be negligible if you don’t have large databases, have beefy servers with robust IO, and have many idle times during the day or night in which to execute these server killing maintenance routines.
This doesn’t just affect the server itself when executing the maintenance but can have a disastrous downstream effects. Since the on-disk structures will entirely change, every extent changed will need to be set so that the next differential backup (if being used) captures the changes. This also means that log usage will soar, and any downstream services such as: LogShipping, Replication, AlwaysOn Availaiblity Groups, Change Data Capture, Backups (All), Recovery (AG Failover, FCI Failover, Database Restore), Database Snapshots, etc., which you can see is non-trivial.
In the worst of the cases, which does seem to happen quite often, it can bring the entire server to its knees, causing long blocking waits, filling up the log, causing remote AG replicas to fall behind, among a large list of other items. This does have real and painful consequences.
TL:DR; Stop rebuilding indexes based on fragmentation level. If anything, keep up with your statistics updates and rebuild an index when it makes sense, such as deleting many rows, changing the schema of a table, or before taking a final backup to archive a database (not a comprehensive list of all things). Stop killing your servers and causing production down scenarios because of a fragmentation value.
I agree with your premise about the limited value of *most* index maintenance but this sentence is quite incorrect for many warehouse and analytics workflows:
“That’s all well and good if you’re using read-ahead, which really should only be needed a handful of times unless memory thrashing is happening when it comes to query performance.”
I’ll do a blog post sooner or later on readahead. But comparing the perfmon counters for buffer manager “pages read” and “readahead pages” can show the frequency of readahead. And comparing a given query in isolation with readahead disabled via trace flag(s) for changes in page io latch wait occurrence can also give an idea of readahead value.
Yes, in that *specific* case read-ahead can be extremely advantageous. However, after a large load in a datawarehouse you’d create your indexes or rebuild, depending on how you load the data and the frequency. That would be included in the loading process and would be a niche scenario where this post isn’t applicable. There are various other niche style working where it may help, I’m not talking about those here as they are niche. This is for the 87% (made up statistic) of systems that use SQL Server which are OLTP or HTAP.
I’ve a scenario where my database receives random inserts (data clustered by customer key on all tables so that joins between them become very cheap). In this case reorganize makes for a perfect worst-case scenario for inserts, where each insert isn’t just a page split, it is actually an extent allocation.
Everyone agrees that if random insert page splits placed 8 rows in page 1 and 1 row in page 2 it would be bad (so this is not done; it’s typically 50-50). However Extent allocation does exactly that behaviour at a page level. This results in 2 extents; one with 7.5 full pages, and one with 0.5 pages of data in it. With high probability a page in the original extent is the next to split, and this again results in ANOTHER extent allocation.
End result of the reorganisation “optimisation” — every write results in an Extent allocation, not just a page split.
These extents are only 1/16th full, so your database “allocated” size rapidly blows out to several times the size of the underlying size.
And the only solution to these basically empty extents is to reorganize to release this space, as it will (statistically) never be used.
If you know this is the case, why not change the fillfactor so you’re not at 100% full and causing everything to be an allocation and page split? Seems like an easy solution.
I’ve a server where most used databases are on SSD disks.
I think your advice is even more suited for SSD disks, where fragmentation has lower effects on performance.
Am I right?
David,
Yes, SSDs generally have great random access performance and won’t suffer nearly the same fate. There is still the space trade-off, so again make a good decision. If the table is all over the place fragmented and rebuilding it will get you back 1 TB of space, that might be worth it. My major point is to use logic and have a good reason why it is being done. I’ve asked people why they do it, why it needs to be done every day or weekend, I get blank sates and, “Well that’s what everyone’s scripts do and say to do so that’s what we do.” which isn’t a very good answer.
Completely agreed! We haven’t been doing maintenance for years now on our multi-TB instances with AGs. The overwhelming cost of indexing maintenance is just not worth it, and there are much bigger fish to fry to improve performance.
This is a near constant observation I’ve had over the last year, so it’s accurate. Minimally at least twice a week I am pulled into a production down situation which either involves a replica with a large redo queue or a large send queue and it’s easy to tell they blanket rebuild all their indexes.
You nailed it Alex, I’m glad you thought about it and realized it doesn’t make sense long before this post – and you’ve noticed no issues!
Hi Alex
It is really good to not been touched maintenance of rebuild indexes for years.
I am curious to know how did you able to manage this?