I’ve seen plenty of technical mistakes when implementing SharePoint, particularly in larger environments when the risks of failure are higher. Here’s a countdown of my top ten “favorite” SharePoint mistakes:
SQL Server performance is the lifeblood of SharePoint, and yet frequently folks don’t size SQL Servers correctly. If you want to know if your SQL Server has enough RAM, there’s a simple counter to watch: SQL Server Buffer Manager: Page Life Expectancy.
This is how long the buffer manager expects it can keep a cached page in memory. You want this number to be 300 (seconds) or better. If it’s not, your performance is suffering because you’re forcing the SQL Server to go to the disks too often.
All too often storage capacity is the name of the SAN game. However, performance is more important. You want to make sure that the SAN can respond to both read and write requests within 20ms – ideally within 10ms. This is a combination of smaller, faster disks and more of them.
It’s also a matter of using RAID 10 instead of RAID 5 for striping. If you believe the ”snake oil” that the configuration of disks in a SAN doesn’t matter because your vendor is “special,” you might need to look for a new line of work. The physics of disks applies whether your vendor wants them to or not.
The load balancer is the traffic cop for your environment, and a bad load balancer configuration can make performance bad. You want to configure your load balancer for session affinity, or sticky, or whatever they want to call it to keep sessions on the same server they started on.
That's because SharePoint caches a ton of information locally on the server. Keeping a session on the same server will perform better over time. Keep them on the same server for long periods of time, for example, 20 minutes, not 20 seconds.
Whether your plan missed the disk capacity for search indexes, or you skipped over the performance of those disks, search query performance relies on query servers which need about 30 percent of the disk that you’re crawling content for.
Thirty percent is a generally safe number. Make sure you plan for how much storage you need on the SharePoint servers – including performance.
There’s some argument about segmenting user traffic from back end traffic on SharePoint servers; however, everyone agrees that network performance between the SharePoint Servers and SQL Server is critical. It should be low latency and high-capacity.
Generally this means only switches between the SharePoint servers and the SQL Servers. Putting a firewall between SharePoint servers and SQL Server is silly.
Make sure your latency between servers is less than 10ms. For the record, my observation is that you should aggregate all network interfaces rather than segmenting front- and back-end traffic.
Sure, most large implementations implement QA environments but all too often their configuration is allowed to drift from the production environment. QA should match production in terms of the types of components, and should be fractionalized in terms of the number of servers and resources for cost reasons.
Make sure that your QA environment has a load balancer and all the firewalls that your production environment has and that the rules are the same. You’ve been warned.
Environments shouldn’t be able to talk to each other. QA shouldn’t be able to see into development, and production shouldn’t be able to peer into QA.
If you do allow this, you should expect you’ll create an unexpected cross-environment dependency. You’ll take down the development environment, and production will crash. Not good.
One of the neat tricks that sometimes will happen is the use of reverse proxies in front of a SharePoint farm. It sounds good on the surface, until you realize that your SharePoint server won’t see the client IP address.
What’s the problem? Well, try debugging your production server when you can’t figure out which traffic is having the problem just once, and you won’t have to ask again.
SharePoint will warn you it’s having trouble. From ULS logs and event logs to the health score that’s returned with every HTTP request, SharePoint isn’t shy about telling you it needs help.
Of course, you have to be listening. Load balancers watch servers to see if they’re in trouble, and so does System Center Operations Manager, but you have to set these things up, and respond to trouble tickets when they come.
Someone sends out an email that the new intranet site, My Sites, and collaboration platform are available. Suddenly everyone in the organization comes flooding in, and in the process, they put the entire farm underwater.
The servers encounter more load in an hour than they’ll typically encounter in weeks of operation, and a great environment is tarnished by one big email. Rather than doing one big-bang email to everyone, stage your communication over the course of a day or two to even out the load a bit.
It’s much better to be twiddling your thumbs because the servers aren’t busy than trying to scramble to keep the environment functional due to overwhelming demand.
That’s my top 10 list, what’s yours?
Robert Bogue is a Microsoft MVP for SharePoint, an internationally renowned speaker, and author of 22 books including the SharePoint Shepherd’s Guide for End Users. You can find out more about Robert’s work to encourage business value out of SharePoint at SharePoint Shepherd or more about his technical solutions at Thor Projects.