WSUS the Redheaded step child of Configuration Manager

So like a lot of people I drank the kool aid for WSUS and Config Manager. Install the feature let SCCM configure it and forget the WSUS console exists. As long as you do some occasional maintenance it just works. Then the cumulative patches came along and every month this year has had 5-6 days each devoted to “fixing” the WSUS\SUP servers. I know I am not alone in fighting high CPU spikes while patching. I have added more memory and CPU to the servers, it helped but the next month the issue returned. Open a case with Microsoft and got a very intensive lesson on how to do maintenance the right way. Which if you need to learn check out The complete guide to Microsoft WSUS and Configuration Manager SUP maintenance and then follow it. But even after all that the issue started to reoccur as my patching team was dealing with the stragglers from the last patching round.

So I opened another case up with the wonderful folks at premier support and we start looking. This time around I would just getting spikes in CPU that would clear up after a hour or 6. As we checked and rechecked everything we were seeing that as few as 50ish connections to the WSUS site would spike the CPU utilization up to 80-90%. Prior to all these issue I would see an average CPU utilization on these servers of 30-40%. While there would be spikes during heavy patching periods they were also accompanied by large numbers of connections to the WSUS site. Using this as justification to finally clean out some obsolete products from the catalog,(Yes Server 2003 was still in there), I unchecked a few products and synced. After running the cleanup, reindex, and decline process; Still no improvement. After looking at the calendar and seeing the next Patch Tuesday coming quickly, I though well if it is going to be another crappy patch cycle lets try just doing security patches and kick everyone one else out of the pool. Well the Updates classification has the largest number of updates in my environment. (This may not be the case in yours.)  So I unchecked the classification and synced. Wow, performance dropped back to normal. To be sure, I triggered a couple of thousand update scans. I was able to get several hundred active connections and the CPU never spiked over 60% and was averaging ~30% utilization. To double check that this was truly the cause, I added the Updates classification back and synced. The sync took about 2 hours to finish and the CPU utilization started spiking again. This time 90-100 %, quick dig in and look at the root cause.

So I start searching through the updates in WSUS and comparing to what is being deployed via Config Manager. WSUS still has lots and Server 2003 updates and I just removed them, why are they still approved? I even found some XP and 2000 Updates approved in WSUS and they have been long gone. But the updates were approved and the WSUS server was diligently querying them to see if they applied and updating the status for them as well. So based on the assumption all those old products were increasing catalog to the point that performance was suffering, I started looking for a way to clean up.  *While I am going to talk about my script and hope you use it, Full credit to the Decline-SupersededUpdates.ps1 linked in The complete guide to Microsoft WSUS and Configuration Manager SUP maintenance and to Automatically Declining Itanium Updates in WSUS as the basis for how to do all this cleanup via powershell.

Now back to the investigation, I still wanted to figure out why all these updates had been approved. After lots of checking and comparing between various sites I found that my top level WSUS server for SCCM had the default auto approval rule enabled in WSUS. Well that explains the why but now for the clean up. To help Identify the updates I wanted to decline I used this powershell

This will grab all updates that are not declined and send them to a Gridview window. I like this because when wsus is overworked the console can timeout frequently and I find it easier to search through all the updates this way. A few things to remember about this, This code assumes you are running on the WSUS server you are checking. It can be run remotely on any system that has the management tool install. You will need to adjust the variables to match your environment.

If you get timeouts with this then your wsus server needs some love. You can retry but if you get timeouts 2 or 3 times stop and go read the complete guide to Microsoft WSUS and Configuration Manager SUP maintenance. Follow those steps and come back and try again.

Once you get the GridView window start searching for updates that can be declined. For example search for XP and see what you get. Here is what I found on one of my servers Lots and lots of XP updates. What I found is that even when you stop syncing the product the updates already in the catalog stay until you decline them. Why does the matter you ask? While the clients do not get them sent to them the wsus server has to process the updates in queries when a client request a scan. In my case a server with plenty of CPU and Memory using a full SQL install could only handle ~50 scan requests before getting overworked. After declining all the old unwanted update performance returned to normal.

Using the variable $allupdates from the powershell above I created several rules to identify and decline updates. Now this is what could be declined in my environment. YOU MUST EVALUATE WHAT CAN BE DECLINED IN YOUR ENVIRONMENT. I am posting these a examples of what I did and how I cleaned up my environment. If you copy what I did and find that you need the updates, all is not lost, just approve the update again and it will be available again.

Now with all that being said I wish that I could give you a definitive recommendation on what number of un-declined update with cause you issues but I don’t because every environment different. What I can say is now we must monitor the wsus catalog and ensure the our maintenance processes now ensure that unused and unwanted updates are declined.

Here is the full cleanup script I used to get back to normal

 

Config Migration Tip – Use PowerShell to export and import Security Roles

I have been doing a lot of migration prep work and wanted to share a big time saver for moving security roles. You can use PowerShell to export and import security roles. If you have lots of custom roles this is a huge time saver.

To export all of the custom roles

After you collect all the xml files for the roles and are ready to import them use this

 

Restoring SMS Registry from the SCCM Site Backup

Well I had an interesting morning. For the past couple of weeks I have had to repair several site servers where key ccm and sms registry key had been deleted. At 1st it appeared that a client repair had gone bad and killed the keys. But this morning it was track down to someone running a client repair script incorrectly. They were targeting a remote client but the script was removing local registry keys. However today it happened on a primary server and we were looking at a site recovery to fix it. What follows may not be supported but it worked for me; if you are looking at site recovery worst case is this does not work and you will need to do the recovery anyway.

On this system the script attempted to deleted the HKLM\Software\Microsoft\SMS key and all sub keys. Most were still present because the SCCM services and components had them open and the delete failed. But a lot were missing! So we when looking for possible backups. I attempted to load the backup copy of the Software hive from windows\system32\config\regback but that was unsuccessful. Next I turned to the System Backups but the recovery plan for this server was to rebuild and then restore the application drives so the OS drive was not backed up. Well the site recovery was looking more and more like the solution. As I checked that backup from the site maintenance process the file \SiteServer\SMSbkSiteRegSMS.dat file reminded me that the back up includes the HKLM\Software\Microsoft\SMS key. So I took a peek at the DAT file in notepad and sure enough it had the registry info. After loading the DAT file as a custom hive in regedit I exported the custom hive and the sms key. (Always remember to back up the registry you are about to change. Got to remember to explain this to the script author 🙂 ) In the reg file for the custom hive I updated the path so that all of the key were for HKLM\Software\Microsoft\SMS. After ensuring that all of the SMS services where stopped, the custom hive reg file was imported into the registry. Some checking to ensure thing like server names and site codes where correct and the sms services were restarted. After celebrating the lack of red in the server logs, the site was declared functional and I snuck off for a nap.

Quickly get system stats for Software Update Point\Management Point

Like most of us out here is SCCM land Patch Tuesday generally means that you will see a performance for the WSUS app pool that you will need to ensure does not lead to issues. Here is a quick PowerShell command  to get the CPU utilization and the current connections the websites. I use it for monitoring both management points and software update points. Because it is using performance counters it is easy to add other counters that are relevant to your own environment.

get-counter supports remote systems with the -ComputerName parameter so to check Multiple systems use

 

Software Updates not applying after a Site Reset

There is a long story about why that I have not had time to post about, one of the SCCM environments had to be recovered with a Site Reset. The reset was successful and everything appeared to be functioning normally. The next day the patching team started to report a few clients not applying patches. Now this is not unusual, there are always some clients that have issues, but by the end of the day it they were reporting that it was all clients.  The bulk of the clients were reporting  ‘Assignment ({xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx}) already in progress state (AssignmentStateDetecting). No need to evaluate UpdatesDeploymentAgent’ in the UpdateDeployment.log. Gabriel Alicea has a great post that solved the issue – https://www.linkedin.com/pulse/total-actionable-updates-0-gabriel-alicea-mcts.

The moral of the story is that the Site Reset changed the version of the wsus catalog to 1 on the primary server but the software update point and the database had a different number. Stopping the services on the primary,updating the registry values, restarting the services and then running a sync allowed the clients to correctly evaluate the scans and apply patches.

 

Preping SCCM Boot Disks to use with WDS or 3rd Party PXE

I have been busy after taking an extended vacation and then catching up at work, but a couple of folks on twitter have been sharing about using the SCCM Boot Disks with WDS. This is something that I do and learned from Johan Arwidmark . If you work with Configuration Manager you may have heard of him before. 🙂 Here his current post on the subject http://deploymentresearch.com/Research/Post/612/Making-a-ConfigMgr-boot-image-work-with-standalone-WDS-or-3rd-party-PXE-server and it has a link to Zeng Yinghua (Sandy)‘s post on this using iPXE rather then WDS.

What I have to add is a quick Powershell script to automate prepping the Boot Disk for use outside of a integrated SCCM PXE DP.

Step 1 – Create the Boot disk as a ISO https://docs.microsoft.com/en-us/sccm/osd/deploy-use/create-bootable-media

Step 2 – Extract the ISO contents.

Because I do this for a couple of boot disk in several environments I name the iso and the directory that they are extracted to something that helps keep track of them easily. For example Lab_x64, Lab_x86, Prod_x64, etc ..

The script below will use the folder name that you extract the files to name the wim file as well. While not a big deal if you only do one disk at a time, it helps when processing several at the same time.

Step 3 – Prep the boot.wim file for use via PXE.

This is the script that I use

This will go thought each of the directories in the $sources variable and mount the boot wim with dism. If there is not a \SMS\Data  folder it will create it. Next it copies the contents of the SMS\DATA folder from the source directory extracted from the iso and copies it to the mounted wim file. After that the script does an optional step. For my environment when we use WDS there is need to execute a Pre-Execution step. The files for this are staged in the Tools directory. So the script creates the directory and copies the files. Next it copies the TSconfig.ini for the Pre-Execution step. This is also optional.  The script then unmounts the wim file and commits the changes.

After all of the boot.wim files have been updated the script will copy each to a staging directory and name the files based on the name of the source folder. So Lab_x64.iso was extracted to d:\SCCM_export\Lab_x64 and the boot.wim from that directory is named Lab_x64.wim.

Step 4 – Copy to your WDS server and add to the menu.

Thanks Johan and Sandy for posting and reminding me about this.

 

Management Point Troubleshooting

Just a quick note, If you are looking into issues with a management point responding with a 500 error for policy.

If http://servername/SMS_MP/.SMS_AUT?MPCERT is ok

and http://servername/SMS_MP/.SMS_AUT?MPLIST gives a 500 error.

Check your database it could be down. Not how I wanted to end a Monday but at least in this case it was a SAN network issue impacting the database server. Once that was resolved the SCCM site came right back up.

 

MVP Days – New Orleans

Come join me at MVP Days New Orleans on May 12, 2017. I have submitted to speak and will hopefully be presenting. But after attending the Orlando MVP day in 2016, I decided to attend every one of these that I can work into my schedule. So regardless of out come of my sessions submission, I will be there. The biggest reason is the time that is set aside for interaction with all of the speakers at the end of the day. This unstructured session allows you to ask questions and network with everyone. That is hands down worth your time. So if you are able register and take part in a great community event.

Distribution Points not reporting Usage data

I am scratching my head about the what happened on this issue, but let me explain. We had a issue come up in our SCCM RAP. (If you are a Premier Customer, I highly recommend the RAP as a service. Get it, Use it, and Use it often.) There were about 30ish distribution points that were not reporting usage stats for our 2012 R2 Configuration Manager site. A quick check showed that the distribution points where alive and well but that the scheduled task that reports the usage statistics was gone.  If you need a good primer on how to check out a distribution point see this post from Scott’s IT Blog. To resolve this I simply exported the task from a working distribution point and imported on the systems where it was missing. To be honest though it did not get fixed everywhere. A couple of months go by and we rerun the RAP and look at the results. Now there are over 300 distribution points not reporting statistics because of a missing scheduled task. This makes me beg and plead to speed up our upgrade project for Current Branch. But until then I have to keep everything going, so a little powershell to the rescue.

First I choose to export the existing scheduled task from a working server and save it as c:\temp\Content Usage.xml

The Rap web site is great for reporting the issue but not so much for getting the details in a way that is easily usable in a script. So here is a SQL query to identify distribution points not reporting usage date.

This will give you a list of server names that you can save in a file. Now for the powershell to recreate the scheduled task and run it.

Give it a little time and rerun the SQL query to verify that the systems are reporting usage data and are being removed from the report.