To know what sites haven't backed up within a given date range (a day and a half since last backup), and those that have never backed up (never reported a backup).
Groovy stuff, not so easy to determine though, or so it would seem, unless you fall back entirely onto the Status Message sub-system.
When a SMS2003 or ConfigMgr backup completes, a Status Message is produced by the SMS_SITE_BACKUP component and processed by the Site server. As you probably know, Status Messages replicate up the hierarchy, so the Central Site server will know all Status Messages being "sent up".
I've had this code running for a few days, it seems to work a charm.
There is one thing that can get in the way though, customised Status Filter Rules. If you have them, you understand them (unless you've inherited in which case GIYF), and know that they can block Status Messages from being processed or delivered to the Parent Site server (drop all informational for example, as the last rule in the rule set, yes some folks do this!).
The T-SQL to bring back the Status Messages, and to render a list of machines that do not fit the logic (no backup reported, ever) is here:
SELECT vsite.ServerName AS Servername, vsite.SiteCode, MAX(stat.Time) AS LastTime
FROM vStatusMessages AS stat LEFT OUTER JOIN
StatusMessageInsStrs AS ins ON ins.RecordID = stat.RecordID LEFT OUTER JOIN
StatusMessageAttributes AS att1 ON att1.RecordID = stat.RecordID LEFT OUTER JOIN
v_Site AS vsite ON vsite.ServerName = stat.MachineName
WHERE (vsite.Type = 2) AND (stat.Component = 'SMS_SITE_BACKUP') AND (stat.MessageID = 5035)
GROUP BY vsite.ServerName, vsite.SiteCode
HAVING (MAX(stat.Time) < DATEADD(day, - 1.5 , GETDATE()))
ORDER BY LastTime
select 'Sites that have never reported a backup (Investigate any server that is listed)'
select servername from v_site where type = 2 and servername not in
(SELECT vsite.ServerName AS Servername
FROM vStatusMessages AS stat LEFT OUTER JOIN
StatusMessageInsStrs AS ins ON ins.RecordID = stat.RecordID LEFT OUTER JOIN
StatusMessageAttributes AS att1 ON att1.RecordID = stat.RecordID LEFT OUTER JOIN
v_Site AS vsite ON vsite.ServerName = stat.MachineName
WHERE (vsite.Type = 2) AND (stat.Component = 'SMS_SITE_BACKUP') AND (stat.MessageID = 5035)
GROUP BY vsite.ServerName, vsite.SiteCode
HAVING (MAX(stat.Time) >= DATEADD(day, - 1.5 , GETDATE())))
The Query itself is entirely harmless, in that it doesn't UPDATE or DELETE, but you should be aware it can put a load on your DB instance while running. Nothing to big to worry about, unless you are seriously underspec'd hardware-wise.
And finally, be aware that as long as Status Filter rules are not inhibiting the delivery of the Status Message from down-level sites up the hierarchy, and that Sites in your hierarchy are communicating with their Parent's correctly, then you should be OK to rely on this report. I would suggest having some more reports configured to check for any sites that haven't reported in within the last 24hrs (indicative of some kind of failure taking place) and also to look at using dedicated Monitoring tools such as OpMgr to monitor Site servers and Site systems health using the Management Pack.
0 comments:
Post a Comment