Recently I came across a situation , where an RDS server started having random issues of services braking, random rebooting out of the blue and users getting disconnected. At the beginning of course we were thinking of a network issue, like the usual DHCP and DNS problems or bandwidth and network down problems. Of course the logs started showing otherwise. Running w32tm /query /configuration , w32tm /query /status , w32tm /query /source , w32tm /query /peers , we have found that the actual time server is set to be a non existent long time turned off machine.
Time sync issues in worst case scenarios can cause the breakage of trust relationships, but mostly cause log-in issues, authentication problems not only for the users, but services and apps too. Scheduled tasks will break, especially if they were not local.
Often time server details are propagated by a GPO or the default domain GPO. Normally the good practice is that the actual main domain server (netdom query fsmo)gets it's time from a main continental time server like pool.ntp.org (redundancy: 0.pool.ntp.org, 1.pool.ntp.org), then it should propagate to secondary DCs and other servers. So if the main domain controller is out of sync from the european time server, the infrastructure would be still in sync and issues would not come up. The most important problems occur when certain sections of the infra are delayed by more than 5minutes. It is rare, but did my research and it can cause some serious issues, especially on hybrid infra structures running linux and ms based operations, on premises and cloud in the meantime.
Our problem had risen from the fact that when server migration happened, they changed the server name and address. However I think it wasn't a service migration, but a hard VM to VM copy. Except that the main domain default GPO stayed as it was. Still set up for the old non existent time server.
At this moment, we did a risky practice, we edited the default GPO, then did a gpupdate /force on the 3 other servers. Would I have done the same if I had 80 servers ? Not sure !!!
Our idea worked. So I think, that it would work also for a big infra, except that maybe on the way of propagation, something would break and would need fixing. That is not an issue, as backups and snapshots are present. But, it should be done like during the general yearly downtime or during a 3 days long weekend.
Better Practice
Main Domain Policy GPOs have N°2 priority so setting a GPO above with N°1 priority is a more sustainable and secure practice.
First need to make sure that the PDC s time server has been set right:
w32tm /config /manualpeerlist:"fr.pool.ntp.org" /syncfromflags:manual /reliable:yes /update
w32tm /resync /force
Right-click the domain name → Create a new GPO
Example: "Time Configuration – All Computers"
Edit the GPO → Computer Configuration → Policies → Administrative Templates → System → Windows Time Service → Time Providers
Configure:
- Enable Windows NTP Client
- Type: NT5DS
- NtpServer: leave empty (clients follow DCs)
- Link the GPO at the domain level (top-level link)
- This applies to all computers and servers in the domain
w32tm /query /source or w32tm /query /status.(Please note that in some cases DCs are stubborn, so while your new time server might propagate down to everything including all servers and PCs, DCs would still not be updated. In this case you simply need to create another GPO applied to the Domain Controllers OU ! )
No comments:
Post a Comment