I think I've almost solved this myself:
UINT64 Rtc(void){ UINT64 softwareTimer = SwRTC; UINT32 lowOrderBits = softwareTimer; // just take low-order 32 bits UINT64 coreTimer = ReadCoreTimer(); if (lowOrderBits > coreTimer) // if CT has rolled over since SwRTC was updated softwareTimer += 0x100000000; // then increment high-order 32 bits of software count return (softwareTimer & 0xFFFFFFFF00000000ull) + coreTimer; }
This first reads the 64-bit software timer, then the 32-bit hardware timer.
The hardware timer (updated every 25 nS) should always be >= the low-order 32-bits of the software timer (updated only every 1 mS).
If it's not, that indicates the hardware timer rolled over since the software timer was read.
So, in that case I increment the high-order word of the software timer.
Then just combine the high-order 32 bits from the software time with the low-order 32 bits from the hardware timer.
One nice side effect is there's no need to disable interrupts.
The only problem I can see is, what if compiler optimization re-orders the code so that the hardware timer gets read first? Then I could get an interrupt that increments the software timer before I have a chance to read it.
At first I thought I could fix that by disabling interrupts while reading both timers, but what if the compiler re-orders the code so the DisableInterrupts() comes too late?