Sorry for the delayed response...I've got some difficult family things to work on IRL that are taking priority...
On 11/12/2015 05:23 PM, Timur Tabi wrote:
On 11/12/2015 06:06 PM, Al Stone wrote:
If it is a NAK, that's fine, but I also want to be sure I understand what the objections are. Based on my understanding of the discussion so far over the multiple versions, I think the primary objection is that the use of pretimeout makes this driver too complex, and indeed complex enough that there is some concern that it could destabilize a running system. Do I have that right?
I don't have a problem with the concept of pre-timeout per se. My primary objection is this code:
+static irqreturn_t sbsa_gwdt_interrupt(int irq, void *dev_id) +{
struct sbsa_gwdt *gwdt = (struct sbsa_gwdt *)dev_id;
struct watchdog_device *wdd = &gwdt->wdd;
/* We don't use pretimeout, trigger WS1 now */
if (!wdd->pretimeout)
sbsa_gwdt_set_wcv(wdd, 0);
This driver depends on an interrupt handler in order to properly program the hardware. Unlike some other devices, the SBSA watchdog does not need assistance to reset on a timeout -- it is a "fire and forget" device. What happens if there is a hard lockup, and interrupts no longer work?
Aha. I see now. That helps clarify a lot. Thanks.
The reason why Fu does this is because he wants to support a pre-timeout value that's independent of the timeout value. The SBSA watchdog is normally programmed where real timeout equals twice the pre-timeout. I would prefer that the driver adhere to this limitation. That would eliminate the need to pre-program the hardware in the interrupt handler.
The "normally programmed" limitation described is interesting; forgive my ignorance, but where is that specified? I couldn't find anything that specific in the SBSA, or the ARM ARM, but I could have missed it. That being said, keeping them independent at least seems like a good idea; if I think about kdump/kexec or some other recovery mechanism wanting to perhaps copy part of RAM or flush a filesystem/database, or maybe do some other magic to recover enough to be able to reset the timer, that may be a really long interval on a large server. I could easily see that being very different from a watchdog timer that's meant to just make sure the platform is still making progress. Conversely, I could see that recovery interval being very small or zero on a guest OS, for example, and the watchdog still different.
And finally, a simpler, single stage timeout watchdog driver would be a reasonable thing to accept, yes? I can see where that would make sense.
I would be okay with merging such a driver, and then enhancing it later to add pre-timeout support.
The issue for me in that case is that the SBSA requires a two stage timeout, so a single stage driver has no real value for me.
There are plenty of existing watchdog devices that have a two-stage timeout but the driver treats it as a single stage. The PowerPC watchdog driver is like that. The hardware is programmed for the second stage to cause a hardware reset, and the interrupt handler is typically a no-op or just a printk().
Hrm. Thanks for the pointer. I _think_ I see a way to do that with arm64, and perhaps combine this driver's functionality with what Timur did originally, but still have it reasonably straightforward. I need to do the experiments, though, and see if it actually works first.