Sasha, could you consider including this cherry-picked patchset in v4.14.
Kernel v4.14 might suffer from the following unbalanced enablement for the board Hikey 960:
Nov 5 12:02:54 hikey kernel: [ 22.148194] Unbalanced enable for IRQ 44 Nov 5 12:02:54 hikey kernel: [ 22.152193] ------------[ cut here ]------------ Nov 5 12:02:54 hikey kernel: [ 22.156872] WARNING: CPU: 2 PID: 509 at /home/inaddy/work/sources/linux/stable/stable-linux-4.14.y/kernel/irq/manage.c:525 __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.249606] CPU: 2 PID: 509 Comm: kworker/2:2 Not tainted 4.14.79 #1 Nov 5 12:02:54 hikey kernel: [ 22.255975] Hardware name: HiKey Development Board (DT) Nov 5 12:02:54 hikey kernel: [ 22.261248] Workqueue: events_freezable thermal_zone_device_check Nov 5 12:02:54 hikey kernel: [ 22.267368] task: ffff8000616e0e00 task.stack: ffff00000b5f0000 Nov 5 12:02:54 hikey kernel: [ 22.273312] PC is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.277516] LR is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.281718] pc : [<ffff00000813e010>] lr : [<ffff00000813e010>] pstate: 000001c5 Nov 5 12:02:54 hikey kernel: [ 22.289129] sp : ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.292457] x29: ffff00000b5f3c80 x28: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.297804] x27: ffff80005c139e38 x26: ffff000008a71870 Nov 5 12:02:54 hikey kernel: [ 22.303148] x25: 0000000000000000 x24: 0000000000000002 Nov 5 12:02:54 hikey kernel: [ 22.308492] x23: ffff00000b5f3d9c x22: ffff80005d565e88 Nov 5 12:02:54 hikey kernel: [ 22.313836] x21: 000000000000f980 x20: 000000000000002c Nov 5 12:02:54 hikey kernel: [ 22.319181] x19: ffff800061726000 x18: 0000000000000010 Nov 5 12:02:54 hikey kernel: [ 22.324524] x17: 0000000000000000 x16: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.329868] x15: ffffffffffffffff x14: ffff000009269c08 Nov 5 12:02:54 hikey kernel: [ 22.335213] x13: ffff00008940678f x12: ffff000009406797 Nov 5 12:02:54 hikey kernel: [ 22.340558] x11: ffff000009290000 x10: ffff00000b5f3980 Nov 5 12:02:54 hikey kernel: [ 22.345902] x9 : 00000000ffffffd0 x8 : ffff00000862c298 Nov 5 12:02:54 hikey kernel: [ 22.351246] x7 : 6c62616e65206465 x6 : 00000000000001b2 Nov 5 12:02:54 hikey kernel: [ 22.356589] x5 : 0000000000000000 x4 : 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.361931] x3 : 0000000000000000 x2 : ffff800063e824c8 Nov 5 12:02:54 hikey kernel: [ 22.367275] x1 : 000080005af95000 x0 : 000000000000001c Nov 5 12:02:54 hikey kernel: [ 22.372618] Call trace: Nov 5 12:02:54 hikey kernel: [ 22.375088] Exception stack(0xffff00000b5f3b40 to 0xffff00000b5f3c80) Nov 5 12:02:54 hikey kernel: [ 22.381560] 3b40: 000000000000001c 000080005af95000 ffff800063e824c8 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.389417] 3b60: 0000000000000000 0000000000000000 00000000000001b2 6c62616e65206465 Nov 5 12:02:54 hikey kernel: [ 22.397276] 3b80: ffff00000862c298 00000000ffffffd0 ffff00000b5f3980 ffff000009290000 Nov 5 12:02:54 hikey kernel: [ 22.405136] 3ba0: ffff000009406797 ffff00008940678f ffff000009269c08 ffffffffffffffff Nov 5 12:02:54 hikey kernel: [ 22.412994] 3bc0: 0000000000000000 0000000000000000 0000000000000010 ffff800061726000 Nov 5 12:02:54 hikey kernel: [ 22.420852] 3be0: 000000000000002c 000000000000f980 ffff80005d565e88 ffff00000b5f3d9c Nov 5 12:02:54 hikey kernel: [ 22.428710] 3c00: 0000000000000002 0000000000000000 ffff000008a71870 ffff80005c139e38 Nov 5 12:02:54 hikey kernel: [ 22.436569] 3c20: 0000000000000000 ffff00000b5f3c80 ffff00000813e010 ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.444426] 3c40: ffff00000813e010 00000000000001c5 0000000000000000 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.452286] 3c60: ffffffffffffffff ffff800061800618 ffff00000b5f3c80 ffff00000813e010 Nov 5 12:02:54 hikey kernel: [ 22.460144] [<ffff00000813e010>] __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.465394] [<ffff00000813e058>] enable_irq+0x40/0x78 Nov 5 12:02:54 hikey kernel: [ 22.470493] [<ffff000000e228a8>] hisi_thermal_get_temp+0x1b0/0x1d8 [hisi_thermal] Nov 5 12:02:54 hikey kernel: [ 22.478008] [<ffff0000087121a8>] of_thermal_get_temp+0x38/0x50 Nov 5 12:02:54 hikey kernel: [ 22.483869] [<ffff000008711790>] thermal_zone_get_temp+0x58/0x80 Nov 5 12:02:54 hikey kernel: [ 22.489903] [<ffff00000870e7bc>] thermal_zone_device_update.part.4+0x2c/0x1a8 Nov 5 12:02:54 hikey kernel: [ 22.497066] [<ffff00000870e9c8>] thermal_zone_device_check+0x40/0x50 Nov 5 12:02:54 hikey kernel: [ 22.503457] [<ffff0000080f1674>] process_one_work+0x19c/0x3d0 Nov 5 12:02:54 hikey kernel: [ 22.509236] [<ffff0000080f18f4>] worker_thread+0x4c/0x428 Nov 5 12:02:54 hikey kernel: [ 22.514664] [<ffff0000080f84fc>] kthread+0x134/0x138 Nov 5 12:02:54 hikey kernel: [ 22.519659] [<ffff000008085154>] ret_from_fork+0x10/0x1c Nov 5 12:02:54 hikey kernel: [ 22.524988] ---[ end trace 328d4bb2d9b066a0 ]---
This issue was solved when "hisi_thermal_alarm_irq" function was removed so only "hisi_thermal_alarm_irq_thread" would exist. This has fixed the issue for the unbalanced enablement since there is no more:
disable_irq_nosync(irq); data->irq_enabled = false;
logic being done in parallel to the threaded handler AND the thermal_zone_device_update() call only happens now if the temperature is already above the threshold.
From: Daniel Lezcano daniel.lezcano@linaro.org
commit ff4ec2997df8fe7cc40513dbe5f86d9f88fb6be7 upstream.
By essence, the tsensor does not really support multiple sensor at the same time. It allows to set a sensor and use it to get the temperature, another sensor could be switched but with a delay of 3-5ms. It is difficult to read simultaneously several sensors without a big delay.
Today, just one sensor is used, it is not necessary to deal with multiple sensors in the code. Remove them and if it is needed in the future add them on top of a code which will be clean up in the meantime.
Link: https://bugs.linaro.org/show_bug.cgi?id=4053 (PATCH 1/5) Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Reviewed-by: Leo Yan leo.yan@linaro.org Acked-by: Wangtao (Kevin, Kirin) kevin.wangtao@hisilicon.com Signed-off-by: Eduardo Valentin edubezval@gmail.com (cherry picked from commit ff4ec2997df8fe7cc40513dbe5f86d9f88fb6be7) Signed-off-by: Rafael David Tinoco rafael.tinoco@linaro.org --- drivers/thermal/hisi_thermal.c | 75 +++++++++------------------------- 1 file changed, 19 insertions(+), 56 deletions(-)
diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c index 6d8906d65476..aecbfe9fc8f2 100644 --- a/drivers/thermal/hisi_thermal.c +++ b/drivers/thermal/hisi_thermal.c @@ -40,6 +40,7 @@ #define HISI_TEMP_STEP (784)
#define HISI_MAX_SENSORS 4 +#define HISI_DEFAULT_SENSOR 2
struct hisi_thermal_sensor { struct hisi_thermal_data *thermal; @@ -54,9 +55,8 @@ struct hisi_thermal_data { struct mutex thermal_lock; /* protects register data */ struct platform_device *pdev; struct clk *clk; - struct hisi_thermal_sensor sensors[HISI_MAX_SENSORS]; - - int irq, irq_bind_sensor; + struct hisi_thermal_sensor sensors; + int irq; bool irq_enabled;
void __iomem *regs; @@ -133,7 +133,7 @@ static void hisi_thermal_enable_bind_irq_sensor
mutex_lock(&data->thermal_lock);
- sensor = &data->sensors[data->irq_bind_sensor]; + sensor = &data->sensors;
/* setting the hdak time */ writel(0x0, data->regs + TEMP0_CFG); @@ -181,31 +181,8 @@ static int hisi_thermal_get_temp(void *_sensor, int *temp) struct hisi_thermal_sensor *sensor = _sensor; struct hisi_thermal_data *data = sensor->thermal;
- int sensor_id = -1, i; - long max_temp = 0; - *temp = hisi_thermal_get_sensor_temp(data, sensor);
- sensor->sensor_temp = *temp; - - for (i = 0; i < HISI_MAX_SENSORS; i++) { - if (!data->sensors[i].tzd) - continue; - - if (data->sensors[i].sensor_temp >= max_temp) { - max_temp = data->sensors[i].sensor_temp; - sensor_id = i; - } - } - - /* If no sensor has been enabled, then skip to enable irq */ - if (sensor_id == -1) - return 0; - - mutex_lock(&data->thermal_lock); - data->irq_bind_sensor = sensor_id; - mutex_unlock(&data->thermal_lock); - dev_dbg(&data->pdev->dev, "id=%d, irq=%d, temp=%d, thres=%d\n", sensor->id, data->irq_enabled, *temp, sensor->thres_temp); /* @@ -218,7 +195,7 @@ static int hisi_thermal_get_temp(void *_sensor, int *temp) return 0; }
- if (max_temp < sensor->thres_temp) { + if (*temp < sensor->thres_temp) { data->irq_enabled = true; hisi_thermal_enable_bind_irq_sensor(data); enable_irq(data->irq); @@ -245,22 +222,16 @@ static irqreturn_t hisi_thermal_alarm_irq_thread(int irq, void *dev) { struct hisi_thermal_data *data = dev; struct hisi_thermal_sensor *sensor; - int i;
mutex_lock(&data->thermal_lock); - sensor = &data->sensors[data->irq_bind_sensor]; + sensor = &data->sensors;
dev_crit(&data->pdev->dev, "THERMAL ALARM: T > %d\n", sensor->thres_temp); mutex_unlock(&data->thermal_lock);
- for (i = 0; i < HISI_MAX_SENSORS; i++) { - if (!data->sensors[i].tzd) - continue; - - thermal_zone_device_update(data->sensors[i].tzd, - THERMAL_EVENT_UNSPECIFIED); - } + thermal_zone_device_update(data->sensors.tzd, + THERMAL_EVENT_UNSPECIFIED);
return IRQ_HANDLED; } @@ -317,7 +288,6 @@ static int hisi_thermal_probe(struct platform_device *pdev) { struct hisi_thermal_data *data; struct resource *res; - int i; int ret;
data = devm_kzalloc(&pdev->dev, sizeof(*data), GFP_KERNEL); @@ -359,14 +329,13 @@ static int hisi_thermal_probe(struct platform_device *pdev) hisi_thermal_enable_bind_irq_sensor(data); data->irq_enabled = true;
- for (i = 0; i < HISI_MAX_SENSORS; ++i) { - ret = hisi_thermal_register_sensor(pdev, data, - &data->sensors[i], i); - if (ret) - dev_err(&pdev->dev, - "failed to register thermal sensor: %d\n", ret); - else - hisi_thermal_toggle_sensor(&data->sensors[i], true); + ret = hisi_thermal_register_sensor(pdev, data, + &data->sensors, + HISI_DEFAULT_SENSOR); + if (ret) { + dev_err(&pdev->dev, "failed to register thermal sensor: %d\n", + ret); + return ret; }
ret = devm_request_threaded_irq(&pdev->dev, data->irq, @@ -378,6 +347,8 @@ static int hisi_thermal_probe(struct platform_device *pdev) return ret; }
+ hisi_thermal_toggle_sensor(&data->sensors, true); + enable_irq(data->irq);
return 0; @@ -386,17 +357,9 @@ static int hisi_thermal_probe(struct platform_device *pdev) static int hisi_thermal_remove(struct platform_device *pdev) { struct hisi_thermal_data *data = platform_get_drvdata(pdev); - int i; - - for (i = 0; i < HISI_MAX_SENSORS; i++) { - struct hisi_thermal_sensor *sensor = &data->sensors[i]; - - if (!sensor->tzd) - continue; - - hisi_thermal_toggle_sensor(sensor, false); - } + struct hisi_thermal_sensor *sensor = &data->sensors;
+ hisi_thermal_toggle_sensor(sensor, false); hisi_thermal_disable_sensor(data); clk_disable_unprepare(data->clk);
From: Daniel Lezcano daniel.lezcano@linaro.org
commit 2d4fa7b4c6f8080ced2e8237c9f46fb1fc110d64 upstream.
The threaded interrupt inspect the sensors structure to look in the temp threshold field, but this field is read-only in all the code, except in the probe function before the threaded interrupt is set. In other words there is not race window in the threaded interrupt when reading the field value.
Link: https://bugs.linaro.org/show_bug.cgi?id=4053 (PATCH 2/5) Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Reviewed-by: Leo Yan leo.yan@linaro.org Signed-off-by: Eduardo Valentin edubezval@gmail.com (cherry picked from commit 2d4fa7b4c6f8080ced2e8237c9f46fb1fc110d64) Signed-off-by: Rafael David Tinoco rafael.tinoco@linaro.org --- drivers/thermal/hisi_thermal.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c index aecbfe9fc8f2..0c2966c285b1 100644 --- a/drivers/thermal/hisi_thermal.c +++ b/drivers/thermal/hisi_thermal.c @@ -221,14 +221,10 @@ static irqreturn_t hisi_thermal_alarm_irq(int irq, void *dev) static irqreturn_t hisi_thermal_alarm_irq_thread(int irq, void *dev) { struct hisi_thermal_data *data = dev; - struct hisi_thermal_sensor *sensor; - - mutex_lock(&data->thermal_lock); - sensor = &data->sensors; + struct hisi_thermal_sensor *sensor = &data->sensors;
dev_crit(&data->pdev->dev, "THERMAL ALARM: T > %d\n", sensor->thres_temp); - mutex_unlock(&data->thermal_lock);
thermal_zone_device_update(data->sensors.tzd, THERMAL_EVENT_UNSPECIFIED);
From: Daniel Lezcano daniel.lezcano@linaro.org
commit 1e11b014271ceccb5ea04ae58f4829ac8209a86d upstream.
Hopefully, the function name can help to clarify the semantic of the operations when writing in the register.
Link: https://bugs.linaro.org/show_bug.cgi?id=4053 (PATCH 3/5) Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Eduardo Valentin edubezval@gmail.com (cherry picked from commit 1e11b014271ceccb5ea04ae58f4829ac8209a86d) Signed-off-by: Rafael David Tinoco rafael.tinoco@linaro.org --- drivers/thermal/hisi_thermal.c | 92 ++++++++++++++++++++++++++-------- 1 file changed, 70 insertions(+), 22 deletions(-)
diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c index 0c2966c285b1..a803043f37ae 100644 --- a/drivers/thermal/hisi_thermal.c +++ b/drivers/thermal/hisi_thermal.c @@ -26,6 +26,7 @@
#include "thermal_core.h"
+#define TEMP0_LAG (0x0) #define TEMP0_TH (0x4) #define TEMP0_RST_TH (0x8) #define TEMP0_CFG (0xC) @@ -96,6 +97,56 @@ static inline long hisi_thermal_round_temp(int temp) hisi_thermal_temp_to_step(temp)); }
+static inline void hisi_thermal_set_lag(void __iomem *addr, int value) +{ + writel(value, addr + TEMP0_LAG); +} + +static inline void hisi_thermal_alarm_clear(void __iomem *addr, int value) +{ + writel(value, addr + TEMP0_INT_CLR); +} + +static inline void hisi_thermal_alarm_enable(void __iomem *addr, int value) +{ + writel(value, addr + TEMP0_INT_EN); +} + +static inline void hisi_thermal_alarm_set(void __iomem *addr, int temp) +{ + writel(hisi_thermal_temp_to_step(temp) | 0x0FFFFFF00, addr + TEMP0_TH); +} + +static inline void hisi_thermal_reset_set(void __iomem *addr, int temp) +{ + writel(hisi_thermal_temp_to_step(temp), addr + TEMP0_RST_TH); +} + +static inline void hisi_thermal_reset_enable(void __iomem *addr, int value) +{ + writel(value, addr + TEMP0_RST_MSK); +} + +static inline void hisi_thermal_enable(void __iomem *addr, int value) +{ + writel(value, addr + TEMP0_EN); +} + +static inline void hisi_thermal_sensor_select(void __iomem *addr, int sensor) +{ + writel((sensor << 12), addr + TEMP0_CFG); +} + +static inline int hisi_thermal_get_temperature(void __iomem *addr) +{ + return hisi_thermal_step_to_temp(readl(addr + TEMP0_VALUE)); +} + +static inline void hisi_thermal_hdak_set(void __iomem *addr, int value) +{ + writel(value, addr + TEMP0_CFG); +} + static long hisi_thermal_get_sensor_temp(struct hisi_thermal_data *data, struct hisi_thermal_sensor *sensor) { @@ -104,22 +155,21 @@ static long hisi_thermal_get_sensor_temp(struct hisi_thermal_data *data, mutex_lock(&data->thermal_lock);
/* disable interrupt */ - writel(0x0, data->regs + TEMP0_INT_EN); - writel(0x1, data->regs + TEMP0_INT_CLR); + hisi_thermal_alarm_enable(data->regs, 0); + hisi_thermal_alarm_clear(data->regs, 1);
/* disable module firstly */ - writel(0x0, data->regs + TEMP0_EN); + hisi_thermal_enable(data->regs, 0);
/* select sensor id */ - writel((sensor->id << 12), data->regs + TEMP0_CFG); + hisi_thermal_sensor_select(data->regs, sensor->id);
/* enable module */ - writel(0x1, data->regs + TEMP0_EN); + hisi_thermal_enable(data->regs, 1);
usleep_range(3000, 5000);
- val = readl(data->regs + TEMP0_VALUE); - val = hisi_thermal_step_to_temp(val); + val = hisi_thermal_get_temperature(data->regs);
mutex_unlock(&data->thermal_lock);
@@ -136,28 +186,26 @@ static void hisi_thermal_enable_bind_irq_sensor sensor = &data->sensors;
/* setting the hdak time */ - writel(0x0, data->regs + TEMP0_CFG); + hisi_thermal_hdak_set(data->regs, 0);
/* disable module firstly */ - writel(0x0, data->regs + TEMP0_RST_MSK); - writel(0x0, data->regs + TEMP0_EN); + hisi_thermal_reset_enable(data->regs, 0); + hisi_thermal_enable(data->regs, 0);
/* select sensor id */ - writel((sensor->id << 12), data->regs + TEMP0_CFG); + hisi_thermal_sensor_select(data->regs, sensor->id);
/* enable for interrupt */ - writel(hisi_thermal_temp_to_step(sensor->thres_temp) | 0x0FFFFFF00, - data->regs + TEMP0_TH); + hisi_thermal_alarm_set(data->regs, sensor->thres_temp);
- writel(hisi_thermal_temp_to_step(HISI_TEMP_RESET), - data->regs + TEMP0_RST_TH); + hisi_thermal_reset_set(data->regs, HISI_TEMP_RESET);
/* enable module */ - writel(0x1, data->regs + TEMP0_RST_MSK); - writel(0x1, data->regs + TEMP0_EN); + hisi_thermal_reset_enable(data->regs, 1); + hisi_thermal_enable(data->regs, 1);
- writel(0x0, data->regs + TEMP0_INT_CLR); - writel(0x1, data->regs + TEMP0_INT_EN); + hisi_thermal_alarm_clear(data->regs, 0); + hisi_thermal_alarm_enable(data->regs, 1);
usleep_range(3000, 5000);
@@ -169,9 +217,9 @@ static void hisi_thermal_disable_sensor(struct hisi_thermal_data *data) mutex_lock(&data->thermal_lock);
/* disable sensor module */ - writel(0x0, data->regs + TEMP0_INT_EN); - writel(0x0, data->regs + TEMP0_RST_MSK); - writel(0x0, data->regs + TEMP0_EN); + hisi_thermal_enable(data->regs, 0); + hisi_thermal_alarm_enable(data->regs, 0); + hisi_thermal_reset_enable(data->regs, 0);
mutex_unlock(&data->thermal_lock); }
From: Daniel Lezcano daniel.lezcano@linaro.org
patch b424315a287c70eeb5f920f84c92492bd2f5658e upstream.
The TEMP0_CFG configuration register contains different field to set up the temperature controller. However in the code, nothing prevents a setup to overwrite the previous one: eg. writing the hdak value overwrites the sensor selection, the sensor selection overwrites the hdak value.
In order to prevent such thing, use a regmap-like mechanism by reading the value before, set the corresponding bits and write the result.
Link: https://bugs.linaro.org/show_bug.cgi?id=4053 (PATCH 4/5) Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Eduardo Valentin edubezval@gmail.com (cherry picked from commit b424315a287c70eeb5f920f84c92492bd2f5658e) Signed-off-by: Rafael David Tinoco rafael.tinoco@linaro.org --- drivers/thermal/hisi_thermal.c | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-)
diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c index a803043f37ae..31b52ebf426f 100644 --- a/drivers/thermal/hisi_thermal.c +++ b/drivers/thermal/hisi_thermal.c @@ -30,6 +30,8 @@ #define TEMP0_TH (0x4) #define TEMP0_RST_TH (0x8) #define TEMP0_CFG (0xC) +#define TEMP0_CFG_SS_MSK (0xF000) +#define TEMP0_CFG_HDAK_MSK (0x30) #define TEMP0_EN (0x10) #define TEMP0_INT_EN (0x14) #define TEMP0_INT_CLR (0x18) @@ -132,19 +134,41 @@ static inline void hisi_thermal_enable(void __iomem *addr, int value) writel(value, addr + TEMP0_EN); }
-static inline void hisi_thermal_sensor_select(void __iomem *addr, int sensor) +static inline int hisi_thermal_get_temperature(void __iomem *addr) { - writel((sensor << 12), addr + TEMP0_CFG); + return hisi_thermal_step_to_temp(readl(addr + TEMP0_VALUE)); }
-static inline int hisi_thermal_get_temperature(void __iomem *addr) +/* + * Temperature configuration register - Sensor selection + * + * Bits [19:12] + * + * 0x0: local sensor (default) + * 0x1: remote sensor 1 (ACPU cluster 1) + * 0x2: remote sensor 2 (ACPU cluster 0) + * 0x3: remote sensor 3 (G3D) + */ +static inline void hisi_thermal_sensor_select(void __iomem *addr, int sensor) { - return hisi_thermal_step_to_temp(readl(addr + TEMP0_VALUE)); + writel((readl(addr + TEMP0_CFG) & ~TEMP0_CFG_SS_MSK) | + (sensor << 12), addr + TEMP0_CFG); }
+/* + * Temperature configuration register - Hdak conversion polling interval + * + * Bits [5:4] + * + * 0x0 : 0.768 ms + * 0x1 : 6.144 ms + * 0x2 : 49.152 ms + * 0x3 : 393.216 ms + */ static inline void hisi_thermal_hdak_set(void __iomem *addr, int value) { - writel(value, addr + TEMP0_CFG); + writel((readl(addr + TEMP0_CFG) & ~TEMP0_CFG_HDAK_MSK) | + (value << 4), addr + TEMP0_CFG); }
static long hisi_thermal_get_sensor_temp(struct hisi_thermal_data *data,
From: Daniel Lezcano daniel.lezcano@linaro.org
commit 10d7e9a9181f4637640f388d334c6740c1b5d0e8 upstream.
The sensor is all setup, bind, resetted, acked, etc... every single second.
That was the way to workaround a problem with the interrupt bouncing again and again.
With the following changes, we fix all in one:
- Do the setup, one time, at probe time
- Add the IRQF_ONESHOT, ack the interrupt in the threaded handler
- Remove the interrupt handler
- Set the correct value for the LAG register
- Remove all the irq_enabled stuff in the code as the interruption handling is fixed
- Remove the 3ms delay
- Reorder the initialization routine to be in the right order
It ends up to a nicer code and more efficient, the 3-5ms delay is removed from the get_temp() path.
Link: https://bugs.linaro.org/show_bug.cgi?id=4053 (PATCH 5/5) Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Reviewed-by: Leo Yan leo.yan@linaro.org Tested-by: Leo Yan leo.yan@linaro.org Signed-off-by: Eduardo Valentin edubezval@gmail.com (cherry picked from commit 10d7e9a9181f4637640f388d334c6740c1b5d0e8) Signed-off-by: Rafael David Tinoco rafael.tinoco@linaro.org --- drivers/thermal/hisi_thermal.c | 203 +++++++++++++++------------------ 1 file changed, 93 insertions(+), 110 deletions(-)
diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c index 31b52ebf426f..7c5d4647cb5e 100644 --- a/drivers/thermal/hisi_thermal.c +++ b/drivers/thermal/hisi_thermal.c @@ -41,6 +41,7 @@ #define HISI_TEMP_BASE (-60000) #define HISI_TEMP_RESET (100000) #define HISI_TEMP_STEP (784) +#define HISI_TEMP_LAG (3500)
#define HISI_MAX_SENSORS 4 #define HISI_DEFAULT_SENSOR 2 @@ -60,8 +61,6 @@ struct hisi_thermal_data { struct clk *clk; struct hisi_thermal_sensor sensors; int irq; - bool irq_enabled; - void __iomem *regs; };
@@ -99,9 +98,40 @@ static inline long hisi_thermal_round_temp(int temp) hisi_thermal_temp_to_step(temp)); }
+/* + * The lag register contains 5 bits encoding the temperature in steps. + * + * Each time the temperature crosses the threshold boundary, an + * interrupt is raised. It could be when the temperature is going + * above the threshold or below. However, if the temperature is + * fluctuating around this value due to the load, we can receive + * several interrupts which may not desired. + * + * We can setup a temperature representing the delta between the + * threshold and the current temperature when the temperature is + * decreasing. + * + * For instance: the lag register is 5°C, the threshold is 65°C, when + * the temperature reaches 65°C an interrupt is raised and when the + * temperature decrease to 65°C - 5°C another interrupt is raised. + * + * A very short lag can lead to an interrupt storm, a long lag + * increase the latency to react to the temperature changes. In our + * case, that is not really a problem as we are polling the + * temperature. + * + * [0:4] : lag register + * + * The temperature is coded in steps, cf. HISI_TEMP_STEP. + * + * Min : 0x00 : 0.0 °C + * Max : 0x1F : 24.3 °C + * + * The 'value' parameter is in milliCelsius. + */ static inline void hisi_thermal_set_lag(void __iomem *addr, int value) { - writel(value, addr + TEMP0_LAG); + writel((value / HISI_TEMP_STEP) & 0x1F, addr + TEMP0_LAG); }
static inline void hisi_thermal_alarm_clear(void __iomem *addr, int value) @@ -171,71 +201,6 @@ static inline void hisi_thermal_hdak_set(void __iomem *addr, int value) (value << 4), addr + TEMP0_CFG); }
-static long hisi_thermal_get_sensor_temp(struct hisi_thermal_data *data, - struct hisi_thermal_sensor *sensor) -{ - long val; - - mutex_lock(&data->thermal_lock); - - /* disable interrupt */ - hisi_thermal_alarm_enable(data->regs, 0); - hisi_thermal_alarm_clear(data->regs, 1); - - /* disable module firstly */ - hisi_thermal_enable(data->regs, 0); - - /* select sensor id */ - hisi_thermal_sensor_select(data->regs, sensor->id); - - /* enable module */ - hisi_thermal_enable(data->regs, 1); - - usleep_range(3000, 5000); - - val = hisi_thermal_get_temperature(data->regs); - - mutex_unlock(&data->thermal_lock); - - return val; -} - -static void hisi_thermal_enable_bind_irq_sensor - (struct hisi_thermal_data *data) -{ - struct hisi_thermal_sensor *sensor; - - mutex_lock(&data->thermal_lock); - - sensor = &data->sensors; - - /* setting the hdak time */ - hisi_thermal_hdak_set(data->regs, 0); - - /* disable module firstly */ - hisi_thermal_reset_enable(data->regs, 0); - hisi_thermal_enable(data->regs, 0); - - /* select sensor id */ - hisi_thermal_sensor_select(data->regs, sensor->id); - - /* enable for interrupt */ - hisi_thermal_alarm_set(data->regs, sensor->thres_temp); - - hisi_thermal_reset_set(data->regs, HISI_TEMP_RESET); - - /* enable module */ - hisi_thermal_reset_enable(data->regs, 1); - hisi_thermal_enable(data->regs, 1); - - hisi_thermal_alarm_clear(data->regs, 0); - hisi_thermal_alarm_enable(data->regs, 1); - - usleep_range(3000, 5000); - - mutex_unlock(&data->thermal_lock); -} - static void hisi_thermal_disable_sensor(struct hisi_thermal_data *data) { mutex_lock(&data->thermal_lock); @@ -253,25 +218,10 @@ static int hisi_thermal_get_temp(void *_sensor, int *temp) struct hisi_thermal_sensor *sensor = _sensor; struct hisi_thermal_data *data = sensor->thermal;
- *temp = hisi_thermal_get_sensor_temp(data, sensor); - - dev_dbg(&data->pdev->dev, "id=%d, irq=%d, temp=%d, thres=%d\n", - sensor->id, data->irq_enabled, *temp, sensor->thres_temp); - /* - * Bind irq to sensor for two cases: - * Reenable alarm IRQ if temperature below threshold; - * if irq has been enabled, always set it; - */ - if (data->irq_enabled) { - hisi_thermal_enable_bind_irq_sensor(data); - return 0; - } + *temp = hisi_thermal_get_temperature(data->regs);
- if (*temp < sensor->thres_temp) { - data->irq_enabled = true; - hisi_thermal_enable_bind_irq_sensor(data); - enable_irq(data->irq); - } + dev_dbg(&data->pdev->dev, "id=%d, temp=%d, thres=%d\n", + sensor->id, *temp, sensor->thres_temp);
return 0; } @@ -280,26 +230,27 @@ static const struct thermal_zone_of_device_ops hisi_of_thermal_ops = { .get_temp = hisi_thermal_get_temp, };
-static irqreturn_t hisi_thermal_alarm_irq(int irq, void *dev) +static irqreturn_t hisi_thermal_alarm_irq_thread(int irq, void *dev) { struct hisi_thermal_data *data = dev; + struct hisi_thermal_sensor *sensor = &data->sensors; + int temp;
- disable_irq_nosync(irq); - data->irq_enabled = false; + hisi_thermal_alarm_clear(data->regs, 1);
- return IRQ_WAKE_THREAD; -} + temp = hisi_thermal_get_temperature(data->regs);
-static irqreturn_t hisi_thermal_alarm_irq_thread(int irq, void *dev) -{ - struct hisi_thermal_data *data = dev; - struct hisi_thermal_sensor *sensor = &data->sensors; + if (temp >= sensor->thres_temp) { + dev_crit(&data->pdev->dev, "THERMAL ALARM: %d > %d\n", + temp, sensor->thres_temp);
- dev_crit(&data->pdev->dev, "THERMAL ALARM: T > %d\n", - sensor->thres_temp); + thermal_zone_device_update(data->sensors.tzd, + THERMAL_EVENT_UNSPECIFIED);
- thermal_zone_device_update(data->sensors.tzd, - THERMAL_EVENT_UNSPECIFIED); + } else if (temp < sensor->thres_temp) { + dev_crit(&data->pdev->dev, "THERMAL ALARM stopped: %d < %d\n", + temp, sensor->thres_temp); + }
return IRQ_HANDLED; } @@ -352,6 +303,40 @@ static void hisi_thermal_toggle_sensor(struct hisi_thermal_sensor *sensor, on ? THERMAL_DEVICE_ENABLED : THERMAL_DEVICE_DISABLED); }
+static int hisi_thermal_setup(struct hisi_thermal_data *data) +{ + struct hisi_thermal_sensor *sensor; + + sensor = &data->sensors; + + /* disable module firstly */ + hisi_thermal_reset_enable(data->regs, 0); + hisi_thermal_enable(data->regs, 0); + + /* select sensor id */ + hisi_thermal_sensor_select(data->regs, sensor->id); + + /* setting the hdak time */ + hisi_thermal_hdak_set(data->regs, 0); + + /* setting lag value between current temp and the threshold */ + hisi_thermal_set_lag(data->regs, HISI_TEMP_LAG); + + /* enable for interrupt */ + hisi_thermal_alarm_set(data->regs, sensor->thres_temp); + + hisi_thermal_reset_set(data->regs, HISI_TEMP_RESET); + + /* enable module */ + hisi_thermal_reset_enable(data->regs, 1); + hisi_thermal_enable(data->regs, 1); + + hisi_thermal_alarm_clear(data->regs, 0); + hisi_thermal_alarm_enable(data->regs, 1); + + return 0; +} + static int hisi_thermal_probe(struct platform_device *pdev) { struct hisi_thermal_data *data; @@ -394,9 +379,6 @@ static int hisi_thermal_probe(struct platform_device *pdev) return ret; }
- hisi_thermal_enable_bind_irq_sensor(data); - data->irq_enabled = true; - ret = hisi_thermal_register_sensor(pdev, data, &data->sensors, HISI_DEFAULT_SENSOR); @@ -406,10 +388,15 @@ static int hisi_thermal_probe(struct platform_device *pdev) return ret; }
- ret = devm_request_threaded_irq(&pdev->dev, data->irq, - hisi_thermal_alarm_irq, + ret = hisi_thermal_setup(data); + if (ret) { + dev_err(&pdev->dev, "Failed to setup the sensor: %d\n", ret); + return ret; + } + + ret = devm_request_threaded_irq(&pdev->dev, data->irq, NULL, hisi_thermal_alarm_irq_thread, - 0, "hisi_thermal", data); + IRQF_ONESHOT, "hisi_thermal", data); if (ret < 0) { dev_err(&pdev->dev, "failed to request alarm irq: %d\n", ret); return ret; @@ -417,8 +404,6 @@ static int hisi_thermal_probe(struct platform_device *pdev)
hisi_thermal_toggle_sensor(&data->sensors, true);
- enable_irq(data->irq); - return 0; }
@@ -440,7 +425,6 @@ static int hisi_thermal_suspend(struct device *dev) struct hisi_thermal_data *data = dev_get_drvdata(dev);
hisi_thermal_disable_sensor(data); - data->irq_enabled = false;
clk_disable_unprepare(data->clk);
@@ -456,8 +440,7 @@ static int hisi_thermal_resume(struct device *dev) if (ret) return ret;
- data->irq_enabled = true; - hisi_thermal_enable_bind_irq_sensor(data); + hisi_thermal_setup(data);
return 0; }
On Mon, Dec 03, 2018 at 11:31:02AM -0200, Rafael David Tinoco wrote:
Sasha, could you consider including this cherry-picked patchset in v4.14.
Kernel v4.14 might suffer from the following unbalanced enablement for the board Hikey 960:
Nov 5 12:02:54 hikey kernel: [ 22.148194] Unbalanced enable for IRQ 44 Nov 5 12:02:54 hikey kernel: [ 22.152193] ------------[ cut here ]------------ Nov 5 12:02:54 hikey kernel: [ 22.156872] WARNING: CPU: 2 PID: 509 at /home/inaddy/work/sources/linux/stable/stable-linux-4.14.y/kernel/irq/manage.c:525 __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.249606] CPU: 2 PID: 509 Comm: kworker/2:2 Not tainted 4.14.79 #1 Nov 5 12:02:54 hikey kernel: [ 22.255975] Hardware name: HiKey Development Board (DT) Nov 5 12:02:54 hikey kernel: [ 22.261248] Workqueue: events_freezable thermal_zone_device_check Nov 5 12:02:54 hikey kernel: [ 22.267368] task: ffff8000616e0e00 task.stack: ffff00000b5f0000 Nov 5 12:02:54 hikey kernel: [ 22.273312] PC is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.277516] LR is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.281718] pc : [<ffff00000813e010>] lr : [<ffff00000813e010>] pstate: 000001c5 Nov 5 12:02:54 hikey kernel: [ 22.289129] sp : ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.292457] x29: ffff00000b5f3c80 x28: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.297804] x27: ffff80005c139e38 x26: ffff000008a71870 Nov 5 12:02:54 hikey kernel: [ 22.303148] x25: 0000000000000000 x24: 0000000000000002 Nov 5 12:02:54 hikey kernel: [ 22.308492] x23: ffff00000b5f3d9c x22: ffff80005d565e88 Nov 5 12:02:54 hikey kernel: [ 22.313836] x21: 000000000000f980 x20: 000000000000002c Nov 5 12:02:54 hikey kernel: [ 22.319181] x19: ffff800061726000 x18: 0000000000000010 Nov 5 12:02:54 hikey kernel: [ 22.324524] x17: 0000000000000000 x16: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.329868] x15: ffffffffffffffff x14: ffff000009269c08 Nov 5 12:02:54 hikey kernel: [ 22.335213] x13: ffff00008940678f x12: ffff000009406797 Nov 5 12:02:54 hikey kernel: [ 22.340558] x11: ffff000009290000 x10: ffff00000b5f3980 Nov 5 12:02:54 hikey kernel: [ 22.345902] x9 : 00000000ffffffd0 x8 : ffff00000862c298 Nov 5 12:02:54 hikey kernel: [ 22.351246] x7 : 6c62616e65206465 x6 : 00000000000001b2 Nov 5 12:02:54 hikey kernel: [ 22.356589] x5 : 0000000000000000 x4 : 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.361931] x3 : 0000000000000000 x2 : ffff800063e824c8 Nov 5 12:02:54 hikey kernel: [ 22.367275] x1 : 000080005af95000 x0 : 000000000000001c Nov 5 12:02:54 hikey kernel: [ 22.372618] Call trace: Nov 5 12:02:54 hikey kernel: [ 22.375088] Exception stack(0xffff00000b5f3b40 to 0xffff00000b5f3c80) Nov 5 12:02:54 hikey kernel: [ 22.381560] 3b40: 000000000000001c 000080005af95000 ffff800063e824c8 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.389417] 3b60: 0000000000000000 0000000000000000 00000000000001b2 6c62616e65206465 Nov 5 12:02:54 hikey kernel: [ 22.397276] 3b80: ffff00000862c298 00000000ffffffd0 ffff00000b5f3980 ffff000009290000 Nov 5 12:02:54 hikey kernel: [ 22.405136] 3ba0: ffff000009406797 ffff00008940678f ffff000009269c08 ffffffffffffffff Nov 5 12:02:54 hikey kernel: [ 22.412994] 3bc0: 0000000000000000 0000000000000000 0000000000000010 ffff800061726000 Nov 5 12:02:54 hikey kernel: [ 22.420852] 3be0: 000000000000002c 000000000000f980 ffff80005d565e88 ffff00000b5f3d9c Nov 5 12:02:54 hikey kernel: [ 22.428710] 3c00: 0000000000000002 0000000000000000 ffff000008a71870 ffff80005c139e38 Nov 5 12:02:54 hikey kernel: [ 22.436569] 3c20: 0000000000000000 ffff00000b5f3c80 ffff00000813e010 ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.444426] 3c40: ffff00000813e010 00000000000001c5 0000000000000000 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.452286] 3c60: ffffffffffffffff ffff800061800618 ffff00000b5f3c80 ffff00000813e010 Nov 5 12:02:54 hikey kernel: [ 22.460144] [<ffff00000813e010>] __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.465394] [<ffff00000813e058>] enable_irq+0x40/0x78 Nov 5 12:02:54 hikey kernel: [ 22.470493] [<ffff000000e228a8>] hisi_thermal_get_temp+0x1b0/0x1d8 [hisi_thermal] Nov 5 12:02:54 hikey kernel: [ 22.478008] [<ffff0000087121a8>] of_thermal_get_temp+0x38/0x50 Nov 5 12:02:54 hikey kernel: [ 22.483869] [<ffff000008711790>] thermal_zone_get_temp+0x58/0x80 Nov 5 12:02:54 hikey kernel: [ 22.489903] [<ffff00000870e7bc>] thermal_zone_device_update.part.4+0x2c/0x1a8 Nov 5 12:02:54 hikey kernel: [ 22.497066] [<ffff00000870e9c8>] thermal_zone_device_check+0x40/0x50 Nov 5 12:02:54 hikey kernel: [ 22.503457] [<ffff0000080f1674>] process_one_work+0x19c/0x3d0 Nov 5 12:02:54 hikey kernel: [ 22.509236] [<ffff0000080f18f4>] worker_thread+0x4c/0x428 Nov 5 12:02:54 hikey kernel: [ 22.514664] [<ffff0000080f84fc>] kthread+0x134/0x138 Nov 5 12:02:54 hikey kernel: [ 22.519659] [<ffff000008085154>] ret_from_fork+0x10/0x1c Nov 5 12:02:54 hikey kernel: [ 22.524988] ---[ end trace 328d4bb2d9b066a0 ]---
This issue was solved when "hisi_thermal_alarm_irq" function was removed so only "hisi_thermal_alarm_irq_thread" would exist. This has fixed the issue for the unbalanced enablement since there is no more:
disable_irq_nosync(irq); data->irq_enabled = false;
logic being done in parallel to the threaded handler AND the thermal_zone_device_update() call only happens now if the temperature is already above the threshold.
So should we revert a patch instead of taking these new ones? Would that be easier and is this a "real" issue or just an annoying warning splat in the kernel log?
thanks,
greg k-h
On 03/12/2018 15:14, Greg KH wrote:
On Mon, Dec 03, 2018 at 11:31:02AM -0200, Rafael David Tinoco wrote:
Sasha, could you consider including this cherry-picked patchset in v4.14.
Kernel v4.14 might suffer from the following unbalanced enablement for the board Hikey 960:
Nov 5 12:02:54 hikey kernel: [ 22.148194] Unbalanced enable for IRQ 44 Nov 5 12:02:54 hikey kernel: [ 22.152193] ------------[ cut here ]------------ Nov 5 12:02:54 hikey kernel: [ 22.156872] WARNING: CPU: 2 PID: 509 at /home/inaddy/work/sources/linux/stable/stable-linux-4.14.y/kernel/irq/manage.c:525 __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.249606] CPU: 2 PID: 509 Comm: kworker/2:2 Not tainted 4.14.79 #1 Nov 5 12:02:54 hikey kernel: [ 22.255975] Hardware name: HiKey Development Board (DT) Nov 5 12:02:54 hikey kernel: [ 22.261248] Workqueue: events_freezable thermal_zone_device_check Nov 5 12:02:54 hikey kernel: [ 22.267368] task: ffff8000616e0e00 task.stack: ffff00000b5f0000 Nov 5 12:02:54 hikey kernel: [ 22.273312] PC is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.277516] LR is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.281718] pc : [<ffff00000813e010>] lr : [<ffff00000813e010>] pstate: 000001c5 Nov 5 12:02:54 hikey kernel: [ 22.289129] sp : ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.292457] x29: ffff00000b5f3c80 x28: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.297804] x27: ffff80005c139e38 x26: ffff000008a71870 Nov 5 12:02:54 hikey kernel: [ 22.303148] x25: 0000000000000000 x24: 0000000000000002 Nov 5 12:02:54 hikey kernel: [ 22.308492] x23: ffff00000b5f3d9c x22: ffff80005d565e88 Nov 5 12:02:54 hikey kernel: [ 22.313836] x21: 000000000000f980 x20: 000000000000002c Nov 5 12:02:54 hikey kernel: [ 22.319181] x19: ffff800061726000 x18: 0000000000000010 Nov 5 12:02:54 hikey kernel: [ 22.324524] x17: 0000000000000000 x16: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.329868] x15: ffffffffffffffff x14: ffff000009269c08 Nov 5 12:02:54 hikey kernel: [ 22.335213] x13: ffff00008940678f x12: ffff000009406797 Nov 5 12:02:54 hikey kernel: [ 22.340558] x11: ffff000009290000 x10: ffff00000b5f3980 Nov 5 12:02:54 hikey kernel: [ 22.345902] x9 : 00000000ffffffd0 x8 : ffff00000862c298 Nov 5 12:02:54 hikey kernel: [ 22.351246] x7 : 6c62616e65206465 x6 : 00000000000001b2 Nov 5 12:02:54 hikey kernel: [ 22.356589] x5 : 0000000000000000 x4 : 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.361931] x3 : 0000000000000000 x2 : ffff800063e824c8 Nov 5 12:02:54 hikey kernel: [ 22.367275] x1 : 000080005af95000 x0 : 000000000000001c Nov 5 12:02:54 hikey kernel: [ 22.372618] Call trace: Nov 5 12:02:54 hikey kernel: [ 22.375088] Exception stack(0xffff00000b5f3b40 to 0xffff00000b5f3c80) Nov 5 12:02:54 hikey kernel: [ 22.381560] 3b40: 000000000000001c 000080005af95000 ffff800063e824c8 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.389417] 3b60: 0000000000000000 0000000000000000 00000000000001b2 6c62616e65206465 Nov 5 12:02:54 hikey kernel: [ 22.397276] 3b80: ffff00000862c298 00000000ffffffd0 ffff00000b5f3980 ffff000009290000 Nov 5 12:02:54 hikey kernel: [ 22.405136] 3ba0: ffff000009406797 ffff00008940678f ffff000009269c08 ffffffffffffffff Nov 5 12:02:54 hikey kernel: [ 22.412994] 3bc0: 0000000000000000 0000000000000000 0000000000000010 ffff800061726000 Nov 5 12:02:54 hikey kernel: [ 22.420852] 3be0: 000000000000002c 000000000000f980 ffff80005d565e88 ffff00000b5f3d9c Nov 5 12:02:54 hikey kernel: [ 22.428710] 3c00: 0000000000000002 0000000000000000 ffff000008a71870 ffff80005c139e38 Nov 5 12:02:54 hikey kernel: [ 22.436569] 3c20: 0000000000000000 ffff00000b5f3c80 ffff00000813e010 ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.444426] 3c40: ffff00000813e010 00000000000001c5 0000000000000000 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.452286] 3c60: ffffffffffffffff ffff800061800618 ffff00000b5f3c80 ffff00000813e010 Nov 5 12:02:54 hikey kernel: [ 22.460144] [<ffff00000813e010>] __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.465394] [<ffff00000813e058>] enable_irq+0x40/0x78 Nov 5 12:02:54 hikey kernel: [ 22.470493] [<ffff000000e228a8>] hisi_thermal_get_temp+0x1b0/0x1d8 [hisi_thermal] Nov 5 12:02:54 hikey kernel: [ 22.478008] [<ffff0000087121a8>] of_thermal_get_temp+0x38/0x50 Nov 5 12:02:54 hikey kernel: [ 22.483869] [<ffff000008711790>] thermal_zone_get_temp+0x58/0x80 Nov 5 12:02:54 hikey kernel: [ 22.489903] [<ffff00000870e7bc>] thermal_zone_device_update.part.4+0x2c/0x1a8 Nov 5 12:02:54 hikey kernel: [ 22.497066] [<ffff00000870e9c8>] thermal_zone_device_check+0x40/0x50 Nov 5 12:02:54 hikey kernel: [ 22.503457] [<ffff0000080f1674>] process_one_work+0x19c/0x3d0 Nov 5 12:02:54 hikey kernel: [ 22.509236] [<ffff0000080f18f4>] worker_thread+0x4c/0x428 Nov 5 12:02:54 hikey kernel: [ 22.514664] [<ffff0000080f84fc>] kthread+0x134/0x138 Nov 5 12:02:54 hikey kernel: [ 22.519659] [<ffff000008085154>] ret_from_fork+0x10/0x1c Nov 5 12:02:54 hikey kernel: [ 22.524988] ---[ end trace 328d4bb2d9b066a0 ]---
This issue was solved when "hisi_thermal_alarm_irq" function was removed so only "hisi_thermal_alarm_irq_thread" would exist. This has fixed the issue for the unbalanced enablement since there is no more:
disable_irq_nosync(irq); data->irq_enabled = false;
logic being done in parallel to the threaded handler AND the thermal_zone_device_update() call only happens now if the temperature is already above the threshold.
So should we revert a patch instead of taking these new ones? Would that be easier and is this a "real" issue or just an annoying warning splat in the kernel log?
Actually, this warning is introduced with the driver and all the plumbers around to fix an irq bouncing. There is no patch to revert without removing the driver.
What the log in the series does not tell is the driver was initially built around a supposed issue with the thermal sensor, that is multiple interrupts firing for the same event.
Those happen when a specific threshold is reached (down to up and up to down) and a hysteresis should have been specified in the register in order to prevent multiple interrupts firing when the limit is crossed.
All the code in the driver was built to overcome this issue because the root cause was not spotted: the hysteresis register was not correctly set and it was reset at every update. The driver 'workarounded' that by creating some kind of irq enabled flag, a sensor reset every time it was read and other hacks to make it work (it results in a worst temperature measurement accuracy also because the temperature buffer on the chip is flushed).
These 5 patches not only fixes the irq issue but also sanitize the driver. There is no longer a reset of the sensor, no irq bouncing and no 5ms delay when the sensor is read.
On Mon, Dec 03, 2018 at 03:42:41PM +0100, Daniel Lezcano wrote:
On 03/12/2018 15:14, Greg KH wrote:
On Mon, Dec 03, 2018 at 11:31:02AM -0200, Rafael David Tinoco wrote:
Sasha, could you consider including this cherry-picked patchset in v4.14.
Kernel v4.14 might suffer from the following unbalanced enablement for the board Hikey 960:
Nov 5 12:02:54 hikey kernel: [ 22.148194] Unbalanced enable for IRQ 44 Nov 5 12:02:54 hikey kernel: [ 22.152193] ------------[ cut here ]------------ Nov 5 12:02:54 hikey kernel: [ 22.156872] WARNING: CPU: 2 PID: 509 at /home/inaddy/work/sources/linux/stable/stable-linux-4.14.y/kernel/irq/manage.c:525 __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.249606] CPU: 2 PID: 509 Comm: kworker/2:2 Not tainted 4.14.79 #1 Nov 5 12:02:54 hikey kernel: [ 22.255975] Hardware name: HiKey Development Board (DT) Nov 5 12:02:54 hikey kernel: [ 22.261248] Workqueue: events_freezable thermal_zone_device_check Nov 5 12:02:54 hikey kernel: [ 22.267368] task: ffff8000616e0e00 task.stack: ffff00000b5f0000 Nov 5 12:02:54 hikey kernel: [ 22.273312] PC is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.277516] LR is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.281718] pc : [<ffff00000813e010>] lr : [<ffff00000813e010>] pstate: 000001c5 Nov 5 12:02:54 hikey kernel: [ 22.289129] sp : ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.292457] x29: ffff00000b5f3c80 x28: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.297804] x27: ffff80005c139e38 x26: ffff000008a71870 Nov 5 12:02:54 hikey kernel: [ 22.303148] x25: 0000000000000000 x24: 0000000000000002 Nov 5 12:02:54 hikey kernel: [ 22.308492] x23: ffff00000b5f3d9c x22: ffff80005d565e88 Nov 5 12:02:54 hikey kernel: [ 22.313836] x21: 000000000000f980 x20: 000000000000002c Nov 5 12:02:54 hikey kernel: [ 22.319181] x19: ffff800061726000 x18: 0000000000000010 Nov 5 12:02:54 hikey kernel: [ 22.324524] x17: 0000000000000000 x16: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.329868] x15: ffffffffffffffff x14: ffff000009269c08 Nov 5 12:02:54 hikey kernel: [ 22.335213] x13: ffff00008940678f x12: ffff000009406797 Nov 5 12:02:54 hikey kernel: [ 22.340558] x11: ffff000009290000 x10: ffff00000b5f3980 Nov 5 12:02:54 hikey kernel: [ 22.345902] x9 : 00000000ffffffd0 x8 : ffff00000862c298 Nov 5 12:02:54 hikey kernel: [ 22.351246] x7 : 6c62616e65206465 x6 : 00000000000001b2 Nov 5 12:02:54 hikey kernel: [ 22.356589] x5 : 0000000000000000 x4 : 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.361931] x3 : 0000000000000000 x2 : ffff800063e824c8 Nov 5 12:02:54 hikey kernel: [ 22.367275] x1 : 000080005af95000 x0 : 000000000000001c Nov 5 12:02:54 hikey kernel: [ 22.372618] Call trace: Nov 5 12:02:54 hikey kernel: [ 22.375088] Exception stack(0xffff00000b5f3b40 to 0xffff00000b5f3c80) Nov 5 12:02:54 hikey kernel: [ 22.381560] 3b40: 000000000000001c 000080005af95000 ffff800063e824c8 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.389417] 3b60: 0000000000000000 0000000000000000 00000000000001b2 6c62616e65206465 Nov 5 12:02:54 hikey kernel: [ 22.397276] 3b80: ffff00000862c298 00000000ffffffd0 ffff00000b5f3980 ffff000009290000 Nov 5 12:02:54 hikey kernel: [ 22.405136] 3ba0: ffff000009406797 ffff00008940678f ffff000009269c08 ffffffffffffffff Nov 5 12:02:54 hikey kernel: [ 22.412994] 3bc0: 0000000000000000 0000000000000000 0000000000000010 ffff800061726000 Nov 5 12:02:54 hikey kernel: [ 22.420852] 3be0: 000000000000002c 000000000000f980 ffff80005d565e88 ffff00000b5f3d9c Nov 5 12:02:54 hikey kernel: [ 22.428710] 3c00: 0000000000000002 0000000000000000 ffff000008a71870 ffff80005c139e38 Nov 5 12:02:54 hikey kernel: [ 22.436569] 3c20: 0000000000000000 ffff00000b5f3c80 ffff00000813e010 ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.444426] 3c40: ffff00000813e010 00000000000001c5 0000000000000000 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.452286] 3c60: ffffffffffffffff ffff800061800618 ffff00000b5f3c80 ffff00000813e010 Nov 5 12:02:54 hikey kernel: [ 22.460144] [<ffff00000813e010>] __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.465394] [<ffff00000813e058>] enable_irq+0x40/0x78 Nov 5 12:02:54 hikey kernel: [ 22.470493] [<ffff000000e228a8>] hisi_thermal_get_temp+0x1b0/0x1d8 [hisi_thermal] Nov 5 12:02:54 hikey kernel: [ 22.478008] [<ffff0000087121a8>] of_thermal_get_temp+0x38/0x50 Nov 5 12:02:54 hikey kernel: [ 22.483869] [<ffff000008711790>] thermal_zone_get_temp+0x58/0x80 Nov 5 12:02:54 hikey kernel: [ 22.489903] [<ffff00000870e7bc>] thermal_zone_device_update.part.4+0x2c/0x1a8 Nov 5 12:02:54 hikey kernel: [ 22.497066] [<ffff00000870e9c8>] thermal_zone_device_check+0x40/0x50 Nov 5 12:02:54 hikey kernel: [ 22.503457] [<ffff0000080f1674>] process_one_work+0x19c/0x3d0 Nov 5 12:02:54 hikey kernel: [ 22.509236] [<ffff0000080f18f4>] worker_thread+0x4c/0x428 Nov 5 12:02:54 hikey kernel: [ 22.514664] [<ffff0000080f84fc>] kthread+0x134/0x138 Nov 5 12:02:54 hikey kernel: [ 22.519659] [<ffff000008085154>] ret_from_fork+0x10/0x1c Nov 5 12:02:54 hikey kernel: [ 22.524988] ---[ end trace 328d4bb2d9b066a0 ]---
This issue was solved when "hisi_thermal_alarm_irq" function was removed so only "hisi_thermal_alarm_irq_thread" would exist. This has fixed the issue for the unbalanced enablement since there is no more:
disable_irq_nosync(irq); data->irq_enabled = false;
logic being done in parallel to the threaded handler AND the thermal_zone_device_update() call only happens now if the temperature is already above the threshold.
So should we revert a patch instead of taking these new ones? Would that be easier and is this a "real" issue or just an annoying warning splat in the kernel log?
Actually, this warning is introduced with the driver and all the plumbers around to fix an irq bouncing. There is no patch to revert without removing the driver.
Greg,
Patch 5 in this series seems to explain the best what is happening here:
With the following changes, we fix all in one:
Do the setup, one time, at probe time
Add the IRQF_ONESHOT, ack the interrupt in the threaded handler
Remove the interrupt handler
Set the correct value for the LAG register
Remove all the irq_enabled stuff in the code as the interruption handling is fixed
Remove the 3ms delay
Reorder the initialization routine to be in the right order
We can't revert anything because the breakage was there since the driver was introduced.
-- Thanks, Sasha
On Mon, Dec 03, 2018 at 10:19:46AM -0500, Sasha Levin wrote:
On Mon, Dec 03, 2018 at 03:42:41PM +0100, Daniel Lezcano wrote:
On 03/12/2018 15:14, Greg KH wrote:
On Mon, Dec 03, 2018 at 11:31:02AM -0200, Rafael David Tinoco wrote:
Sasha, could you consider including this cherry-picked patchset in v4.14.
Kernel v4.14 might suffer from the following unbalanced enablement for the board Hikey 960:
Nov 5 12:02:54 hikey kernel: [ 22.148194] Unbalanced enable for IRQ 44 Nov 5 12:02:54 hikey kernel: [ 22.152193] ------------[ cut here ]------------ Nov 5 12:02:54 hikey kernel: [ 22.156872] WARNING: CPU: 2 PID: 509 at /home/inaddy/work/sources/linux/stable/stable-linux-4.14.y/kernel/irq/manage.c:525 __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.249606] CPU: 2 PID: 509 Comm: kworker/2:2 Not tainted 4.14.79 #1 Nov 5 12:02:54 hikey kernel: [ 22.255975] Hardware name: HiKey Development Board (DT) Nov 5 12:02:54 hikey kernel: [ 22.261248] Workqueue: events_freezable thermal_zone_device_check Nov 5 12:02:54 hikey kernel: [ 22.267368] task: ffff8000616e0e00 task.stack: ffff00000b5f0000 Nov 5 12:02:54 hikey kernel: [ 22.273312] PC is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.277516] LR is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.281718] pc : [<ffff00000813e010>] lr : [<ffff00000813e010>] pstate: 000001c5 Nov 5 12:02:54 hikey kernel: [ 22.289129] sp : ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.292457] x29: ffff00000b5f3c80 x28: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.297804] x27: ffff80005c139e38 x26: ffff000008a71870 Nov 5 12:02:54 hikey kernel: [ 22.303148] x25: 0000000000000000 x24: 0000000000000002 Nov 5 12:02:54 hikey kernel: [ 22.308492] x23: ffff00000b5f3d9c x22: ffff80005d565e88 Nov 5 12:02:54 hikey kernel: [ 22.313836] x21: 000000000000f980 x20: 000000000000002c Nov 5 12:02:54 hikey kernel: [ 22.319181] x19: ffff800061726000 x18: 0000000000000010 Nov 5 12:02:54 hikey kernel: [ 22.324524] x17: 0000000000000000 x16: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.329868] x15: ffffffffffffffff x14: ffff000009269c08 Nov 5 12:02:54 hikey kernel: [ 22.335213] x13: ffff00008940678f x12: ffff000009406797 Nov 5 12:02:54 hikey kernel: [ 22.340558] x11: ffff000009290000 x10: ffff00000b5f3980 Nov 5 12:02:54 hikey kernel: [ 22.345902] x9 : 00000000ffffffd0 x8 : ffff00000862c298 Nov 5 12:02:54 hikey kernel: [ 22.351246] x7 : 6c62616e65206465 x6 : 00000000000001b2 Nov 5 12:02:54 hikey kernel: [ 22.356589] x5 : 0000000000000000 x4 : 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.361931] x3 : 0000000000000000 x2 : ffff800063e824c8 Nov 5 12:02:54 hikey kernel: [ 22.367275] x1 : 000080005af95000 x0 : 000000000000001c Nov 5 12:02:54 hikey kernel: [ 22.372618] Call trace: Nov 5 12:02:54 hikey kernel: [ 22.375088] Exception stack(0xffff00000b5f3b40 to 0xffff00000b5f3c80) Nov 5 12:02:54 hikey kernel: [ 22.381560] 3b40: 000000000000001c 000080005af95000 ffff800063e824c8 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.389417] 3b60: 0000000000000000 0000000000000000 00000000000001b2 6c62616e65206465 Nov 5 12:02:54 hikey kernel: [ 22.397276] 3b80: ffff00000862c298 00000000ffffffd0 ffff00000b5f3980 ffff000009290000 Nov 5 12:02:54 hikey kernel: [ 22.405136] 3ba0: ffff000009406797 ffff00008940678f ffff000009269c08 ffffffffffffffff Nov 5 12:02:54 hikey kernel: [ 22.412994] 3bc0: 0000000000000000 0000000000000000 0000000000000010 ffff800061726000 Nov 5 12:02:54 hikey kernel: [ 22.420852] 3be0: 000000000000002c 000000000000f980 ffff80005d565e88 ffff00000b5f3d9c Nov 5 12:02:54 hikey kernel: [ 22.428710] 3c00: 0000000000000002 0000000000000000 ffff000008a71870 ffff80005c139e38 Nov 5 12:02:54 hikey kernel: [ 22.436569] 3c20: 0000000000000000 ffff00000b5f3c80 ffff00000813e010 ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.444426] 3c40: ffff00000813e010 00000000000001c5 0000000000000000 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.452286] 3c60: ffffffffffffffff ffff800061800618 ffff00000b5f3c80 ffff00000813e010 Nov 5 12:02:54 hikey kernel: [ 22.460144] [<ffff00000813e010>] __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.465394] [<ffff00000813e058>] enable_irq+0x40/0x78 Nov 5 12:02:54 hikey kernel: [ 22.470493] [<ffff000000e228a8>] hisi_thermal_get_temp+0x1b0/0x1d8 [hisi_thermal] Nov 5 12:02:54 hikey kernel: [ 22.478008] [<ffff0000087121a8>] of_thermal_get_temp+0x38/0x50 Nov 5 12:02:54 hikey kernel: [ 22.483869] [<ffff000008711790>] thermal_zone_get_temp+0x58/0x80 Nov 5 12:02:54 hikey kernel: [ 22.489903] [<ffff00000870e7bc>] thermal_zone_device_update.part.4+0x2c/0x1a8 Nov 5 12:02:54 hikey kernel: [ 22.497066] [<ffff00000870e9c8>] thermal_zone_device_check+0x40/0x50 Nov 5 12:02:54 hikey kernel: [ 22.503457] [<ffff0000080f1674>] process_one_work+0x19c/0x3d0 Nov 5 12:02:54 hikey kernel: [ 22.509236] [<ffff0000080f18f4>] worker_thread+0x4c/0x428 Nov 5 12:02:54 hikey kernel: [ 22.514664] [<ffff0000080f84fc>] kthread+0x134/0x138 Nov 5 12:02:54 hikey kernel: [ 22.519659] [<ffff000008085154>] ret_from_fork+0x10/0x1c Nov 5 12:02:54 hikey kernel: [ 22.524988] ---[ end trace 328d4bb2d9b066a0 ]---
This issue was solved when "hisi_thermal_alarm_irq" function was removed so only "hisi_thermal_alarm_irq_thread" would exist. This has fixed the issue for the unbalanced enablement since there is no more:
disable_irq_nosync(irq); data->irq_enabled = false;
logic being done in parallel to the threaded handler AND the thermal_zone_device_update() call only happens now if the temperature is already above the threshold.
So should we revert a patch instead of taking these new ones? Would that be easier and is this a "real" issue or just an annoying warning splat in the kernel log?
Actually, this warning is introduced with the driver and all the plumbers around to fix an irq bouncing. There is no patch to revert without removing the driver.
Greg,
Patch 5 in this series seems to explain the best what is happening here:
With the following changes, we fix all in one:
Do the setup, one time, at probe time
Add the IRQF_ONESHOT, ack the interrupt in the threaded handler
Remove the interrupt handler
Set the correct value for the LAG register
Remove all the irq_enabled stuff in the code as the interruption handling is fixed
Remove the 3ms delay
Reorder the initialization routine to be in the right order
We can't revert anything because the breakage was there since the driver was introduced.
So the driver was broken in 4.14, why not just use 4.19 instead? This isn't a 4.14 regression, it's something that obviously no one has noticed for a year now, so why backport these big patches to 4.14 now?
thanks,
greg k-h
Greg,
Patch 5 in this series seems to explain the best what is happening here:
With the following changes, we fix all in one:
Do the setup, one time, at probe time
Add the IRQF_ONESHOT, ack the interrupt in the threaded handler
Remove the interrupt handler
Set the correct value for the LAG register
Remove all the irq_enabled stuff in the code as the interruption handling is fixed
Remove the 3ms delay
Reorder the initialization routine to be in the right order
We can't revert anything because the breakage was there since the driver was introduced.
So the driver was broken in 4.14, why not just use 4.19 instead? This isn't a 4.14 regression, it's something that obviously no one has noticed for a year now, so why backport these big patches to 4.14 now?
thanks,
greg k-h
This was caught during our functional tests. No direct complains, but, since it included a trace, and there was a fix for it, I thought it could be accepted for upstream v4.14 (it is included in other v4.14 kernels, like Android's).
thanks!
On Mon, Dec 03, 2018 at 04:24:48PM -0200, Rafael David Tinoco wrote:
Greg,
Patch 5 in this series seems to explain the best what is happening here:
With the following changes, we fix all in one:
Do the setup, one time, at probe time
Add the IRQF_ONESHOT, ack the interrupt in the threaded handler
Remove the interrupt handler
Set the correct value for the LAG register
Remove all the irq_enabled stuff in the code as the interruption handling is fixed
Remove the 3ms delay
Reorder the initialization routine to be in the right order
We can't revert anything because the breakage was there since the driver was introduced.
So the driver was broken in 4.14, why not just use 4.19 instead? This isn't a 4.14 regression, it's something that obviously no one has noticed for a year now, so why backport these big patches to 4.14 now?
thanks,
greg k-h
This was caught during our functional tests. No direct complains, but, since it included a trace, and there was a fix for it, I thought it could be accepted for upstream v4.14 (it is included in other v4.14 kernels, like Android's).
Ok, all now queued up, thanks.
greg k-h
On 03/12/2018 19:05, Greg KH wrote:
On Mon, Dec 03, 2018 at 10:19:46AM -0500, Sasha Levin wrote:
On Mon, Dec 03, 2018 at 03:42:41PM +0100, Daniel Lezcano wrote:
On 03/12/2018 15:14, Greg KH wrote:
On Mon, Dec 03, 2018 at 11:31:02AM -0200, Rafael David Tinoco wrote:
Sasha, could you consider including this cherry-picked patchset in v4.14.
Kernel v4.14 might suffer from the following unbalanced enablement for the board Hikey 960:
Nov 5 12:02:54 hikey kernel: [ 22.148194] Unbalanced enable for IRQ 44 Nov 5 12:02:54 hikey kernel: [ 22.152193] ------------[ cut here ]------------ Nov 5 12:02:54 hikey kernel: [ 22.156872] WARNING: CPU: 2 PID: 509 at /home/inaddy/work/sources/linux/stable/stable-linux-4.14.y/kernel/irq/manage.c:525 __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.249606] CPU: 2 PID: 509 Comm: kworker/2:2 Not tainted 4.14.79 #1 Nov 5 12:02:54 hikey kernel: [ 22.255975] Hardware name: HiKey Development Board (DT) Nov 5 12:02:54 hikey kernel: [ 22.261248] Workqueue: events_freezable thermal_zone_device_check Nov 5 12:02:54 hikey kernel: [ 22.267368] task: ffff8000616e0e00 task.stack: ffff00000b5f0000 Nov 5 12:02:54 hikey kernel: [ 22.273312] PC is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.277516] LR is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.281718] pc : [<ffff00000813e010>] lr : [<ffff00000813e010>] pstate: 000001c5 Nov 5 12:02:54 hikey kernel: [ 22.289129] sp : ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.292457] x29: ffff00000b5f3c80 x28: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.297804] x27: ffff80005c139e38 x26: ffff000008a71870 Nov 5 12:02:54 hikey kernel: [ 22.303148] x25: 0000000000000000 x24: 0000000000000002 Nov 5 12:02:54 hikey kernel: [ 22.308492] x23: ffff00000b5f3d9c x22: ffff80005d565e88 Nov 5 12:02:54 hikey kernel: [ 22.313836] x21: 000000000000f980 x20: 000000000000002c Nov 5 12:02:54 hikey kernel: [ 22.319181] x19: ffff800061726000 x18: 0000000000000010 Nov 5 12:02:54 hikey kernel: [ 22.324524] x17: 0000000000000000 x16: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.329868] x15: ffffffffffffffff x14: ffff000009269c08 Nov 5 12:02:54 hikey kernel: [ 22.335213] x13: ffff00008940678f x12: ffff000009406797 Nov 5 12:02:54 hikey kernel: [ 22.340558] x11: ffff000009290000 x10: ffff00000b5f3980 Nov 5 12:02:54 hikey kernel: [ 22.345902] x9 : 00000000ffffffd0 x8 : ffff00000862c298 Nov 5 12:02:54 hikey kernel: [ 22.351246] x7 : 6c62616e65206465 x6 : 00000000000001b2 Nov 5 12:02:54 hikey kernel: [ 22.356589] x5 : 0000000000000000 x4 : 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.361931] x3 : 0000000000000000 x2 : ffff800063e824c8 Nov 5 12:02:54 hikey kernel: [ 22.367275] x1 : 000080005af95000 x0 : 000000000000001c Nov 5 12:02:54 hikey kernel: [ 22.372618] Call trace: Nov 5 12:02:54 hikey kernel: [ 22.375088] Exception stack(0xffff00000b5f3b40 to 0xffff00000b5f3c80) Nov 5 12:02:54 hikey kernel: [ 22.381560] 3b40: 000000000000001c 000080005af95000 ffff800063e824c8 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.389417] 3b60: 0000000000000000 0000000000000000 00000000000001b2 6c62616e65206465 Nov 5 12:02:54 hikey kernel: [ 22.397276] 3b80: ffff00000862c298 00000000ffffffd0 ffff00000b5f3980 ffff000009290000 Nov 5 12:02:54 hikey kernel: [ 22.405136] 3ba0: ffff000009406797 ffff00008940678f ffff000009269c08 ffffffffffffffff Nov 5 12:02:54 hikey kernel: [ 22.412994] 3bc0: 0000000000000000 0000000000000000 0000000000000010 ffff800061726000 Nov 5 12:02:54 hikey kernel: [ 22.420852] 3be0: 000000000000002c 000000000000f980 ffff80005d565e88 ffff00000b5f3d9c Nov 5 12:02:54 hikey kernel: [ 22.428710] 3c00: 0000000000000002 0000000000000000 ffff000008a71870 ffff80005c139e38 Nov 5 12:02:54 hikey kernel: [ 22.436569] 3c20: 0000000000000000 ffff00000b5f3c80 ffff00000813e010 ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.444426] 3c40: ffff00000813e010 00000000000001c5 0000000000000000 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.452286] 3c60: ffffffffffffffff ffff800061800618 ffff00000b5f3c80 ffff00000813e010 Nov 5 12:02:54 hikey kernel: [ 22.460144] [<ffff00000813e010>] __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.465394] [<ffff00000813e058>] enable_irq+0x40/0x78 Nov 5 12:02:54 hikey kernel: [ 22.470493] [<ffff000000e228a8>] hisi_thermal_get_temp+0x1b0/0x1d8 [hisi_thermal] Nov 5 12:02:54 hikey kernel: [ 22.478008] [<ffff0000087121a8>] of_thermal_get_temp+0x38/0x50 Nov 5 12:02:54 hikey kernel: [ 22.483869] [<ffff000008711790>] thermal_zone_get_temp+0x58/0x80 Nov 5 12:02:54 hikey kernel: [ 22.489903] [<ffff00000870e7bc>] thermal_zone_device_update.part.4+0x2c/0x1a8 Nov 5 12:02:54 hikey kernel: [ 22.497066] [<ffff00000870e9c8>] thermal_zone_device_check+0x40/0x50 Nov 5 12:02:54 hikey kernel: [ 22.503457] [<ffff0000080f1674>] process_one_work+0x19c/0x3d0 Nov 5 12:02:54 hikey kernel: [ 22.509236] [<ffff0000080f18f4>] worker_thread+0x4c/0x428 Nov 5 12:02:54 hikey kernel: [ 22.514664] [<ffff0000080f84fc>] kthread+0x134/0x138 Nov 5 12:02:54 hikey kernel: [ 22.519659] [<ffff000008085154>] ret_from_fork+0x10/0x1c Nov 5 12:02:54 hikey kernel: [ 22.524988] ---[ end trace 328d4bb2d9b066a0 ]---
This issue was solved when "hisi_thermal_alarm_irq" function was removed so only "hisi_thermal_alarm_irq_thread" would exist. This has fixed the issue for the unbalanced enablement since there is no more:
disable_irq_nosync(irq); data->irq_enabled = false;
logic being done in parallel to the threaded handler AND the thermal_zone_device_update() call only happens now if the temperature is already above the threshold.
So should we revert a patch instead of taking these new ones? Would that be easier and is this a "real" issue or just an annoying warning splat in the kernel log?
Actually, this warning is introduced with the driver and all the plumbers around to fix an irq bouncing. There is no patch to revert without removing the driver.
Greg,
Patch 5 in this series seems to explain the best what is happening here:
With the following changes, we fix all in one:
Do the setup, one time, at probe time
Add the IRQF_ONESHOT, ack the interrupt in the threaded handler
Remove the interrupt handler
Set the correct value for the LAG register
Remove all the irq_enabled stuff in the code as the interruption handling is fixed
Remove the 3ms delay
Reorder the initialization routine to be in the right order
We can't revert anything because the breakage was there since the driver was introduced.
So the driver was broken in 4.14, why not just use 4.19 instead? This isn't a 4.14 regression, it's something that obviously no one has noticed for a year now, so why backport these big patches to 4.14 now?
It was not broken before this series but wobbly with a high latency to read the temperature and an unusual interrupt handling scheme to work around an uncatched register misconfiguration.
I think it was noticed but without finding the root cause of the issue. This one is definitively solved with these patches and the resulting code is consistent with the other thermal drivers.
All these patches were applied to Android-4.14 if worth to mention.
linux-stable-mirror@lists.linaro.org