The arm defconfig builds failed on today's Linux next tag next-20240304.
Build log: --------- ERROR: modpost: "__aeabi_uldivmod" [drivers/gpu/drm/sun4i/sun4i-drm-hdmi.ko] undefined!
Steps to reproduce: # tuxmake --runtime podman --target-arch arm --toolchain clang-17 --kconfig defconfig LLVM=1 LLVM_IAS=1
Links: - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240304/tes... - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240304/tes... - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240304/tes...
-- Linaro LKFT https://lkft.linaro.org
On Mon, Mar 4, 2024, at 09:07, Naresh Kamboju wrote:
The arm defconfig builds failed on today's Linux next tag next-20240304.
Build log:
ERROR: modpost: "__aeabi_uldivmod" [drivers/gpu/drm/sun4i/sun4i-drm-hdmi.ko] undefined!
Apparently caused by the 64-bit division in 358e76fd613a ("drm/sun4i: hdmi: Consolidate atomic_check and mode_valid"):
+static enum drm_mode_status +sun4i_hdmi_connector_clock_valid(const struct drm_connector *connector, + const struct drm_display_mode *mode, + unsigned long long clock) { - struct sun4i_hdmi *hdmi = drm_encoder_to_sun4i_hdmi(encoder); - unsigned long rate = mode->clock * 1000; - unsigned long diff = rate / 200; /* +-0.5% allowed by HDMI spec */ + const struct sun4i_hdmi *hdmi = drm_connector_to_sun4i_hdmi(connector); + unsigned long diff = clock / 200; /* +-0.5% allowed by HDMI spec */ long rounded_rate;
This used to be a 32-bit division. If the rate is never more than 4.2GHz, clock could be turned back into 'unsigned long' to avoid the expensive div_u64().
Arnd
On Mon, 04 Mar 2024 12:11:36 +0100 "Arnd Bergmann" arnd@arndb.de wrote:
Hi,
On Mon, Mar 4, 2024, at 09:07, Naresh Kamboju wrote:
The arm defconfig builds failed on today's Linux next tag next-20240304.
Build log:
ERROR: modpost: "__aeabi_uldivmod" [drivers/gpu/drm/sun4i/sun4i-drm-hdmi.ko] undefined!
Apparently caused by the 64-bit division in 358e76fd613a ("drm/sun4i: hdmi: Consolidate atomic_check and mode_valid"):
+static enum drm_mode_status +sun4i_hdmi_connector_clock_valid(const struct drm_connector *connector,
const struct drm_display_mode *mode,
unsigned long long clock)
{
struct sun4i_hdmi *hdmi = drm_encoder_to_sun4i_hdmi(encoder);
unsigned long rate = mode->clock * 1000;
unsigned long diff = rate / 200; /* +-0.5% allowed by HDMI spec */
const struct sun4i_hdmi *hdmi = drm_connector_to_sun4i_hdmi(connector);
unsigned long diff = clock / 200; /* +-0.5% allowed by HDMI spec */
Wouldn't "div_u64(clock, 200)" solve this problem?
Cheers, Andre
long rounded_rate;
This used to be a 32-bit division. If the rate is never more than 4.2GHz, clock could be turned back into 'unsigned long' to avoid the expensive div_u64().
Arnd
On Mon, Mar 4, 2024, at 12:24, Andre Przywara wrote:
On Mon, 04 Mar 2024 12:11:36 +0100 "Arnd Bergmann" arnd@arndb.de wrote:
This used to be a 32-bit division. If the rate is never more than 4.2GHz, clock could be turned back into 'unsigned long' to avoid the expensive div_u64().
Wouldn't "div_u64(clock, 200)" solve this problem?
Yes, that's why I mentioned it as the worse of the two obvious solutions. ;-)
Arnd
On Mon, 04 Mar 2024 12:26:46 +0100 "Arnd Bergmann" arnd@arndb.de wrote:
On Mon, Mar 4, 2024, at 12:24, Andre Przywara wrote:
On Mon, 04 Mar 2024 12:11:36 +0100 "Arnd Bergmann" arnd@arndb.de wrote:
This used to be a 32-bit division. If the rate is never more than 4.2GHz, clock could be turned back into 'unsigned long' to avoid the expensive div_u64().
Wouldn't "div_u64(clock, 200)" solve this problem?
Yes, that's why I mentioned it as the worse of the two obvious solutions. ;-)
Argh, should have cleaned my glasses first ;-)
I guess I was put somehow put off by the word "expensive". While it's admittedly not trivial, I wonder if we care about the (hidden) complexity of that function? I mean it's neither core code nor something called frequently? I don't think we have any clock exceeding 3GHz at the moment, but it sounds fishy to rely on that.
Cheers, Andre
On Mon, Mar 4, 2024, at 12:45, Andre Przywara wrote:
On Mon, 04 Mar 2024 12:26:46 +0100 "Arnd Bergmann" arnd@arndb.de wrote:
On Mon, Mar 4, 2024, at 12:24, Andre Przywara wrote:
On Mon, 04 Mar 2024 12:11:36 +0100 "Arnd Bergmann" arnd@arndb.de wrote:
This used to be a 32-bit division. If the rate is never more than 4.2GHz, clock could be turned back into 'unsigned long' to avoid the expensive div_u64().
Wouldn't "div_u64(clock, 200)" solve this problem?
Yes, that's why I mentioned it as the worse of the two obvious solutions. ;-)
Argh, should have cleaned my glasses first ;-)
I guess I was put somehow put off by the word "expensive". While it's admittedly not trivial, I wonder if we care about the (hidden) complexity of that function? I mean it's neither core code nor something called frequently?
It's not critical if this is called infrequently, and as Maxime just replied, the 64-bit division is in fact required here. Since we are dividing by a constant value (200), there is a good chance that this will be get turned into fairly efficient multiply/shift code.
I don't think we have any clock exceeding 3GHz at the moment, but it sounds fishy to rely on that.
Right, it's just important to look at each case individually. The cost of 64-bit division is crazy if it gets called repeatedly, which is of course the entire reason we don't provide a __aeabi_uldivmod() function and require developers to think before adding div_u64().
Arnd
On Mon, 4 Mar 2024 at 13:35, Arnd Bergmann arnd@arndb.de wrote:
On Mon, Mar 4, 2024, at 12:45, Andre Przywara wrote:
On Mon, 04 Mar 2024 12:26:46 +0100 "Arnd Bergmann" arnd@arndb.de wrote:
On Mon, Mar 4, 2024, at 12:24, Andre Przywara wrote:
On Mon, 04 Mar 2024 12:11:36 +0100 "Arnd Bergmann" arnd@arndb.de wrote:
This used to be a 32-bit division. If the rate is never more than 4.2GHz, clock could be turned back into 'unsigned long' to avoid the expensive div_u64().
Wouldn't "div_u64(clock, 200)" solve this problem?
Yes, that's why I mentioned it as the worse of the two obvious solutions. ;-)
Argh, should have cleaned my glasses first ;-)
I guess I was put somehow put off by the word "expensive". While it's admittedly not trivial, I wonder if we care about the (hidden) complexity of that function? I mean it's neither core code nor something called frequently?
It's not critical if this is called infrequently, and as Maxime just replied, the 64-bit division is in fact required here. Since we are dividing by a constant value (200), there is a good chance that this will be get turned into fairly efficient multiply/shift code.
Clang does not implement that optimization for 64-bit division. That is how we ended up with this error in the first place.
Perhaps it is worthwhile to make div_u64() check its divisor, e.g.,
--- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -127,6 +127,9 @@ static inline u64 div_u64(u64 dividend, u32 divisor) { u32 remainder; + + if (IS_ENABLED(CONFIG_CC_IS_GCC) && __builtin_constant_p(divisor)) + return dividend / divisor; return div_u64_rem(dividend, divisor, &remainder); } #endif
On Mon, Mar 4, 2024, at 14:01, Ard Biesheuvel wrote:
On Mon, 4 Mar 2024 at 13:35, Arnd Bergmann arnd@arndb.de wrote:
On Mon, Mar 4, 2024, at 12:45, Andre Przywara wrote: It's not critical if this is called infrequently, and as Maxime just replied, the 64-bit division is in fact required here. Since we are dividing by a constant value (200), there is a good chance that this will be get turned into fairly efficient multiply/shift code.
Clang does not implement that optimization for 64-bit division. That is how we ended up with this error in the first place.
I meant it will use the optimization after the patch to convert the plain '/' to div_u64().
Perhaps it is worthwhile to make div_u64() check its divisor, e.g.,
--- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -127,6 +127,9 @@ static inline u64 div_u64(u64 dividend, u32 divisor) { u32 remainder;
if (IS_ENABLED(CONFIG_CC_IS_GCC) && __builtin_constant_p(divisor))
return dividend / divisor; return div_u64_rem(dividend, divisor, &remainder);
}
I think the div_u64()->do_div()->__div64_const32()->__arch_xprod_64() optimization in asm-generic/div64.h already produces what we want on both compilers. Is there something missing there?
Arnd
On Mon, 4 Mar 2024 at 14:49, Arnd Bergmann arnd@arndb.de wrote:
On Mon, Mar 4, 2024, at 14:01, Ard Biesheuvel wrote:
On Mon, 4 Mar 2024 at 13:35, Arnd Bergmann arnd@arndb.de wrote:
On Mon, Mar 4, 2024, at 12:45, Andre Przywara wrote: It's not critical if this is called infrequently, and as Maxime just replied, the 64-bit division is in fact required here. Since we are dividing by a constant value (200), there is a good chance that this will be get turned into fairly efficient multiply/shift code.
Clang does not implement that optimization for 64-bit division. That is how we ended up with this error in the first place.
I meant it will use the optimization after the patch to convert the plain '/' to div_u64().
Ah ok.
I did not realize we implement the same optimization in our code as the one that GCC will apply when encountering a compile-time constant divisor.
Perhaps it is worthwhile to make div_u64() check its divisor, e.g.,
--- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -127,6 +127,9 @@ static inline u64 div_u64(u64 dividend, u32 divisor) { u32 remainder;
if (IS_ENABLED(CONFIG_CC_IS_GCC) && __builtin_constant_p(divisor))
return dividend / divisor; return div_u64_rem(dividend, divisor, &remainder);
}
I think the div_u64()->do_div()->__div64_const32()->__arch_xprod_64() optimization in asm-generic/div64.h already produces what we want on both compilers. Is there something missing there?
No, you are right. I thought we were relying on GCC for the optimization here.
Hi,
On Mon, Mar 04, 2024 at 12:11:36PM +0100, Arnd Bergmann wrote:
On Mon, Mar 4, 2024, at 09:07, Naresh Kamboju wrote:
The arm defconfig builds failed on today's Linux next tag next-20240304.
Build log:
ERROR: modpost: "__aeabi_uldivmod" [drivers/gpu/drm/sun4i/sun4i-drm-hdmi.ko] undefined!
Apparently caused by the 64-bit division in 358e76fd613a ("drm/sun4i: hdmi: Consolidate atomic_check and mode_valid"):
+static enum drm_mode_status +sun4i_hdmi_connector_clock_valid(const struct drm_connector *connector,
const struct drm_display_mode *mode,
unsigned long long clock)
{
struct sun4i_hdmi *hdmi = drm_encoder_to_sun4i_hdmi(encoder);
unsigned long rate = mode->clock * 1000;
unsigned long diff = rate / 200; /* +-0.5% allowed by HDMI spec */
const struct sun4i_hdmi *hdmi = drm_connector_to_sun4i_hdmi(connector);
unsigned long diff = clock / 200; /* +-0.5% allowed by HDMI spec */ long rounded_rate;
This used to be a 32-bit division. If the rate is never more than 4.2GHz, clock could be turned back into 'unsigned long' to avoid the expensive div_u64().
I sent a fix for it this morning: https://lore.kernel.org/r/20240304091225.366325-1-mripard@kernel.org
The framework will pass an unsigned long long because HDMI character rates can go up to 5.9GHz.
Maxime
From: Maxime Ripard
Sent: 04 March 2024 11:46
On Mon, Mar 04, 2024 at 12:11:36PM +0100, Arnd Bergmann wrote:
On Mon, Mar 4, 2024, at 09:07, Naresh Kamboju wrote:
The arm defconfig builds failed on today's Linux next tag next-20240304.
Build log:
ERROR: modpost: "__aeabi_uldivmod" [drivers/gpu/drm/sun4i/sun4i-drm-hdmi.ko] undefined!
Apparently caused by the 64-bit division in 358e76fd613a ("drm/sun4i: hdmi: Consolidate atomic_check and mode_valid"):
+static enum drm_mode_status +sun4i_hdmi_connector_clock_valid(const struct drm_connector *connector,
const struct drm_display_mode *mode,
unsigned long long clock)
{
struct sun4i_hdmi *hdmi = drm_encoder_to_sun4i_hdmi(encoder);
unsigned long rate = mode->clock * 1000;
unsigned long diff = rate / 200; /* +-0.5% allowed by HDMI spec */
const struct sun4i_hdmi *hdmi = drm_connector_to_sun4i_hdmi(connector);
unsigned long diff = clock / 200; /* +-0.5% allowed by HDMI spec */ long rounded_rate;
This used to be a 32-bit division. If the rate is never more than 4.2GHz, clock could be turned back into 'unsigned long' to avoid the expensive div_u64().
I sent a fix for it this morning: https://lore.kernel.org/r/20240304091225.366325-1-mripard@kernel.org
The framework will pass an unsigned long long because HDMI character rates can go up to 5.9GHz.
You could do: /* The max clock is 5.9GHz, split the divide */ u32 diff = (u32)(clock / 8) / (200/8);
The code should really use u32 and u64. Otherwise the sizes are different on 32bit.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Sat, Mar 9, 2024 at 3:34 PM David Laight David.Laight@aculab.com wrote:
From: Maxime Ripard
Sent: 04 March 2024 11:46
On Mon, Mar 04, 2024 at 12:11:36PM +0100, Arnd Bergmann wrote:
On Mon, Mar 4, 2024, at 09:07, Naresh Kamboju wrote:
The arm defconfig builds failed on today's Linux next tag next-20240304.
Build log:
ERROR: modpost: "__aeabi_uldivmod" [drivers/gpu/drm/sun4i/sun4i-drm-hdmi.ko] undefined!
Apparently caused by the 64-bit division in 358e76fd613a ("drm/sun4i: hdmi: Consolidate atomic_check and mode_valid"):
+static enum drm_mode_status +sun4i_hdmi_connector_clock_valid(const struct drm_connector *connector,
const struct drm_display_mode *mode,
unsigned long long clock)
{
struct sun4i_hdmi *hdmi = drm_encoder_to_sun4i_hdmi(encoder);
unsigned long rate = mode->clock * 1000;
unsigned long diff = rate / 200; /* +-0.5% allowed by HDMI spec */
const struct sun4i_hdmi *hdmi = drm_connector_to_sun4i_hdmi(connector);
unsigned long diff = clock / 200; /* +-0.5% allowed by HDMI spec */ long rounded_rate;
This used to be a 32-bit division. If the rate is never more than 4.2GHz, clock could be turned back into 'unsigned long' to avoid the expensive div_u64().
I sent a fix for it this morning: https://lore.kernel.org/r/20240304091225.366325-1-mripard@kernel.org
The framework will pass an unsigned long long because HDMI character rates can go up to 5.9GHz.
You could do: /* The max clock is 5.9GHz, split the divide */ u32 diff = (u32)(clock / 8) / (200/8);
+1, as the issue is still present in current next, as per the recent nagging from the build bots.
The code should really use u32 and u64. Otherwise the sizes are different on 32bit.
The code is already using a variety of types (long, unsigned long long, unsigned long) :-(
max_t(unsigned long, rounded_rate, clock) - min_t(unsigned long, rounded_rate, clock) < diff)
At least u64 should make it very clear clock does not fit in 32-bit.
Gr{oetje,eeting}s,
Geert
On Thu, Mar 14, 2024 at 10:27 AM Geert Uytterhoeven geert@linux-m68k.org wrote:
On Sat, Mar 9, 2024 at 3:34 PM David Laight David.Laight@aculab.com wrote:
From: Maxime Ripard
Sent: 04 March 2024 11:46
On Mon, Mar 04, 2024 at 12:11:36PM +0100, Arnd Bergmann wrote:
On Mon, Mar 4, 2024, at 09:07, Naresh Kamboju wrote:
The arm defconfig builds failed on today's Linux next tag next-20240304.
Build log:
ERROR: modpost: "__aeabi_uldivmod" [drivers/gpu/drm/sun4i/sun4i-drm-hdmi.ko] undefined!
Apparently caused by the 64-bit division in 358e76fd613a ("drm/sun4i: hdmi: Consolidate atomic_check and mode_valid"):
+static enum drm_mode_status +sun4i_hdmi_connector_clock_valid(const struct drm_connector *connector,
const struct drm_display_mode *mode,
unsigned long long clock)
{
struct sun4i_hdmi *hdmi = drm_encoder_to_sun4i_hdmi(encoder);
unsigned long rate = mode->clock * 1000;
unsigned long diff = rate / 200; /* +-0.5% allowed by HDMI spec */
const struct sun4i_hdmi *hdmi = drm_connector_to_sun4i_hdmi(connector);
unsigned long diff = clock / 200; /* +-0.5% allowed by HDMI spec */ long rounded_rate;
This used to be a 32-bit division. If the rate is never more than 4.2GHz, clock could be turned back into 'unsigned long' to avoid the expensive div_u64().
I sent a fix for it this morning: https://lore.kernel.org/r/20240304091225.366325-1-mripard@kernel.org
The framework will pass an unsigned long long because HDMI character rates can go up to 5.9GHz.
You could do: /* The max clock is 5.9GHz, split the divide */ u32 diff = (u32)(clock / 8) / (200/8);
+1, as the issue is still present in current next, as per the recent nagging from the build bots.
Oops, s/next/upstream/. Well, less 32-bit build testing for the next few days :-(
Gr{oetje,eeting}s,
Geert
On Thu, Mar 14, 2024 at 10:27:23AM +0100, Geert Uytterhoeven wrote:
On Sat, Mar 9, 2024 at 3:34 PM David Laight David.Laight@aculab.com wrote:
From: Maxime Ripard
Sent: 04 March 2024 11:46
On Mon, Mar 04, 2024 at 12:11:36PM +0100, Arnd Bergmann wrote:
On Mon, Mar 4, 2024, at 09:07, Naresh Kamboju wrote:
The arm defconfig builds failed on today's Linux next tag next-20240304.
Build log:
ERROR: modpost: "__aeabi_uldivmod" [drivers/gpu/drm/sun4i/sun4i-drm-hdmi.ko] undefined!
Apparently caused by the 64-bit division in 358e76fd613a ("drm/sun4i: hdmi: Consolidate atomic_check and mode_valid"):
+static enum drm_mode_status +sun4i_hdmi_connector_clock_valid(const struct drm_connector *connector,
const struct drm_display_mode *mode,
unsigned long long clock)
{
struct sun4i_hdmi *hdmi = drm_encoder_to_sun4i_hdmi(encoder);
unsigned long rate = mode->clock * 1000;
unsigned long diff = rate / 200; /* +-0.5% allowed by HDMI spec */
const struct sun4i_hdmi *hdmi = drm_connector_to_sun4i_hdmi(connector);
unsigned long diff = clock / 200; /* +-0.5% allowed by HDMI spec */ long rounded_rate;
This used to be a 32-bit division. If the rate is never more than 4.2GHz, clock could be turned back into 'unsigned long' to avoid the expensive div_u64().
I sent a fix for it this morning: https://lore.kernel.org/r/20240304091225.366325-1-mripard@kernel.org
The framework will pass an unsigned long long because HDMI character rates can go up to 5.9GHz.
You could do: /* The max clock is 5.9GHz, split the divide */ u32 diff = (u32)(clock / 8) / (200/8);
+1, as the issue is still present in current next, as per the recent nagging from the build bots.
A patch to fix it has been merged today and will show up tomorrow in next.
Maxime