Hi,
The main details on this series are in patch #2's commit log. It's long, so I won't repeat it again here for the v2. As before, I've tried to trim the CC list.
v2: - _keep_ ksize(), but remove instrumentation (makes patch series smaller) - reorganized skbuff logic to avoid yet more copy/paste code - added a WARN to a separate skbuff ksize usage - add new refactorings: bpf, openvswitch, devres, mempool, kasan - dropped "independent" patches: iwlwifi, x86/microcode/AMD (sent separately) v1: https://lore.kernel.org/lkml/20220922031013.2150682-1-keescook@chromium.org
Notes:
Originally when I was going to entirely remove ksize(), there were a handful for refactorings that just needed to do ksize -> __ksize. In the end, it was cleaner to actually leave ksize() as a real function, just without the kasan instrumentation. I wonder, however, if it should be converted into a static inline now?
I dropped Jakub's Ack because I refactored that code a bunch more.
The 2 patches that didn't need to call kmalloc_size_roundup() don't need to be part of this series. (One is already in -next, actually.)
I'd like to land at least the first two patches in the coming v6.1 merge window so that the per-subsystem patches can be sent to their various subsystems directly. Vlastimil, what you think?
Thanks!
-Kees
Kees Cook (16): slab: Remove __malloc attribute from realloc functions slab: Introduce kmalloc_size_roundup() skbuff: Proactively round up to kmalloc bucket size skbuff: Phase out ksize() fallback for frag_size net: ipa: Proactively round up to kmalloc bucket size igb: Proactively round up to kmalloc bucket size btrfs: send: Proactively round up to kmalloc bucket size dma-buf: Proactively round up to kmalloc bucket size coredump: Proactively round up to kmalloc bucket size openvswitch: Use kmalloc_size_roundup() to match ksize() usage bpf: Use kmalloc_size_roundup() to match ksize() usage devres: Use kmalloc_size_roundup() to match ksize() usage mempool: Use kmalloc_size_roundup() to match ksize() usage kasan: Remove ksize()-related tests mm: Make ksize() a reporting-only function slab: Restore __alloc_size attribute to __kmalloc_track_caller
drivers/base/devres.c | 3 + drivers/dma-buf/dma-resv.c | 9 ++- drivers/net/ethernet/intel/igb/igb_main.c | 5 +- drivers/net/ipa/gsi_trans.c | 7 +- fs/btrfs/send.c | 11 +-- fs/coredump.c | 7 +- include/linux/compiler_types.h | 13 ++-- include/linux/skbuff.h | 5 +- include/linux/slab.h | 46 +++++++++++-- kernel/bpf/verifier.c | 49 +++++++++----- lib/test_kasan.c | 42 ------------ mm/kasan/shadow.c | 4 +- mm/mempool.c | 2 +- mm/slab.c | 9 ++- mm/slab_common.c | 62 ++++++++++------- net/core/skbuff.c | 82 ++++++++++++----------- net/openvswitch/flow_netlink.c | 2 +- 17 files changed, 192 insertions(+), 166 deletions(-)
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed.
Additionally removes the conditional test for __alloc_size__, which is always defined now.
Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Hyeonggon Yoo 42.hyeyoo@gmail.com Cc: Marco Elver elver@google.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org --- include/linux/compiler_types.h | 13 +++++-------- include/linux/slab.h | 12 ++++++------ mm/slab_common.c | 4 ++-- 3 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 4f2a819fd60a..f141a6f6b9f6 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -271,15 +271,12 @@ struct ftrace_likely_data {
/* * Any place that could be marked with the "alloc_size" attribute is also - * a place to be marked with the "malloc" attribute. Do this as part of the - * __alloc_size macro to avoid redundant attributes and to avoid missing a - * __malloc marking. + * a place to be marked with the "malloc" attribute, except those that may + * be performing a _reallocation_, as that may alias the existing pointer. + * For these, use __realloc_size(). */ -#ifdef __alloc_size__ -# define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc -#else -# define __alloc_size(x, ...) __malloc -#endif +#define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc +#define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__)
#ifndef asm_volatile_goto #define asm_volatile_goto(x...) asm goto(x) diff --git a/include/linux/slab.h b/include/linux/slab.h index 0fefdf528e0d..41bd036e7551 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -184,7 +184,7 @@ int kmem_cache_shrink(struct kmem_cache *s); /* * Common kmalloc functions provided by all allocators */ -void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2); +void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2); void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp); @@ -647,10 +647,10 @@ static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_ * @new_size: new size of a single member of the array * @flags: the type of memory to allocate (see kmalloc) */ -static inline __alloc_size(2, 3) void * __must_check krealloc_array(void *p, - size_t new_n, - size_t new_size, - gfp_t flags) +static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p, + size_t new_n, + size_t new_size, + gfp_t flags) { size_t bytes;
@@ -774,7 +774,7 @@ static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t fla }
extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags) - __alloc_size(3); + __realloc_size(3); extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len);
diff --git a/mm/slab_common.c b/mm/slab_common.c index 17996649cfe3..457671ace7eb 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1134,8 +1134,8 @@ module_init(slab_proc_init);
#endif /* CONFIG_SLAB || CONFIG_SLUB_DEBUG */
-static __always_inline void *__do_krealloc(const void *p, size_t new_size, - gfp_t flags) +static __always_inline __realloc_size(2) void * +__do_krealloc(const void *p, size_t new_size, gfp_t flags) { void *ret; size_t ks;
Hi Kees,
On Fri, Sep 23, 2022 at 10:35 PM Kees Cook keescook@chromium.org wrote:
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed.
Additionally removes the conditional test for __alloc_size__, which is always defined now.
Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Hyeonggon Yoo 42.hyeyoo@gmail.com Cc: Marco Elver elver@google.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
Thanks for your patch, which is now commit 63caa04ec60583b1 ("slab: Remove __malloc attribute from realloc functions") in next-20220927.
Noreply@ellerman.id.au reported all gcc8-based builds to fail (e.g. [1], more at [2]):
In file included from <command-line>: ./include/linux/percpu.h: In function ‘__alloc_reserved_percpu’: ././include/linux/compiler_types.h:279:30: error: expected declaration specifiers before ‘__alloc_size__’ #define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc ^~~~~~~~~~~~~~ ./include/linux/percpu.h:120:74: note: in expansion of macro ‘__alloc_size’ [...]
It's building fine with e.g. gcc-9 (which is my usual m68k cross-compiler). Reverting this commit on next-20220927 fixes the issue.
[1] http://kisskb.ellerman.id.au/kisskb/buildresult/14803908/ [2] http://kisskb.ellerman.id.au/kisskb/head/1bd8b75fe6adeaa89d02968bdd811ffe708...
include/linux/compiler_types.h | 13 +++++-------- include/linux/slab.h | 12 ++++++------ mm/slab_common.c | 4 ++-- 3 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 4f2a819fd60a..f141a6f6b9f6 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -271,15 +271,12 @@ struct ftrace_likely_data {
/*
- Any place that could be marked with the "alloc_size" attribute is also
- a place to be marked with the "malloc" attribute. Do this as part of the
- __alloc_size macro to avoid redundant attributes and to avoid missing a
- __malloc marking.
- a place to be marked with the "malloc" attribute, except those that may
- be performing a _reallocation_, as that may alias the existing pointer.
*/
- For these, use __realloc_size().
-#ifdef __alloc_size__ -# define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc -#else -# define __alloc_size(x, ...) __malloc -#endif +#define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc +#define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__)
#ifndef asm_volatile_goto #define asm_volatile_goto(x...) asm goto(x) diff --git a/include/linux/slab.h b/include/linux/slab.h index 0fefdf528e0d..41bd036e7551 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -184,7 +184,7 @@ int kmem_cache_shrink(struct kmem_cache *s); /*
- Common kmalloc functions provided by all allocators
*/ -void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2); +void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2); void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp); @@ -647,10 +647,10 @@ static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_
- @new_size: new size of a single member of the array
- @flags: the type of memory to allocate (see kmalloc)
*/ -static inline __alloc_size(2, 3) void * __must_check krealloc_array(void *p,
size_t new_n,
size_t new_size,
gfp_t flags)
+static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
size_t new_n,
size_t new_size,
gfp_t flags)
{ size_t bytes;
@@ -774,7 +774,7 @@ static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t fla }
extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
__alloc_size(3);
__realloc_size(3);
extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len);
diff --git a/mm/slab_common.c b/mm/slab_common.c index 17996649cfe3..457671ace7eb 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1134,8 +1134,8 @@ module_init(slab_proc_init);
#endif /* CONFIG_SLAB || CONFIG_SLUB_DEBUG */
-static __always_inline void *__do_krealloc(const void *p, size_t new_size,
gfp_t flags)
+static __always_inline __realloc_size(2) void * +__do_krealloc(const void *p, size_t new_size, gfp_t flags) { void *ret; size_t ks; -- 2.34.1
-- Gr{oetje,eeting}s,
Geert
-- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On 9/28/22 09:26, Geert Uytterhoeven wrote:
Hi Kees,
On Fri, Sep 23, 2022 at 10:35 PM Kees Cook keescook@chromium.org wrote:
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed.
Additionally removes the conditional test for __alloc_size__, which is always defined now.
Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Hyeonggon Yoo 42.hyeyoo@gmail.com Cc: Marco Elver elver@google.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
Thanks for your patch, which is now commit 63caa04ec60583b1 ("slab: Remove __malloc attribute from realloc functions") in next-20220927.
Noreply@ellerman.id.au reported all gcc8-based builds to fail (e.g. [1], more at [2]):
In file included from <command-line>: ./include/linux/percpu.h: In function ‘__alloc_reserved_percpu’: ././include/linux/compiler_types.h:279:30: error: expected
declaration specifiers before ‘__alloc_size__’ #define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc ^~~~~~~~~~~~~~ ./include/linux/percpu.h:120:74: note: in expansion of macro ‘__alloc_size’ [...]
It's building fine with e.g. gcc-9 (which is my usual m68k cross-compiler). Reverting this commit on next-20220927 fixes the issue.
So IIUC it was wrong to remove the #ifdefs?
[1] http://kisskb.ellerman.id.au/kisskb/buildresult/14803908/ [2] http://kisskb.ellerman.id.au/kisskb/head/1bd8b75fe6adeaa89d02968bdd811ffe708...
include/linux/compiler_types.h | 13 +++++-------- include/linux/slab.h | 12 ++++++------ mm/slab_common.c | 4 ++-- 3 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 4f2a819fd60a..f141a6f6b9f6 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -271,15 +271,12 @@ struct ftrace_likely_data {
/*
- Any place that could be marked with the "alloc_size" attribute is also
- a place to be marked with the "malloc" attribute. Do this as part of the
- __alloc_size macro to avoid redundant attributes and to avoid missing a
- __malloc marking.
- a place to be marked with the "malloc" attribute, except those that may
- be performing a _reallocation_, as that may alias the existing pointer.
*/
- For these, use __realloc_size().
-#ifdef __alloc_size__ -# define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc -#else -# define __alloc_size(x, ...) __malloc -#endif +#define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc +#define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__)
#ifndef asm_volatile_goto #define asm_volatile_goto(x...) asm goto(x) diff --git a/include/linux/slab.h b/include/linux/slab.h index 0fefdf528e0d..41bd036e7551 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -184,7 +184,7 @@ int kmem_cache_shrink(struct kmem_cache *s); /*
- Common kmalloc functions provided by all allocators
*/ -void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2); +void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2); void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp); @@ -647,10 +647,10 @@ static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_
- @new_size: new size of a single member of the array
- @flags: the type of memory to allocate (see kmalloc)
*/ -static inline __alloc_size(2, 3) void * __must_check krealloc_array(void *p,
size_t new_n,
size_t new_size,
gfp_t flags)
+static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
size_t new_n,
size_t new_size,
{ size_t bytes;gfp_t flags)
@@ -774,7 +774,7 @@ static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t fla }
extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
__alloc_size(3);
extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len);__realloc_size(3);
diff --git a/mm/slab_common.c b/mm/slab_common.c index 17996649cfe3..457671ace7eb 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1134,8 +1134,8 @@ module_init(slab_proc_init);
#endif /* CONFIG_SLAB || CONFIG_SLUB_DEBUG */
-static __always_inline void *__do_krealloc(const void *p, size_t new_size,
gfp_t flags)
+static __always_inline __realloc_size(2) void * +__do_krealloc(const void *p, size_t new_size, gfp_t flags) { void *ret; size_t ks; -- 2.34.1
-- Gr{oetje,eeting}s,
Geert
-- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Wed, Sep 28, 2022 at 09:26:15AM +0200, Geert Uytterhoeven wrote:
Hi Kees,
On Fri, Sep 23, 2022 at 10:35 PM Kees Cook keescook@chromium.org wrote:
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed.
Additionally removes the conditional test for __alloc_size__, which is always defined now.
Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Hyeonggon Yoo 42.hyeyoo@gmail.com Cc: Marco Elver elver@google.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
Thanks for your patch, which is now commit 63caa04ec60583b1 ("slab: Remove __malloc attribute from realloc functions") in next-20220927.
Noreply@ellerman.id.au reported all gcc8-based builds to fail (e.g. [1], more at [2]):
In file included from <command-line>: ./include/linux/percpu.h: In function ‘__alloc_reserved_percpu’: ././include/linux/compiler_types.h:279:30: error: expected
declaration specifiers before ‘__alloc_size__’ #define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc ^~~~~~~~~~~~~~ ./include/linux/percpu.h:120:74: note: in expansion of macro ‘__alloc_size’ [...]
It's building fine with e.g. gcc-9 (which is my usual m68k cross-compiler). Reverting this commit on next-20220927 fixes the issue.
[1] http://kisskb.ellerman.id.au/kisskb/buildresult/14803908/ [2] http://kisskb.ellerman.id.au/kisskb/head/1bd8b75fe6adeaa89d02968bdd811ffe708...
Eek! Thanks for letting me know. I'm confused about this -- __alloc_size__ wasn't optional in compiler_attributes.h -- but obviously I broke something! I'll go figure this out.
-Kees
On 9/28/22 19:13, Kees Cook wrote:
On Wed, Sep 28, 2022 at 09:26:15AM +0200, Geert Uytterhoeven wrote:
Hi Kees,
On Fri, Sep 23, 2022 at 10:35 PM Kees Cook keescook@chromium.org wrote:
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed.
Additionally removes the conditional test for __alloc_size__, which is always defined now.
Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Hyeonggon Yoo 42.hyeyoo@gmail.com Cc: Marco Elver elver@google.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
Thanks for your patch, which is now commit 63caa04ec60583b1 ("slab: Remove __malloc attribute from realloc functions") in next-20220927.
Noreply@ellerman.id.au reported all gcc8-based builds to fail (e.g. [1], more at [2]):
In file included from <command-line>: ./include/linux/percpu.h: In function ‘__alloc_reserved_percpu’: ././include/linux/compiler_types.h:279:30: error: expected
declaration specifiers before ‘__alloc_size__’ #define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc ^~~~~~~~~~~~~~ ./include/linux/percpu.h:120:74: note: in expansion of macro ‘__alloc_size’ [...]
It's building fine with e.g. gcc-9 (which is my usual m68k cross-compiler). Reverting this commit on next-20220927 fixes the issue.
[1] http://kisskb.ellerman.id.au/kisskb/buildresult/14803908/ [2] http://kisskb.ellerman.id.au/kisskb/head/1bd8b75fe6adeaa89d02968bdd811ffe708...
Eek! Thanks for letting me know. I'm confused about this -- __alloc_size__ wasn't optional in compiler_attributes.h -- but obviously I broke something! I'll go figure this out.
Even in latest next I can see at the end of include/linux/compiler-gcc.h
/* * Prior to 9.1, -Wno-alloc-size-larger-than (and therefore the "alloc_size" * attribute) do not work, and must be disabled. */ #if GCC_VERSION < 90100 #undef __alloc_size__ #endif
-Kees
Kees Cook keescook@chromium.org writes:
On Wed, Sep 28, 2022 at 09:26:15AM +0200, Geert Uytterhoeven wrote:
On Fri, Sep 23, 2022 at 10:35 PM Kees Cook keescook@chromium.org wrote:
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed.
Additionally removes the conditional test for __alloc_size__, which is always defined now.
Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Hyeonggon Yoo 42.hyeyoo@gmail.com Cc: Marco Elver elver@google.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
Thanks for your patch, which is now commit 63caa04ec60583b1 ("slab: Remove __malloc attribute from realloc functions") in next-20220927.
Noreply@ellerman.id.au reported all gcc8-based builds to fail (e.g. [1], more at [2]):
In file included from <command-line>: ./include/linux/percpu.h: In function ‘__alloc_reserved_percpu’: ././include/linux/compiler_types.h:279:30: error: expected
declaration specifiers before ‘__alloc_size__’ #define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc ^~~~~~~~~~~~~~ ./include/linux/percpu.h:120:74: note: in expansion of macro ‘__alloc_size’ [...]
It's building fine with e.g. gcc-9 (which is my usual m68k cross-compiler). Reverting this commit on next-20220927 fixes the issue.
[1] http://kisskb.ellerman.id.au/kisskb/buildresult/14803908/ [2] http://kisskb.ellerman.id.au/kisskb/head/1bd8b75fe6adeaa89d02968bdd811ffe708...
Eek! Thanks for letting me know. I'm confused about this -- __alloc_size__ wasn't optional in compiler_attributes.h -- but obviously I broke something! I'll go figure this out.
This fixes it for me.
cheers
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index f141a6f6b9f6..0717534f8364 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -275,8 +275,13 @@ struct ftrace_likely_data { * be performing a _reallocation_, as that may alias the existing pointer. * For these, use __realloc_size(). */ -#define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc -#define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) +#ifdef __alloc_size__ +# define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc +# define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) +#else +# define __alloc_size(x, ...) __malloc +# define __realloc_size(x, ...) +#endif
#ifndef asm_volatile_goto #define asm_volatile_goto(x...) asm goto(x)
Hi Michael,
On Thu, Sep 29, 2022 at 10:36 AM Michael Ellerman mpe@ellerman.id.au wrote:
Kees Cook keescook@chromium.org writes:
On Wed, Sep 28, 2022 at 09:26:15AM +0200, Geert Uytterhoeven wrote:
On Fri, Sep 23, 2022 at 10:35 PM Kees Cook keescook@chromium.org wrote:
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed.
Additionally removes the conditional test for __alloc_size__, which is always defined now.
Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Hyeonggon Yoo 42.hyeyoo@gmail.com Cc: Marco Elver elver@google.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
Thanks for your patch, which is now commit 63caa04ec60583b1 ("slab: Remove __malloc attribute from realloc functions") in next-20220927.
Noreply@ellerman.id.au reported all gcc8-based builds to fail (e.g. [1], more at [2]):
In file included from <command-line>: ./include/linux/percpu.h: In function ‘__alloc_reserved_percpu’: ././include/linux/compiler_types.h:279:30: error: expected
declaration specifiers before ‘__alloc_size__’ #define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc ^~~~~~~~~~~~~~ ./include/linux/percpu.h:120:74: note: in expansion of macro ‘__alloc_size’ [...]
It's building fine with e.g. gcc-9 (which is my usual m68k cross-compiler). Reverting this commit on next-20220927 fixes the issue.
[1] http://kisskb.ellerman.id.au/kisskb/buildresult/14803908/ [2] http://kisskb.ellerman.id.au/kisskb/head/1bd8b75fe6adeaa89d02968bdd811ffe708...
Eek! Thanks for letting me know. I'm confused about this -- __alloc_size__ wasn't optional in compiler_attributes.h -- but obviously I broke something! I'll go figure this out.
This fixes it for me.
Kees submitted a similar patch 20 minutes before: https://lore.kernel.org/all/20220929081642.1932200-1-keescook@chromium.org
--- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -275,8 +275,13 @@ struct ftrace_likely_data {
- be performing a _reallocation_, as that may alias the existing pointer.
- For these, use __realloc_size().
*/ -#define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc -#define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) +#ifdef __alloc_size__ +# define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc +# define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) +#else +# define __alloc_size(x, ...) __malloc +# define __realloc_size(x, ...) +#endif
#ifndef asm_volatile_goto #define asm_volatile_goto(x...) asm goto(x)
Gr{oetje,eeting}s,
Geert
-- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Fri, Sep 23, 2022 at 01:28:07PM -0700, Kees Cook wrote:
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed.
Additionally removes the conditional test for __alloc_size__, which is always defined now.
Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Hyeonggon Yoo 42.hyeyoo@gmail.com Cc: Marco Elver elver@google.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
include/linux/compiler_types.h | 13 +++++-------- include/linux/slab.h | 12 ++++++------ mm/slab_common.c | 4 ++-- 3 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 4f2a819fd60a..f141a6f6b9f6 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -271,15 +271,12 @@ struct ftrace_likely_data { /*
- Any place that could be marked with the "alloc_size" attribute is also
- a place to be marked with the "malloc" attribute. Do this as part of the
- __alloc_size macro to avoid redundant attributes and to avoid missing a
- __malloc marking.
- a place to be marked with the "malloc" attribute, except those that may
- be performing a _reallocation_, as that may alias the existing pointer.
*/
- For these, use __realloc_size().
-#ifdef __alloc_size__ -# define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc -#else -# define __alloc_size(x, ...) __malloc -#endif +#define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc +#define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) #ifndef asm_volatile_goto #define asm_volatile_goto(x...) asm goto(x) diff --git a/include/linux/slab.h b/include/linux/slab.h index 0fefdf528e0d..41bd036e7551 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -184,7 +184,7 @@ int kmem_cache_shrink(struct kmem_cache *s); /*
- Common kmalloc functions provided by all allocators
*/ -void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2); +void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2); void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp); @@ -647,10 +647,10 @@ static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_
- @new_size: new size of a single member of the array
- @flags: the type of memory to allocate (see kmalloc)
*/ -static inline __alloc_size(2, 3) void * __must_check krealloc_array(void *p,
size_t new_n,
size_t new_size,
gfp_t flags)
+static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
size_t new_n,
size_t new_size,
gfp_t flags)
{ size_t bytes; @@ -774,7 +774,7 @@ static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t fla } extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
__alloc_size(3);
__realloc_size(3);
extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len); diff --git a/mm/slab_common.c b/mm/slab_common.c index 17996649cfe3..457671ace7eb 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1134,8 +1134,8 @@ module_init(slab_proc_init); #endif /* CONFIG_SLAB || CONFIG_SLUB_DEBUG */ -static __always_inline void *__do_krealloc(const void *p, size_t new_size,
gfp_t flags)
+static __always_inline __realloc_size(2) void * +__do_krealloc(const void *p, size_t new_size, gfp_t flags) { void *ret; size_t ks; -- 2.34.1
This is now squashed with later one. (so undefined __alloc_size__ issues are fixed) for the latest version of this patch:
Looks good to me, Acked-by: Hyeonggon Yoo 42.hyeyoo@gmail.com
In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for.
There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel:
1) An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example:
data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data);
2) The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators:
data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize);
In both cases, the "how much was actually allocated?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information.
We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?".
Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize().
[1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183
Cc: Vlastimil Babka vbabka@suse.cz Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org --- include/linux/slab.h | 31 +++++++++++++++++++++++++++++++ mm/slab.c | 9 ++++++--- mm/slab_common.c | 20 ++++++++++++++++++++ 3 files changed, 57 insertions(+), 3 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h index 41bd036e7551..727640173568 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __r void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp); + +/** + * ksize - Report actual allocation size of associated object + * + * @objp: Pointer returned from a prior kmalloc()-family allocation. + * + * This should not be used for writing beyond the originally requested + * allocation size. Either use krealloc() or round up the allocation size + * with kmalloc_size_roundup() prior to allocation. If this is used to + * access beyond the originally requested allocation size, UBSAN_BOUNDS + * and/or FORTIFY_SOURCE may trip, since they only know about the + * originally allocated size via the __alloc_size attribute. + */ size_t ksize(const void *objp); + #ifdef CONFIG_PRINTK bool kmem_valid_obj(void *object); void kmem_dump_obj(void *object); @@ -779,6 +793,23 @@ extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len);
unsigned int kmem_cache_size(struct kmem_cache *s); + +/** + * kmalloc_size_roundup - Report allocation bucket size for the given size + * + * @size: Number of bytes to round up from. + * + * This returns the number of bytes that would be available in a kmalloc() + * allocation of @size bytes. For example, a 126 byte request would be + * rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly + * for the general-purpose kmalloc()-based allocations, and is not for the + * pre-sized kmem_cache_alloc()-based allocations.) + * + * Use this to kmalloc() the full bucket size ahead of time instead of using + * ksize() to query the size after an allocation. + */ +size_t kmalloc_size_roundup(size_t size); + void __init kmem_cache_init_late(void);
#if defined(CONFIG_SMP) && defined(CONFIG_SLAB) diff --git a/mm/slab.c b/mm/slab.c index 10e96137b44f..2da862bf6226 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -4192,11 +4192,14 @@ void __check_heap_object(const void *ptr, unsigned long n, #endif /* CONFIG_HARDENED_USERCOPY */
/** - * __ksize -- Uninstrumented ksize. + * __ksize -- Report full size of underlying allocation * @objp: pointer to the object * - * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same - * safety checks as ksize() with KASAN instrumentation enabled. + * This should only be used internally to query the true size of allocations. + * It is not meant to be a way to discover the usable size of an allocation + * after the fact. Instead, use kmalloc_size_roundup(). Using memory beyond + * the originally requested allocation size may trigger KASAN, UBSAN_BOUNDS, + * and/or FORTIFY_SOURCE. * * Return: size of the actual memory used by @objp in bytes */ diff --git a/mm/slab_common.c b/mm/slab_common.c index 457671ace7eb..d7420cf649f8 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -721,6 +721,26 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; }
+size_t kmalloc_size_roundup(size_t size) +{ + struct kmem_cache *c; + + /* Short-circuit the 0 size case. */ + if (unlikely(size == 0)) + return 0; + /* Short-circuit saturated "too-large" case. */ + if (unlikely(size == SIZE_MAX)) + return SIZE_MAX; + /* Above the smaller buckets, size is a multiple of page size. */ + if (size > KMALLOC_MAX_CACHE_SIZE) + return PAGE_SIZE << get_order(size); + + /* The flags don't matter since size_index is common to all. */ + c = kmalloc_slab(size, GFP_KERNEL); + return c ? c->object_size : 0; +} +EXPORT_SYMBOL(kmalloc_size_roundup); + #ifdef CONFIG_ZONE_DMA #define KMALLOC_DMA_NAME(sz) .name[KMALLOC_DMA] = "dma-kmalloc-" #sz, #else
On 9/23/22 22:28, Kees Cook wrote:
In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for.
There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel:
An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example:
data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data);
The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators:
data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested.
- Put skb_shared_info exactly at the end of allocated zone,
- to allow max possible filling before reallocation.
*/ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize);
In both cases, the "how much was actually allocated?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information.
We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?".
Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize().
[1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183
Cc: Vlastimil Babka vbabka@suse.cz Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
OK, added patch 1+2 to slab.git for-next branch. Had to adjust this one a bit, see below.
include/linux/slab.h | 31 +++++++++++++++++++++++++++++++ mm/slab.c | 9 ++++++--- mm/slab_common.c | 20 ++++++++++++++++++++ 3 files changed, 57 insertions(+), 3 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h index 41bd036e7551..727640173568 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __r void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp);
+/**
- ksize - Report actual allocation size of associated object
- @objp: Pointer returned from a prior kmalloc()-family allocation.
- This should not be used for writing beyond the originally requested
- allocation size. Either use krealloc() or round up the allocation size
- with kmalloc_size_roundup() prior to allocation. If this is used to
- access beyond the originally requested allocation size, UBSAN_BOUNDS
- and/or FORTIFY_SOURCE may trip, since they only know about the
- originally allocated size via the __alloc_size attribute.
- */ size_t ksize(const void *objp);
- #ifdef CONFIG_PRINTK bool kmem_valid_obj(void *object); void kmem_dump_obj(void *object);
@@ -779,6 +793,23 @@ extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len); unsigned int kmem_cache_size(struct kmem_cache *s);
+/**
- kmalloc_size_roundup - Report allocation bucket size for the given size
- @size: Number of bytes to round up from.
- This returns the number of bytes that would be available in a kmalloc()
- allocation of @size bytes. For example, a 126 byte request would be
- rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly
- for the general-purpose kmalloc()-based allocations, and is not for the
- pre-sized kmem_cache_alloc()-based allocations.)
- Use this to kmalloc() the full bucket size ahead of time instead of using
- ksize() to query the size after an allocation.
- */
+size_t kmalloc_size_roundup(size_t size);
- void __init kmem_cache_init_late(void);
#if defined(CONFIG_SMP) && defined(CONFIG_SLAB) diff --git a/mm/slab.c b/mm/slab.c index 10e96137b44f..2da862bf6226 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -4192,11 +4192,14 @@ void __check_heap_object(const void *ptr, unsigned long n, #endif /* CONFIG_HARDENED_USERCOPY */ /**
- __ksize -- Uninstrumented ksize.
- __ksize -- Report full size of underlying allocation
- @objp: pointer to the object
- Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
- safety checks as ksize() with KASAN instrumentation enabled.
- This should only be used internally to query the true size of allocations.
- It is not meant to be a way to discover the usable size of an allocation
- after the fact. Instead, use kmalloc_size_roundup(). Using memory beyond
- the originally requested allocation size may trigger KASAN, UBSAN_BOUNDS,
*/
- and/or FORTIFY_SOURCE.
- Return: size of the actual memory used by @objp in bytes
diff --git a/mm/slab_common.c b/mm/slab_common.c index 457671ace7eb..d7420cf649f8 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -721,6 +721,26 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; } +size_t kmalloc_size_roundup(size_t size) +{
- struct kmem_cache *c;
- /* Short-circuit the 0 size case. */
- if (unlikely(size == 0))
return 0;
- /* Short-circuit saturated "too-large" case. */
- if (unlikely(size == SIZE_MAX))
return SIZE_MAX;
- /* Above the smaller buckets, size is a multiple of page size. */
- if (size > KMALLOC_MAX_CACHE_SIZE)
return PAGE_SIZE << get_order(size);
- /* The flags don't matter since size_index is common to all. */
- c = kmalloc_slab(size, GFP_KERNEL);
- return c ? c->object_size : 0;
+} +EXPORT_SYMBOL(kmalloc_size_roundup);
We need a SLOB version too as it's not yet removed... I added this:
diff --git a/mm/slob.c b/mm/slob.c index 2bd4f476c340..5dbdf6ad8bcc 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -574,6 +574,20 @@ void kfree(const void *block) } EXPORT_SYMBOL(kfree);
+size_t kmalloc_size_roundup(size_t size) +{ + /* Short-circuit the 0 size case. */ + if (unlikely(size == 0)) + return 0; + /* Short-circuit saturated "too-large" case. */ + if (unlikely(size == SIZE_MAX)) + return SIZE_MAX; + + return ALIGN(size, ARCH_KMALLOC_MINALIGN); +} + +EXPORT_SYMBOL(kmalloc_size_roundup); + /* can't use ksize for kmem_cache_alloc memory, only kmalloc */ size_t __ksize(const void *block) {
On Mon, Sep 26, 2022 at 03:15:22PM +0200, Vlastimil Babka wrote:
On 9/23/22 22:28, Kees Cook wrote:
In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for.
There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel:
An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example:
data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data);
The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators:
data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested.
- Put skb_shared_info exactly at the end of allocated zone,
- to allow max possible filling before reallocation.
*/ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize);
In both cases, the "how much was actually allocated?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information.
We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?".
Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize().
[1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183
Cc: Vlastimil Babka vbabka@suse.cz Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
OK, added patch 1+2 to slab.git for-next branch. Had to adjust this one a bit, see below.
include/linux/slab.h | 31 +++++++++++++++++++++++++++++++ mm/slab.c | 9 ++++++--- mm/slab_common.c | 20 ++++++++++++++++++++ 3 files changed, 57 insertions(+), 3 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h index 41bd036e7551..727640173568 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __r void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp);
+/**
- ksize - Report actual allocation size of associated object
- @objp: Pointer returned from a prior kmalloc()-family allocation.
- This should not be used for writing beyond the originally requested
- allocation size. Either use krealloc() or round up the allocation size
- with kmalloc_size_roundup() prior to allocation. If this is used to
- access beyond the originally requested allocation size, UBSAN_BOUNDS
- and/or FORTIFY_SOURCE may trip, since they only know about the
- originally allocated size via the __alloc_size attribute.
- */ size_t ksize(const void *objp);
- #ifdef CONFIG_PRINTK bool kmem_valid_obj(void *object); void kmem_dump_obj(void *object);
@@ -779,6 +793,23 @@ extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len); unsigned int kmem_cache_size(struct kmem_cache *s);
+/**
- kmalloc_size_roundup - Report allocation bucket size for the given size
- @size: Number of bytes to round up from.
- This returns the number of bytes that would be available in a kmalloc()
- allocation of @size bytes. For example, a 126 byte request would be
- rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly
- for the general-purpose kmalloc()-based allocations, and is not for the
- pre-sized kmem_cache_alloc()-based allocations.)
- Use this to kmalloc() the full bucket size ahead of time instead of using
- ksize() to query the size after an allocation.
- */
+size_t kmalloc_size_roundup(size_t size);
- void __init kmem_cache_init_late(void); #if defined(CONFIG_SMP) && defined(CONFIG_SLAB)
diff --git a/mm/slab.c b/mm/slab.c index 10e96137b44f..2da862bf6226 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -4192,11 +4192,14 @@ void __check_heap_object(const void *ptr, unsigned long n, #endif /* CONFIG_HARDENED_USERCOPY */ /**
- __ksize -- Uninstrumented ksize.
- __ksize -- Report full size of underlying allocation
- @objp: pointer to the object
- Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
- safety checks as ksize() with KASAN instrumentation enabled.
- This should only be used internally to query the true size of allocations.
- It is not meant to be a way to discover the usable size of an allocation
- after the fact. Instead, use kmalloc_size_roundup(). Using memory beyond
- the originally requested allocation size may trigger KASAN, UBSAN_BOUNDS,
*/
- and/or FORTIFY_SOURCE.
- Return: size of the actual memory used by @objp in bytes
diff --git a/mm/slab_common.c b/mm/slab_common.c index 457671ace7eb..d7420cf649f8 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -721,6 +721,26 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; } +size_t kmalloc_size_roundup(size_t size) +{
- struct kmem_cache *c;
- /* Short-circuit the 0 size case. */
- if (unlikely(size == 0))
return 0;
- /* Short-circuit saturated "too-large" case. */
- if (unlikely(size == SIZE_MAX))
return SIZE_MAX;
- /* Above the smaller buckets, size is a multiple of page size. */
- if (size > KMALLOC_MAX_CACHE_SIZE)
return PAGE_SIZE << get_order(size);
- /* The flags don't matter since size_index is common to all. */
- c = kmalloc_slab(size, GFP_KERNEL);
- return c ? c->object_size : 0;
+} +EXPORT_SYMBOL(kmalloc_size_roundup);
We need a SLOB version too as it's not yet removed... I added this:
diff --git a/mm/slob.c b/mm/slob.c index 2bd4f476c340..5dbdf6ad8bcc 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -574,6 +574,20 @@ void kfree(const void *block) } EXPORT_SYMBOL(kfree); +size_t kmalloc_size_roundup(size_t size) +{
/* Short-circuit the 0 size case. */
if (unlikely(size == 0))
return 0;
/* Short-circuit saturated "too-large" case. */
if (unlikely(size == SIZE_MAX))
return SIZE_MAX;
return ALIGN(size, ARCH_KMALLOC_MINALIGN);
+}
+EXPORT_SYMBOL(kmalloc_size_roundup);
Ah, perfect! Thanks for catching that. :)
FWIW:
Reviewed-by: Kees Cook keescook@chromium.org
On Fri, Sep 23, 2022 at 01:28:08PM -0700, Kees Cook wrote:
In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for.
There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel:
An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example:
data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data);
The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators:
data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested.
- Put skb_shared_info exactly at the end of allocated zone,
- to allow max possible filling before reallocation.
*/ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize);
In both cases, the "how much was actually allocated?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information.
We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?".
Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize().
[1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183
Cc: Vlastimil Babka vbabka@suse.cz Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
include/linux/slab.h | 31 +++++++++++++++++++++++++++++++ mm/slab.c | 9 ++++++--- mm/slab_common.c | 20 ++++++++++++++++++++ 3 files changed, 57 insertions(+), 3 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h index 41bd036e7551..727640173568 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __r void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp);
+/**
- ksize - Report actual allocation size of associated object
- @objp: Pointer returned from a prior kmalloc()-family allocation.
- This should not be used for writing beyond the originally requested
- allocation size. Either use krealloc() or round up the allocation size
- with kmalloc_size_roundup() prior to allocation. If this is used to
- access beyond the originally requested allocation size, UBSAN_BOUNDS
- and/or FORTIFY_SOURCE may trip, since they only know about the
- originally allocated size via the __alloc_size attribute.
- */
size_t ksize(const void *objp);
With this now we have two conflicting kernel-doc comments about ksize in mm/slab_common.c and include/linux/slab.h.
#ifdef CONFIG_PRINTK bool kmem_valid_obj(void *object); void kmem_dump_obj(void *object); @@ -779,6 +793,23 @@ extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len); unsigned int kmem_cache_size(struct kmem_cache *s);
+/**
- kmalloc_size_roundup - Report allocation bucket size for the given size
- @size: Number of bytes to round up from.
- This returns the number of bytes that would be available in a kmalloc()
- allocation of @size bytes. For example, a 126 byte request would be
- rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly
- for the general-purpose kmalloc()-based allocations, and is not for the
- pre-sized kmem_cache_alloc()-based allocations.)
- Use this to kmalloc() the full bucket size ahead of time instead of using
- ksize() to query the size after an allocation.
- */
+size_t kmalloc_size_roundup(size_t size);
void __init kmem_cache_init_late(void); #if defined(CONFIG_SMP) && defined(CONFIG_SLAB) diff --git a/mm/slab.c b/mm/slab.c index 10e96137b44f..2da862bf6226 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -4192,11 +4192,14 @@ void __check_heap_object(const void *ptr, unsigned long n, #endif /* CONFIG_HARDENED_USERCOPY */ /**
- __ksize -- Uninstrumented ksize.
- __ksize -- Report full size of underlying allocation
- @objp: pointer to the object
- Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
- safety checks as ksize() with KASAN instrumentation enabled.
- This should only be used internally to query the true size of allocations.
- It is not meant to be a way to discover the usable size of an allocation
- after the fact. Instead, use kmalloc_size_roundup(). Using memory beyond
- the originally requested allocation size may trigger KASAN, UBSAN_BOUNDS,
*/
- and/or FORTIFY_SOURCE.
- Return: size of the actual memory used by @objp in bytes
diff --git a/mm/slab_common.c b/mm/slab_common.c index 457671ace7eb..d7420cf649f8 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -721,6 +721,26 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; } +size_t kmalloc_size_roundup(size_t size) +{
- struct kmem_cache *c;
- /* Short-circuit the 0 size case. */
- if (unlikely(size == 0))
return 0;
- /* Short-circuit saturated "too-large" case. */
- if (unlikely(size == SIZE_MAX))
return SIZE_MAX;
- /* Above the smaller buckets, size is a multiple of page size. */
- if (size > KMALLOC_MAX_CACHE_SIZE)
return PAGE_SIZE << get_order(size);
- /* The flags don't matter since size_index is common to all. */
- c = kmalloc_slab(size, GFP_KERNEL);
- return c ? c->object_size : 0;
+} +EXPORT_SYMBOL(kmalloc_size_roundup);
#ifdef CONFIG_ZONE_DMA #define KMALLOC_DMA_NAME(sz) .name[KMALLOC_DMA] = "dma-kmalloc-" #sz,
#else
2.34.1
Otherwise looks good!
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
This will allow for kernels built with CONFIG_UBSAN_BOUNDS or the coming dynamic bounds checking under CONFIG_FORTIFY_SOURCE to gain back the __alloc_size() hints that were temporarily reverted in commit 93dd04ab0b2b ("slab: remove __alloc_size attribute from __kmalloc_track_caller")
Additionally tries to normalize size variables to u32 from int. Most interfaces are using "int", but notably __alloc_skb uses unsigned int.
Also fix some reverse Christmas tree and comments while touching nearby code.
Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: David Rientjes rientjes@google.com Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Kees Cook keescook@chromium.org --- include/linux/skbuff.h | 5 +--- net/core/skbuff.c | 64 +++++++++++++++++++++--------------------- 2 files changed, 33 insertions(+), 36 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ca8afa382bf2..5a16177f38b5 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1234,7 +1234,7 @@ void kfree_skb_partial(struct sk_buff *skb, bool head_stolen); bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, bool *fragstolen, int *delta_truesize);
-struct sk_buff *__alloc_skb(unsigned int size, gfp_t priority, int flags, +struct sk_buff *__alloc_skb(unsigned int bytes, gfp_t priority, int flags, int node); struct sk_buff *__build_skb(void *data, unsigned int frag_size); struct sk_buff *build_skb(void *data, unsigned int frag_size); @@ -1870,9 +1870,6 @@ static inline int skb_unclone(struct sk_buff *skb, gfp_t pri)
/* This variant of skb_unclone() makes sure skb->truesize * and skb_end_offset() are not changed, whenever a new skb->head is needed. - * - * Indeed there is no guarantee that ksize(kmalloc(X)) == ksize(kmalloc(X)) - * when various debugging features are in place. */ int __skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri); static inline int skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 974bbbbe7138..0b30fbdbd0d0 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -343,19 +343,23 @@ EXPORT_SYMBOL(napi_build_skb); * the caller if emergency pfmemalloc reserves are being used. If it is and * the socket is later found to be SOCK_MEMALLOC then PFMEMALLOC reserves * may be used. Otherwise, the packet data may be discarded until enough - * memory is free + * memory is free. */ -static void *kmalloc_reserve(size_t size, gfp_t flags, int node, +static void *kmalloc_reserve(u32 *size, gfp_t flags, int node, bool *pfmemalloc) { void *obj; bool ret_pfmemalloc = false;
+ /* kmalloc(size) might give us more room than requested, so + * allocate the true bucket size up front. + */ + *size = kmalloc_size_roundup(*size); /* * Try a regular allocation, when that fails and we're not entitled * to the reserves, fail. */ - obj = kmalloc_node_track_caller(size, + obj = kmalloc_node_track_caller(*size, flags | __GFP_NOMEMALLOC | __GFP_NOWARN, node); if (obj || !(gfp_pfmemalloc_allowed(flags))) @@ -363,7 +367,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
/* Try again but now we are using pfmemalloc reserves */ ret_pfmemalloc = true; - obj = kmalloc_node_track_caller(size, flags, node); + obj = kmalloc_node_track_caller(*size, flags, node);
out: if (pfmemalloc) @@ -380,7 +384,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
/** * __alloc_skb - allocate a network buffer - * @size: size to allocate + * @bytes: minimum bytes to allocate * @gfp_mask: allocation mask * @flags: If SKB_ALLOC_FCLONE is set, allocate from fclone cache * instead of head cache and allocate a cloned (child) skb. @@ -395,12 +399,12 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node, * Buffers may only be allocated from interrupts using a @gfp_mask of * %GFP_ATOMIC. */ -struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, +struct sk_buff *__alloc_skb(unsigned int bytes, gfp_t gfp_mask, int flags, int node) { struct kmem_cache *cache; struct sk_buff *skb; - unsigned int osize; + u32 size = bytes; bool pfmemalloc; u8 *data;
@@ -427,15 +431,13 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, */ size = SKB_DATA_ALIGN(size); size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); - if (unlikely(!data)) - goto nodata; - /* kmalloc(size) might give us more room than requested. - * Put skb_shared_info exactly at the end of allocated zone, + /* Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ - osize = ksize(data); - size = SKB_WITH_OVERHEAD(osize); + data = kmalloc_reserve(&size, gfp_mask, node, &pfmemalloc); + if (unlikely(!data)) + goto nodata; + size = SKB_WITH_OVERHEAD(size); prefetchw(data + size);
/* @@ -444,7 +446,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, * the tail pointer in struct sk_buff! */ memset(skb, 0, offsetof(struct sk_buff, tail)); - __build_skb_around(skb, data, osize); + __build_skb_around(skb, data, size); skb->pfmemalloc = pfmemalloc;
if (flags & SKB_ALLOC_FCLONE) { @@ -1708,7 +1710,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, gfp_t gfp_mask) { int i, osize = skb_end_offset(skb); - int size = osize + nhead + ntail; + u32 size = osize + nhead + ntail; long off; u8 *data;
@@ -1722,11 +1724,11 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), - gfp_mask, NUMA_NO_NODE, NULL); + size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) goto nodata; - size = SKB_WITH_OVERHEAD(ksize(data)); + size = SKB_WITH_OVERHEAD(size);
/* Copy only real data... and, alas, header. This should be * optimized for the cases when header is void. @@ -6060,22 +6062,21 @@ EXPORT_SYMBOL(alloc_skb_with_frags); static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off, const int headlen, gfp_t gfp_mask) { - int i; - int size = skb_end_offset(skb); + u32 size = skb_end_offset(skb); int new_hlen = headlen - off; u8 *data; + int i;
size = SKB_DATA_ALIGN(size);
if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - data = kmalloc_reserve(size + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), - gfp_mask, NUMA_NO_NODE, NULL); + size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) return -ENOMEM;
- size = SKB_WITH_OVERHEAD(ksize(data)); + size = SKB_WITH_OVERHEAD(size);
/* Copy real data, and all frags */ skb_copy_from_linear_data_offset(skb, off, data, new_hlen); @@ -6179,23 +6180,22 @@ static int pskb_carve_frag_list(struct sk_buff *skb, static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off, int pos, gfp_t gfp_mask) { - int i, k = 0; - int size = skb_end_offset(skb); - u8 *data; const int nfrags = skb_shinfo(skb)->nr_frags; struct skb_shared_info *shinfo; + u32 size = skb_end_offset(skb); + int i, k = 0; + u8 *data;
size = SKB_DATA_ALIGN(size);
if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - data = kmalloc_reserve(size + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), - gfp_mask, NUMA_NO_NODE, NULL); + size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) return -ENOMEM;
- size = SKB_WITH_OVERHEAD(ksize(data)); + size = SKB_WITH_OVERHEAD(size);
memcpy((struct skb_shared_info *)(data + size), skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
On Fri, Sep 23, 2022 at 01:28:09PM -0700, Kees Cook wrote:
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
This will allow for kernels built with CONFIG_UBSAN_BOUNDS or the coming dynamic bounds checking under CONFIG_FORTIFY_SOURCE to gain back the __alloc_size() hints that were temporarily reverted in commit 93dd04ab0b2b ("slab: remove __alloc_size attribute from __kmalloc_track_caller")
Additionally tries to normalize size variables to u32 from int. Most interfaces are using "int", but notably __alloc_skb uses unsigned int.
Also fix some reverse Christmas tree and comments while touching nearby code.
Something in this patch is breaking things -- I've refactored it again to avoid overwriting the incoming size argument, and instead add a dedicated outgoing size variable. Here's what will be v3 ...
--- net/core/skbuff.c | 41 ++++++++++++++++++++++------------------- 1 file changed, 22 insertions(+), 19 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 974bbbbe7138..9b5a9fb69d9d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -346,11 +346,12 @@ EXPORT_SYMBOL(napi_build_skb); * memory is free */ static void *kmalloc_reserve(size_t size, gfp_t flags, int node, - bool *pfmemalloc) + bool *pfmemalloc, size_t *alloc_size) { void *obj; bool ret_pfmemalloc = false;
+ size = kmalloc_size_roundup(size); /* * Try a regular allocation, when that fails and we're not entitled * to the reserves, fail. @@ -369,6 +370,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node, if (pfmemalloc) *pfmemalloc = ret_pfmemalloc;
+ *alloc_size = size; return obj; }
@@ -400,7 +402,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, { struct kmem_cache *cache; struct sk_buff *skb; - unsigned int osize; + size_t alloc_size; bool pfmemalloc; u8 *data;
@@ -427,15 +429,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, */ size = SKB_DATA_ALIGN(size); size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); - if (unlikely(!data)) - goto nodata; - /* kmalloc(size) might give us more room than requested. + /* kmalloc(size) might give us more room than requested, so + * allocate the true bucket size up front. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ - osize = ksize(data); - size = SKB_WITH_OVERHEAD(osize); + data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc, &alloc_size); + if (unlikely(!data)) + goto nodata; + size = SKB_WITH_OVERHEAD(alloc_size); prefetchw(data + size);
/* @@ -444,7 +446,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, * the tail pointer in struct sk_buff! */ memset(skb, 0, offsetof(struct sk_buff, tail)); - __build_skb_around(skb, data, osize); + __build_skb_around(skb, data, alloc_size); skb->pfmemalloc = pfmemalloc;
if (flags & SKB_ALLOC_FCLONE) { @@ -1709,6 +1711,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, { int i, osize = skb_end_offset(skb); int size = osize + nhead + ntail; + size_t alloc_size; long off; u8 *data;
@@ -1723,10 +1726,10 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), - gfp_mask, NUMA_NO_NODE, NULL); + gfp_mask, NUMA_NO_NODE, NULL, &alloc_size); if (!data) goto nodata; - size = SKB_WITH_OVERHEAD(ksize(data)); + size = SKB_WITH_OVERHEAD(alloc_size);
/* Copy only real data... and, alas, header. This should be * optimized for the cases when header is void. @@ -6063,19 +6066,19 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off, int i; int size = skb_end_offset(skb); int new_hlen = headlen - off; + size_t alloc_size; u8 *data;
size = SKB_DATA_ALIGN(size);
if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - data = kmalloc_reserve(size + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), - gfp_mask, NUMA_NO_NODE, NULL); + data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), + gfp_mask, NUMA_NO_NODE, NULL, &alloc_size); if (!data) return -ENOMEM;
- size = SKB_WITH_OVERHEAD(ksize(data)); + size = SKB_WITH_OVERHEAD(alloc_size);
/* Copy real data, and all frags */ skb_copy_from_linear_data_offset(skb, off, data, new_hlen); @@ -6184,18 +6187,18 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off, u8 *data; const int nfrags = skb_shinfo(skb)->nr_frags; struct skb_shared_info *shinfo; + size_t alloc_size;
size = SKB_DATA_ALIGN(size);
if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - data = kmalloc_reserve(size + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), - gfp_mask, NUMA_NO_NODE, NULL); + data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), + gfp_mask, NUMA_NO_NODE, NULL, &alloc_size); if (!data) return -ENOMEM;
- size = SKB_WITH_OVERHEAD(ksize(data)); + size = SKB_WITH_OVERHEAD(alloc_size);
memcpy((struct skb_shared_info *)(data + size), skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
All callers of APIs that allowed a 0-sized frag_size appear to be passing actual size information already, so this use of ksize() can be removed. However, just in case there is something still depending on this behavior, issue a WARN and fall back to as before to ksize() which means we'll also potentially get KASAN warnings.
Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- net/core/skbuff.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 0b30fbdbd0d0..84ca89c781cd 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -195,7 +195,11 @@ static void __build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size) { struct skb_shared_info *shinfo; - unsigned int size = frag_size ? : ksize(data); + unsigned int size = frag_size; + + /* All callers should be setting frag size now? */ + if (WARN_ON_ONCE(size == 0)) + size = ksize(data);
size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
@@ -220,12 +224,10 @@ static void __build_skb_around(struct sk_buff *skb, void *data, /** * __build_skb - build a network buffer * @data: data buffer provided by caller - * @frag_size: size of data, or 0 if head was kmalloced + * @frag_size: size of data * * Allocate a new &sk_buff. Caller provides space holding head and - * skb_shared_info. @data must have been allocated by kmalloc() only if - * @frag_size is 0, otherwise data should come from the page allocator - * or vmalloc() + * skb_shared_info. * The return is the new skb buffer. * On a failure the return is %NULL, and @data is not freed. * Notes : @@ -272,7 +274,7 @@ EXPORT_SYMBOL(build_skb); * build_skb_around - build a network buffer around provided skb * @skb: sk_buff provide by caller, must be memset cleared * @data: data buffer provided by caller - * @frag_size: size of data, or 0 if head was kmalloced + * @frag_size: size of data */ struct sk_buff *build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size) @@ -294,7 +296,7 @@ EXPORT_SYMBOL(build_skb_around); /** * __napi_build_skb - build a network buffer * @data: data buffer provided by caller - * @frag_size: size of data, or 0 if head was kmalloced + * @frag_size: size of data * * Version of __build_skb() that uses NAPI percpu caches to obtain * skbuff_head instead of inplace allocation. @@ -318,7 +320,7 @@ static struct sk_buff *__napi_build_skb(void *data, unsigned int frag_size) /** * napi_build_skb - build a network buffer * @data: data buffer provided by caller - * @frag_size: size of data, or 0 if head was kmalloced + * @frag_size: size of data * * Version of __napi_build_skb() that takes care of skb->head_frag * and skb->pfmemalloc when the data is a page or page fragment.
On Fri, 2022-09-23 at 13:28 -0700, Kees Cook wrote:
All callers of APIs that allowed a 0-sized frag_size appear to be passing actual size information already
AFAICS, not yet:
drivers/net/ethernet/qlogic/qed/qed_ll2.c: skb = build_skb(buffer->data, 0); // -> __build_skb(..., 0) // -> __build_skb_around()
drivers/net/ethernet/broadcom/bnx2.c: skb = build_skb(data, 0);
I guess some more drivers have calls leading to
__build_skb_around(..., 0)
there are several call path to checks...
, so this use of ksize() can be removed. However, just in case there is something still depending on this behavior, issue a WARN and fall back to as before to ksize() which means we'll also potentially get KASAN warnings.
Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org
net/core/skbuff.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 0b30fbdbd0d0..84ca89c781cd 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -195,7 +195,11 @@ static void __build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size) { struct skb_shared_info *shinfo;
- unsigned int size = frag_size ? : ksize(data);
- unsigned int size = frag_size;
- /* All callers should be setting frag size now? */
- if (WARN_ON_ONCE(size == 0))
size = ksize(data);
At some point in the future, I guess we could even drop this check, right?
Thanks!
Paolo
On Sun, Sep 25, 2022 at 09:17:40AM +0200, Paolo Abeni wrote:
On Fri, 2022-09-23 at 13:28 -0700, Kees Cook wrote:
All callers of APIs that allowed a 0-sized frag_size appear to be passing actual size information already
AFAICS, not yet:
drivers/net/ethernet/qlogic/qed/qed_ll2.c: skb = build_skb(buffer->data, 0); // -> __build_skb(..., 0) // -> __build_skb_around()
drivers/net/ethernet/broadcom/bnx2.c: skb = build_skb(data, 0);
I guess some more drivers have calls leading to
__build_skb_around(..., 0)
there are several call path to checks...
Ah-ha! Thank you. I will try to hunt these down -- I think we can't remove the "secret resizing" effect of ksize() without fixing these.
[...] diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 0b30fbdbd0d0..84ca89c781cd 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -195,7 +195,11 @@ static void __build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size) { struct skb_shared_info *shinfo;
- unsigned int size = frag_size ? : ksize(data);
- unsigned int size = frag_size;
- /* All callers should be setting frag size now? */
- if (WARN_ON_ONCE(size == 0))
size = ksize(data);
At some point in the future, I guess we could even drop this check, right?
Alternatively, we might be able to ask the slab if "data" came from kmalloc or a kmem_cache, and if the former, do:
data = krealloc(kmalloc_size_roundup(ksize(data), ...)
But that seems ugly...
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Reviewed-by: Alex Elder elder@linaro.org Link: https://lore.kernel.org/lkml/4d75a9fd-1b94-7208-9de8-5a0102223e68@ieee.org Signed-off-by: Kees Cook keescook@chromium.org --- drivers/net/ipa/gsi_trans.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ipa/gsi_trans.c b/drivers/net/ipa/gsi_trans.c index 18e7e8c405be..eeec149b5d89 100644 --- a/drivers/net/ipa/gsi_trans.c +++ b/drivers/net/ipa/gsi_trans.c @@ -88,6 +88,7 @@ struct gsi_tre { int gsi_trans_pool_init(struct gsi_trans_pool *pool, size_t size, u32 count, u32 max_alloc) { + size_t alloc_size; void *virt;
if (!size) @@ -104,13 +105,15 @@ int gsi_trans_pool_init(struct gsi_trans_pool *pool, size_t size, u32 count, * If there aren't enough entries starting at the free index, * we just allocate free entries from the beginning of the pool. */ - virt = kcalloc(count + max_alloc - 1, size, GFP_KERNEL); + alloc_size = size_mul(count + max_alloc - 1, size); + alloc_size = kmalloc_size_roundup(alloc_size); + virt = kzalloc(alloc_size, GFP_KERNEL); if (!virt) return -ENOMEM;
pool->base = virt; /* If the allocator gave us any extra memory, use it */ - pool->count = ksize(pool->base) / size; + pool->count = alloc_size / size; pool->free = 0; pool->max_alloc = max_alloc; pool->size = size;
In preparation for removing the "silently change allocation size" users of ksize(), explicitly round up all q_vector allocations so that allocations can be correctly compared to ksize().
Additionally fix potential use-after-free in the case of new allocation failure: only free memory if the replacement allocation succeeds.
Cc: Jesse Brandeburg jesse.brandeburg@intel.com Cc: Tony Nguyen anthony.l.nguyen@intel.com Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- drivers/net/ethernet/intel/igb/igb_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 2796e81d2726..eb51e531c096 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -1195,15 +1195,16 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter, return -ENOMEM;
ring_count = txr_count + rxr_count; - size = struct_size(q_vector, ring, ring_count); + size = kmalloc_size_roundup(struct_size(q_vector, ring, ring_count));
/* allocate q_vector and rings */ q_vector = adapter->q_vector[v_idx]; if (!q_vector) { q_vector = kzalloc(size, GFP_KERNEL); } else if (size > ksize(q_vector)) { - kfree_rcu(q_vector, rcu); q_vector = kzalloc(size, GFP_KERNEL); + if (q_vector) + kfree_rcu(q_vector, rcu); } else { memset(q_vector, 0, size); }
-----Original Message----- From: Kees Cook keescook@chromium.org Sent: Friday, September 23, 2022 4:28 PM To: Vlastimil Babka vbabka@suse.cz Cc: Kees Cook keescook@chromium.org; Brandeburg, Jesse jesse.brandeburg@intel.com; Nguyen, Anthony L anthony.l.nguyen@intel.com; David S. Miller davem@davemloft.net; Eric Dumazet edumazet@google.com; Jakub Kicinski kuba@kernel.org; Paolo Abeni pabeni@redhat.com; intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; Ruhl, Michael J michael.j.ruhl@intel.com; Hyeonggon Yoo 42.hyeyoo@gmail.com; Christoph Lameter cl@linux.com; Pekka Enberg penberg@kernel.org; David Rientjes rientjes@google.com; Joonsoo Kim iamjoonsoo.kim@lge.com; Andrew Morton akpm@linux-foundation.org; Greg Kroah-Hartman gregkh@linuxfoundation.org; Nick Desaulniers ndesaulniers@google.com; Alex Elder elder@kernel.org; Josef Bacik josef@toxicpanda.com; David Sterba dsterba@suse.com; Sumit Semwal sumit.semwal@linaro.org; Christian König christian.koenig@amd.com; Daniel Micay danielmicay@gmail.com; Yonghong Song yhs@fb.com; Marco Elver elver@google.com; Miguel Ojeda ojeda@kernel.org; linux- kernel@vger.kernel.org; linux-mm@kvack.org; linux-btrfs@vger.kernel.org; linux-media@vger.kernel.org; dri-devel@lists.freedesktop.org; linaro-mm- sig@lists.linaro.org; linux-fsdevel@vger.kernel.org; dev@openvswitch.org; x86@kernel.org; llvm@lists.linux.dev; linux-hardening@vger.kernel.org Subject: [PATCH v2 06/16] igb: Proactively round up to kmalloc bucket size
In preparation for removing the "silently change allocation size" users of ksize(), explicitly round up all q_vector allocations so that allocations can be correctly compared to ksize().
Additionally fix potential use-after-free in the case of new allocation failure: only free memory if the replacement allocation succeeds.
Cc: Jesse Brandeburg jesse.brandeburg@intel.com Cc: Tony Nguyen anthony.l.nguyen@intel.com Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org
drivers/net/ethernet/intel/igb/igb_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 2796e81d2726..eb51e531c096 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -1195,15 +1195,16 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter, return -ENOMEM;
ring_count = txr_count + rxr_count;
- size = struct_size(q_vector, ring, ring_count);
- size = kmalloc_size_roundup(struct_size(q_vector, ring, ring_count));
This looks good to me...
/* allocate q_vector and rings */ q_vector = adapter->q_vector[v_idx]; if (!q_vector) { q_vector = kzalloc(size, GFP_KERNEL); } else if (size > ksize(q_vector)) {
q_vector = kzalloc(size, GFP_KERNEL);kfree_rcu(q_vector, rcu);
if (q_vector)
kfree_rcu(q_vector, rcu);
Even though this is in the ksize part, this seems like an unrelated change? Should this be in a different patch?
Also, the kfree_rcu will free q_vector after the RCU grace period?
Is that what you want to do?
How does rcu distinguish between the original q_vector, and the newly kzalloced one?
Thanks,
Mike
} else { memset(q_vector, 0, size); } -- 2.34.1
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
Cc: Chris Mason clm@fb.com Cc: Josef Bacik josef@toxicpanda.com Cc: linux-btrfs@vger.kernel.org Acked-by: David Sterba dsterba@suse.com Link: https://lore.kernel.org/lkml/20220922133014.GI32411@suse.cz Signed-off-by: Kees Cook keescook@chromium.org --- fs/btrfs/send.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index e7671afcee4f..d40d65598e8f 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -435,6 +435,11 @@ static int fs_path_ensure_buf(struct fs_path *p, int len) path_len = p->end - p->start; old_buf_len = p->buf_len;
+ /* + * Allocate to the next largest kmalloc bucket size, to let + * the fast path happen most of the time. + */ + len = kmalloc_size_roundup(len); /* * First time the inline_buf does not suffice */ @@ -448,11 +453,7 @@ static int fs_path_ensure_buf(struct fs_path *p, int len) if (!tmp_buf) return -ENOMEM; p->buf = tmp_buf; - /* - * The real size of the buffer is bigger, this will let the fast path - * happen most of the time - */ - p->buf_len = ksize(p->buf); + p->buf_len = len;
if (p->reversed) { tmp_buf = p->buf + old_buf_len - path_len - 1;
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Kees Cook keescook@chromium.org --- drivers/dma-buf/dma-resv.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 205acb2c744d..5b0a4b8830ff 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -98,12 +98,17 @@ static void dma_resv_list_set(struct dma_resv_list *list, static struct dma_resv_list *dma_resv_list_alloc(unsigned int max_fences) { struct dma_resv_list *list; + size_t size;
- list = kmalloc(struct_size(list, table, max_fences), GFP_KERNEL); + /* Round up to the next kmalloc bucket size. */ + size = kmalloc_size_roundup(struct_size(list, table, max_fences)); + + list = kmalloc(size, GFP_KERNEL); if (!list) return NULL;
- list->max_fences = (ksize(list) - offsetof(typeof(*list), table)) / + /* Given the resulting bucket size, recalculated max_fences. */ + list->max_fences = (size - offsetof(typeof(*list), table)) / sizeof(*list->table);
return list;
Am 23.09.22 um 22:28 schrieb Kees Cook:
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Kees Cook keescook@chromium.org
Reviewed-by: Christian König christian.koenig@amd.com
drivers/dma-buf/dma-resv.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 205acb2c744d..5b0a4b8830ff 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -98,12 +98,17 @@ static void dma_resv_list_set(struct dma_resv_list *list, static struct dma_resv_list *dma_resv_list_alloc(unsigned int max_fences) { struct dma_resv_list *list;
- size_t size;
- list = kmalloc(struct_size(list, table, max_fences), GFP_KERNEL);
- /* Round up to the next kmalloc bucket size. */
- size = kmalloc_size_roundup(struct_size(list, table, max_fences));
- list = kmalloc(size, GFP_KERNEL); if (!list) return NULL;
- list->max_fences = (ksize(list) - offsetof(typeof(*list), table)) /
- /* Given the resulting bucket size, recalculated max_fences. */
- list->max_fences = (size - offsetof(typeof(*list), table)) / sizeof(*list->table);
return list;
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- fs/coredump.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c index 9f4aae202109..0894b2c35d98 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -68,7 +68,10 @@ struct core_name {
static int expand_corename(struct core_name *cn, int size) { - char *corename = krealloc(cn->corename, size, GFP_KERNEL); + char *corename; + + size = kmalloc_size_roundup(size); + corename = krealloc(cn->corename, size, GFP_KERNEL);
if (!corename) return -ENOMEM; @@ -76,7 +79,7 @@ static int expand_corename(struct core_name *cn, int size) if (size > core_name_size) /* racy but harmless */ core_name_size = size;
- cn->size = ksize(corename); + cn->size = size; cn->corename = corename; return 0; }
Round up allocations with kmalloc_size_roundup() so that openvswitch's use of ksize() is always accurate and no special handling of the memory is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.
Cc: Pravin B Shelar pshelar@ovn.org Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Cc: dev@openvswitch.org Signed-off-by: Kees Cook keescook@chromium.org --- net/openvswitch/flow_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 4c09cf8a0ab2..6621873abde2 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -2309,7 +2309,7 @@ static struct sw_flow_actions *nla_alloc_flow_actions(int size)
WARN_ON_ONCE(size > MAX_ACTIONS_BUFSIZE);
- sfa = kmalloc(sizeof(*sfa) + size, GFP_KERNEL); + sfa = kmalloc(kmalloc_size_roundup(sizeof(*sfa) + size), GFP_KERNEL); if (!sfa) return ERR_PTR(-ENOMEM);
Round up allocations with kmalloc_size_roundup() so that the verifier's use of ksize() is always accurate and no special handling of the memory is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE. Pass the new size information back up to callers so they can use the space immediately, so array resizing to happen less frequently as well. Explicitly zero any trailing bytes in new allocations.
Additionally fix a memory allocation leak: if krealloc() fails, "arr" wasn't freed, but NULL was return to the caller of realloc_array() would be writing NULL to the lvalue, losing the reference to the original memory.
Cc: Alexei Starovoitov ast@kernel.org Cc: Daniel Borkmann daniel@iogearbox.net Cc: John Fastabend john.fastabend@gmail.com Cc: Andrii Nakryiko andrii@kernel.org Cc: Martin KaFai Lau martin.lau@linux.dev Cc: Song Liu song@kernel.org Cc: Yonghong Song yhs@fb.com Cc: KP Singh kpsingh@kernel.org Cc: Stanislav Fomichev sdf@google.com Cc: Hao Luo haoluo@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: bpf@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- kernel/bpf/verifier.c | 49 +++++++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 18 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 096fdac70165..80531f8f0d36 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -978,42 +978,53 @@ static void print_insn_state(struct bpf_verifier_env *env, */ static void *copy_array(void *dst, const void *src, size_t n, size_t size, gfp_t flags) { - size_t bytes; + size_t src_bytes, dst_bytes;
if (ZERO_OR_NULL_PTR(src)) goto out;
- if (unlikely(check_mul_overflow(n, size, &bytes))) + if (unlikely(check_mul_overflow(n, size, &src_bytes))) return NULL;
- if (ksize(dst) < bytes) { + dst_bytes = kmalloc_size_roundup(src_bytes); + if (ksize(dst) < dst_bytes) { kfree(dst); - dst = kmalloc_track_caller(bytes, flags); + dst = kmalloc_track_caller(dst_bytes, flags); if (!dst) return NULL; }
- memcpy(dst, src, bytes); + memcpy(dst, src, src_bytes); + memset(dst + src_bytes, 0, dst_bytes - src_bytes); out: return dst ? dst : ZERO_SIZE_PTR; }
-/* resize an array from old_n items to new_n items. the array is reallocated if it's too - * small to hold new_n items. new items are zeroed out if the array grows. +/* Resize an array from old_n items to *new_n items. The array is reallocated if it's too + * small to hold *new_n items. New items are zeroed out if the array grows. Allocation + * is rounded up to next kmalloc bucket size to reduce frequency of resizing. *new_n + * contains the new total number of items that will fit. * - * Contrary to krealloc_array, does not free arr if new_n is zero. + * Contrary to krealloc, does not free arr if new_n is zero. */ -static void *realloc_array(void *arr, size_t old_n, size_t new_n, size_t size) +static void *realloc_array(void *arr, size_t old_n, size_t *new_n, size_t size) { - if (!new_n || old_n == new_n) + void *old_arr = arr; + size_t alloc_size; + + if (!new_n || !*new_n || old_n == *new_n) goto out;
- arr = krealloc_array(arr, new_n, size, GFP_KERNEL); - if (!arr) + alloc_size = kmalloc_size_roundup(size_mul(*new_n, size)); + arr = krealloc(old_arr, alloc_size, GFP_KERNEL); + if (!arr) { + kfree(old_arr); return NULL; + }
- if (new_n > old_n) - memset(arr + old_n * size, 0, (new_n - old_n) * size); + *new_n = alloc_size / size; + if (*new_n > old_n) + memset(arr + old_n * size, 0, (*new_n - old_n) * size);
out: return arr ? arr : ZERO_SIZE_PTR; @@ -1045,7 +1056,7 @@ static int copy_stack_state(struct bpf_func_state *dst, const struct bpf_func_st
static int resize_reference_state(struct bpf_func_state *state, size_t n) { - state->refs = realloc_array(state->refs, state->acquired_refs, n, + state->refs = realloc_array(state->refs, state->acquired_refs, &n, sizeof(struct bpf_reference_state)); if (!state->refs) return -ENOMEM; @@ -1061,11 +1072,11 @@ static int grow_stack_state(struct bpf_func_state *state, int size) if (old_n >= n) return 0;
- state->stack = realloc_array(state->stack, old_n, n, sizeof(struct bpf_stack_state)); + state->stack = realloc_array(state->stack, old_n, &n, sizeof(struct bpf_stack_state)); if (!state->stack) return -ENOMEM;
- state->allocated_stack = size; + state->allocated_stack = n * BPF_REG_SIZE; return 0; }
@@ -2472,9 +2483,11 @@ static int push_jmp_history(struct bpf_verifier_env *env, { u32 cnt = cur->jmp_history_cnt; struct bpf_idx_pair *p; + size_t size;
cnt++; - p = krealloc(cur->jmp_history, cnt * sizeof(*p), GFP_USER); + size = kmalloc_size_roundup(size_mul(cnt, sizeof(*p))); + p = krealloc(cur->jmp_history, size, GFP_USER); if (!p) return -ENOMEM; p[cnt - 1].idx = env->insn_idx;
Round up allocations with kmalloc_size_roundup() so that devres's use of ksize() is always accurate and no special handling of the memory is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: "Rafael J. Wysocki" rafael@kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- drivers/base/devres.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/base/devres.c b/drivers/base/devres.c index 864d0b3f566e..7db20ce7ea8a 100644 --- a/drivers/base/devres.c +++ b/drivers/base/devres.c @@ -101,6 +101,9 @@ static bool check_dr_size(size_t size, size_t *tot_size) size, tot_size))) return false;
+ /* Actually allocate the full kmalloc bucket size. */ + *tot_size = kmalloc_size_roundup(*tot_size); + return true; }
Round up allocations with kmalloc_size_roundup() so that mempool's use of ksize() is always accurate and no special handling of the memory is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.
Cc: Andrew Morton akpm@linux-foundation.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org --- mm/mempool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/mempool.c b/mm/mempool.c index 96488b13a1ef..0f3107b28e6b 100644 --- a/mm/mempool.c +++ b/mm/mempool.c @@ -526,7 +526,7 @@ EXPORT_SYMBOL(mempool_free_slab); */ void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data) { - size_t size = (size_t)pool_data; + size_t size = kmalloc_size_roundup((size_t)pool_data); return kmalloc(size, gfp_mask); } EXPORT_SYMBOL(mempool_kmalloc);
On 9/23/22 22:28, Kees Cook wrote:
Round up allocations with kmalloc_size_roundup() so that mempool's use of ksize() is always accurate and no special handling of the memory is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.
Cc: Andrew Morton akpm@linux-foundation.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
mm/mempool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/mempool.c b/mm/mempool.c index 96488b13a1ef..0f3107b28e6b 100644 --- a/mm/mempool.c +++ b/mm/mempool.c @@ -526,7 +526,7 @@ EXPORT_SYMBOL(mempool_free_slab); */ void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data) {
- size_t size = (size_t)pool_data;
- size_t size = kmalloc_size_roundup((size_t)pool_data);
Hm it is kinda wasteful to call into kmalloc_size_roundup for every allocation that has the same input. We could do it just once in mempool_init_node() for adjusting pool->pool_data ?
But looking more closely, I wonder why poison_element() and kasan_unpoison_element() in mm/mempool.c even have to use ksize()/__ksize() and not just operate on the requested size (again, pool->pool_data). If no kmalloc mempool's users use ksize() to write beyond requested size, then we don't have to unpoison/poison that area either?
return kmalloc(size, gfp_mask); } EXPORT_SYMBOL(mempool_kmalloc);
On Mon, Sep 26, 2022 at 03:50:43PM +0200, Vlastimil Babka wrote:
On 9/23/22 22:28, Kees Cook wrote:
Round up allocations with kmalloc_size_roundup() so that mempool's use of ksize() is always accurate and no special handling of the memory is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.
Cc: Andrew Morton akpm@linux-foundation.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
mm/mempool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/mempool.c b/mm/mempool.c index 96488b13a1ef..0f3107b28e6b 100644 --- a/mm/mempool.c +++ b/mm/mempool.c @@ -526,7 +526,7 @@ EXPORT_SYMBOL(mempool_free_slab); */ void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data) {
- size_t size = (size_t)pool_data;
- size_t size = kmalloc_size_roundup((size_t)pool_data);
Hm it is kinda wasteful to call into kmalloc_size_roundup for every allocation that has the same input. We could do it just once in mempool_init_node() for adjusting pool->pool_data ?
But looking more closely, I wonder why poison_element() and kasan_unpoison_element() in mm/mempool.c even have to use ksize()/__ksize() and not just operate on the requested size (again, pool->pool_data). If no kmalloc mempool's users use ksize() to write beyond requested size, then we don't have to unpoison/poison that area either?
Yeah, I think that's a fair point. I will adjust this.
In preparation for no longer unpoisoning in ksize(), remove the behavioral self-tests for ksize().
Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Vincenzo Frascino vincenzo.frascino@arm.com Cc: Andrew Morton akpm@linux-foundation.org Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org --- lib/test_kasan.c | 42 ------------------------------------------ mm/kasan/shadow.c | 4 +--- 2 files changed, 1 insertion(+), 45 deletions(-)
diff --git a/lib/test_kasan.c b/lib/test_kasan.c index 58c1b01ccfe2..bdd0ced8f8d7 100644 --- a/lib/test_kasan.c +++ b/lib/test_kasan.c @@ -753,46 +753,6 @@ static void kasan_global_oob_left(struct kunit *test) KUNIT_EXPECT_KASAN_FAIL(test, *(volatile char *)p); }
-/* Check that ksize() makes the whole object accessible. */ -static void ksize_unpoisons_memory(struct kunit *test) -{ - char *ptr; - size_t size = 123, real_size; - - ptr = kmalloc(size, GFP_KERNEL); - KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr); - real_size = ksize(ptr); - - OPTIMIZER_HIDE_VAR(ptr); - - /* This access shouldn't trigger a KASAN report. */ - ptr[size] = 'x'; - - /* This one must. */ - KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[real_size]); - - kfree(ptr); -} - -/* - * Check that a use-after-free is detected by ksize() and via normal accesses - * after it. - */ -static void ksize_uaf(struct kunit *test) -{ - char *ptr; - int size = 128 - KASAN_GRANULE_SIZE; - - ptr = kmalloc(size, GFP_KERNEL); - KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr); - kfree(ptr); - - OPTIMIZER_HIDE_VAR(ptr); - KUNIT_EXPECT_KASAN_FAIL(test, ksize(ptr)); - KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[0]); - KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[size]); -} - static void kasan_stack_oob(struct kunit *test) { char stack_array[10]; @@ -1392,8 +1352,6 @@ static struct kunit_case kasan_kunit_test_cases[] = { KUNIT_CASE(kasan_stack_oob), KUNIT_CASE(kasan_alloca_oob_left), KUNIT_CASE(kasan_alloca_oob_right), - KUNIT_CASE(ksize_unpoisons_memory), - KUNIT_CASE(ksize_uaf), KUNIT_CASE(kmem_cache_double_free), KUNIT_CASE(kmem_cache_invalid_free), KUNIT_CASE(kmem_cache_double_destroy), diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c index 0e3648b603a6..0895c73e9b69 100644 --- a/mm/kasan/shadow.c +++ b/mm/kasan/shadow.c @@ -124,9 +124,7 @@ void kasan_unpoison(const void *addr, size_t size, bool init) addr = kasan_reset_tag(addr);
/* - * Skip KFENCE memory if called explicitly outside of sl*b. Also note - * that calls to ksize(), where size is not a multiple of machine-word - * size, would otherwise poison the invalid portion of the word. + * Skip KFENCE memory if called explicitly outside of sl*b. */ if (is_kfence_address(addr)) return;
On Fri, 23 Sept 2022 at 22:28, Kees Cook keescook@chromium.org wrote:
In preparation for no longer unpoisoning in ksize(), remove the behavioral self-tests for ksize().
Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Vincenzo Frascino vincenzo.frascino@arm.com Cc: Andrew Morton akpm@linux-foundation.org Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org
lib/test_kasan.c | 42 ------------------------------------------ mm/kasan/shadow.c | 4 +--- 2 files changed, 1 insertion(+), 45 deletions(-)
diff --git a/lib/test_kasan.c b/lib/test_kasan.c index 58c1b01ccfe2..bdd0ced8f8d7 100644 --- a/lib/test_kasan.c +++ b/lib/test_kasan.c @@ -753,46 +753,6 @@ static void kasan_global_oob_left(struct kunit *test) KUNIT_EXPECT_KASAN_FAIL(test, *(volatile char *)p); }
-/* Check that ksize() makes the whole object accessible. */ -static void ksize_unpoisons_memory(struct kunit *test) -{
char *ptr;
size_t size = 123, real_size;
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
real_size = ksize(ptr);
OPTIMIZER_HIDE_VAR(ptr);
/* This access shouldn't trigger a KASAN report. */
ptr[size] = 'x';
I would rather keep the tests and update to the new behavior. We had bugs in ksize, we need test coverage. I assume ptr[size] access must now produce an error even after ksize.
/* This one must. */
KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[real_size]);
kfree(ptr);
-}
-/*
- Check that a use-after-free is detected by ksize() and via normal accesses
- after it.
- */
-static void ksize_uaf(struct kunit *test) -{
char *ptr;
int size = 128 - KASAN_GRANULE_SIZE;
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
kfree(ptr);
OPTIMIZER_HIDE_VAR(ptr);
KUNIT_EXPECT_KASAN_FAIL(test, ksize(ptr));
This is still a bug that should be detected, right? Calling ksize on a freed pointer is a bug.
KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[0]);
KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[size]);
-}
static void kasan_stack_oob(struct kunit *test) { char stack_array[10]; @@ -1392,8 +1352,6 @@ static struct kunit_case kasan_kunit_test_cases[] = { KUNIT_CASE(kasan_stack_oob), KUNIT_CASE(kasan_alloca_oob_left), KUNIT_CASE(kasan_alloca_oob_right),
KUNIT_CASE(ksize_unpoisons_memory),
KUNIT_CASE(ksize_uaf), KUNIT_CASE(kmem_cache_double_free), KUNIT_CASE(kmem_cache_invalid_free), KUNIT_CASE(kmem_cache_double_destroy),
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c index 0e3648b603a6..0895c73e9b69 100644 --- a/mm/kasan/shadow.c +++ b/mm/kasan/shadow.c @@ -124,9 +124,7 @@ void kasan_unpoison(const void *addr, size_t size, bool init) addr = kasan_reset_tag(addr);
/*
* Skip KFENCE memory if called explicitly outside of sl*b. Also note
* that calls to ksize(), where size is not a multiple of machine-word
* size, would otherwise poison the invalid portion of the word.
* Skip KFENCE memory if called explicitly outside of sl*b. */ if (is_kfence_address(addr)) return;
-- 2.34.1
On Sat, Sep 24, 2022 at 10:15:18AM +0200, Dmitry Vyukov wrote:
On Fri, 23 Sept 2022 at 22:28, Kees Cook keescook@chromium.org wrote:
In preparation for no longer unpoisoning in ksize(), remove the behavioral self-tests for ksize().
[...] -/* Check that ksize() makes the whole object accessible. */ -static void ksize_unpoisons_memory(struct kunit *test) -{
char *ptr;
size_t size = 123, real_size;
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
real_size = ksize(ptr);
OPTIMIZER_HIDE_VAR(ptr);
/* This access shouldn't trigger a KASAN report. */
ptr[size] = 'x';
I would rather keep the tests and update to the new behavior. We had bugs in ksize, we need test coverage. I assume ptr[size] access must now produce an error even after ksize.
Good point on all these! I'll respin.
With all "silently resizing" callers of ksize() refactored, remove the logic in ksize() that would allow it to be used to effectively change the size of an allocation (bypassing __alloc_size hints, etc). Users wanting this feature need to either use kmalloc_size_roundup() before an allocation, or call krealloc() directly.
For kfree_sensitive(), move the unpoisoning logic inline. Replace the open-coded ksize() in __do_krealloc with ksize() now that it doesn't perform unpoisoning.
Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Hyeonggon Yoo 42.hyeyoo@gmail.com Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Vincenzo Frascino vincenzo.frascino@arm.com Cc: linux-mm@kvack.org Cc: kasan-dev@googlegroups.com Signed-off-by: Kees Cook keescook@chromium.org --- mm/slab_common.c | 38 ++++++++++++++------------------------ 1 file changed, 14 insertions(+), 24 deletions(-)
diff --git a/mm/slab_common.c b/mm/slab_common.c index d7420cf649f8..60b77bcdc2e3 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1160,13 +1160,8 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags) void *ret; size_t ks;
- /* Don't use instrumented ksize to allow precise KASAN poisoning. */ - if (likely(!ZERO_OR_NULL_PTR(p))) { - if (!kasan_check_byte(p)) - return NULL; - ks = kfence_ksize(p) ?: __ksize(p); - } else - ks = 0; + /* How large is the allocation actually? */ + ks = ksize(p);
/* If the object still fits, repoison it precisely. */ if (ks >= new_size) { @@ -1232,8 +1227,10 @@ void kfree_sensitive(const void *p) void *mem = (void *)p;
ks = ksize(mem); - if (ks) + if (ks) { + kasan_unpoison_range(mem, ks); memzero_explicit(mem, ks); + } kfree(mem); } EXPORT_SYMBOL(kfree_sensitive); @@ -1242,10 +1239,11 @@ EXPORT_SYMBOL(kfree_sensitive); * ksize - get the actual amount of memory allocated for a given object * @objp: Pointer to the object * - * kmalloc may internally round up allocations and return more memory + * kmalloc() may internally round up allocations and return more memory * than requested. ksize() can be used to determine the actual amount of - * memory allocated. The caller may use this additional memory, even though - * a smaller amount of memory was initially specified with the kmalloc call. + * allocated memory. The caller may NOT use this additional memory, unless + * it calls krealloc(). To avoid an alloc/realloc cycle, callers can use + * kmalloc_size_roundup() to find the size of the associated kmalloc bucket. * The caller must guarantee that objp points to a valid object previously * allocated with either kmalloc() or kmem_cache_alloc(). The object * must not be freed during the duration of the call. @@ -1254,13 +1252,11 @@ EXPORT_SYMBOL(kfree_sensitive); */ size_t ksize(const void *objp) { - size_t size; - /* - * We need to first check that the pointer to the object is valid, and - * only then unpoison the memory. The report printed from ksize() is - * more useful, then when it's printed later when the behaviour could - * be undefined due to a potential use-after-free or double-free. + * We need to first check that the pointer to the object is valid. + * The KASAN report printed from ksize() is more useful, then when + * it's printed later when the behaviour could be undefined due to + * a potential use-after-free or double-free. * * We use kasan_check_byte(), which is supported for the hardware * tag-based KASAN mode, unlike kasan_check_read/write(). @@ -1274,13 +1270,7 @@ size_t ksize(const void *objp) if (unlikely(ZERO_OR_NULL_PTR(objp)) || !kasan_check_byte(objp)) return 0;
- size = kfence_ksize(objp) ?: __ksize(objp); - /* - * We assume that ksize callers could use whole allocated area, - * so we need to unpoison this area. - */ - kasan_unpoison_range(objp, size); - return size; + return kfence_ksize(objp) ?: __ksize(objp); } EXPORT_SYMBOL(ksize);
With skbuff's post-allocation use of ksize() rearranged to use kmalloc_size_round() prior to allocation, the compiler can correctly reason about the size of these allocations. The prior mismatch had caused buffer overflow mitigations to erroneously fire under CONFIG_UBSAN_BOUNDS, requiring a partial revert of the __alloc_size attributes. Restore the attribute that had been removed in commit 93dd04ab0b2b ("slab: remove __alloc_size attribute from __kmalloc_track_caller").
Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Hyeonggon Yoo 42.hyeyoo@gmail.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org --- include/linux/slab.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h index 727640173568..297b85ed2c29 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -693,7 +693,8 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag * allocator where we care about the real place the memory allocation * request comes from. */ -extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller); +extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller) + __alloc_size(1); #define kmalloc_track_caller(size, flags) \ __kmalloc_track_caller(size, flags, _RET_IP_)
linaro-mm-sig@lists.linaro.org