June 2021 - Linux-stable-mirror

FAILED: patch "[PATCH] mm/slub: actually fix freelist pointer vs redzoning" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From e41a49fadbc80b60b48d3c095d9e2ee7ef7c9a8e Mon Sep 17 00:00:00 2001 From: Kees Cook <keescook(a)chromium.org> Date: Tue, 15 Jun 2021 18:23:26 -0700 Subject: [PATCH] mm/slub: actually fix freelist pointer vs redzoning It turns out that SLUB redzoning ("slub_debug=Z") checks from s->object_size rather than from s->inuse (which is normally bumped to make room for the freelist pointer), so a cache created with an object size less than 24 would have the freelist pointer written beyond s->object_size, causing the redzone to be corrupted by the freelist pointer. This was very visible with "slub_debug=ZF": BUG test (Tainted: G B ): Right Redzone overwritten ----------------------------------------------------------------------------- INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200 INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620 Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): 00 00 00 00 00 f6 f4 a5 ........ Redzone (____ptrval____): 40 1d e8 1a aa @.... Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ Adjust the offset to stay within s->object_size. (Note that no caches of in this size range are known to exist in the kernel currently.) Link: https://lkml.kernel.org/r/20210608183955.280836-4-keescook@chromium.org Link: https://lore.kernel.org/linux-mm/20200807160627.GA1420741@elver.google.com/ Link: https://lore.kernel.org/lkml/0f7dd7b2-7496-5e2d-9488-2ec9f8e90441@suse.cz/F…: 89b83f282d8b (slub: avoid redzone when choosing freepointer location) Link: https://lore.kernel.org/lkml/CANpmjNOwZ5VpKQn+SYWovTkFB4VsT-RPwyENBmaK0dLcp… Signed-off-by: Kees Cook <keescook(a)chromium.org> Reported-by: Marco Elver <elver(a)google.com> Reported-by: "Lin, Zhenpeng" <zplin(a)psu.edu> Tested-by: Marco Elver <elver(a)google.com> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Christoph Lameter <cl(a)linux.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com> Cc: Pekka Enberg <penberg(a)kernel.org> Cc: Roman Gushchin <guro(a)fb.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/mm/slub.c b/mm/slub.c index f58cfd456548..fe30df460fad 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3689,7 +3689,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) { slab_flags_t flags = s->flags; unsigned int size = s->object_size; - unsigned int freepointer_area; unsigned int order; /* @@ -3698,13 +3697,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) * the possible location of the free pointer. */ size = ALIGN(size, sizeof(void *)); - /* - * This is the area of the object where a freepointer can be - * safely written. If redzoning adds more to the inuse size, we - * can't use that portion for writing the freepointer, so - * s->offset must be limited within this for the general case. - */ - freepointer_area = size; #ifdef CONFIG_SLUB_DEBUG /* @@ -3730,7 +3722,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) /* * With that we have determined the number of bytes in actual use - * by the object. This is the potential offset to the free pointer. + * by the object and redzoning. */ s->inuse = size; @@ -3753,13 +3745,13 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) */ s->offset = size; size += sizeof(void *); - } else if (freepointer_area > sizeof(void *)) { + } else { /* * Store freelist pointer near middle of object to keep * it away from the edges of the object to avoid small * sized over/underflows from neighboring allocations. */ - s->offset = ALIGN(freepointer_area / 2, sizeof(void *)); + s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *)); } #ifdef CONFIG_SLUB_DEBUG

4 years, 6 months

1
0
0 0

FAILED: patch "[PATCH] mm/slub: actually fix freelist pointer vs redzoning" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From e41a49fadbc80b60b48d3c095d9e2ee7ef7c9a8e Mon Sep 17 00:00:00 2001 From: Kees Cook <keescook(a)chromium.org> Date: Tue, 15 Jun 2021 18:23:26 -0700 Subject: [PATCH] mm/slub: actually fix freelist pointer vs redzoning It turns out that SLUB redzoning ("slub_debug=Z") checks from s->object_size rather than from s->inuse (which is normally bumped to make room for the freelist pointer), so a cache created with an object size less than 24 would have the freelist pointer written beyond s->object_size, causing the redzone to be corrupted by the freelist pointer. This was very visible with "slub_debug=ZF": BUG test (Tainted: G B ): Right Redzone overwritten ----------------------------------------------------------------------------- INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200 INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620 Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): 00 00 00 00 00 f6 f4 a5 ........ Redzone (____ptrval____): 40 1d e8 1a aa @.... Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ Adjust the offset to stay within s->object_size. (Note that no caches of in this size range are known to exist in the kernel currently.) Link: https://lkml.kernel.org/r/20210608183955.280836-4-keescook@chromium.org Link: https://lore.kernel.org/linux-mm/20200807160627.GA1420741@elver.google.com/ Link: https://lore.kernel.org/lkml/0f7dd7b2-7496-5e2d-9488-2ec9f8e90441@suse.cz/F…: 89b83f282d8b (slub: avoid redzone when choosing freepointer location) Link: https://lore.kernel.org/lkml/CANpmjNOwZ5VpKQn+SYWovTkFB4VsT-RPwyENBmaK0dLcp… Signed-off-by: Kees Cook <keescook(a)chromium.org> Reported-by: Marco Elver <elver(a)google.com> Reported-by: "Lin, Zhenpeng" <zplin(a)psu.edu> Tested-by: Marco Elver <elver(a)google.com> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Christoph Lameter <cl(a)linux.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com> Cc: Pekka Enberg <penberg(a)kernel.org> Cc: Roman Gushchin <guro(a)fb.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/mm/slub.c b/mm/slub.c index f58cfd456548..fe30df460fad 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3689,7 +3689,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) { slab_flags_t flags = s->flags; unsigned int size = s->object_size; - unsigned int freepointer_area; unsigned int order; /* @@ -3698,13 +3697,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) * the possible location of the free pointer. */ size = ALIGN(size, sizeof(void *)); - /* - * This is the area of the object where a freepointer can be - * safely written. If redzoning adds more to the inuse size, we - * can't use that portion for writing the freepointer, so - * s->offset must be limited within this for the general case. - */ - freepointer_area = size; #ifdef CONFIG_SLUB_DEBUG /* @@ -3730,7 +3722,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) /* * With that we have determined the number of bytes in actual use - * by the object. This is the potential offset to the free pointer. + * by the object and redzoning. */ s->inuse = size; @@ -3753,13 +3745,13 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) */ s->offset = size; size += sizeof(void *); - } else if (freepointer_area > sizeof(void *)) { + } else { /* * Store freelist pointer near middle of object to keep * it away from the edges of the object to avoid small * sized over/underflows from neighboring allocations. */ - s->offset = ALIGN(freepointer_area / 2, sizeof(void *)); + s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *)); } #ifdef CONFIG_SLUB_DEBUG

4 years, 6 months

1
0
0 0

FAILED: patch "[PATCH] mm/slub: fix redzoning for small allocations" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 74c1d3e081533825f2611e46edea1fcdc0701985 Mon Sep 17 00:00:00 2001 From: Kees Cook <keescook(a)chromium.org> Date: Tue, 15 Jun 2021 18:23:22 -0700 Subject: [PATCH] mm/slub: fix redzoning for small allocations The redzone area for SLUB exists between s->object_size and s->inuse (which is at least the word-aligned object_size). If a cache were created with an object_size smaller than sizeof(void *), the in-object stored freelist pointer would overwrite the redzone (e.g. with boot param "slub_debug=ZF"): BUG test (Tainted: G B ): Right Redzone overwritten ----------------------------------------------------------------------------- INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200 INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620 Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ Store the freelist pointer out of line when object_size is smaller than sizeof(void *) and redzoning is enabled. Additionally remove the "smaller than sizeof(void *)" check under CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant: SLAB and SLOB both handle small sizes. (Note that no caches within this size range are known to exist in the kernel currently.) Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org Fixes: 81819f0fc828 ("SLUB core") Signed-off-by: Kees Cook <keescook(a)chromium.org> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Christoph Lameter <cl(a)linux.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com> Cc: "Lin, Zhenpeng" <zplin(a)psu.edu> Cc: Marco Elver <elver(a)google.com> Cc: Pekka Enberg <penberg(a)kernel.org> Cc: Roman Gushchin <guro(a)fb.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/mm/slab_common.c b/mm/slab_common.c index a4a571428c51..7cab77655f11 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -97,8 +97,7 @@ EXPORT_SYMBOL(kmem_cache_size); #ifdef CONFIG_DEBUG_VM static int kmem_cache_sanity_check(const char *name, unsigned int size) { - if (!name || in_interrupt() || size < sizeof(void *) || - size > KMALLOC_MAX_SIZE) { + if (!name || in_interrupt() || size > KMALLOC_MAX_SIZE) { pr_err("kmem_cache_create(%s) integrity check failed\n", name); return -EINVAL; } diff --git a/mm/slub.c b/mm/slub.c index f91d9fe7d0d8..f58cfd456548 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3734,15 +3734,17 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) */ s->inuse = size; - if (((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || - s->ctor)) { + if ((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || + ((flags & SLAB_RED_ZONE) && s->object_size < sizeof(void *)) || + s->ctor) { /* * Relocate free pointer after the object if it is not * permitted to overwrite the first word of the object on * kmem_cache_free. * * This is the case if we do RCU, have a constructor or - * destructor or are poisoning the objects. + * destructor, are poisoning the objects, or are + * redzoning an object smaller than sizeof(void *). * * The assumption that s->offset >= s->inuse means free * pointer is outside of the object is used in the

4 years, 6 months

1
0
0 0

FAILED: patch "[PATCH] mm/slub: fix redzoning for small allocations" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 74c1d3e081533825f2611e46edea1fcdc0701985 Mon Sep 17 00:00:00 2001 From: Kees Cook <keescook(a)chromium.org> Date: Tue, 15 Jun 2021 18:23:22 -0700 Subject: [PATCH] mm/slub: fix redzoning for small allocations The redzone area for SLUB exists between s->object_size and s->inuse (which is at least the word-aligned object_size). If a cache were created with an object_size smaller than sizeof(void *), the in-object stored freelist pointer would overwrite the redzone (e.g. with boot param "slub_debug=ZF"): BUG test (Tainted: G B ): Right Redzone overwritten ----------------------------------------------------------------------------- INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200 INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620 Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ Store the freelist pointer out of line when object_size is smaller than sizeof(void *) and redzoning is enabled. Additionally remove the "smaller than sizeof(void *)" check under CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant: SLAB and SLOB both handle small sizes. (Note that no caches within this size range are known to exist in the kernel currently.) Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org Fixes: 81819f0fc828 ("SLUB core") Signed-off-by: Kees Cook <keescook(a)chromium.org> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Christoph Lameter <cl(a)linux.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com> Cc: "Lin, Zhenpeng" <zplin(a)psu.edu> Cc: Marco Elver <elver(a)google.com> Cc: Pekka Enberg <penberg(a)kernel.org> Cc: Roman Gushchin <guro(a)fb.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/mm/slab_common.c b/mm/slab_common.c index a4a571428c51..7cab77655f11 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -97,8 +97,7 @@ EXPORT_SYMBOL(kmem_cache_size); #ifdef CONFIG_DEBUG_VM static int kmem_cache_sanity_check(const char *name, unsigned int size) { - if (!name || in_interrupt() || size < sizeof(void *) || - size > KMALLOC_MAX_SIZE) { + if (!name || in_interrupt() || size > KMALLOC_MAX_SIZE) { pr_err("kmem_cache_create(%s) integrity check failed\n", name); return -EINVAL; } diff --git a/mm/slub.c b/mm/slub.c index f91d9fe7d0d8..f58cfd456548 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3734,15 +3734,17 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) */ s->inuse = size; - if (((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || - s->ctor)) { + if ((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || + ((flags & SLAB_RED_ZONE) && s->object_size < sizeof(void *)) || + s->ctor) { /* * Relocate free pointer after the object if it is not * permitted to overwrite the first word of the object on * kmem_cache_free. * * This is the case if we do RCU, have a constructor or - * destructor or are poisoning the objects. + * destructor, are poisoning the objects, or are + * redzoning an object smaller than sizeof(void *). * * The assumption that s->offset >= s->inuse means free * pointer is outside of the object is used in the

4 years, 6 months

1
0
0 0

FAILED: patch "[PATCH] mm/slub: fix redzoning for small allocations" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 74c1d3e081533825f2611e46edea1fcdc0701985 Mon Sep 17 00:00:00 2001 From: Kees Cook <keescook(a)chromium.org> Date: Tue, 15 Jun 2021 18:23:22 -0700 Subject: [PATCH] mm/slub: fix redzoning for small allocations The redzone area for SLUB exists between s->object_size and s->inuse (which is at least the word-aligned object_size). If a cache were created with an object_size smaller than sizeof(void *), the in-object stored freelist pointer would overwrite the redzone (e.g. with boot param "slub_debug=ZF"): BUG test (Tainted: G B ): Right Redzone overwritten ----------------------------------------------------------------------------- INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200 INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620 Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ Store the freelist pointer out of line when object_size is smaller than sizeof(void *) and redzoning is enabled. Additionally remove the "smaller than sizeof(void *)" check under CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant: SLAB and SLOB both handle small sizes. (Note that no caches within this size range are known to exist in the kernel currently.) Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org Fixes: 81819f0fc828 ("SLUB core") Signed-off-by: Kees Cook <keescook(a)chromium.org> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Christoph Lameter <cl(a)linux.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com> Cc: "Lin, Zhenpeng" <zplin(a)psu.edu> Cc: Marco Elver <elver(a)google.com> Cc: Pekka Enberg <penberg(a)kernel.org> Cc: Roman Gushchin <guro(a)fb.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/mm/slab_common.c b/mm/slab_common.c index a4a571428c51..7cab77655f11 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -97,8 +97,7 @@ EXPORT_SYMBOL(kmem_cache_size); #ifdef CONFIG_DEBUG_VM static int kmem_cache_sanity_check(const char *name, unsigned int size) { - if (!name || in_interrupt() || size < sizeof(void *) || - size > KMALLOC_MAX_SIZE) { + if (!name || in_interrupt() || size > KMALLOC_MAX_SIZE) { pr_err("kmem_cache_create(%s) integrity check failed\n", name); return -EINVAL; } diff --git a/mm/slub.c b/mm/slub.c index f91d9fe7d0d8..f58cfd456548 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3734,15 +3734,17 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) */ s->inuse = size; - if (((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || - s->ctor)) { + if ((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || + ((flags & SLAB_RED_ZONE) && s->object_size < sizeof(void *)) || + s->ctor) { /* * Relocate free pointer after the object if it is not * permitted to overwrite the first word of the object on * kmem_cache_free. * * This is the case if we do RCU, have a constructor or - * destructor or are poisoning the objects. + * destructor, are poisoning the objects, or are + * redzoning an object smaller than sizeof(void *). * * The assumption that s->offset >= s->inuse means free * pointer is outside of the object is used in the

4 years, 6 months

1
0
0 0

FAILED: patch "[PATCH] mm/slub: fix redzoning for small allocations" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 74c1d3e081533825f2611e46edea1fcdc0701985 Mon Sep 17 00:00:00 2001 From: Kees Cook <keescook(a)chromium.org> Date: Tue, 15 Jun 2021 18:23:22 -0700 Subject: [PATCH] mm/slub: fix redzoning for small allocations The redzone area for SLUB exists between s->object_size and s->inuse (which is at least the word-aligned object_size). If a cache were created with an object_size smaller than sizeof(void *), the in-object stored freelist pointer would overwrite the redzone (e.g. with boot param "slub_debug=ZF"): BUG test (Tainted: G B ): Right Redzone overwritten ----------------------------------------------------------------------------- INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200 INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620 Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ Store the freelist pointer out of line when object_size is smaller than sizeof(void *) and redzoning is enabled. Additionally remove the "smaller than sizeof(void *)" check under CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant: SLAB and SLOB both handle small sizes. (Note that no caches within this size range are known to exist in the kernel currently.) Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org Fixes: 81819f0fc828 ("SLUB core") Signed-off-by: Kees Cook <keescook(a)chromium.org> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Christoph Lameter <cl(a)linux.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com> Cc: "Lin, Zhenpeng" <zplin(a)psu.edu> Cc: Marco Elver <elver(a)google.com> Cc: Pekka Enberg <penberg(a)kernel.org> Cc: Roman Gushchin <guro(a)fb.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/mm/slab_common.c b/mm/slab_common.c index a4a571428c51..7cab77655f11 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -97,8 +97,7 @@ EXPORT_SYMBOL(kmem_cache_size); #ifdef CONFIG_DEBUG_VM static int kmem_cache_sanity_check(const char *name, unsigned int size) { - if (!name || in_interrupt() || size < sizeof(void *) || - size > KMALLOC_MAX_SIZE) { + if (!name || in_interrupt() || size > KMALLOC_MAX_SIZE) { pr_err("kmem_cache_create(%s) integrity check failed\n", name); return -EINVAL; } diff --git a/mm/slub.c b/mm/slub.c index f91d9fe7d0d8..f58cfd456548 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3734,15 +3734,17 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) */ s->inuse = size; - if (((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || - s->ctor)) { + if ((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || + ((flags & SLAB_RED_ZONE) && s->object_size < sizeof(void *)) || + s->ctor) { /* * Relocate free pointer after the object if it is not * permitted to overwrite the first word of the object on * kmem_cache_free. * * This is the case if we do RCU, have a constructor or - * destructor or are poisoning the objects. + * destructor, are poisoning the objects, or are + * redzoning an object smaller than sizeof(void *). * * The assumption that s->offset >= s->inuse means free * pointer is outside of the object is used in the

4 years, 6 months

1
0
0 0

FAILED: patch "[PATCH] mm/hugetlb: expand restore_reserve_on_error functionality" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 846be08578edb81f02bc8534577e6c367ef34f41 Mon Sep 17 00:00:00 2001 From: Mike Kravetz <mike.kravetz(a)oracle.com> Date: Tue, 15 Jun 2021 18:23:29 -0700 Subject: [PATCH] mm/hugetlb: expand restore_reserve_on_error functionality The routine restore_reserve_on_error is called to restore reservation information when an error occurs after page allocation. The routine alloc_huge_page modifies the mapping reserve map and potentially the reserve count during allocation. If code calling alloc_huge_page encounters an error after allocation and needs to free the page, the reservation information needs to be adjusted. Currently, restore_reserve_on_error only takes action on pages for which the reserve count was adjusted(HPageRestoreReserve flag). There is nothing wrong with these adjustments. However, alloc_huge_page ALWAYS modifies the reserve map during allocation even if the reserve count is not adjusted. This can cause issues as observed during development of this patch [1]. One specific series of operations causing an issue is: - Create a shared hugetlb mapping Reservations for all pages created by default - Fault in a page in the mapping Reservation exists so reservation count is decremented - Punch a hole in the file/mapping at index previously faulted Reservation and any associated pages will be removed - Allocate a page to fill the hole No reservation entry, so reserve count unmodified Reservation entry added to map by alloc_huge_page - Error after allocation and before instantiating the page Reservation entry remains in map - Allocate a page to fill the hole Reservation entry exists, so decrement reservation count This will cause a reservation count underflow as the reservation count was decremented twice for the same index. A user would observe a very large number for HugePages_Rsvd in /proc/meminfo. This would also likely cause subsequent allocations of hugetlb pages to fail as it would 'appear' that all pages are reserved. This sequence of operations is unlikely to happen, however they were easily reproduced and observed using hacked up code as described in [1]. Address the issue by having the routine restore_reserve_on_error take action on pages where HPageRestoreReserve is not set. In this case, we need to remove any reserve map entry created by alloc_huge_page. A new helper routine vma_del_reservation assists with this operation. There are three callers of alloc_huge_page which do not currently call restore_reserve_on error before freeing a page on error paths. Add those missing calls. [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.… Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com Fixes: 96b96a96ddee ("mm/hugetlb: fix huge page reservation leak in private mapping error paths" Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reviewed-by: Mina Almasry <almasrymina(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Muchun Song <songmuchun(a)bytedance.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 55efd3dd04f6..30dee68458c7 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -735,6 +735,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, __SetPageUptodate(page); error = huge_add_to_page_cache(page, mapping, index); if (unlikely(error)) { + restore_reserve_on_error(h, &pseudo_vma, addr, page); put_page(page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 790ae618548d..6504346a1947 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -610,6 +610,8 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page); /* arch callback */ int __init __alloc_bootmem_huge_page(struct hstate *h); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 85f42ec1a927..e0a5f9cbbece 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2121,12 +2121,18 @@ static void return_unused_surplus_pages(struct hstate *h, * be restored when a newly allocated huge page must be freed. It is * to be called after calling vma_needs_reservation to determine if a * reservation exists. + * + * vma_del_reservation is used in error paths where an entry in the reserve + * map was created during huge page allocation and must be removed. It is to + * be called after calling vma_needs_reservation to determine if a reservation + * exists. */ enum vma_resv_mode { VMA_NEEDS_RESV, VMA_COMMIT_RESV, VMA_END_RESV, VMA_ADD_RESV, + VMA_DEL_RESV, }; static long __vma_reservation_common(struct hstate *h, struct vm_area_struct *vma, unsigned long addr, @@ -2170,11 +2176,21 @@ static long __vma_reservation_common(struct hstate *h, ret = region_del(resv, idx, idx + 1); } break; + case VMA_DEL_RESV: + if (vma->vm_flags & VM_MAYSHARE) { + region_abort(resv, idx, idx + 1, 1); + ret = region_del(resv, idx, idx + 1); + } else { + ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + /* region_add calls of range 1 should never fail. */ + VM_BUG_ON(ret < 0); + } + break; default: BUG(); } - if (vma->vm_flags & VM_MAYSHARE) + if (vma->vm_flags & VM_MAYSHARE || mode == VMA_DEL_RESV) return ret; /* * We know private mapping must have HPAGE_RESV_OWNER set. @@ -2222,25 +2238,39 @@ static long vma_add_reservation(struct hstate *h, return __vma_reservation_common(h, vma, addr, VMA_ADD_RESV); } +static long vma_del_reservation(struct hstate *h, + struct vm_area_struct *vma, unsigned long addr) +{ + return __vma_reservation_common(h, vma, addr, VMA_DEL_RESV); +} + /* - * This routine is called to restore a reservation on error paths. In the - * specific error paths, a huge page was allocated (via alloc_huge_page) - * and is about to be freed. If a reservation for the page existed, - * alloc_huge_page would have consumed the reservation and set - * HPageRestoreReserve in the newly allocated page. When the page is freed - * via free_huge_page, the global reservation count will be incremented if - * HPageRestoreReserve is set. However, free_huge_page can not adjust the - * reserve map. Adjust the reserve map here to be consistent with global - * reserve count adjustments to be made by free_huge_page. + * This routine is called to restore reservation information on error paths. + * It should ONLY be called for pages allocated via alloc_huge_page(), and + * the hugetlb mutex should remain held when calling this routine. + * + * It handles two specific cases: + * 1) A reservation was in place and the page consumed the reservation. + * HPageRestoreReserve is set in the page. + * 2) No reservation was in place for the page, so HPageRestoreReserve is + * not set. However, alloc_huge_page always updates the reserve map. + * + * In case 1, free_huge_page later in the error path will increment the + * global reserve count. But, free_huge_page does not have enough context + * to adjust the reservation map. This case deals primarily with private + * mappings. Adjust the reserve map here to be consistent with global + * reserve count adjustments to be made by free_huge_page. Make sure the + * reserve map indicates there is a reservation present. + * + * In case 2, simply undo reserve map modifications done by alloc_huge_page. */ -static void restore_reserve_on_error(struct hstate *h, - struct vm_area_struct *vma, unsigned long address, - struct page *page) +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page) { - if (unlikely(HPageRestoreReserve(page))) { - long rc = vma_needs_reservation(h, vma, address); + long rc = vma_needs_reservation(h, vma, address); - if (unlikely(rc < 0)) { + if (HPageRestoreReserve(page)) { + if (unlikely(rc < 0)) /* * Rare out of memory condition in reserve map * manipulation. Clear HPageRestoreReserve so that @@ -2253,16 +2283,57 @@ static void restore_reserve_on_error(struct hstate *h, * accounting of reserve counts. */ ClearHPageRestoreReserve(page); - } else if (rc) { - rc = vma_add_reservation(h, vma, address); - if (unlikely(rc < 0)) + else if (rc) + (void)vma_add_reservation(h, vma, address); + else + vma_end_reservation(h, vma, address); + } else { + if (!rc) { + /* + * This indicates there is an entry in the reserve map + * added by alloc_huge_page. We know it was added + * before the alloc_huge_page call, otherwise + * HPageRestoreReserve would be set on the page. + * Remove the entry so that a subsequent allocation + * does not consume a reservation. + */ + rc = vma_del_reservation(h, vma, address); + if (rc < 0) /* - * See above comment about rare out of - * memory condition. + * VERY rare out of memory condition. Since + * we can not delete the entry, set + * HPageRestoreReserve so that the reserve + * count will be incremented when the page + * is freed. This reserve will be consumed + * on a subsequent allocation. */ - ClearHPageRestoreReserve(page); + SetHPageRestoreReserve(page); + } else if (rc < 0) { + /* + * Rare out of memory condition from + * vma_needs_reservation call. Memory allocation is + * only attempted if a new entry is needed. Therefore, + * this implies there is not an entry in the + * reserve map. + * + * For shared mappings, no entry in the map indicates + * no reservation. We are done. + */ + if (!(vma->vm_flags & VM_MAYSHARE)) + /* + * For private mappings, no entry indicates + * a reservation is present. Since we can + * not add an entry, set SetHPageRestoreReserve + * on the page so reserve count will be + * incremented when freed. This reserve will + * be consumed on a subsequent allocation. + */ + SetHPageRestoreReserve(page); } else - vma_end_reservation(h, vma, address); + /* + * No reservation present, do nothing + */ + vma_end_reservation(h, vma, address); } } @@ -4037,6 +4108,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { + restore_reserve_on_error(h, vma, addr, + new); put_page(new); /* dst_entry won't change as in child */ goto again; @@ -5006,6 +5079,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (vm_shared || is_continue) unlock_page(page); out_release_nounlock: + restore_reserve_on_error(h, dst_vma, dst_addr, page); put_page(page); goto out; }

4 years, 6 months

1
0
0 0

FAILED: patch "[PATCH] mm/hugetlb: expand restore_reserve_on_error functionality" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 846be08578edb81f02bc8534577e6c367ef34f41 Mon Sep 17 00:00:00 2001 From: Mike Kravetz <mike.kravetz(a)oracle.com> Date: Tue, 15 Jun 2021 18:23:29 -0700 Subject: [PATCH] mm/hugetlb: expand restore_reserve_on_error functionality The routine restore_reserve_on_error is called to restore reservation information when an error occurs after page allocation. The routine alloc_huge_page modifies the mapping reserve map and potentially the reserve count during allocation. If code calling alloc_huge_page encounters an error after allocation and needs to free the page, the reservation information needs to be adjusted. Currently, restore_reserve_on_error only takes action on pages for which the reserve count was adjusted(HPageRestoreReserve flag). There is nothing wrong with these adjustments. However, alloc_huge_page ALWAYS modifies the reserve map during allocation even if the reserve count is not adjusted. This can cause issues as observed during development of this patch [1]. One specific series of operations causing an issue is: - Create a shared hugetlb mapping Reservations for all pages created by default - Fault in a page in the mapping Reservation exists so reservation count is decremented - Punch a hole in the file/mapping at index previously faulted Reservation and any associated pages will be removed - Allocate a page to fill the hole No reservation entry, so reserve count unmodified Reservation entry added to map by alloc_huge_page - Error after allocation and before instantiating the page Reservation entry remains in map - Allocate a page to fill the hole Reservation entry exists, so decrement reservation count This will cause a reservation count underflow as the reservation count was decremented twice for the same index. A user would observe a very large number for HugePages_Rsvd in /proc/meminfo. This would also likely cause subsequent allocations of hugetlb pages to fail as it would 'appear' that all pages are reserved. This sequence of operations is unlikely to happen, however they were easily reproduced and observed using hacked up code as described in [1]. Address the issue by having the routine restore_reserve_on_error take action on pages where HPageRestoreReserve is not set. In this case, we need to remove any reserve map entry created by alloc_huge_page. A new helper routine vma_del_reservation assists with this operation. There are three callers of alloc_huge_page which do not currently call restore_reserve_on error before freeing a page on error paths. Add those missing calls. [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.… Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com Fixes: 96b96a96ddee ("mm/hugetlb: fix huge page reservation leak in private mapping error paths" Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reviewed-by: Mina Almasry <almasrymina(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Muchun Song <songmuchun(a)bytedance.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 55efd3dd04f6..30dee68458c7 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -735,6 +735,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, __SetPageUptodate(page); error = huge_add_to_page_cache(page, mapping, index); if (unlikely(error)) { + restore_reserve_on_error(h, &pseudo_vma, addr, page); put_page(page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 790ae618548d..6504346a1947 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -610,6 +610,8 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page); /* arch callback */ int __init __alloc_bootmem_huge_page(struct hstate *h); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 85f42ec1a927..e0a5f9cbbece 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2121,12 +2121,18 @@ static void return_unused_surplus_pages(struct hstate *h, * be restored when a newly allocated huge page must be freed. It is * to be called after calling vma_needs_reservation to determine if a * reservation exists. + * + * vma_del_reservation is used in error paths where an entry in the reserve + * map was created during huge page allocation and must be removed. It is to + * be called after calling vma_needs_reservation to determine if a reservation + * exists. */ enum vma_resv_mode { VMA_NEEDS_RESV, VMA_COMMIT_RESV, VMA_END_RESV, VMA_ADD_RESV, + VMA_DEL_RESV, }; static long __vma_reservation_common(struct hstate *h, struct vm_area_struct *vma, unsigned long addr, @@ -2170,11 +2176,21 @@ static long __vma_reservation_common(struct hstate *h, ret = region_del(resv, idx, idx + 1); } break; + case VMA_DEL_RESV: + if (vma->vm_flags & VM_MAYSHARE) { + region_abort(resv, idx, idx + 1, 1); + ret = region_del(resv, idx, idx + 1); + } else { + ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + /* region_add calls of range 1 should never fail. */ + VM_BUG_ON(ret < 0); + } + break; default: BUG(); } - if (vma->vm_flags & VM_MAYSHARE) + if (vma->vm_flags & VM_MAYSHARE || mode == VMA_DEL_RESV) return ret; /* * We know private mapping must have HPAGE_RESV_OWNER set. @@ -2222,25 +2238,39 @@ static long vma_add_reservation(struct hstate *h, return __vma_reservation_common(h, vma, addr, VMA_ADD_RESV); } +static long vma_del_reservation(struct hstate *h, + struct vm_area_struct *vma, unsigned long addr) +{ + return __vma_reservation_common(h, vma, addr, VMA_DEL_RESV); +} + /* - * This routine is called to restore a reservation on error paths. In the - * specific error paths, a huge page was allocated (via alloc_huge_page) - * and is about to be freed. If a reservation for the page existed, - * alloc_huge_page would have consumed the reservation and set - * HPageRestoreReserve in the newly allocated page. When the page is freed - * via free_huge_page, the global reservation count will be incremented if - * HPageRestoreReserve is set. However, free_huge_page can not adjust the - * reserve map. Adjust the reserve map here to be consistent with global - * reserve count adjustments to be made by free_huge_page. + * This routine is called to restore reservation information on error paths. + * It should ONLY be called for pages allocated via alloc_huge_page(), and + * the hugetlb mutex should remain held when calling this routine. + * + * It handles two specific cases: + * 1) A reservation was in place and the page consumed the reservation. + * HPageRestoreReserve is set in the page. + * 2) No reservation was in place for the page, so HPageRestoreReserve is + * not set. However, alloc_huge_page always updates the reserve map. + * + * In case 1, free_huge_page later in the error path will increment the + * global reserve count. But, free_huge_page does not have enough context + * to adjust the reservation map. This case deals primarily with private + * mappings. Adjust the reserve map here to be consistent with global + * reserve count adjustments to be made by free_huge_page. Make sure the + * reserve map indicates there is a reservation present. + * + * In case 2, simply undo reserve map modifications done by alloc_huge_page. */ -static void restore_reserve_on_error(struct hstate *h, - struct vm_area_struct *vma, unsigned long address, - struct page *page) +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page) { - if (unlikely(HPageRestoreReserve(page))) { - long rc = vma_needs_reservation(h, vma, address); + long rc = vma_needs_reservation(h, vma, address); - if (unlikely(rc < 0)) { + if (HPageRestoreReserve(page)) { + if (unlikely(rc < 0)) /* * Rare out of memory condition in reserve map * manipulation. Clear HPageRestoreReserve so that @@ -2253,16 +2283,57 @@ static void restore_reserve_on_error(struct hstate *h, * accounting of reserve counts. */ ClearHPageRestoreReserve(page); - } else if (rc) { - rc = vma_add_reservation(h, vma, address); - if (unlikely(rc < 0)) + else if (rc) + (void)vma_add_reservation(h, vma, address); + else + vma_end_reservation(h, vma, address); + } else { + if (!rc) { + /* + * This indicates there is an entry in the reserve map + * added by alloc_huge_page. We know it was added + * before the alloc_huge_page call, otherwise + * HPageRestoreReserve would be set on the page. + * Remove the entry so that a subsequent allocation + * does not consume a reservation. + */ + rc = vma_del_reservation(h, vma, address); + if (rc < 0) /* - * See above comment about rare out of - * memory condition. + * VERY rare out of memory condition. Since + * we can not delete the entry, set + * HPageRestoreReserve so that the reserve + * count will be incremented when the page + * is freed. This reserve will be consumed + * on a subsequent allocation. */ - ClearHPageRestoreReserve(page); + SetHPageRestoreReserve(page); + } else if (rc < 0) { + /* + * Rare out of memory condition from + * vma_needs_reservation call. Memory allocation is + * only attempted if a new entry is needed. Therefore, + * this implies there is not an entry in the + * reserve map. + * + * For shared mappings, no entry in the map indicates + * no reservation. We are done. + */ + if (!(vma->vm_flags & VM_MAYSHARE)) + /* + * For private mappings, no entry indicates + * a reservation is present. Since we can + * not add an entry, set SetHPageRestoreReserve + * on the page so reserve count will be + * incremented when freed. This reserve will + * be consumed on a subsequent allocation. + */ + SetHPageRestoreReserve(page); } else - vma_end_reservation(h, vma, address); + /* + * No reservation present, do nothing + */ + vma_end_reservation(h, vma, address); } } @@ -4037,6 +4108,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { + restore_reserve_on_error(h, vma, addr, + new); put_page(new); /* dst_entry won't change as in child */ goto again; @@ -5006,6 +5079,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (vm_shared || is_continue) unlock_page(page); out_release_nounlock: + restore_reserve_on_error(h, dst_vma, dst_addr, page); put_page(page); goto out; }

4 years, 6 months

1
0
0 0

FAILED: patch "[PATCH] mm/hugetlb: expand restore_reserve_on_error functionality" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 846be08578edb81f02bc8534577e6c367ef34f41 Mon Sep 17 00:00:00 2001 From: Mike Kravetz <mike.kravetz(a)oracle.com> Date: Tue, 15 Jun 2021 18:23:29 -0700 Subject: [PATCH] mm/hugetlb: expand restore_reserve_on_error functionality The routine restore_reserve_on_error is called to restore reservation information when an error occurs after page allocation. The routine alloc_huge_page modifies the mapping reserve map and potentially the reserve count during allocation. If code calling alloc_huge_page encounters an error after allocation and needs to free the page, the reservation information needs to be adjusted. Currently, restore_reserve_on_error only takes action on pages for which the reserve count was adjusted(HPageRestoreReserve flag). There is nothing wrong with these adjustments. However, alloc_huge_page ALWAYS modifies the reserve map during allocation even if the reserve count is not adjusted. This can cause issues as observed during development of this patch [1]. One specific series of operations causing an issue is: - Create a shared hugetlb mapping Reservations for all pages created by default - Fault in a page in the mapping Reservation exists so reservation count is decremented - Punch a hole in the file/mapping at index previously faulted Reservation and any associated pages will be removed - Allocate a page to fill the hole No reservation entry, so reserve count unmodified Reservation entry added to map by alloc_huge_page - Error after allocation and before instantiating the page Reservation entry remains in map - Allocate a page to fill the hole Reservation entry exists, so decrement reservation count This will cause a reservation count underflow as the reservation count was decremented twice for the same index. A user would observe a very large number for HugePages_Rsvd in /proc/meminfo. This would also likely cause subsequent allocations of hugetlb pages to fail as it would 'appear' that all pages are reserved. This sequence of operations is unlikely to happen, however they were easily reproduced and observed using hacked up code as described in [1]. Address the issue by having the routine restore_reserve_on_error take action on pages where HPageRestoreReserve is not set. In this case, we need to remove any reserve map entry created by alloc_huge_page. A new helper routine vma_del_reservation assists with this operation. There are three callers of alloc_huge_page which do not currently call restore_reserve_on error before freeing a page on error paths. Add those missing calls. [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.… Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com Fixes: 96b96a96ddee ("mm/hugetlb: fix huge page reservation leak in private mapping error paths" Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reviewed-by: Mina Almasry <almasrymina(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Muchun Song <songmuchun(a)bytedance.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 55efd3dd04f6..30dee68458c7 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -735,6 +735,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, __SetPageUptodate(page); error = huge_add_to_page_cache(page, mapping, index); if (unlikely(error)) { + restore_reserve_on_error(h, &pseudo_vma, addr, page); put_page(page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 790ae618548d..6504346a1947 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -610,6 +610,8 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page); /* arch callback */ int __init __alloc_bootmem_huge_page(struct hstate *h); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 85f42ec1a927..e0a5f9cbbece 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2121,12 +2121,18 @@ static void return_unused_surplus_pages(struct hstate *h, * be restored when a newly allocated huge page must be freed. It is * to be called after calling vma_needs_reservation to determine if a * reservation exists. + * + * vma_del_reservation is used in error paths where an entry in the reserve + * map was created during huge page allocation and must be removed. It is to + * be called after calling vma_needs_reservation to determine if a reservation + * exists. */ enum vma_resv_mode { VMA_NEEDS_RESV, VMA_COMMIT_RESV, VMA_END_RESV, VMA_ADD_RESV, + VMA_DEL_RESV, }; static long __vma_reservation_common(struct hstate *h, struct vm_area_struct *vma, unsigned long addr, @@ -2170,11 +2176,21 @@ static long __vma_reservation_common(struct hstate *h, ret = region_del(resv, idx, idx + 1); } break; + case VMA_DEL_RESV: + if (vma->vm_flags & VM_MAYSHARE) { + region_abort(resv, idx, idx + 1, 1); + ret = region_del(resv, idx, idx + 1); + } else { + ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + /* region_add calls of range 1 should never fail. */ + VM_BUG_ON(ret < 0); + } + break; default: BUG(); } - if (vma->vm_flags & VM_MAYSHARE) + if (vma->vm_flags & VM_MAYSHARE || mode == VMA_DEL_RESV) return ret; /* * We know private mapping must have HPAGE_RESV_OWNER set. @@ -2222,25 +2238,39 @@ static long vma_add_reservation(struct hstate *h, return __vma_reservation_common(h, vma, addr, VMA_ADD_RESV); } +static long vma_del_reservation(struct hstate *h, + struct vm_area_struct *vma, unsigned long addr) +{ + return __vma_reservation_common(h, vma, addr, VMA_DEL_RESV); +} + /* - * This routine is called to restore a reservation on error paths. In the - * specific error paths, a huge page was allocated (via alloc_huge_page) - * and is about to be freed. If a reservation for the page existed, - * alloc_huge_page would have consumed the reservation and set - * HPageRestoreReserve in the newly allocated page. When the page is freed - * via free_huge_page, the global reservation count will be incremented if - * HPageRestoreReserve is set. However, free_huge_page can not adjust the - * reserve map. Adjust the reserve map here to be consistent with global - * reserve count adjustments to be made by free_huge_page. + * This routine is called to restore reservation information on error paths. + * It should ONLY be called for pages allocated via alloc_huge_page(), and + * the hugetlb mutex should remain held when calling this routine. + * + * It handles two specific cases: + * 1) A reservation was in place and the page consumed the reservation. + * HPageRestoreReserve is set in the page. + * 2) No reservation was in place for the page, so HPageRestoreReserve is + * not set. However, alloc_huge_page always updates the reserve map. + * + * In case 1, free_huge_page later in the error path will increment the + * global reserve count. But, free_huge_page does not have enough context + * to adjust the reservation map. This case deals primarily with private + * mappings. Adjust the reserve map here to be consistent with global + * reserve count adjustments to be made by free_huge_page. Make sure the + * reserve map indicates there is a reservation present. + * + * In case 2, simply undo reserve map modifications done by alloc_huge_page. */ -static void restore_reserve_on_error(struct hstate *h, - struct vm_area_struct *vma, unsigned long address, - struct page *page) +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page) { - if (unlikely(HPageRestoreReserve(page))) { - long rc = vma_needs_reservation(h, vma, address); + long rc = vma_needs_reservation(h, vma, address); - if (unlikely(rc < 0)) { + if (HPageRestoreReserve(page)) { + if (unlikely(rc < 0)) /* * Rare out of memory condition in reserve map * manipulation. Clear HPageRestoreReserve so that @@ -2253,16 +2283,57 @@ static void restore_reserve_on_error(struct hstate *h, * accounting of reserve counts. */ ClearHPageRestoreReserve(page); - } else if (rc) { - rc = vma_add_reservation(h, vma, address); - if (unlikely(rc < 0)) + else if (rc) + (void)vma_add_reservation(h, vma, address); + else + vma_end_reservation(h, vma, address); + } else { + if (!rc) { + /* + * This indicates there is an entry in the reserve map + * added by alloc_huge_page. We know it was added + * before the alloc_huge_page call, otherwise + * HPageRestoreReserve would be set on the page. + * Remove the entry so that a subsequent allocation + * does not consume a reservation. + */ + rc = vma_del_reservation(h, vma, address); + if (rc < 0) /* - * See above comment about rare out of - * memory condition. + * VERY rare out of memory condition. Since + * we can not delete the entry, set + * HPageRestoreReserve so that the reserve + * count will be incremented when the page + * is freed. This reserve will be consumed + * on a subsequent allocation. */ - ClearHPageRestoreReserve(page); + SetHPageRestoreReserve(page); + } else if (rc < 0) { + /* + * Rare out of memory condition from + * vma_needs_reservation call. Memory allocation is + * only attempted if a new entry is needed. Therefore, + * this implies there is not an entry in the + * reserve map. + * + * For shared mappings, no entry in the map indicates + * no reservation. We are done. + */ + if (!(vma->vm_flags & VM_MAYSHARE)) + /* + * For private mappings, no entry indicates + * a reservation is present. Since we can + * not add an entry, set SetHPageRestoreReserve + * on the page so reserve count will be + * incremented when freed. This reserve will + * be consumed on a subsequent allocation. + */ + SetHPageRestoreReserve(page); } else - vma_end_reservation(h, vma, address); + /* + * No reservation present, do nothing + */ + vma_end_reservation(h, vma, address); } } @@ -4037,6 +4108,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { + restore_reserve_on_error(h, vma, addr, + new); put_page(new); /* dst_entry won't change as in child */ goto again; @@ -5006,6 +5079,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (vm_shared || is_continue) unlock_page(page); out_release_nounlock: + restore_reserve_on_error(h, dst_vma, dst_addr, page); put_page(page); goto out; }

4 years, 6 months

1
0
0 0

FAILED: patch "[PATCH] mm/hugetlb: expand restore_reserve_on_error functionality" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 846be08578edb81f02bc8534577e6c367ef34f41 Mon Sep 17 00:00:00 2001 From: Mike Kravetz <mike.kravetz(a)oracle.com> Date: Tue, 15 Jun 2021 18:23:29 -0700 Subject: [PATCH] mm/hugetlb: expand restore_reserve_on_error functionality The routine restore_reserve_on_error is called to restore reservation information when an error occurs after page allocation. The routine alloc_huge_page modifies the mapping reserve map and potentially the reserve count during allocation. If code calling alloc_huge_page encounters an error after allocation and needs to free the page, the reservation information needs to be adjusted. Currently, restore_reserve_on_error only takes action on pages for which the reserve count was adjusted(HPageRestoreReserve flag). There is nothing wrong with these adjustments. However, alloc_huge_page ALWAYS modifies the reserve map during allocation even if the reserve count is not adjusted. This can cause issues as observed during development of this patch [1]. One specific series of operations causing an issue is: - Create a shared hugetlb mapping Reservations for all pages created by default - Fault in a page in the mapping Reservation exists so reservation count is decremented - Punch a hole in the file/mapping at index previously faulted Reservation and any associated pages will be removed - Allocate a page to fill the hole No reservation entry, so reserve count unmodified Reservation entry added to map by alloc_huge_page - Error after allocation and before instantiating the page Reservation entry remains in map - Allocate a page to fill the hole Reservation entry exists, so decrement reservation count This will cause a reservation count underflow as the reservation count was decremented twice for the same index. A user would observe a very large number for HugePages_Rsvd in /proc/meminfo. This would also likely cause subsequent allocations of hugetlb pages to fail as it would 'appear' that all pages are reserved. This sequence of operations is unlikely to happen, however they were easily reproduced and observed using hacked up code as described in [1]. Address the issue by having the routine restore_reserve_on_error take action on pages where HPageRestoreReserve is not set. In this case, we need to remove any reserve map entry created by alloc_huge_page. A new helper routine vma_del_reservation assists with this operation. There are three callers of alloc_huge_page which do not currently call restore_reserve_on error before freeing a page on error paths. Add those missing calls. [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.… Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com Fixes: 96b96a96ddee ("mm/hugetlb: fix huge page reservation leak in private mapping error paths" Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reviewed-by: Mina Almasry <almasrymina(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Muchun Song <songmuchun(a)bytedance.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 55efd3dd04f6..30dee68458c7 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -735,6 +735,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, __SetPageUptodate(page); error = huge_add_to_page_cache(page, mapping, index); if (unlikely(error)) { + restore_reserve_on_error(h, &pseudo_vma, addr, page); put_page(page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 790ae618548d..6504346a1947 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -610,6 +610,8 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page); /* arch callback */ int __init __alloc_bootmem_huge_page(struct hstate *h); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 85f42ec1a927..e0a5f9cbbece 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2121,12 +2121,18 @@ static void return_unused_surplus_pages(struct hstate *h, * be restored when a newly allocated huge page must be freed. It is * to be called after calling vma_needs_reservation to determine if a * reservation exists. + * + * vma_del_reservation is used in error paths where an entry in the reserve + * map was created during huge page allocation and must be removed. It is to + * be called after calling vma_needs_reservation to determine if a reservation + * exists. */ enum vma_resv_mode { VMA_NEEDS_RESV, VMA_COMMIT_RESV, VMA_END_RESV, VMA_ADD_RESV, + VMA_DEL_RESV, }; static long __vma_reservation_common(struct hstate *h, struct vm_area_struct *vma, unsigned long addr, @@ -2170,11 +2176,21 @@ static long __vma_reservation_common(struct hstate *h, ret = region_del(resv, idx, idx + 1); } break; + case VMA_DEL_RESV: + if (vma->vm_flags & VM_MAYSHARE) { + region_abort(resv, idx, idx + 1, 1); + ret = region_del(resv, idx, idx + 1); + } else { + ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + /* region_add calls of range 1 should never fail. */ + VM_BUG_ON(ret < 0); + } + break; default: BUG(); } - if (vma->vm_flags & VM_MAYSHARE) + if (vma->vm_flags & VM_MAYSHARE || mode == VMA_DEL_RESV) return ret; /* * We know private mapping must have HPAGE_RESV_OWNER set. @@ -2222,25 +2238,39 @@ static long vma_add_reservation(struct hstate *h, return __vma_reservation_common(h, vma, addr, VMA_ADD_RESV); } +static long vma_del_reservation(struct hstate *h, + struct vm_area_struct *vma, unsigned long addr) +{ + return __vma_reservation_common(h, vma, addr, VMA_DEL_RESV); +} + /* - * This routine is called to restore a reservation on error paths. In the - * specific error paths, a huge page was allocated (via alloc_huge_page) - * and is about to be freed. If a reservation for the page existed, - * alloc_huge_page would have consumed the reservation and set - * HPageRestoreReserve in the newly allocated page. When the page is freed - * via free_huge_page, the global reservation count will be incremented if - * HPageRestoreReserve is set. However, free_huge_page can not adjust the - * reserve map. Adjust the reserve map here to be consistent with global - * reserve count adjustments to be made by free_huge_page. + * This routine is called to restore reservation information on error paths. + * It should ONLY be called for pages allocated via alloc_huge_page(), and + * the hugetlb mutex should remain held when calling this routine. + * + * It handles two specific cases: + * 1) A reservation was in place and the page consumed the reservation. + * HPageRestoreReserve is set in the page. + * 2) No reservation was in place for the page, so HPageRestoreReserve is + * not set. However, alloc_huge_page always updates the reserve map. + * + * In case 1, free_huge_page later in the error path will increment the + * global reserve count. But, free_huge_page does not have enough context + * to adjust the reservation map. This case deals primarily with private + * mappings. Adjust the reserve map here to be consistent with global + * reserve count adjustments to be made by free_huge_page. Make sure the + * reserve map indicates there is a reservation present. + * + * In case 2, simply undo reserve map modifications done by alloc_huge_page. */ -static void restore_reserve_on_error(struct hstate *h, - struct vm_area_struct *vma, unsigned long address, - struct page *page) +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page) { - if (unlikely(HPageRestoreReserve(page))) { - long rc = vma_needs_reservation(h, vma, address); + long rc = vma_needs_reservation(h, vma, address); - if (unlikely(rc < 0)) { + if (HPageRestoreReserve(page)) { + if (unlikely(rc < 0)) /* * Rare out of memory condition in reserve map * manipulation. Clear HPageRestoreReserve so that @@ -2253,16 +2283,57 @@ static void restore_reserve_on_error(struct hstate *h, * accounting of reserve counts. */ ClearHPageRestoreReserve(page); - } else if (rc) { - rc = vma_add_reservation(h, vma, address); - if (unlikely(rc < 0)) + else if (rc) + (void)vma_add_reservation(h, vma, address); + else + vma_end_reservation(h, vma, address); + } else { + if (!rc) { + /* + * This indicates there is an entry in the reserve map + * added by alloc_huge_page. We know it was added + * before the alloc_huge_page call, otherwise + * HPageRestoreReserve would be set on the page. + * Remove the entry so that a subsequent allocation + * does not consume a reservation. + */ + rc = vma_del_reservation(h, vma, address); + if (rc < 0) /* - * See above comment about rare out of - * memory condition. + * VERY rare out of memory condition. Since + * we can not delete the entry, set + * HPageRestoreReserve so that the reserve + * count will be incremented when the page + * is freed. This reserve will be consumed + * on a subsequent allocation. */ - ClearHPageRestoreReserve(page); + SetHPageRestoreReserve(page); + } else if (rc < 0) { + /* + * Rare out of memory condition from + * vma_needs_reservation call. Memory allocation is + * only attempted if a new entry is needed. Therefore, + * this implies there is not an entry in the + * reserve map. + * + * For shared mappings, no entry in the map indicates + * no reservation. We are done. + */ + if (!(vma->vm_flags & VM_MAYSHARE)) + /* + * For private mappings, no entry indicates + * a reservation is present. Since we can + * not add an entry, set SetHPageRestoreReserve + * on the page so reserve count will be + * incremented when freed. This reserve will + * be consumed on a subsequent allocation. + */ + SetHPageRestoreReserve(page); } else - vma_end_reservation(h, vma, address); + /* + * No reservation present, do nothing + */ + vma_end_reservation(h, vma, address); } } @@ -4037,6 +4108,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { + restore_reserve_on_error(h, vma, addr, + new); put_page(new); /* dst_entry won't change as in child */ goto again; @@ -5006,6 +5079,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (vm_shared || is_continue) unlock_page(page); out_release_nounlock: + restore_reserve_on_error(h, dst_vma, dst_addr, page); put_page(page); goto out; }

4 years, 6 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror June 2021