Hi,
This series fixes up the cases where callers of ksize() use it to opportunistically grow their buffer sizes, which can run afoul of the __alloc_size hinting that CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE use to perform dynamic buffer bounds checking. Quoting the first patch:
In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for.
There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel:
1) An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example:
data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data);
2) The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators:
data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize);
In both cases, the "how large is the allocation?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information.
We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?".
Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize().
[1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183 -------
And after adding kmalloc_size_roundup(), put it to use with the various ksize() callers, restore the previously removed __alloc_size hint, and fix the use of __malloc annotations.
I tried to trim the CC list on this series since it got rather long. I kept all the suggested mailing lists, though. :)
Thanks!
-Kees
Kees Cook (12): slab: Introduce kmalloc_size_roundup() skbuff: Proactively round up to kmalloc bucket size net: ipa: Proactively round up to kmalloc bucket size btrfs: send: Proactively round up to kmalloc bucket size dma-buf: Proactively round up to kmalloc bucket size coredump: Proactively round up to kmalloc bucket size igb: Proactively round up to kmalloc bucket size openvswitch: Proactively round up to kmalloc bucket size x86/microcode/AMD: Track patch allocation size explicitly iwlwifi: Track scan_cmd allocation size explicitly slab: Remove __malloc attribute from realloc functions slab: Restore __alloc_size attribute to __kmalloc_track_caller
arch/x86/include/asm/microcode.h | 1 + arch/x86/kernel/cpu/microcode/amd.c | 3 +- drivers/dma-buf/dma-resv.c | 9 +++- drivers/net/ethernet/intel/igb/igb_main.c | 1 + drivers/net/ipa/gsi_trans.c | 7 ++- drivers/net/wireless/intel/iwlwifi/dvm/dev.h | 1 + drivers/net/wireless/intel/iwlwifi/dvm/scan.c | 10 +++- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 3 +- drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 3 +- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 6 +-- fs/btrfs/send.c | 11 +++-- fs/coredump.c | 7 ++- include/linux/compiler_types.h | 13 ++---- include/linux/slab.h | 46 ++++++++++++++++--- mm/slab_common.c | 17 +++++++ net/core/skbuff.c | 34 +++++++------- net/openvswitch/flow_netlink.c | 4 +- 17 files changed, 125 insertions(+), 51 deletions(-)
In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for.
There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel:
1) An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example:
data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data);
2) The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators:
data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize);
In both cases, the "how large is the allocation?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information.
We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?".
Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize().
[1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183
Cc: Vlastimil Babka vbabka@suse.cz Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org --- include/linux/slab.h | 31 +++++++++++++++++++++++++++++++ mm/slab_common.c | 17 +++++++++++++++++ 2 files changed, 48 insertions(+)
diff --git a/include/linux/slab.h b/include/linux/slab.h index 0fefdf528e0d..4fc41e4ed4a2 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __a void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp); + +/** + * ksize - Report actual allocation size of associated object + * + * @objp: Pointer returned from a prior kmalloc()-family allocation. + * + * This should not be used for writing beyond the originally requested + * allocation size. Either use krealloc() or round up the allocation size + * with kmalloc_size_roundup() prior to allocation. If this is used to + * access beyond the originally requested allocation size, UBSAN_BOUNDS + * and/or FORTIFY_SOURCE may trip, since they only know about the + * originally allocated size via the __alloc_size attribute. + */ size_t ksize(const void *objp); + #ifdef CONFIG_PRINTK bool kmem_valid_obj(void *object); void kmem_dump_obj(void *object); @@ -779,6 +793,23 @@ extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len);
unsigned int kmem_cache_size(struct kmem_cache *s); + +/** + * kmalloc_size_roundup - Report allocation bucket size for the given size + * + * @size: Number of bytes to round up from. + * + * This returns the number of bytes that would be available in a kmalloc() + * allocation of @size bytes. For example, a 126 byte request would be + * rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly + * for the general-purpose kmalloc()-based allocations, and is not for the + * pre-sized kmem_cache_alloc()-based allocations.) + * + * Use this to kmalloc() the full bucket size ahead of time instead of using + * ksize() to query the size after an allocation. + */ +unsigned int kmalloc_size_roundup(size_t size); + void __init kmem_cache_init_late(void);
#if defined(CONFIG_SMP) && defined(CONFIG_SLAB) diff --git a/mm/slab_common.c b/mm/slab_common.c index 17996649cfe3..132d91a0f8c7 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -721,6 +721,23 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; }
+unsigned int kmalloc_size_roundup(size_t size) +{ + struct kmem_cache *c; + + /* Short-circuit the 0 size case. */ + if (size == 0) + return 0; + /* Above the smaller buckets, size is a multiple of page size. */ + if (size > KMALLOC_MAX_CACHE_SIZE) + return PAGE_SIZE << get_order(size); + + /* The flags don't matter since size_index is common to all. */ + c = kmalloc_slab(size, GFP_KERNEL); + return c ? c->object_size : 0; +} +EXPORT_SYMBOL(kmalloc_size_roundup); + #ifdef CONFIG_ZONE_DMA #define KMALLOC_DMA_NAME(sz) .name[KMALLOC_DMA] = "dma-kmalloc-" #sz, #else
On Wed, Sep 21, 2022 at 08:10:02PM -0700, Kees Cook wrote:
Cc-ing Feng Tang who may welcome this series ;)
When users call ksize(), slab expects that it may access beyond the originally requested allocation size.
(i.e. KASAN unpoisons the whole object.) Maybe don't let KASAN unpoison to catch such users?
This looks okay.
Thanks!
Thanks Hyeonggon for looping in me.
On Thu, Sep 22, 2022 at 07:12:21PM +0800, Hyeonggon Yoo wrote:
Indeed! This will help our work of extending slub redzone check, as we also ran into some trouble with ksize() users when extending the redzone support to this extra allocated space than requested size [1], and have to disable the redzone sanity for all ksize() users [2].
[1]. https://lore.kernel.org/lkml/20220719134503.GA56558@shbuild999.sh.intel.com/ [2]. https://lore.kernel.org/lkml/20220913065423.520159-5-feng.tang@intel.com/
Thanks, Feng
On Fri, Sep 23, 2022 at 09:17:25AM +0800, Feng Tang wrote:
Thanks for the feedback! I'll send my v2 series -- I'm hoping at least this patch can land in v6.1 so the various other patches would be clear to land via their separate trees, etc.
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
This will allow for kernels built with CONFIG_UBSAN_BOUNDS or the coming dynamic bounds checking under CONFIG_FORTIFY_SOURCE to gain back the __alloc_size() hints that were temporarily reverted in commit 93dd04ab0b2b ("slab: remove __alloc_size attribute from __kmalloc_track_caller")
Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: David Rientjes rientjes@google.com Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Kees Cook keescook@chromium.org --- net/core/skbuff.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 974bbbbe7138..4fe4c7544c1d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -427,14 +427,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, */ size = SKB_DATA_ALIGN(size); size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); - if (unlikely(!data)) - goto nodata; - /* kmalloc(size) might give us more room than requested. + /* kmalloc(size) might give us more room than requested, so + * allocate the true bucket size up front. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ - osize = ksize(data); + osize = kmalloc_size_roundup(size); + data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc); + if (unlikely(!data)) + goto nodata; size = SKB_WITH_OVERHEAD(osize); prefetchw(data + size);
@@ -1709,6 +1710,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, { int i, osize = skb_end_offset(skb); int size = osize + nhead + ntail; + int alloc_size; long off; u8 *data;
@@ -1722,11 +1724,11 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), - gfp_mask, NUMA_NO_NODE, NULL); + alloc_size = kmalloc_size_roundup(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); + data = kmalloc_reserve(alloc_size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) goto nodata; - size = SKB_WITH_OVERHEAD(ksize(data)); + size = SKB_WITH_OVERHEAD(alloc_size);
/* Copy only real data... and, alas, header. This should be * optimized for the cases when header is void. @@ -6063,19 +6065,19 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off, int i; int size = skb_end_offset(skb); int new_hlen = headlen - off; + int alloc_size; u8 *data;
size = SKB_DATA_ALIGN(size);
if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - data = kmalloc_reserve(size + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), - gfp_mask, NUMA_NO_NODE, NULL); + alloc_size = kmalloc_size_roundup(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); + data = kmalloc_reserve(alloc_size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) return -ENOMEM;
- size = SKB_WITH_OVERHEAD(ksize(data)); + size = SKB_WITH_OVERHEAD(alloc_size);
/* Copy real data, and all frags */ skb_copy_from_linear_data_offset(skb, off, data, new_hlen); @@ -6184,18 +6186,18 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off, u8 *data; const int nfrags = skb_shinfo(skb)->nr_frags; struct skb_shared_info *shinfo; + int alloc_size;
size = SKB_DATA_ALIGN(size);
if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - data = kmalloc_reserve(size + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), - gfp_mask, NUMA_NO_NODE, NULL); + alloc_size = kmalloc_size_roundup(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); + data = kmalloc_reserve(alloc_size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) return -ENOMEM;
- size = SKB_WITH_OVERHEAD(ksize(data)); + size = SKB_WITH_OVERHEAD(alloc_size);
memcpy((struct skb_shared_info *)(data + size), skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
On Wed, 21 Sep 2022 20:10:03 -0700 Kees Cook wrote:
I'd rename osize here to alloc_size for consistency but one could argue either way :)
Acked-by: Jakub Kicinski kuba@kernel.org
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
Cc: Alex Elder elder@kernel.org Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- drivers/net/ipa/gsi_trans.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ipa/gsi_trans.c b/drivers/net/ipa/gsi_trans.c index 18e7e8c405be..cec968854dcf 100644 --- a/drivers/net/ipa/gsi_trans.c +++ b/drivers/net/ipa/gsi_trans.c @@ -89,6 +89,7 @@ int gsi_trans_pool_init(struct gsi_trans_pool *pool, size_t size, u32 count, u32 max_alloc) { void *virt; + size_t allocate;
if (!size) return -EINVAL; @@ -104,13 +105,15 @@ int gsi_trans_pool_init(struct gsi_trans_pool *pool, size_t size, u32 count, * If there aren't enough entries starting at the free index, * we just allocate free entries from the beginning of the pool. */ - virt = kcalloc(count + max_alloc - 1, size, GFP_KERNEL); + allocate = size_mul(count + max_alloc - 1, size); + allocate = kmalloc_size_roundup(allocate); + virt = kzalloc(allocate, GFP_KERNEL); if (!virt) return -ENOMEM;
pool->base = virt; /* If the allocator gave us any extra memory, use it */ - pool->count = ksize(pool->base) / size; + pool->count = allocate / size; pool->free = 0; pool->max_alloc = max_alloc; pool->size = size;
On 9/21/22 10:10 PM, Kees Cook wrote:
I don't care about this but the reverse Christmas tree convention would put the "allocate" variable definition above "virt".
Whether you fix that or not, this patch looks good to me.
Reviewed-by: Alex Elder elder@linaro.org
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
Cc: linux-btrfs@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- fs/btrfs/send.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index e7671afcee4f..d40d65598e8f 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -435,6 +435,11 @@ static int fs_path_ensure_buf(struct fs_path *p, int len) path_len = p->end - p->start; old_buf_len = p->buf_len;
+ /* + * Allocate to the next largest kmalloc bucket size, to let + * the fast path happen most of the time. + */ + len = kmalloc_size_roundup(len); /* * First time the inline_buf does not suffice */ @@ -448,11 +453,7 @@ static int fs_path_ensure_buf(struct fs_path *p, int len) if (!tmp_buf) return -ENOMEM; p->buf = tmp_buf; - /* - * The real size of the buffer is bigger, this will let the fast path - * happen most of the time - */ - p->buf_len = ksize(p->buf); + p->buf_len = len;
if (p->reversed) { tmp_buf = p->buf + old_buf_len - path_len - 1;
On Wed, Sep 21, 2022 at 08:10:05PM -0700, Kees Cook wrote:
Acked-by: David Sterba dsterba@suse.com
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Kees Cook keescook@chromium.org --- drivers/dma-buf/dma-resv.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 205acb2c744d..a20f6db99b8f 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -98,12 +98,17 @@ static void dma_resv_list_set(struct dma_resv_list *list, static struct dma_resv_list *dma_resv_list_alloc(unsigned int max_fences) { struct dma_resv_list *list; + size_t size = struct_size(list, table, max_fences);
- list = kmalloc(struct_size(list, table, max_fences), GFP_KERNEL); + /* Round up to the next kmalloc bucket size. */ + size = kmalloc_size_roundup(size); + + list = kmalloc(size, GFP_KERNEL); if (!list) return NULL;
- list->max_fences = (ksize(list) - offsetof(typeof(*list), table)) / + /* Given the resulting bucket size, recalculated max_fences. */ + list->max_fences = (size - offsetof(typeof(*list), table)) / sizeof(*list->table);
return list;
Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- fs/coredump.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c index 9f4aae202109..0894b2c35d98 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -68,7 +68,10 @@ struct core_name {
static int expand_corename(struct core_name *cn, int size) { - char *corename = krealloc(cn->corename, size, GFP_KERNEL); + char *corename; + + size = kmalloc_size_roundup(size); + corename = krealloc(cn->corename, size, GFP_KERNEL);
if (!corename) return -ENOMEM; @@ -76,7 +79,7 @@ static int expand_corename(struct core_name *cn, int size) if (size > core_name_size) /* racy but harmless */ core_name_size = size;
- cn->size = ksize(corename); + cn->size = size; cn->corename = corename; return 0; }
Instead of having a mismatch between the requested allocation size and the actual kmalloc bucket size, which is examined later via ksize(), round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
Cc: Jesse Brandeburg jesse.brandeburg@intel.com Cc: Tony Nguyen anthony.l.nguyen@intel.com Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- drivers/net/ethernet/intel/igb/igb_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 2796e81d2726..4d70ee5b0f79 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -1196,6 +1196,7 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
ring_count = txr_count + rxr_count; size = struct_size(q_vector, ring, ring_count); + size = kmalloc_size_roundup(size);
/* allocate q_vector and rings */ q_vector = adapter->q_vector[v_idx];
Instead of having a mismatch between the requested allocation size and the actual kmalloc bucket size, which is examined later via ksize(), round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint.
Cc: Pravin B Shelar pshelar@ovn.org Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Cc: dev@openvswitch.org Signed-off-by: Kees Cook keescook@chromium.org --- net/openvswitch/flow_netlink.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 4c09cf8a0ab2..11b2e2c94c7e 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -2306,10 +2306,12 @@ int ovs_nla_put_mask(const struct sw_flow *flow, struct sk_buff *skb) static struct sw_flow_actions *nla_alloc_flow_actions(int size) { struct sw_flow_actions *sfa; + int alloc_size;
WARN_ON_ONCE(size > MAX_ACTIONS_BUFSIZE);
- sfa = kmalloc(sizeof(*sfa) + size, GFP_KERNEL); + alloc_size = kmalloc_size_roundup(sizeof(*sfa) + size); + sfa = kmalloc(alloc_size, GFP_KERNEL); if (!sfa) return ERR_PTR(-ENOMEM);
In preparation for reducing the use of ksize(), record the actual allocation size for later memcpy(). This avoids copying extra (uninitialized!) bytes into the patch buffer when the requested allocation size isn't exactly the size of a kmalloc bucket. Additionally fixes potential future issues where runtime bounds checking will notice that the buffer was allocated to a smaller value than returned by ksize().
Suggested-by: Daniel Micay danielmicay@gmail.com Link: https://lore.kernel.org/lkml/CA+DvKQ+bp7Y7gmaVhacjv9uF6Ar-o4tet872h4Q8RPYPJj... Fixes: 757885e94a22 ("x86, microcode, amd: Early microcode patch loading support for AMD") Cc: Borislav Petkov bp@alien8.de Cc: x86@kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- arch/x86/include/asm/microcode.h | 1 + arch/x86/kernel/cpu/microcode/amd.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h index 0c3d3440fe27..aa675783412f 100644 --- a/arch/x86/include/asm/microcode.h +++ b/arch/x86/include/asm/microcode.h @@ -9,6 +9,7 @@ struct ucode_patch { struct list_head plist; void *data; /* Intel uses only this one */ + unsigned int size; u32 patch_id; u16 equiv_cpu; }; diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c index 8b2fcdfa6d31..615bc6efa1dd 100644 --- a/arch/x86/kernel/cpu/microcode/amd.c +++ b/arch/x86/kernel/cpu/microcode/amd.c @@ -788,6 +788,7 @@ static int verify_and_add_patch(u8 family, u8 *fw, unsigned int leftover, kfree(patch); return -EINVAL; } + patch->size = *patch_size;
mc_hdr = (struct microcode_header_amd *)(fw + SECTION_HDR_SIZE); proc_id = mc_hdr->processor_rev_id; @@ -869,7 +870,7 @@ load_microcode_amd(bool save, u8 family, const u8 *data, size_t size) return ret;
memset(amd_ucode_patch, 0, PATCH_MAX_SIZE); - memcpy(amd_ucode_patch, p->data, min_t(u32, ksize(p->data), PATCH_MAX_SIZE)); + memcpy(amd_ucode_patch, p->data, min_t(u32, p->size, PATCH_MAX_SIZE));
return ret; }
In preparation for reducing the use of ksize(), explicitly track the size of scan_cmd allocations. This also allows for noticing if the scan size changes unexpectedly. Note that using ksize() was already incorrect here, in the sense that ksize() would not match the actual allocation size, which would trigger future run-time allocation bounds checking. (In other words, memset() may know how large scan_cmd was allocated for, but ksize() will return the upper bounds of the actually allocated memory, causing a run-time warning about an overflow.)
Cc: Gregory Greenman gregory.greenman@intel.com Cc: Kalle Valo kvalo@kernel.org Cc: Johannes Berg johannes.berg@intel.com Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- drivers/net/wireless/intel/iwlwifi/dvm/dev.h | 1 + drivers/net/wireless/intel/iwlwifi/dvm/scan.c | 10 ++++++++-- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 3 ++- drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 3 ++- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 6 +++--- 5 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/dev.h b/drivers/net/wireless/intel/iwlwifi/dvm/dev.h index bbd574091201..1a9eadace188 100644 --- a/drivers/net/wireless/intel/iwlwifi/dvm/dev.h +++ b/drivers/net/wireless/intel/iwlwifi/dvm/dev.h @@ -696,6 +696,7 @@ struct iwl_priv { /* Scan related variables */ unsigned long scan_start; unsigned long scan_start_tsf; + size_t scan_cmd_size; void *scan_cmd; enum nl80211_band scan_band; struct cfg80211_scan_request *scan_request; diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/scan.c b/drivers/net/wireless/intel/iwlwifi/dvm/scan.c index 2d38227dfdd2..a7e85c5c8c72 100644 --- a/drivers/net/wireless/intel/iwlwifi/dvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/dvm/scan.c @@ -626,7 +626,7 @@ static int iwlagn_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif) u8 active_chains; u8 scan_tx_antennas = priv->nvm_data->valid_tx_ant; int ret; - int scan_cmd_size = sizeof(struct iwl_scan_cmd) + + size_t scan_cmd_size = sizeof(struct iwl_scan_cmd) + MAX_SCAN_CHANNEL * sizeof(struct iwl_scan_channel) + priv->fw->ucode_capa.max_probe_length; const u8 *ssid = NULL; @@ -649,9 +649,15 @@ static int iwlagn_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif) "fail to allocate memory for scan\n"); return -ENOMEM; } + priv->scan_cmd_size = scan_cmd_size; + } + if (priv->scan_cmd_size < scan_cmd_size) { + IWL_DEBUG_SCAN(priv, + "memory needed for scan grew unexpectedly\n"); + return -ENOMEM; } scan = priv->scan_cmd; - memset(scan, 0, scan_cmd_size); + memset(scan, 0, priv->scan_cmd_size);
scan->quiet_plcp_th = IWL_PLCP_QUIET_THRESH; scan->quiet_time = IWL_ACTIVE_QUIET_TIME; diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h index bf35e130c876..214b8a525cc6 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h @@ -860,6 +860,7 @@ struct iwl_mvm {
/* Scan status, cmd (pre-allocated) and auxiliary station */ unsigned int scan_status; + size_t scan_cmd_size; void *scan_cmd; struct iwl_mcast_filter_cmd *mcast_filter_cmd; /* For CDB this is low band scan type, for non-CDB - type. */ @@ -1705,7 +1706,7 @@ int iwl_mvm_update_quotas(struct iwl_mvm *mvm, bool force_upload, int iwl_mvm_reg_scan_start(struct iwl_mvm *mvm, struct ieee80211_vif *vif, struct cfg80211_scan_request *req, struct ieee80211_scan_ies *ies); -int iwl_mvm_scan_size(struct iwl_mvm *mvm); +size_t iwl_mvm_scan_size(struct iwl_mvm *mvm); int iwl_mvm_scan_stop(struct iwl_mvm *mvm, int type, bool notify); int iwl_mvm_max_scan_ie_len(struct iwl_mvm *mvm); void iwl_mvm_report_scan_aborted(struct iwl_mvm *mvm); diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c index db43c8a83a31..b9cbb18b0dcb 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c @@ -1065,7 +1065,7 @@ iwl_op_mode_mvm_start(struct iwl_trans *trans, const struct iwl_cfg *cfg, static const u8 no_reclaim_cmds[] = { TX_CMD, }; - int scan_size; + size_t scan_size; u32 min_backoff; struct iwl_mvm_csme_conn_info *csme_conn_info __maybe_unused;
@@ -1299,6 +1299,7 @@ iwl_op_mode_mvm_start(struct iwl_trans *trans, const struct iwl_cfg *cfg, mvm->scan_cmd = kmalloc(scan_size, GFP_KERNEL); if (!mvm->scan_cmd) goto out_free; + mvm->scan_cmd_size = scan_size;
/* invalidate ids to prevent accidental removal of sta_id 0 */ mvm->aux_sta.sta_id = IWL_MVM_INVALID_STA; diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 582a95ffc7ab..acd8803dbcdd 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -2626,7 +2626,7 @@ static int iwl_mvm_build_scan_cmd(struct iwl_mvm *mvm, u8 scan_ver;
lockdep_assert_held(&mvm->mutex); - memset(mvm->scan_cmd, 0, ksize(mvm->scan_cmd)); + memset(mvm->scan_cmd, 0, mvm->scan_cmd_size);
if (!fw_has_capa(&mvm->fw->ucode_capa, IWL_UCODE_TLV_CAPA_UMAC_SCAN)) { hcmd->id = SCAN_OFFLOAD_REQUEST_CMD; @@ -3091,7 +3091,7 @@ static int iwl_mvm_scan_stop_wait(struct iwl_mvm *mvm, int type) 1 * HZ); }
-static int iwl_scan_req_umac_get_size(u8 scan_ver) +static size_t iwl_scan_req_umac_get_size(u8 scan_ver) { switch (scan_ver) { case 12: @@ -3104,7 +3104,7 @@ static int iwl_scan_req_umac_get_size(u8 scan_ver) return 0; }
-int iwl_mvm_scan_size(struct iwl_mvm *mvm) +size_t iwl_mvm_scan_size(struct iwl_mvm *mvm) { int base_size, tail_size; u8 scan_ver = iwl_fw_lookup_cmd_ver(mvm->fw, SCAN_REQ_UMAC,
Kees Cook keescook@chromium.org writes:
Via which tree is this iwlwifi patch going? Normally via wireless-next or something else?
On Thu, Sep 22, 2022 at 07:18:51AM +0300, Kalle Valo wrote:
This doesn't depend on the kmalloc_size_roundup() helper at all, so I would be happy for it to go via wireless-next if the patch seems reasonable.
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed.
Additionally removes the conditional test for __alloc_size__, which is always defined now.
Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Hao Luo haoluo@google.com Cc: Marco Elver elver@google.com Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org --- include/linux/compiler_types.h | 13 +++++-------- include/linux/slab.h | 12 ++++++------ 2 files changed, 11 insertions(+), 14 deletions(-)
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 4f2a819fd60a..f141a6f6b9f6 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -271,15 +271,12 @@ struct ftrace_likely_data {
/* * Any place that could be marked with the "alloc_size" attribute is also - * a place to be marked with the "malloc" attribute. Do this as part of the - * __alloc_size macro to avoid redundant attributes and to avoid missing a - * __malloc marking. + * a place to be marked with the "malloc" attribute, except those that may + * be performing a _reallocation_, as that may alias the existing pointer. + * For these, use __realloc_size(). */ -#ifdef __alloc_size__ -# define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc -#else -# define __alloc_size(x, ...) __malloc -#endif +#define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc +#define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__)
#ifndef asm_volatile_goto #define asm_volatile_goto(x...) asm goto(x) diff --git a/include/linux/slab.h b/include/linux/slab.h index 4fc41e4ed4a2..ac3832b50dbb 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -184,7 +184,7 @@ int kmem_cache_shrink(struct kmem_cache *s); /* * Common kmalloc functions provided by all allocators */ -void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2); +void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2); void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp); @@ -661,10 +661,10 @@ static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_ * @new_size: new size of a single member of the array * @flags: the type of memory to allocate (see kmalloc) */ -static inline __alloc_size(2, 3) void * __must_check krealloc_array(void *p, - size_t new_n, - size_t new_size, - gfp_t flags) +static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p, + size_t new_n, + size_t new_size, + gfp_t flags) { size_t bytes;
@@ -788,7 +788,7 @@ static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t fla }
extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags) - __alloc_size(3); + __realloc_size(3); extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len);
On Thu, Sep 22, 2022 at 5:10 AM Kees Cook keescook@chromium.org wrote:
These look unconditional now, so we could move it to `compiler_attributes.h` in a later patch (or an independent series).
Cheers, Miguel
On Thu, Sep 22, 2022 at 11:23:46AM +0200, Miguel Ojeda wrote:
I wasn't sure if this "composite macro" was sane there, especially since it would be using __malloc before it was defined, etc. Would you prefer I move it?
On Thu, Sep 22, 2022 at 5:56 PM Kees Cook keescook@chromium.org wrote:
Hmm... On one hand, they end up being attributes, so it could make sense to have them there (after all, the big advantage of that header is that there is no `#ifdef` nest like in others, and that it is only for attributes).
On the other hand, you are right that the file so far is intended to be as simple as possible (`__always_inline` having an extra `inline` and `fallthrough` would be closest outliers), so if we do it, I would prefer to do so in an independent series that carries its own rationale.
So I would leave the patch as it is here.
Cheers, Miguel
With skbuff's post-allocation use of ksize() rearranged to use kmalloc_size_round() prior to allocation, the compiler can correctly reason about the size of these allocations. The prior mismatch had caused buffer overflow mitigations to erroneously fire under CONFIG_UBSAN_BOUNDS, requiring a partial revert of the __alloc_size attributes. Restore the attribute that had been removed in commit 93dd04ab0b2b ("slab: remove __alloc_size attribute from __kmalloc_track_caller").
Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook keescook@chromium.org --- include/linux/slab.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h index ac3832b50dbb..dd50ed7207c9 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -693,7 +693,8 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag * allocator where we care about the real place the memory allocation * request comes from. */ -extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller); +extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller) + __alloc_size(1); #define kmalloc_track_caller(size, flags) \ __kmalloc_track_caller(size, flags, _RET_IP_)
Am 22.09.22 um 05:10 schrieb Kees Cook:
Good cleanup, but one question: What other use cases we have for ksize() except the opportunistically growth of buffers?
Of hand I can't see any.
So when this patch set is about to clean up this use case it should probably also take care to remove ksize() or at least limit it so that it won't be used for this use case in the future.
Regards, Christian.
On Thu, Sep 22, 2022 at 09:10:56AM +0200, Christian König wrote:
The remaining cases all seem to be using it as a "do we need to resize yet?" check, where they don't actually track the allocation size themselves and want to just depend on the slab cache to answer it. This is most clearly seen in the igp code: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
My "solution" there kind of side-steps it, and leaves ksize() as-is: https://lore.kernel.org/linux-hardening/20220922031013.2150682-8-keescook@ch...
The more correct solution would be to add per-v_idx size tracking, similar to the other changes I sent: https://lore.kernel.org/linux-hardening/20220922031013.2150682-11-keescook@c...
I wonder if perhaps I should just migrate some of this code to using something like struct membuf.
Yeah, my goal would be to eliminate ksize(), and it seems possible if other cases are satisfied with tracking their allocation sizes directly.
-Kees
On 9/22/22 17:55, Kees Cook wrote:
I think we could leave ksize() to determine the size without a need for external tracking, but from now on forbid callers from using that hint to overflow the allocation size they actually requested? Once we remove the kasan/kfence hooks in ksize() that make the current kinds of usage possible, we should be able to catch any offenders of the new semantics that would appear?
-Kees
On Thu, Sep 22, 2022 at 11:05:47PM +0200, Vlastimil Babka wrote:
That's correct. I spent the morning working my way through the rest of the ksize() users I didn't clean up yesterday, and in several places I just swapped in __ksize(). But that wouldn't even be needed if we just removed the kasan unpoisoning from ksize(), etc.
I am tempted to leave it __ksize(), though, just to reinforce that it's not supposed to be used "normally". What do you think?
On 9/22/22 23:49, Kees Cook wrote:
Sounds good. Note in linux-next there's now a series in slab.git planned for 6.1 that moves __ksize() declaration to mm/slab.h to make it more private. But we don't want random users outside mm and related kasan/kfence subsystems to include mm/slab.h, so we'll have to expose it again instead of ksize().
linaro-mm-sig@lists.linaro.org