On Thu, Sep 11, 2025 at 03:35:17PM +0100, Jun Eeo wrote:
Hi -
With this patch, we've been seeing a small number of machines in our fleet boot up but are not able to register a SCSI device:
[ 6.290992] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
It usually goes away upon another reboot. I don't have a reliable reproducer except for rebooting some servers repeatedly on 6.1.132.
I added a couple of printks around the various cases where scsi_alloc_sdev fails, as there are 3 allocation sites, and also pulled in f7d77dfc91 ("mm/percpu.c: print error message too if atomic alloc failed"), and isolated it to a failed percpcu allocation:
[ 5.431189] percpu: allocation failed, size=4 align=4 atomic=1, atomic alloc failed, no space left [ 5.440383] sbitmap_init_node: init_alloc_hint failed. [ 5.440383] scsi_realloc_sdev_budget_map: sbitmap_init_node failed with -12
Which kind of makes sense, as __alloc_percpu_gfp says:
If @gfp doesn't contain %GFP_KERNEL, the allocation doesn't block and can be called from any context but is a lot more likely to fail.
Reverting this patch in our environment made the initial SCSI scan reliably work, and we no longer see issues with the SCSI drive disappearing.
Is this also a problem for you in newer kernels, like 6.16.y?
thanks,
greg k-h