On Mon, 2025-07-28 at 16:43 +0200, Neil Armstrong wrote:
On 25/07/2025 16:16, André Draszik wrote:
Commit 3c7ac40d7322 ("scsi: ufs: core: Delegate the interrupt service routine to a threaded IRQ handler") introduced a massive performance drop for various work loads on UFSHC versions < 4 due to the extra latency introduced by moving all of the IRQ handling into a threaded handler. See below for a summary.
To resolve this performance drop, move IRQ handling back into hardirq context, but apply a time limit which, once expired, will cause the remainder of the work to be deferred to the threaded handler.
Above commit is trying to avoid unduly delay of other subsystem interrupts while the UFS events are being handled. By limiting the amount of time spent in hardirq context, we can still ensure that.
The time limit itself was chosen because I have generally seen interrupt handling to have been completed within 20 usecs, with the occasional spikes of a couple 100 usecs.
This commits brings UFS performance roughly back to original performance, and should still avoid other subsystem's starvation thanks to dealing with these spikes.
fio results for 4k block size on Pixel 6, all values being the average of 5 runs each: read / 1 job original after this commit min IOPS 4,653.60 2,704.40 3,902.80 max IOPS 6,151.80 4,847.60 6,103.40 avg IOPS 5,488.82 4,226.61 5,314.89 cpu % usr 1.85 1.72 1.97 cpu % sys 32.46 28.88 33.29 bw MB/s 21.46 16.50 20.76
read / 8 jobs original after this commit min IOPS 18,207.80 11,323.00 17,911.80 max IOPS 25,535.80 14,477.40 24,373.60 avg IOPS 22,529.93 13,325.59 21,868.85 cpu % usr 1.70 1.41 1.67 cpu % sys 27.89 21.85 27.23 bw MB/s 88.10 52.10 84.48
write / 1 job original after this commit min IOPS 6,524.20 3,136.00 5,988.40 max IOPS 7,303.60 5,144.40 7,232.40 avg IOPS 7,169.80 4,608.29 7,014.66 cpu % usr 2.29 2.34 2.23 cpu % sys 41.91 39.34 42.48 bw MB/s 28.02 18.00 27.42
write / 8 jobs original after this commit min IOPS 12,685.40 13,783.00 12,622.40 max IOPS 30,814.20 22,122.00 29,636.00 avg IOPS 21,539.04 18,552.63 21,134.65 cpu % usr 2.08 1.61 2.07 cpu % sys 30.86 23.88 30.64 bw MB/s 84.18 72.54 82.62
Thanks for this updated change, I'm running the exact same run on SM8650 to check the impact, and I'll report something comparable.
Btw, my complete command was (should probably have added that to the commit message in the first place):
for rw in read write ; do echo "rw: ${rw}" for jobs in 1 8 ; do echo "jobs: ${jobs}" for it in $(seq 1 5) ; do fio --name=rand${rw} --rw=rand${rw} \ --ioengine=libaio --direct=1 \ --bs=4k --numjobs=${jobs} --size=32m \ --runtime=30 --time_based --end_fsync=1 \ --group_reporting --filename=/foo \ | grep -E '(iops|sys=|READ:|WRITE:)' sleep 5 done done done
Cheers, Andre'