This is great that you did this work and found this out, but really, shouldn't you have done the less work and just moved to 5.15.y instead? You're going to have to do that anyway, what's preventing that from happening now, with the HUGE justification that you get a big workload increase and power savings (i.e. real money)?
Hey Greg,
We are actually shipping kernel 5.15 as part of Amazon Linux kernel releases so theoretically moving to 5.15 should be the way to go however usually the relevant teams take some time for workload specific testing and benchmark before they do a major upgrade like moving from 5.10 to 5.15. We usually ask whoever is reporting a regression/bug/kernel enhancement to run with the latest kernel as you said while sometimes we backport fixes if the production migration to the latest kernel is something that will take time for the reasons I mentioned above. We thought that this performance improvement will also be beneficial for Linux 5.10 users hence we preferred these patches to be merged to the stable 5.10 rather than us just consume them as downstream patches. We are currently working with the relevant team on a plan for the possible 5.15 migration as a long term solution.
Thank you.
Hazem
On 09/02/2023, 10:37, "Greg KH" <gregkh@linuxfoundation.org mailto:gregkh@linuxfoundation.org <mailto:gregkh@linuxfoundation.org mailto:gregkh@linuxfoundation.org> <mailto:gregkh@linuxfoundation.org mailto:gregkh@linuxfoundation.org <mailto:gregkh@linuxfoundation.org mailto:gregkh@linuxfoundation.org>> <mailto:gregkh@linuxfoundation.org mailto:gregkh@linuxfoundation.org <mailto:gregkh@linuxfoundation.org mailto:gregkh@linuxfoundation.org> <mailto:gregkh@linuxfoundation.org mailto:gregkh@linuxfoundation.org <mailto:gregkh@linuxfoundation.org mailto:gregkh@linuxfoundation.org>>>> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On Tue, Feb 07, 2023 at 07:01:28PM +0000, Shaoying Xu wrote:
This patch series is to remove reader optimistic spinning in kernel 5.10 to improve the MongoDB performance. Performance measurements (10 times running average of overall throughput ops/sec) are using MongoDB 5.0.11 and YCSB [1] microbenchmark with workloadA [2] on AWS EC2 m5.4xlarge/m6g.4xlarge (16-vCPU 64GiB-memory) instances with a 512GB EBS IO1 drive disk with 5000 IOPS and separating MongoDB and YCSB load generator on 2 instances and setting recordcount=25000000 and operationcount=10000000 to see the impacts of these changes:
Before - v5.10.165 kernel in OS Amazon Linux 2 After - v5.10.165 kernel with reader spinning disabled in OS Amazon Linux 2
| Arch | Instance Type | Before | After | |---------+---------------+---------+---------| | x86_64 | m5.4xlarge | 37365.4 | 42373.9 | |---------+---------------+---------+---------| | aarch64 | m6g.4xlarge | 33823.1 | 43113.7 | |---------+---------------+---------+---------|
It can be seen that the MongoDB throughput can be improved around 13% in x86_64 and 27% in aarch64 after disabling reader optimistic spinning and these patches can be applied to 5.10 with no conflict so we wonder if it's possible to backport them to stable 5.10?
This is, frankly, crazy. :)
This is great that you did this work and found this out, but really, shouldn't you have done the less work and just moved to 5.15.y instead? You're going to have to do that anyway, what's preventing that from happening now, with the HUGE justification that you get a big workload increase and power savings (i.e. real money)?
So now you just delay the inevitable and spend more work overall (i.e. the backport work now, and the 5.15.y move later?) This feels like a bad management decision somewhere, who do I need to talk to to resolve this?
thanks,
greg k-h