A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Based on the reproducer
provided in [1] we suspect this is caused by the lack of VMA locking
while forking a child process.
Patch 1/2 in the series implements proper VMA locking during fork.
I tested the fix locally using the reproducer and was unable to reproduce
the memory corruption problem.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Patch 2/2 disabled per-VMA locks until the fix is tested and verified.
Both patches apply cleanly over Linus' ToT and stable 6.4.y branch.
Changes from v2 posted at [3]:
- Move VMA locking before flush_cache_dup_mm, per David Hildenbrand
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com
[3] https://lore.kernel.org/all/20230705063711.2670599-1-surenb@google.com/
Suren Baghdasaryan (2):
fork: lock VMAs of the parent process when forking
mm: disable CONFIG_PER_VMA_LOCK until its fixed
kernel/fork.c | 6 ++++++
mm/Kconfig | 3 ++-
2 files changed, 8 insertions(+), 1 deletion(-)
--
2.41.0.255.g8b1d071c50-goog
On 7/5/23 21:37, Ranjan kumar wrote:
> Hi Damien,
> Regarding delay:
> As Sathya already mentioned as this is our hardware specific behavior and
> we are confident that the increased retry count
> is sufficient from our hardware perspective for any new systems too. So, we
> want to go with this change .
Fine. Adding a comment above the macro definitions to explain something like
that would be nice.
> Apart from that, I will change the name as suggested .
>
> Thanks & Regards,
> Ranjan
Please avoid top posting.
> On Thu, 29 Jun 2023 at 05:24, Damien Le Moal <dlemoal(a)kernel.org> wrote:
>
>> On 6/28/23 16:05, Ranjan Kumar wrote:
>>> Doorbell and Host diagnostic registers could return 0 even
>>> after 3 retries and that leads to occasional resets of the
>>> controllers, hence increased the retry count to thirty.
>>
>> The magic value "3" for retry count was already that, magic. Why would
>> things
>> work better with 30 ? What is the reasoning ? Isn't a udelay needed to
>> avoid
>> that many retries ?
>>
>>>
>>> Fixes: b899202901a8 ("mpt3sas: Add separate function for aero doorbell
>> reads ")
>>> Cc: stable(a)vger.kernel.org
>>> Signed-off-by: Ranjan Kumar <ranjan.kumar(a)broadcom.com>
>>
>> [..]
>>
>>> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h
>> b/drivers/scsi/mpt3sas/mpt3sas_base.h
>>> index 05364aa15ecd..3b8ec4fd2d21 100644
>>> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
>>> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
>>> @@ -160,6 +160,8 @@
>>>
>>> #define IOC_OPERATIONAL_WAIT_COUNT 10
>>>
>>> +#define READL_RETRY_COUNT_OF_THIRTY 30
>>> +#define READL_RETRY_COUNT_OF_THREE 3
>>
>> Less than ideal naming I think. If the values need to be changed again, a
>> lot of
>> code will need to change. What about soemthing like:
>>
>> #define READL_RETRY_COUNT 30
>> #define READL_RETRY_SHORT_COUNT 3
>>
>>> /*
>>> * NVMe defines
>>> */
>>> @@ -994,7 +996,7 @@ typedef void (*NVME_BUILD_PRP)(struct
>> MPT3SAS_ADAPTER *ioc, u16 smid,
>>> typedef void (*PUT_SMID_IO_FP_HIP) (struct MPT3SAS_ADAPTER *ioc, u16
>> smid,
>>> u16 funcdep);
>>> typedef void (*PUT_SMID_DEFAULT) (struct MPT3SAS_ADAPTER *ioc, u16
>> smid);
>>> -typedef u32 (*BASE_READ_REG) (const volatile void __iomem *addr);
>>> +typedef u32 (*BASE_READ_REG) (const volatile void __iomem *addr, u8
>> retry_count);
>>> /*
>>> * To get high iops reply queue's msix index when high iops mode is
>> enabled
>>> * else get the msix index of general reply queues.
>>
>> --
>> Damien Le Moal
>> Western Digital Research
>>
>>
>
--
Damien Le Moal
Western Digital Research
Hi,
The following commit landed in 6.4 that helps some occasional NULL
pointer dereferences when setting up MST devices, particularly on
suspend/resume cycles.
54d217406afe ("drm: use mgr->dev in drm_dbg_kms in
drm_dp_add_payload_part2")
Can you please take this to 6.1.y and 6.3.y as well? The NULL pointer
de-reference has been reported on both.
Thanks,