On Mon, Nov 06, 2017 at 04:57:21PM -0800, Dan Williams wrote:
> Until there is a solution to the dma-to-dax vs truncate problem it is
> not safe to allow RDMA to create long standing memory registrations
> against filesytem-dax vmas.
Looks good:
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
> +long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
> + unsigned int gup_flags, struct page **pages,
> + struct vm_area_struct **vmas)
> +{
> + struct vm_area_struct **__vmas = vmas;
How about calling the vma argument vma_arg, and the one used vma to
make thigns a little more readable?
> + struct vm_area_struct *vma_prev = NULL;
> + long rc, i;
> +
> + if (!pages)
> + return -EINVAL;
> +
> + if (!vmas && IS_ENABLED(CONFIG_FS_DAX)) {
> + __vmas = kzalloc(sizeof(struct vm_area_struct *) * nr_pages,
> + GFP_KERNEL);
> + if (!__vmas)
> + return -ENOMEM;
> + }
> +
> + rc = get_user_pages(start, nr_pages, gup_flags, pages, __vmas);
> +
> + /* skip scan for fs-dax vmas if they are compile time disabled */
> + if (!IS_ENABLED(CONFIG_FS_DAX))
> + goto out;
Instead of all this IS_ENABLED magic I'd recomment to just conditionally
compile this function and define it to get_user_pages in the header
if FS_DAX is disabled.
Else this looks fine to me.
Currently only get_user_pages_fast() can safely handle the writable gup
case due to its use of pud_access_permitted() to check whether the pud
entry is writable. In the gup slow path pud_write() is used instead of
pud_access_permitted() and to date it has been unimplemented, just calls
BUG_ON().
kernel BUG at ./include/linux/hugetlb.h:244!
[..]
RIP: 0010:follow_devmap_pud+0x482/0x490
[..]
Call Trace:
follow_page_mask+0x28c/0x6e0
__get_user_pages+0xe4/0x6c0
get_user_pages_unlocked+0x130/0x1b0
get_user_pages_fast+0x89/0xb0
iov_iter_get_pages_alloc+0x114/0x4a0
nfs_direct_read_schedule_iovec+0xd2/0x350
? nfs_start_io_direct+0x63/0x70
nfs_file_direct_read+0x1e0/0x250
nfs_file_read+0x90/0xc0
Use pud_access_permitted() to implement pud_write(), a later cleanup can
remove {pte,pmd,pud}_write and replace them with
{pte,pmd,pud}_access_permitted() drectly so that we only have one set of
helpers these kinds of checks. For now, implementing pud_write()
simplifies -stable backports.
Cc: <stable(a)vger.kernel.org>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
Sending this as RFC for opinion on whether this should just be a
pud_flags() & _PAGE_RW check, like pmd_write, or pud_access_permitted()
that also takes protection keys into account.
include/linux/hugetlb.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index fbf5b31d47ee..6a142b240ef7 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -242,8 +242,7 @@ static inline int pgd_write(pgd_t pgd)
#ifndef pud_write
static inline int pud_write(pud_t pud)
{
- BUG();
- return 0;
+ return pud_access_permitted(pud, WRITE);
}
#endif
Aleksa Sarai <asarai(a)suse.de> writes:
> On 11/05/2017 01:56 PM, Aleksa Sarai wrote:
>> Previously, the only capability effectively required to operate on the
>> /proc/scsi interface was CAP_DAC_OVERRIDE (or for some other files,
>> having an fsuid of GLOBAL_ROOT_UID was enough). This means that
>> semi-privileged processes could interfere with core components of a
>> system (such as causing a DoS by removing the underlying SCSI device of
>> the host's / mount).
>
> An alternative to this patch would be to make the open(2) call fail, if you try
> to open it write-only or read-write. Not sure which would be preferred (should
> it be possible to pass /proc/scsi/scsi to a semi-privileged process to write
> to?).
Making open fail is very much the preferred solution.
Testing for permission on write can be avoided by finding a suid root
application whose error output acts like a suid cat.
The best current practice for adding this kind of permission check is to
add the check in open. For some older use cases where we made this
mistake we had to maintian a check during write to avoid breaking
userspace. But as this check is new there is no reason to add a check
anywhere except in open.
Eric
On Thu, Nov 09, 2017 at 04:59:58PM +0000, Ben Hutchings wrote:
> On Thu, 2017-11-09 at 08:03 -0800, Guenter Roeck wrote:
> > On Thu, Nov 09, 2017 at 12:40:36PM +0000, Ben Hutchings wrote:
> > > On Thu, 2017-11-09 at 13:21 +0100, Arnd Bergmann wrote:
> > > > On Thu, Nov 9, 2017 at 1:08 PM, Greg KH <greg(a)kroah.com> wrote:
> > > > > On Thu, Nov 09, 2017 at 12:55:30PM +0100, Arnd Bergmann wrote:
> > >
> > > [...]
> > > > > > I think if you upload the branch to the stable-rc git, that should produce
> > > > > > the automated build and boot results via email or via the
> > > > > > https://kernelci.org/job/ interface. Once there are some results
> > > > > > there, I'll go through the list once more to see what warnings
> > > > > > and failures remain.
> > > > >
> > > > > I don't know of a way to have others push to that tree/branch at the
> > > > > moment :(
> > > > >
> > > > > I'll go update that branch now...
> > > >
> > > > Thanks!
> > > >
> > > > With the arm-soc tree, we simply have a shared group-id on
> > > > gitolite.kernel.org and everyone in that group can push to it.
> > > >
> > > > If that is the only thing you need, it should be trivial to let Ben
> > > > and Sasha push to /pub/scm/linux/kernel/git/stable/*.git as well,
> > > > I'm sure helpdesk(a)kernel.org can arrange that. Of course if you are
> > > > worried about having multiple accounts with write access to all the
> > > > branches, then that wouldn't be enough.
> > >
> > > I think I'd rather send a pull request to Greg at the start of the
> > > review period.
> > >
> >
> > If you change the trees I am supposed to pull from for my builders,
> > please let me know.
>
> If you're happy to keep supporting quilt-in-git then there's no change.
> I check your builders page and try to fix up build failures before even
> making a release candidate.
>
Ah yes, kernelci won't pick that up. No problem to keep kerneltests going
as long as it adds value.
Guenter
Hi Simon,
On 08.11.17 18:17, Simon Guinot wrote:
> Hi Sven and Andreas,
>
> Please, can you try with this patch ?
Today we, my son and I, repeated the failing scenario and we were able
to show that our scenario behaves stable after you patch being applied.
Thanks for taking care of this issue. If you need further testing let me
know.
Regards,
Andreas
> On Wed, Nov 08, 2017 at 05:58:35PM +0100, Simon Guinot wrote:
>> The mvneta controller provides a 8-bit register to update the pending
>> Tx descriptor counter. Then, a maximum of 255 Tx descriptors can be
>> added at once. In the current code the mvneta_txq_pend_desc_add function
>> assumes the caller takes care of this limit. But it is not the case. In
>> some situations (xmit_more flag), more than 255 descriptors are added.
>> When this happens, the Tx descriptor counter register is updated with a
>> wrong value, which breaks the whole Tx queue management.
>>
>> This patch fixes the issue by allowing the mvneta_txq_pend_desc_add
>> function to process more than 255 Tx descriptors.
>>
>> Fixes: 2a90f7e1d5d0 ("net: mvneta: add xmit_more support")
>> Cc: stable(a)vger.kernel.org # 4.11+
>> Signed-off-by: Simon Guinot <simon.guinot(a)sequanux.org>
>> ---
>> drivers/net/ethernet/marvell/mvneta.c | 16 +++++++++-------
>> 1 file changed, 9 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
>> index 64a04975bcf8..027c08ce4e5d 100644
>> --- a/drivers/net/ethernet/marvell/mvneta.c
>> +++ b/drivers/net/ethernet/marvell/mvneta.c
>> @@ -816,11 +816,14 @@ static void mvneta_txq_pend_desc_add(struct mvneta_port *pp,
>> {
>> u32 val;
>>
>> - /* Only 255 descriptors can be added at once ; Assume caller
>> - * process TX desriptors in quanta less than 256
>> - */
>> - val = pend_desc + txq->pending;
>> - mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
>> + pend_desc += txq->pending;
>> +
>> + /* Only 255 Tx descriptors can be added at once */
>> + while (pend_desc > 0) {
>> + val = min(pend_desc, 255);
>> + mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
>> + pend_desc -= val;
>> + }
>> txq->pending = 0;
>> }
>>
>> @@ -2413,8 +2416,7 @@ static int mvneta_tx(struct sk_buff *skb, struct net_device *dev)
>> if (txq->count >= txq->tx_stop_threshold)
>> netif_tx_stop_queue(nq);
>>
>> - if (!skb->xmit_more || netif_xmit_stopped(nq) ||
>> - txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK)
>> + if (!skb->xmit_more || netif_xmit_stopped(nq))
>> mvneta_txq_pend_desc_add(pp, txq, frags);
>> else
>> txq->pending += frags;
>> --
>> 2.9.3
On Mon, 2017-10-16 at 18:11 +0200, gregkh(a)linuxfoundation.org wrote:
> 4.4-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
>
> commit 28585a832602747cbfa88ad8934013177a3aae38 upstream.
>
> A number of architecture invoke rcu_irq_enter() on exception entry in
> order to allow RCU read-side critical sections in the exception handler
> when the exception is from an idle or nohz_full CPU. This works, at
> least unless the exception happens in an NMI handler. In that case,
> rcu_nmi_enter() would already have exited the extended quiescent state,
> which would mean that rcu_irq_enter() would (incorrectly) cause RCU
> to think that it is again in an extended quiescent state. This will
> in turn result in lockdep splats in response to later RCU read-side
> critical sections.
>
> This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to
> take no action if there is an rcu_nmi_enter() in effect, thus avoiding
> the unscheduled return to RCU quiescent state. This in turn should
> make the kernel safe for on-demand RCU voyeurism.
>
> Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com
>
> Cc: stable(a)vger.kernel.org
> Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
> > Reported-by: Steven Rostedt <rostedt(a)goodmis.org>
> > Signed-off-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
> > Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
> > Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> kernel/rcu/tree.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -759,6 +759,12 @@ void rcu_irq_exit(void)
>
> local_irq_save(flags);
> rdtp = this_cpu_ptr(&rcu_dynticks);
> +
> + /* Page faults can happen in NMI handlers, so check... */
> + if (READ_ONCE(rdtp->dynticks_nmi_nesting))
> + return;
Shouldn't there be a local_irq_restore() on this return path? Or does
this condition imply that IRQs were already disabled?
> + RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_exit() invoked with irqs enabled!!!");
I don't see why you added RCU_LOCKDEP_WARN() here. Prior to 4.5 it's
not an error to call this function with IRQs disabled. And after
calling local_irq_save(), it's redundant to assert that IRQs are
disabled.
> oldval = rdtp->dynticks_nesting;
> rdtp->dynticks_nesting--;
> WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
> @@ -887,6 +893,12 @@ void rcu_irq_enter(void)
>
> local_irq_save(flags);
> rdtp = this_cpu_ptr(&rcu_dynticks);
> +
> + /* Page faults can happen in NMI handlers, so check... */
> + if (READ_ONCE(rdtp->dynticks_nmi_nesting))
> + return;
> +
> + RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_enter() invoked with irqs enabled!!!");
Same problems here.
Ben.
> oldval = rdtp->dynticks_nesting;
> rdtp->dynticks_nesting++;
> WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
--
Ben Hutchings
Software Developer, Codethink Ltd.