The checkpatch.pl in v5.10.y still triggers lots of false positives for REPEATED_WORD warnings, particularly for commit logs. Can we please backport these two fixes?
Aditya Srivastava (1): checkpatch: fix false positives in REPEATED_WORD warning
Dwaipayan Ray (1): checkpatch: add new exception to repeated word check
scripts/checkpatch.pl | 31 ++++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-)
base-commit: b50306f77190155d2c14a72be5d2e02254d17dbd
From: Dwaipayan Ray dwaipayanray1@gmail.com
commit 1db81a682a2f2a664489c4e94f3b945f70a43a13 upstream.
Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test") moved the repeated word test to check for more file types. But after this, if checkpatch.pl is run on MAINTAINERS, it generates several new warnings of the type:
WARNING: Possible repeated word: 'git'
For example:
WARNING: Possible repeated word: 'git' +T: git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git
So, the pattern "git git://..." is a false positive in this case.
There are several other combinations which may produce a wrong warning message, such as "@size size", ":Begin begin", etc.
Extend repeated word check to compare the characters before and after the word matches.
If there is a non whitespace character before the first word or a non whitespace character excluding punctuation characters after the second word, then the check is skipped and the warning is avoided.
Also add case insensitive word matching to the repeated word check.
Link: https://lore.kernel.org/linux-kernel-mentees/81b6a0bb2c7b9256361573f7a13201e... Link: https://lkml.kernel.org/r/20201017162732.152351-1-dwaipayanray1@gmail.com Signed-off-by: Dwaipayan Ray dwaipayanray1@gmail.com Suggested-by: Joe Perches joe@perches.com Suggested-by: Lukas Bulwahn lukas.bulwahn@gmail.com Acked-by: Joe Perches joe@perches.com Cc: Aditya Srivastava yashsri421@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Carlos Llamas cmllamas@google.com --- scripts/checkpatch.pl | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 0ad235ee96f9..a83e5f0088bb 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3050,19 +3050,30 @@ sub process {
# check for repeated words separated by a single space if ($rawline =~ /^+/ || $in_commit_log) { + pos($rawline) = 1 if (!$in_commit_log); while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
my $first = $1; my $second = $2; - + my $start_pos = $-[1]; + my $end_pos = $+[2]; if ($first =~ /(?:struct|union|enum)/) { pos($rawline) += length($first) + length($second) + 1; next; }
- next if ($first ne $second); + next if (lc($first) ne lc($second)); next if ($first eq 'long');
+ # check for character before and after the word matches + my $start_char = ''; + my $end_char = ''; + $start_char = substr($rawline, $start_pos - 1, 1) if ($start_pos > ($in_commit_log ? 0 : 1)); + $end_char = substr($rawline, $end_pos, 1) if ($end_pos < length($rawline)); + + next if ($start_char =~ /^\S$/); + next if (index(" \t.,;?!", $end_char) == -1); + if (WARN("REPEATED_WORD", "Possible repeated word: '$first'\n" . $herecurr) && $fix) {
On Thu, 2023-12-14 at 18:15 +0000, Carlos Llamas wrote:
From: Dwaipayan Ray dwaipayanray1@gmail.com
commit 1db81a682a2f2a664489c4e94f3b945f70a43a13 upstream.
Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test") moved the repeated word test to check for more file types. But after this, if checkpatch.pl is run on MAINTAINERS, it generates several new warnings of the type:
WARNING: Possible repeated word: 'git'
Why should this be backported? Who cares?
On Thu, Dec 14, 2023 at 10:31:02AM -0800, Joe Perches wrote:
On Thu, 2023-12-14 at 18:15 +0000, Carlos Llamas wrote:
From: Dwaipayan Ray dwaipayanray1@gmail.com
commit 1db81a682a2f2a664489c4e94f3b945f70a43a13 upstream.
Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test") moved the repeated word test to check for more file types. But after this, if checkpatch.pl is run on MAINTAINERS, it generates several new warnings of the type:
WARNING: Possible repeated word: 'git'
Why should this be backported? Who cares?
Our CI for stable cares. However, it seems it would be best to run the version directly from master instead. I'll try that. Sorry for the noise.
From: Aditya Srivastava yashsri421@gmail.com
commit 8d0325cc74a31d517b5b4307c8d895c6e81076b7 upstream.
Presence of hexadecimal address or symbol results in false warning message by checkpatch.pl.
For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix memory leak in mptcp_subflow_create_socket()") results in warning:
WARNING:REPEATED_WORD: Possible repeated word: 'ff' 00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff ........./0.....
Similarly, the presence of list command output in commit results in an unnecessary warning.
For example, running checkpatch on commit 899e5ffbf246 ("perf record: Introduce --switch-output-event") gives:
WARNING:REPEATED_WORD: Possible repeated word: 'root' dr-xr-x---. 12 root root 4096 Apr 27 17:46 ..
Here, it reports 'ff' and 'root' to be repeated, but it is in fact part of some address or code, where it has to be repeated.
In these cases, the intent of the warning to find stylistic issues in commit messages is not met and the warning is just completely wrong in this case.
To avoid these warnings, add an additional regex check for the directory permission pattern and avoid checking the line for this class of warning. Similarly, to avoid hex pattern, check if the word consists of hex symbols and skip this warning if it is not among the common english words formed using hex letters.
A quick evaluation on v5.6..v5.8 showed that this fix reduces REPEATED_WORD warnings by the frequency of 1890.
A quick manual check found all cases are related to hex output or list command outputs in commit messages.
Link: https://lkml.kernel.org/r/20201024102253.13614-1-yashsri421@gmail.com Signed-off-by: Aditya Srivastava yashsri421@gmail.com Acked-by: Joe Perches joe@perches.com Cc: Dwaipayan Ray dwaipayanray1@gmail.com Cc: Lukas Bulwahn lukas.bulwahn@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Carlos Llamas cmllamas@google.com --- scripts/checkpatch.pl | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index a83e5f0088bb..c2704af497ba 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -853,6 +853,13 @@ our $declaration_macros = qr{(?x: (?:SKCIPHER_REQUEST|SHASH_DESC|AHASH_REQUEST)_ON_STACK\s*( )};
+our %allow_repeated_words = ( + add => '', + added => '', + bad => '', + be => '', +); + sub deparenthesize { my ($string) = @_; return "" if (!defined($string)); @@ -3049,7 +3056,9 @@ sub process { }
# check for repeated words separated by a single space - if ($rawline =~ /^+/ || $in_commit_log) { +# avoid false positive from list command eg, '-rw-r--r-- 1 root root' + if (($rawline =~ /^+/ || $in_commit_log) && + $rawline !~ /[bcCdDlMnpPs?-][rwxsStT-]{9}/) { pos($rawline) = 1 if (!$in_commit_log); while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
@@ -3074,6 +3083,11 @@ sub process { next if ($start_char =~ /^\S$/); next if (index(" \t.,;?!", $end_char) == -1);
+ # avoid repeating hex occurrences like 'ff ff fe 09 ...' + if ($first =~ /\b[0-9a-f]{2,}\b/i) { + next if (!exists($allow_repeated_words{lc($first)})); + } + if (WARN("REPEATED_WORD", "Possible repeated word: '$first'\n" . $herecurr) && $fix) {
On Thu, Dec 14, 2023 at 06:15:02PM +0000, Carlos Llamas wrote:
The checkpatch.pl in v5.10.y still triggers lots of false positives for REPEATED_WORD warnings, particularly for commit logs. Can we please backport these two fixes?
Why is older versions of checkpatch being used? Why not always use the latest version, much like perf is handled?
No new code should be written against older kernels, so who is using this old tool?
thanks,
greg k-h
On Thu, Dec 14, 2023 at 07:23:28PM +0100, Greg KH wrote:
On Thu, Dec 14, 2023 at 06:15:02PM +0000, Carlos Llamas wrote:
The checkpatch.pl in v5.10.y still triggers lots of false positives for REPEATED_WORD warnings, particularly for commit logs. Can we please backport these two fixes?
Why is older versions of checkpatch being used? Why not always use the latest version, much like perf is handled?
No new code should be written against older kernels, so who is using this old tool?
This is a minor annoyance when working directly with the v5.10 stable tree and doing e.g ./scripts/checkpatch.pl -g HEAD. I suppose it makes sense to always prefer the top-of-tree scripts. However, this could be inconvenient for some scenarios were master needs to be pulled separately.
-- Carlos Llamas
On Thu, Dec 14, 2023 at 06:37:43PM +0000, Carlos Llamas wrote:
On Thu, Dec 14, 2023 at 07:23:28PM +0100, Greg KH wrote:
On Thu, Dec 14, 2023 at 06:15:02PM +0000, Carlos Llamas wrote:
The checkpatch.pl in v5.10.y still triggers lots of false positives for REPEATED_WORD warnings, particularly for commit logs. Can we please backport these two fixes?
Why is older versions of checkpatch being used? Why not always use the latest version, much like perf is handled?
No new code should be written against older kernels, so who is using this old tool?
This is a minor annoyance when working directly with the v5.10 stable tree and doing e.g ./scripts/checkpatch.pl -g HEAD. I suppose it makes sense to always prefer the top-of-tree scripts. However, this could be inconvenient for some scenarios were master needs to be pulled separately.
It makes more sense to use the newer version of the tool, especially as you are probably having it review backports of newer patches, which obviously, should follow the newer checkpatch settings, not the older ones :)
thanks,
greg k-h
On Fri, Dec 15, 2023 at 08:15:01AM +0100, Greg KH wrote:
On Thu, Dec 14, 2023 at 06:37:43PM +0000, Carlos Llamas wrote:
On Thu, Dec 14, 2023 at 07:23:28PM +0100, Greg KH wrote:
On Thu, Dec 14, 2023 at 06:15:02PM +0000, Carlos Llamas wrote:
The checkpatch.pl in v5.10.y still triggers lots of false positives for REPEATED_WORD warnings, particularly for commit logs. Can we please backport these two fixes?
Why is older versions of checkpatch being used? Why not always use the latest version, much like perf is handled?
No new code should be written against older kernels, so who is using this old tool?
This is a minor annoyance when working directly with the v5.10 stable tree and doing e.g ./scripts/checkpatch.pl -g HEAD. I suppose it makes sense to always prefer the top-of-tree scripts. However, this could be inconvenient for some scenarios were master needs to be pulled separately.
It makes more sense to use the newer version of the tool, especially as you are probably having it review backports of newer patches, which obviously, should follow the newer checkpatch settings, not the older ones :)
Yes, that is the use-case we have. We'll switch to the latest version so please ignore these patches then. Sorry for the noise.
-- Carlos Llamas
linux-stable-mirror@lists.linaro.org