❌ FAIL: Stable queue: queue-5.2

List overview All Threads
Download

newer

older

[PATCH 1/6] Revert "arm64: Remove...

CKI Project

25 Aug 2019 25 Aug '19

2:37 p.m.

Hello,

We ran automated tests on a patchset that was proposed for merging into this kernel tree. The patches were applied to:

Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git Commit: f7d5b3dc4792 - Linux 5.2.10

The results of these automated tests are provided below.

Overall result: FAILED (see details below) Merge: FAILED

All kernel binaries, config files, and logs are available for download here:

https://artifacts.cki-project.org/pipelines/123306

When we attempted to merge the patchset, we received an error:

error: patch failed: security/keys/trusted.c:1228 error: security/keys/trusted.c: patch does not apply hint: Use 'git am --show-current-patch' to see the failed patch Applying: KEYS: trusted: allow module init if TPM is inactive or deactivated Patch failed at 0001 KEYS: trusted: allow module init if TPM is inactive or deactivated

We hope that these logs can help you find the problem quickly. For the full detail on our testing procedures, please scroll to the bottom of this message.

Please reply to this email if you have any questions about the tests that we ran or if you have any suggestions on how to make future tests more effective.

,-. ,-. ( C ) ( K ) Continuous `-',-.`-' Kernel ( I ) Integration `-' ______________________________________________________________________________

Merge testing -------------

We cloned this repository and checked out the following commit:

Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git Commit: f7d5b3dc4792 - Linux 5.2.10

We grabbed the cc88f4442e50 commit of the stable queue repository.

We then merged the patchset with `git am`:

keys-trusted-allow-module-init-if-tpm-is-inactive-or-deactivated.patch

Show replies by date

Greg KH

25 Aug 25 Aug

2:41 p.m.

On Sun, Aug 25, 2019 at 10:37:26AM -0400, CKI Project wrote:

...

Hello,

We ran automated tests on a patchset that was proposed for merging into this kernel tree. The patches were applied to:
   Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
        Commit: f7d5b3dc4792 - Linux 5.2.10
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
         Merge: FAILED
All kernel binaries, config files, and logs are available for download here:

https://artifacts.cki-project.org/pipelines/123306

When we attempted to merge the patchset, we received an error:

error: patch failed: security/keys/trusted.c:1228 error: security/keys/trusted.c: patch does not apply hint: Use 'git am --show-current-patch' to see the failed patch Applying: KEYS: trusted: allow module init if TPM is inactive or deactivated Patch failed at 0001 KEYS: trusted: allow module init if TPM is inactive or deactivated

We hope that these logs can help you find the problem quickly. For the full detail on our testing procedures, please scroll to the bottom of this message.

Please reply to this email if you have any questions about the tests that we ran or if you have any suggestions on how to make future tests more effective.
    ,-.   ,-.
   ( C ) ( K )  Continuous
    `-',-.`-'   Kernel
      ( I )     Integration
       `-'
Merge testing

We cloned this repository and checked out the following commit:

Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git Commit: f7d5b3dc4792 - Linux 5.2.10

We grabbed the cc88f4442e50 commit of the stable queue repository.

We then merged the patchset with `git am`:

keys-trusted-allow-module-init-if-tpm-is-inactive-or-deactivated.patch

That file is not in the repo, I think your system is messed up :(

Nikolai Kondrashov

26 Aug 26 Aug

8:23 a.m.

On 8/25/19 5:41 PM, Greg KH wrote:

...

On Sun, Aug 25, 2019 at 10:37:26AM -0400, CKI Project wrote:

...
Merge testing

We cloned this repository and checked out the following commit:

Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git Commit: f7d5b3dc4792 - Linux 5.2.10

We grabbed the cc88f4442e50 commit of the stable queue repository.

We then merged the patchset with `git am`:

keys-trusted-allow-module-init-if-tpm-is-inactive-or-deactivated.patch

That file is not in the repo, I think your system is messed up :(

Sorry for the trouble, Greg, but I think it's a race between the changes to the two repos.

The job which triggered this message was started right before the moment this commit was made:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/comm...

At that moment, the repo was still on this commit, about five hours old:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/comm...

which still had the file. And when the job finished, and the message reached you, yes, the repo no longer contained it.

At the moment the job started, the latest commit to stable/linux.git was about 22 minutes old:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=l...

and the repo already contained the patches from the queue, including the one the job tried to merge:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=l...

IIRC, we agreed to not start testing both of the repos until the latest commits are at least 5 minutes old. In this situation the latest commit was 22 minutes old, so the system started testing.

We could increase the window to, say, 30 minutes (or something else), to avoid misfires like this, but then the response time would be increased accordingly.

It's your pick :)

Nick

Greg KH

8:33 a.m.

On Mon, Aug 26, 2019 at 11:23:58AM +0300, Nikolai Kondrashov wrote:

...

On 8/25/19 5:41 PM, Greg KH wrote:

...
On Sun, Aug 25, 2019 at 10:37:26AM -0400, CKI Project wrote:

...
Merge testing

We cloned this repository and checked out the following commit:

Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git Commit: f7d5b3dc4792 - Linux 5.2.10

We grabbed the cc88f4442e50 commit of the stable queue repository.

We then merged the patchset with `git am`:

keys-trusted-allow-module-init-if-tpm-is-inactive-or-deactivated.patch

That file is not in the repo, I think your system is messed up :(

Sorry for the trouble, Greg, but I think it's a race between the changes to the two repos.

The job which triggered this message was started right before the moment this commit was made:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=af2f46e26e770b3aa0bc304a13ecd24763f3b452
At that moment, the repo was still on this commit, about five hours old:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=cc88f4442e505e9f1f21c8c119debe89cbf63ab2
which still had the file. And when the job finished, and the message reached you, yes, the repo no longer contained it.

At the moment the job started, the latest commit to stable/linux.git was about 22 minutes old:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f7d5b3dc4792a5fe0a4d6b8106a8f3eb20c3c24c
and the repo already contained the patches from the queue, including the one the job tried to merge:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f820ecf609cc38676071ec6c6d3e96b26c73b747

How in the world are you seeing such a messed up tree?

The 5.2.10 commit moved things around, in one single atomic move.

...

IIRC, we agreed to not start testing both of the repos until the latest commits are at least 5 minutes old. In this situation the latest commit was 22 minutes old, so the system started testing.

We could increase the window to, say, 30 minutes (or something else), to avoid misfires like this, but then the response time would be increased accordingly.

It's your pick :)

Why is there any race at all?

Why do you not have a local mirror of the repo? When it updates, then run the tests. Every commit in the tree is "stand alone" and things should work at that point in time. Don't use a commit as a "time to go mirror something at a later point in time", as you are ending up with trees that are obviously not correct at all.

I think you need to rework your systems as no one else seems to have this "stale random tree state" issue.

Git does commits in an atomic fashion, how you all are messing that up shows you are doing _way_ more work than you probably need to :)

thanks,

greg k-h

Nikolai Kondrashov

9:13 a.m.

On 8/26/19 11:33 AM, Greg KH wrote:

...

On Mon, Aug 26, 2019 at 11:23:58AM +0300, Nikolai Kondrashov wrote:

...
On 8/25/19 5:41 PM, Greg KH wrote:

...
On Sun, Aug 25, 2019 at 10:37:26AM -0400, CKI Project wrote:

...
Merge testing

We cloned this repository and checked out the following commit:
Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: f7d5b3dc4792 - Linux 5.2.10
We grabbed the cc88f4442e50 commit of the stable queue repository.

We then merged the patchset with `git am`:
keys-trusted-allow-module-init-if-tpm-is-inactive-or-deactivated.patch
That file is not in the repo, I think your system is messed up :(
Sorry for the trouble, Greg, but I think it's a race between the changes to the two repos.

The job which triggered this message was started right before the moment this commit was made:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=af2f46e26e770b3aa0bc304a13ecd24763f3b452
At that moment, the repo was still on this commit, about five hours old:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=cc88f4442e505e9f1f21c8c119debe89cbf63ab2
which still had the file. And when the job finished, and the message reached you, yes, the repo no longer contained it.

At the moment the job started, the latest commit to stable/linux.git was about 22 minutes old:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f7d5b3dc4792a5fe0a4d6b8106a8f3eb20c3c24c
and the repo already contained the patches from the queue, including the one the job tried to merge:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f820ecf609cc38676071ec6c6d3e96b26c73b747
How in the world are you seeing such a messed up tree?

The 5.2.10 commit moved things around, in one single atomic move.

...
IIRC, we agreed to not start testing both of the repos until the latest commits are at least 5 minutes old. In this situation the latest commit was 22 minutes old, so the system started testing.

We could increase the window to, say, 30 minutes (or something else), to avoid misfires like this, but then the response time would be increased accordingly.

It's your pick :)

Why is there any race at all?

Why do you not have a local mirror of the repo? When it updates, then run the tests. Every commit in the tree is "stand alone" and things should work at that point in time. Don't use a commit as a "time to go mirror something at a later point in time", as you are ending up with trees that are obviously not correct at all.

I think you need to rework your systems as no one else seems to have this "stale random tree state" issue.

Git does commits in an atomic fashion, how you all are messing that up shows you are doing _way_ more work than you probably need to :)

Sorry, I'm not the one who implemented and maintains the system, I'm just generally aware of how it works and am looking at the code right now, so I could be misunderstanding something. Please bear with me :)

However, I don't see how anything could be done, if we have two git repos, which are inconsistent with each other, when CI comes to test them.

I'll try to draw the timeline of what was happening to explain what I think is the problem. All times are in my timezone (UTC+03:00).

Time stable/linux.git stable/stable-queue.git Comments branch linux-5.2.y branch master subdir queue-5.2 --------------- ------------------- ----------------------- ----------------- Aug 5 19:44:27 aad39e30fb9e6e72, Repos are "Linux 5.2.9", consistent *doesn't have* the patch that failed

Aug 25 11:53:25 cc88f4442e505e9f, Repos are "Linux 4.4.190", consistent *has* the patch that failed

Aug 25 17:13:54 f7d5b3dc4792a5, Repos are "Linux 5.2.10", inconsistent, contains patches both contain from the queue the same patches above, including the failed one

Aug 25 17:36:18 Our CI job starts

Aug 25 17:36:19 af2f46e26e770b3a Repos are "Linux 5.2.10", consistent "queue-5.2" dir is removed, doesn't have the failed patch

Aug 25 17:37:23 Our CI sends failure report

I.e. I think the problem was that both linux-5.2.y branch of stable/linux.git, and the queue-5.2 subdir of master branch of stable/stable-queue.git contained the same patches for about 22 minutes on Aug 25, when our CI started.

We sample the latest commits from both repos at the same time (well, as close as Python and HTTP allow us), and we update our clones to those before testing.

We also don't start testing if the commits in either are less than 5 minutes old to avoid testing inconsistent repos, assuming that 5 minutes are enough to update them both to keep them in consistency. We can increase that time to what you think best fits your workflow, to avoid hitting these problems.

Nick

Nikolai Kondrashov

9:40 a.m.

On 8/26/19 12:13 PM, Nikolai Kondrashov wrote:

...

On 8/26/19 11:33 AM, Greg KH wrote:

...
On Mon, Aug 26, 2019 at 11:23:58AM +0300, Nikolai Kondrashov wrote:

...
On 8/25/19 5:41 PM, Greg KH wrote:

...
On Sun, Aug 25, 2019 at 10:37:26AM -0400, CKI Project wrote:

...
Merge testing

We cloned this repository and checked out the following commit:
Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: f7d5b3dc4792 - Linux 5.2.10
We grabbed the cc88f4442e50 commit of the stable queue repository.

We then merged the patchset with `git am`:
keys-trusted-allow-module-init-if-tpm-is-inactive-or-deactivated.patch
That file is not in the repo, I think your system is messed up :(
Sorry for the trouble, Greg, but I think it's a race between the changes to the two repos.

The job which triggered this message was started right before the moment this commit was made:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=af2f46e26e770b3aa0bc304a13ecd24763f3b452
At that moment, the repo was still on this commit, about five hours old:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=cc88f4442e505e9f1f21c8c119debe89cbf63ab2
which still had the file. And when the job finished, and the message reached you, yes, the repo no longer contained it.

At the moment the job started, the latest commit to stable/linux.git was about 22 minutes old:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f7d5b3dc4792a5fe0a4d6b8106a8f3eb20c3c24c
and the repo already contained the patches from the queue, including the one the job tried to merge:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f820ecf609cc38676071ec6c6d3e96b26c73b747
How in the world are you seeing such a messed up tree?

The 5.2.10 commit moved things around, in one single atomic move.

...
IIRC, we agreed to not start testing both of the repos until the latest commits are at least 5 minutes old. In this situation the latest commit was 22 minutes old, so the system started testing.

We could increase the window to, say, 30 minutes (or something else), to avoid misfires like this, but then the response time would be increased accordingly.

It's your pick :)

Why is there any race at all?

Why do you not have a local mirror of the repo? When it updates, then run the tests. Every commit in the tree is "stand alone" and things should work at that point in time. Don't use a commit as a "time to go mirror something at a later point in time", as you are ending up with trees that are obviously not correct at all.

I think you need to rework your systems as no one else seems to have this "stale random tree state" issue.

Git does commits in an atomic fashion, how you all are messing that up shows you are doing _way_ more work than you probably need to :)
Sorry, I'm not the one who implemented and maintains the system, I'm just generally aware of how it works and am looking at the code right now, so I could be misunderstanding something. Please bear with me :)

However, I don't see how anything could be done, if we have two git repos, which are inconsistent with each other, when CI comes to test them.

I'll try to draw the timeline of what was happening to explain what I think is the problem. All times are in my timezone (UTC+03:00).

Time stable/linux.git stable/stable-queue.git Comments branch linux-5.2.y branch master subdir queue-5.2

Aug 5 19:44:27 aad39e30fb9e6e72, Repos are "Linux 5.2.9", consistent *doesn't have* the patch that failed

Aug 25 11:53:25 cc88f4442e505e9f, Repos are "Linux 4.4.190", consistent *has* the patch that failed

Aug 25 17:13:54 f7d5b3dc4792a5, Repos are "Linux 5.2.10", inconsistent, contains patches both contain from the queue the same patches above, including the failed one

Aug 25 17:36:18 Our CI job starts

Aug 25 17:36:19 af2f46e26e770b3a Repos are "Linux 5.2.10", consistent "queue-5.2" dir is removed, doesn't have the failed patch

Aug 25 17:37:23 Our CI sends failure report

I.e. I think the problem was that both linux-5.2.y branch of stable/linux.git, and the queue-5.2 subdir of master branch of stable/stable-queue.git contained the same patches for about 22 minutes on Aug 25, when our CI started.

We sample the latest commits from both repos at the same time (well, as close as Python and HTTP allow us), and we update our clones to those before testing.

We also don't start testing if the commits in either are less than 5 minutes old to avoid testing inconsistent repos, assuming that 5 minutes are enough to update them both to keep them in consistency. We can increase that time to what you think best fits your workflow, to avoid hitting these problems.

OK, I keep forgetting about the fact that commit and push times are different, and I have no idea what was pushed when. I'll go check our code and logs a little closer.

Nick

Nikolai Kondrashov

11:12 a.m.

On 8/26/19 12:40 PM, Nikolai Kondrashov wrote:

...

On 8/26/19 12:13 PM, Nikolai Kondrashov wrote:

...
On 8/26/19 11:33 AM, Greg KH wrote:

...
On Mon, Aug 26, 2019 at 11:23:58AM +0300, Nikolai Kondrashov wrote:

...
On 8/25/19 5:41 PM, Greg KH wrote:

...
On Sun, Aug 25, 2019 at 10:37:26AM -0400, CKI Project wrote:

...
Merge testing

We cloned this repository and checked out the following commit:
Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: f7d5b3dc4792 - Linux 5.2.10
We grabbed the cc88f4442e50 commit of the stable queue repository.

We then merged the patchset with `git am`:
keys-trusted-allow-module-init-if-tpm-is-inactive-or-deactivated.patch
That file is not in the repo, I think your system is messed up :(
Sorry for the trouble, Greg, but I think it's a race between the changes to the two repos.

The job which triggered this message was started right before the moment this commit was made:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=af2f46e26e770b3aa0bc304a13ecd24763f3b452
At that moment, the repo was still on this commit, about five hours old:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=cc88f4442e505e9f1f21c8c119debe89cbf63ab2
which still had the file. And when the job finished, and the message reached you, yes, the repo no longer contained it.

At the moment the job started, the latest commit to stable/linux.git was about 22 minutes old:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f7d5b3dc4792a5fe0a4d6b8106a8f3eb20c3c24c
and the repo already contained the patches from the queue, including the one the job tried to merge:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f820ecf609cc38676071ec6c6d3e96b26c73b747
How in the world are you seeing such a messed up tree?

The 5.2.10 commit moved things around, in one single atomic move.

...
IIRC, we agreed to not start testing both of the repos until the latest commits are at least 5 minutes old. In this situation the latest commit was 22 minutes old, so the system started testing.

We could increase the window to, say, 30 minutes (or something else), to avoid misfires like this, but then the response time would be increased accordingly.

It's your pick :)

Why is there any race at all?

Why do you not have a local mirror of the repo? When it updates, then run the tests. Every commit in the tree is "stand alone" and things should work at that point in time. Don't use a commit as a "time to go mirror something at a later point in time", as you are ending up with trees that are obviously not correct at all.

I think you need to rework your systems as no one else seems to have this "stale random tree state" issue.

Git does commits in an atomic fashion, how you all are messing that up shows you are doing _way_ more work than you probably need to :)
Sorry, I'm not the one who implemented and maintains the system, I'm just generally aware of how it works and am looking at the code right now, so I could be misunderstanding something. Please bear with me :)

However, I don't see how anything could be done, if we have two git repos, which are inconsistent with each other, when CI comes to test them.

I'll try to draw the timeline of what was happening to explain what I think is the problem. All times are in my timezone (UTC+03:00).

Time stable/linux.git stable/stable-queue.git Comments branch linux-5.2.y branch master subdir queue-5.2

Aug 5 19:44:27 aad39e30fb9e6e72, Repos are "Linux 5.2.9", consistent *doesn't have* the patch that failed

Aug 25 11:53:25 cc88f4442e505e9f, Repos are "Linux 4.4.190", consistent *has* the patch that failed

Aug 25 17:13:54 f7d5b3dc4792a5, Repos are "Linux 5.2.10", inconsistent, contains patches both contain from the queue the same patches above, including the failed one

Aug 25 17:36:18 Our CI job starts

Aug 25 17:36:19 af2f46e26e770b3a Repos are "Linux 5.2.10", consistent "queue-5.2" dir is removed, doesn't have the failed patch

Aug 25 17:37:23 Our CI sends failure report

I.e. I think the problem was that both linux-5.2.y branch of stable/linux.git, and the queue-5.2 subdir of master branch of stable/stable-queue.git contained the same patches for about 22 minutes on Aug 25, when our CI started.

We sample the latest commits from both repos at the same time (well, as close as Python and HTTP allow us), and we update our clones to those before testing.

We also don't start testing if the commits in either are less than 5 minutes old to avoid testing inconsistent repos, assuming that 5 minutes are enough to update them both to keep them in consistency. We can increase that time to what you think best fits your workflow, to avoid hitting these problems.
OK, I keep forgetting about the fact that commit and push times are different, and I have no idea what was pushed when. I'll go check our code and logs a little closer.

OK, regardless whether the repo conflict was made public or not, we might have a problem in the way we check the age of the latest commits. We're using cgit's patch view for the corresponding branch, since the normal tools don't show the commit dates without cloning the repo. Since cgit normally caches most of what it shows, I suspect we might have hit a stale cache there.

I'll see what we can do. Either we'll keep a clone cached just for determining when to start the CI job, or find a way to fresher data.

Nick

Nikolai Kondrashov

11:39 a.m.

On 8/26/19 2:12 PM, Nikolai Kondrashov wrote:

...

On 8/26/19 12:40 PM, Nikolai Kondrashov wrote:

...
On 8/26/19 12:13 PM, Nikolai Kondrashov wrote:

...
On 8/26/19 11:33 AM, Greg KH wrote:

...
On Mon, Aug 26, 2019 at 11:23:58AM +0300, Nikolai Kondrashov wrote:

...
On 8/25/19 5:41 PM, Greg KH wrote:

...
On Sun, Aug 25, 2019 at 10:37:26AM -0400, CKI Project wrote: > Merge testing > ------------- > > We cloned this repository and checked out the following commit: > > Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git > Commit: f7d5b3dc4792 - Linux 5.2.10 > > > We grabbed the cc88f4442e50 commit of the stable queue repository. > > We then merged the patchset with `git am`: > > keys-trusted-allow-module-init-if-tpm-is-inactive-or-deactivated.patch

That file is not in the repo, I think your system is messed up :(

Sorry for the trouble, Greg, but I think it's a race between the changes to the two repos.

The job which triggered this message was started right before the moment this commit was made:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=af2f46e26e770b3aa0bc304a13ecd24763f3b452
At that moment, the repo was still on this commit, about five hours old:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=cc88f4442e505e9f1f21c8c119debe89cbf63ab2
which still had the file. And when the job finished, and the message reached you, yes, the repo no longer contained it.

At the moment the job started, the latest commit to stable/linux.git was about 22 minutes old:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f7d5b3dc4792a5fe0a4d6b8106a8f3eb20c3c24c
and the repo already contained the patches from the queue, including the one the job tried to merge:
 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f820ecf609cc38676071ec6c6d3e96b26c73b747
How in the world are you seeing such a messed up tree?

The 5.2.10 commit moved things around, in one single atomic move.

...
IIRC, we agreed to not start testing both of the repos until the latest commits are at least 5 minutes old. In this situation the latest commit was 22 minutes old, so the system started testing.

We could increase the window to, say, 30 minutes (or something else), to avoid misfires like this, but then the response time would be increased accordingly.

It's your pick :)

Why is there any race at all?

Why do you not have a local mirror of the repo? When it updates, then run the tests. Every commit in the tree is "stand alone" and things should work at that point in time. Don't use a commit as a "time to go mirror something at a later point in time", as you are ending up with trees that are obviously not correct at all.

I think you need to rework your systems as no one else seems to have this "stale random tree state" issue.

Git does commits in an atomic fashion, how you all are messing that up shows you are doing _way_ more work than you probably need to :)
Sorry, I'm not the one who implemented and maintains the system, I'm just generally aware of how it works and am looking at the code right now, so I could be misunderstanding something. Please bear with me :)

However, I don't see how anything could be done, if we have two git repos, which are inconsistent with each other, when CI comes to test them.

I'll try to draw the timeline of what was happening to explain what I think is the problem. All times are in my timezone (UTC+03:00).

Time stable/linux.git stable/stable-queue.git Comments branch linux-5.2.y branch master subdir queue-5.2

Aug 5 19:44:27 aad39e30fb9e6e72, Repos are "Linux 5.2.9", consistent *doesn't have* the patch that failed

Aug 25 11:53:25 cc88f4442e505e9f, Repos are "Linux 4.4.190", consistent *has* the patch that failed

Aug 25 17:13:54 f7d5b3dc4792a5, Repos are "Linux 5.2.10", inconsistent, contains patches both contain from the queue the same patches above, including the failed one

Aug 25 17:36:18 Our CI job starts

Aug 25 17:36:19 af2f46e26e770b3a Repos are "Linux 5.2.10", consistent "queue-5.2" dir is removed, doesn't have the failed patch

Aug 25 17:37:23 Our CI sends failure report

I.e. I think the problem was that both linux-5.2.y branch of stable/linux.git, and the queue-5.2 subdir of master branch of stable/stable-queue.git contained the same patches for about 22 minutes on Aug 25, when our CI started.

We sample the latest commits from both repos at the same time (well, as close as Python and HTTP allow us), and we update our clones to those before testing.

We also don't start testing if the commits in either are less than 5 minutes old to avoid testing inconsistent repos, assuming that 5 minutes are enough to update them both to keep them in consistency. We can increase that time to what you think best fits your workflow, to avoid hitting these problems.
OK, I keep forgetting about the fact that commit and push times are different, and I have no idea what was pushed when. I'll go check our code and logs a little closer.
OK, regardless whether the repo conflict was made public or not, we might have a problem in the way we check the age of the latest commits. We're using cgit's patch view for the corresponding branch, since the normal tools don't show the commit dates without cloning the repo. Since cgit normally caches most of what it shows, I suspect we might have hit a stale cache there.

I'll see what we can do. Either we'll keep a clone cached just for determining when to start the CI job, or find a way to fresher data.

Ah, wrong. We're actually getting latest commit hashes with "git ls-remote" first, which I believe is not cached, and only *then* query cgit for their date. Since commits hashes are unique and commits never change, we shouldn't be getting any out-of-date data. The worst would be 404, and we weren't getting that.

Here's the code in question: https://gitlab.com/cki-project/pipeline-trigger/blob/e2e46e9580e260442805f6e...

So, this leads me to suspect the repos *were* inconsistent. Likely not as I described before, but still. They should've been inconsistent for more than 5 minutes for us to trip on this.

Nick

Sasha Levin

1:33 p.m.

On Mon, Aug 26, 2019 at 02:39:31PM +0300, Nikolai Kondrashov wrote:

...

So, this leads me to suspect the repos *were* inconsistent. Likely not as I described before, but still. They should've been inconsistent for more than 5 minutes for us to trip on this.

This is likely the case. I took my sweet time doing the release and looking at irc logs, I have gone way above 5 minutes. However, we'd really like to avoid having a magical number of minutes here to get it right.

To me the issue seems that you're mixing the information provided by two repos that may have inconsistency between them, even if merely due to sync within the CDN. You should use information provided only by one repo.

I myself run a (rather dumb) bot that just attempts to apply/build -stable tagged patches, and it seems to avoid the inconsistency issue by only working with the information provided by stable-queue:

- For each of the active stable/LTS kernels (let's say 5.2 in this "loop"), we do: - Grab the latest released version from stable-queue: - $ git tag | sort -V | grep 'v5.2' | tail -n1 v5.2.10 - Check it out in linux-stable: - $ git checkout v5.2.10 - Bail if the above fails; this solves the "consistency" problem. - Apply the patches from the queue - Run your tests

This way, you guarantee that linux-stable is at the right position since you're just telling it where to go to, rather than getting information out of that repo which might conflict with something you've learned from stable-queue.

-- Thanks, Sasha

Nikolai Kondrashov

27 Aug 27 Aug

1:10 p.m.

On 8/26/19 4:33 PM, Sasha Levin wrote:

...

On Mon, Aug 26, 2019 at 02:39:31PM +0300, Nikolai Kondrashov wrote:

...
So, this leads me to suspect the repos *were* inconsistent. Likely not as I described before, but still. They should've been inconsistent for more than 5 minutes for us to trip on this.

This is likely the case. I took my sweet time doing the release and looking at irc logs, I have gone way above 5 minutes. However, we'd really like to avoid having a magical number of minutes here to get it right.

To me the issue seems that you're mixing the information provided by two repos that may have inconsistency between them, even if merely due to sync within the CDN. You should use information provided only by one repo.

I myself run a (rather dumb) bot that just attempts to apply/build -stable tagged patches, and it seems to avoid the inconsistency issue by only working with the information provided by stable-queue:

For each of the active stable/LTS kernels (let's say 5.2 in this "loop"), we do:

Grab the latest released version from stable-queue:

$ git tag | sort -V | grep 'v5.2' | tail -n1 v5.2.10

Check it out in linux-stable:

$ git checkout v5.2.10

Bail if the above fails; this solves the "consistency" problem.

Apply the patches from the queue

Run your tests

This way, you guarantee that linux-stable is at the right position since you're just telling it where to go to, rather than getting information out of that repo which might conflict with something you've learned from stable-queue.

Thank you, Sasha. This makes sense. You using this approach in your bot gives us the guarantee it will work :) We'll change our trigger to this (I posted an internal ticket and everything), likely next week.

Nick

2149

days inactive

2151

days old

linux-stable-mirror@lists.linaro.org

9 comments

participants

tags (0)

participants (4)

CKI Project
Greg KH
Nikolai Kondrashov
Sasha Levin