Re: [Linaro-validation] [LNG] run BE in lava

List overview All Threads
Download

newer

older

Re: [Linaro-validation] More...

errore in lava-dispacher

Fathi Boudra

30 Sep 2013 30 Sep '13

3:54 p.m.

CC'ing LAVA guys

On 26 September 2013 15:46, Maxim Uvarov maxim.uvarov@linaro.org wrote:

...

As I understand this log right: https://validation.linaro.org/scheduler/job/74753/log_file

Arndale booted with standard LE rootfs. Then there is wget rootfs and chroot to it. And of course any commands under chroot will fail.

So I think for now it's impossible to run BE in lava boards unless we will set up any board with BE env or remove chroot from set up scripts.

Is chroot really needed for this? : chroot /mnt/root ln -sf /bin/true /usr/sbin/flash-kernel

Thank you, Maxim.

Show replies by date

Antonio Terceiro

2 Oct 2 Oct

3:45 p.m.

New subject: [LNG] run BE in lava

On Mon, Sep 30, 2013 at 06:54:47PM +0300, Fathi Boudra wrote:

...

CC'ing LAVA guys

On 26 September 2013 15:46, Maxim Uvarov maxim.uvarov@linaro.org wrote:

...
As I understand this log right: https://validation.linaro.org/scheduler/job/74753/log_file

Arndale booted with standard LE rootfs. Then there is wget rootfs and chroot to it. And of course any commands under chroot will fail.

So I think for now it's impossible to run BE in lava boards unless we will set up any board with BE env or remove chroot from set up scripts.

Is chroot really needed for this? : chroot /mnt/root ln -sf /bin/true /usr/sbin/flash-kernel

This is an internal implementation detail of LAVA that in a ideal world shouldn't even show up in the logs. The job did not get to the point of actually running your code, because the _image deployment_ failed (hence "CriticalError: Deployment failed")

So LAVA needs to run some commands inside the image with chroot before it deploys the image to the device, but if you look at https://validation.linaro.org/scheduler/job/74753/log_file#L_11_26 you will see:

chroot: failed to run command `which': Exec format error

this means that the kernel on the host cannot execute the ARM binaries inside the image. I suspected that this would be because the server is lacking either qemu-user-static or binfmt-support, but both are installed so I'm not sure what went wrong there.

Matt/Dave, can you guys please investigate?

-- Antonio Terceiro Software Engineer - Linaro http://www.linaro.org

Maxim Uvarov

4:56 p.m.

New subject: [LNG] run BE in lava

On 10/02/2013 07:45 PM, Antonio Terceiro wrote:

...

On Mon, Sep 30, 2013 at 06:54:47PM +0300, Fathi Boudra wrote:

...
CC'ing LAVA guys

On 26 September 2013 15:46, Maxim Uvarov maxim.uvarov@linaro.org wrote:

...
As I understand this log right: https://validation.linaro.org/scheduler/job/74753/log_file

Arndale booted with standard LE rootfs. Then there is wget rootfs and chroot to it. And of course any commands under chroot will fail.

So I think for now it's impossible to run BE in lava boards unless we will set up any board with BE env or remove chroot from set up scripts.

Is chroot really needed for this? : chroot /mnt/root ln -sf /bin/true /usr/sbin/flash-kernel

This is an internal implementation detail of LAVA that in a ideal world shouldn't even show up in the logs. The job did not get to the point of actually running your code, because the _image deployment_ failed (hence "CriticalError: Deployment failed")

So LAVA needs to run some commands inside the image with chroot before it deploys the image to the device, but if you look at https://validation.linaro.org/scheduler/job/74753/log_file#L_11_26 you will see:

chroot: failed to run command `which': Exec format error

this means that the kernel on the host cannot execute the ARM binaries inside the image. I suspected that this would be because the server is lacking either qemu-user-static or binfmt-support, but both are installed so I'm not sure what went wrong there.

Matt/Dave, can you guys please investigate?

This happens because original system is little endian. And Lava does chroot to big endian root fs. And it's expected that you can not run be binaries on le system. So the only way to do is to avoid chroot.

Thank you, Maxim.

Michael Hudson-Doyle

8:30 p.m.

New subject: [LNG] run BE in lava

Maxim Uvarov maxim.uvarov@linaro.org writes:

...

This happens because original system is little endian. And Lava does chroot to big endian root fs. And it's expected that you can not run be binaries on le system. So the only way to do is to avoid chroot.

Is there a qemu-arm-static-equivalent that can run BE binaries?

Cheers, mwh

Victor Kamensky

3 Oct 3 Oct

7:03 a.m.

New subject: [LNG] run BE in lava

On 2 October 2013 13:30, Michael Hudson-Doyle michael.hudson@linaro.org wrote:

...

Maxim Uvarov maxim.uvarov@linaro.org writes:

...
This happens because original system is little endian. And Lava does chroot to big endian root fs. And it's expected that you can not run be binaries on le system. So the only way to do is to avoid chroot.

Is there a qemu-arm-static-equivalent that can run BE binaries?

No, there is no such. In fact at this point even basic BE ARM qemu and KVM, just got barely alive very recently. We did not have plans to make x86 host to run BE ARM at all.

Frankly the requirement to have qemu-arm-static just to repackage final images is a bit weird IMHO. Even in LE case: personally I am Fedora user and never get qemu-arm- static working on my Fedora machine. I ended up installing ubuntu under VirtualBox just to run Linaro image packaging tool! I saw some version of qemu-arm-static for Fedora that Fathi put a while back but it does not work on newer version of Fedora ... Note that OE build system can construct final images without really running any target code under qemu. Given It is done with some amount of trickery and set of target postinstall scripts but I like it better than qemu-arm-static. IMHO it work better for all embedded CPUs types.

Wondering how it works in LE aarch64 ... does qemu-arm-static equivalent exist out there?

Thanks, Victor

...

Cheers, mwh

-- You received this message because you are subscribed to the Google Groups "Linaro Networking" group. To unsubscribe from this group and stop receiving emails from it, send an email to linaro-networking+unsubscribe@linaro.org. To post to this group, send email to linaro-networking@linaro.org.

Riku Voipio

9:14 a.m.

New subject: [LNG] run BE in lava

On 3 October 2013 10:03, Victor Kamensky victor.kamensky@linaro.org wrote:

...

On 2 October 2013 13:30, Michael Hudson-Doyle michael.hudson@linaro.org wrote:

...
Maxim Uvarov maxim.uvarov@linaro.org writes:

...
This happens because original system is little endian. And Lava does chroot to big endian root fs. And it's expected that you can not run be binaries on le system. So the only way to do is to avoid chroot.

Is there a qemu-arm-static-equivalent that can run BE binaries?

No, there is no such. In fact at this point even basic BE ARM qemu and KVM, just got barely alive very recently. We did not have plans to make x86 host to run BE ARM at all.

There is qemu-armeb-static, but it is whole lot untested. This chroot is also being run ARM host, so it is even less tested.

But looking at the chroot commends being executed:

root@master [rc=0]# chroot /mnt/root which dpkg-divert chroot: failed to run command `which': Exec format error root@master [rc=126]# chroot /mnt/root ln -sf /bin/true /usr/sbin/flash-kernel chroot: failed to run command `ln': Exec format error

This is horse manure. Lava should not be tampering with images like this. Further more, this is ubuntu specific tampering - there is no dpkg-divert or flash-kernel on the OE filesystem being submitted in the job.

The solution here needs to be removal of these chroot commands, not trying to make them work when they are not needed.

Riku

Fathi Boudra

10:23 a.m.

New subject: [LNG] run BE in lava

On 3 October 2013 10:03, Victor Kamensky victor.kamensky@linaro.org wrote:

...

On 2 October 2013 13:30, Michael Hudson-Doyle michael.hudson@linaro.org wrote:

...
Maxim Uvarov maxim.uvarov@linaro.org writes:

...
This happens because original system is little endian. And Lava does chroot to big endian root fs. And it's expected that you can not run be binaries on le system. So the only way to do is to avoid chroot.

Is there a qemu-arm-static-equivalent that can run BE binaries?

No, there is no such. In fact at this point even basic BE ARM qemu and KVM, just got barely alive very recently. We did not have plans to make x86 host to run BE ARM at all.

Frankly the requirement to have qemu-arm-static just to repackage final images is a bit weird IMHO. Even in LE case: personally I am Fedora user and never get qemu-arm- static working on my Fedora machine. I ended up installing ubuntu under VirtualBox just to run Linaro image packaging tool! I saw some version of qemu-arm-static for Fedora that Fathi put a while back but it does not work on newer version of Fedora ... Note that OE build system can construct final images without really running any target code under qemu. Given It is done with some amount of trickery and set of target postinstall scripts but I like it better than qemu-arm-static. IMHO it work better for all embedded CPUs types.

Wondering how it works in LE aarch64 ... does qemu-arm-static equivalent exist out there?

Our aarch64 support doesn't use QEMU. We use OE and extract archives of cross-built binaries.

According to previous comments, the problem is that we try to run Ubuntu specific commands. Most likely because LAVA code path detects the rootfs and seems to think it's Ubuntu.

In BE case, LAVA behavior should be similar to aarch64: - LAVA detects OE - we don't run native OS commands but only extract archives (QEMU isn't involved, or flash-kernel...)

btw, OpenSUSE guys have just released QEMU aarch64 port: http://news.opensuse.org/2013/10/01/suse-speeds-up-building-aarch64-software... https://github.com/openSUSE/qemu/commits/aarch64-work

Cheers, Fathi

Antonio Terceiro

4 Oct 4 Oct

12:30 p.m.

New subject: [LNG] run BE in lava

On Thu, Oct 03, 2013 at 01:23:14PM +0300, Fathi Boudra wrote:

...

Our aarch64 support doesn't use QEMU. We use OE and extract archives of cross-built binaries.

According to previous comments, the problem is that we try to run Ubuntu specific commands. Most likely because LAVA code path detects the rootfs and seems to think it's Ubuntu.

In BE case, LAVA behavior should be similar to aarch64:

LAVA detects OE

we don't run native OS commands but only extract archives (QEMU

isn't involved, or flash-kernel...)

Actually those commands are run regardless of which OS is in the image, but on little endian images it is harmless. The fix for this is to make the second commmand (the one that failed aborting the job) condition on the first one, which is allowed to fail:

https://git.linaro.org/gitweb?p=lava/lava-dispatcher.git%3Ba=commitdiff%3Bh=...

With this change in, I was able to reproduce that job up to the point of booting the big endian kernel - but the kernel does not boot:

https://staging.validation.linaro.org/scheduler/job/3369/log_file

(note that is the staging LAVA server, I will be looking into deploying this change to the production server later today)

-- Antonio Terceiro Software Engineer - Linaro http://www.linaro.org

Maxim Uvarov

12:42 p.m.

New subject: [LNG] run BE in lava

On 10/04/2013 04:30 PM, Antonio Terceiro wrote:

...

On Thu, Oct 03, 2013 at 01:23:14PM +0300, Fathi Boudra wrote:

...
Our aarch64 support doesn't use QEMU. We use OE and extract archives of cross-built binaries.

According to previous comments, the problem is that we try to run Ubuntu specific commands. Most likely because LAVA code path detects the rootfs and seems to think it's Ubuntu.

In BE case, LAVA behavior should be similar to aarch64:

LAVA detects OE

we don't run native OS commands but only extract archives (QEMU

isn't involved, or flash-kernel...)

Actually those commands are run regardless of which OS is in the image, but on little endian images it is harmless. The fix for this is to make the second commmand (the one that failed aborting the job) condition on the first one, which is allowed to fail:

https://git.linaro.org/gitweb?p=lava/lava-dispatcher.git%3Ba=commitdiff%3Bh=...

With this change in, I was able to reproduce that job up to the point of booting the big endian kernel - but the kernel does not boot:

https://staging.validation.linaro.org/scheduler/job/3369/log_file

(note that is the staging LAVA server, I will be looking into deploying this change to the production server later today)

I failed on xhci.

Needed fix like that: diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 77600ce..6d40a4d 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1572,12 +1572,12 @@ static inline struct usb_hcd *xhci_to_hcd(struct xhci_hcd *xhci) static inline unsigned int xhci_readl(const struct xhci_hcd *xhci, __le32 __iomem *regs) { - return readl(regs); + return readl_relaxed(regs); } static inline void xhci_writel(struct xhci_hcd *xhci, const unsigned int val, __le32 __iomem *regs) { - writel(val, regs); + writel_relaxed(val, regs); }

/* @@ -1593,8 +1593,8 @@ static inline u64 xhci_read_64(const struct xhci_hcd *xhci, __le64 __iomem *regs) { __u32 __iomem *ptr = (__u32 __iomem *) regs; - u64 val_lo = readl(ptr); - u64 val_hi = readl(ptr + 1); + u64 val_lo = readl_relaxed(ptr); + u64 val_hi = readl_relaxed(ptr + 1); return val_lo + (val_hi << 32); } static inline void xhci_write_64(struct xhci_hcd *xhci, @@ -1604,8 +1604,8 @@ static inline void xhci_write_64(struct xhci_hcd *xhci, u32 val_lo = lower_32_bits(val); u32 val_hi = upper_32_bits(val);

- writel(val_lo, ptr); - writel(val_hi, ptr + 1); + writel_relaxed(val_lo, ptr); + writel_relaxed(val_hi, ptr + 1); }

Please let me know when it will be possible to run be kernels in lava. I can test patch.

Thank you, Maxim.

Victor Kamensky

3:40 p.m.

New subject: [LNG] run BE in lava

Hi Maxim,

readl and writel are stronger version of readl_realxed and writel_relaxed:

#define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; }) #define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })

They just add __iormb and __iowmb, I think it is very dangerous thing to drop those memory barriers. I don't think your change is correct and/or it requires way better explanation.

I've run into the same crash while working on 3.12-rc3 BE issues. In fact I saw this failure on both BE and LE and on old versions of BE kernels when I tried to use 4.8 gcc version from 13.09 release. When I fall back to 4.7 (i.e 13.04) it works fine

I would think it is compiler issue or preexisting issue in the code uncovered by compiler change. Personally I think it is the first. Since I am chasing another problem I did not have time to look more deeply into the issue. IMHO it definitely require more digging. In mean time you can quickly check your current version and try another one if your looks as one described in this email.

Thanks, Victor

On 4 October 2013 05:42, Maxim Uvarov maxim.uvarov@linaro.org wrote:

...

On 10/04/2013 04:30 PM, Antonio Terceiro wrote:

...
On Thu, Oct 03, 2013 at 01:23:14PM +0300, Fathi Boudra wrote:

...
Our aarch64 support doesn't use QEMU. We use OE and extract archives of cross-built binaries.

According to previous comments, the problem is that we try to run Ubuntu specific commands. Most likely because LAVA code path detects the rootfs and seems to think it's Ubuntu.

In BE case, LAVA behavior should be similar to aarch64:

LAVA detects OE

we don't run native OS commands but only extract archives (QEMU

isn't involved, or flash-kernel...)

Actually those commands are run regardless of which OS is in the image, but on little endian images it is harmless. The fix for this is to make the second commmand (the one that failed aborting the job) condition on the first one, which is allowed to fail:

https://git.linaro.org/gitweb?p=lava/lava-dispatcher.git%3Ba=commitdiff%3Bh=...

With this change in, I was able to reproduce that job up to the point of booting the big endian kernel - but the kernel does not boot:

https://staging.validation.linaro.org/scheduler/job/3369/log_file

(note that is the staging LAVA server, I will be looking into deploying this change to the production server later today)

I failed on xhci.

Needed fix like that: diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 77600ce..6d40a4d 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1572,12 +1572,12 @@ static inline struct usb_hcd *xhci_to_hcd(struct xhci_hcd *xhci) static inline unsigned int xhci_readl(const struct xhci_hcd *xhci, __le32 __iomem *regs) {
  return readl(regs);
  return readl_relaxed(regs);
} static inline void xhci_writel(struct xhci_hcd *xhci, const unsigned int val, __le32 __iomem *regs) {
  writel(val, regs);
  writel_relaxed(val, regs);
}

/* @@ -1593,8 +1593,8 @@ static inline u64 xhci_read_64(const struct xhci_hcd *xhci, __le64 __iomem *regs) { __u32 __iomem *ptr = (__u32 __iomem *) regs;
  u64 val_lo = readl(ptr);
  u64 val_hi = readl(ptr + 1);
  u64 val_lo = readl_relaxed(ptr);
  u64 val_hi = readl_relaxed(ptr + 1);
  return val_lo + (val_hi << 32);
} static inline void xhci_write_64(struct xhci_hcd *xhci, @@ -1604,8 +1604,8 @@ static inline void xhci_write_64(struct xhci_hcd *xhci, u32 val_lo = lower_32_bits(val); u32 val_hi = upper_32_bits(val);
  writel(val_lo, ptr);
  writel(val_hi, ptr + 1);
  writel_relaxed(val_lo, ptr);
  writel_relaxed(val_hi, ptr + 1);
}

Please let me know when it will be possible to run be kernels in lava. I can test patch.

Thank you, Maxim.

Maxim Uvarov

4:01 p.m.

New subject: [LNG] run BE in lava

On 10/04/2013 07:40 PM, Victor Kamensky wrote:

...

Hi Maxim,

readl and writel are stronger version of readl_realxed and writel_relaxed:

#define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; }) #define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })

They just add __iormb and __iowmb, I think it is very dangerous thing to drop those memory barriers. I don't think your change is correct and/or it requires way better explanation.

I've run into the same crash while working on 3.12-rc3 BE issues. In fact I saw this failure on both BE and LE and on old versions of BE kernels when I tried to use 4.8 gcc version from 13.09 release. When I fall back to 4.7 (i.e 13.04) it works fine

I would think it is compiler issue or preexisting issue in the code uncovered by compiler change. Personally I think it is the first. Since I am chasing another problem I did not have time to look more deeply into the issue. IMHO it definitely require more digging. In mean time you can quickly check your current version and try another one if your looks as one described in this email.

Thanks, Victor

Ah, yes, it __raw_write has direct access and writel swaps bits.

If it's compiler issue then it has to be simple to compare objdump disasm output for that function.

Maxim.

...

On 4 October 2013 05:42, Maxim Uvarov maxim.uvarov@linaro.org wrote:

...
On 10/04/2013 04:30 PM, Antonio Terceiro wrote:

...
On Thu, Oct 03, 2013 at 01:23:14PM +0300, Fathi Boudra wrote:

...
Our aarch64 support doesn't use QEMU. We use OE and extract archives of cross-built binaries.

According to previous comments, the problem is that we try to run Ubuntu specific commands. Most likely because LAVA code path detects the rootfs and seems to think it's Ubuntu.

In BE case, LAVA behavior should be similar to aarch64:

LAVA detects OE

we don't run native OS commands but only extract archives (QEMU

isn't involved, or flash-kernel...)

Actually those commands are run regardless of which OS is in the image, but on little endian images it is harmless. The fix for this is to make the second commmand (the one that failed aborting the job) condition on the first one, which is allowed to fail:

https://git.linaro.org/gitweb?p=lava/lava-dispatcher.git%3Ba=commitdiff%3Bh=...

With this change in, I was able to reproduce that job up to the point of booting the big endian kernel - but the kernel does not boot:

https://staging.validation.linaro.org/scheduler/job/3369/log_file

(note that is the staging LAVA server, I will be looking into deploying this change to the production server later today)

I failed on xhci.

Needed fix like that: diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 77600ce..6d40a4d 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1572,12 +1572,12 @@ static inline struct usb_hcd *xhci_to_hcd(struct xhci_hcd *xhci) static inline unsigned int xhci_readl(const struct xhci_hcd *xhci, __le32 __iomem *regs) {
  return readl(regs);
  return readl_relaxed(regs);
} static inline void xhci_writel(struct xhci_hcd *xhci, const unsigned int val, __le32 __iomem *regs) {
  writel(val, regs);
  writel_relaxed(val, regs);
}

/*
@@ -1593,8 +1593,8 @@ static inline u64 xhci_read_64(const struct xhci_hcd *xhci, __le64 __iomem *regs) { __u32 __iomem *ptr = (__u32 __iomem *) regs;
  u64 val_lo = readl(ptr);
  u64 val_hi = readl(ptr + 1);
  u64 val_lo = readl_relaxed(ptr);
  u64 val_hi = readl_relaxed(ptr + 1);
   return val_lo + (val_hi << 32);
} static inline void xhci_write_64(struct xhci_hcd *xhci,
@@ -1604,8 +1604,8 @@ static inline void xhci_write_64(struct xhci_hcd *xhci, u32 val_lo = lower_32_bits(val); u32 val_hi = upper_32_bits(val);
  writel(val_lo, ptr);
  writel(val_hi, ptr + 1);
  writel_relaxed(val_lo, ptr);
  writel_relaxed(val_hi, ptr + 1);
}
Please let me know when it will be possible to run be kernels in lava. I can test patch.

Thank you, Maxim.

Fathi Boudra

1:11 p.m.

New subject: [LNG] run BE in lava

On 4 October 2013 15:30, Antonio Terceiro antonio.terceiro@linaro.org wrote:

...

On Thu, Oct 03, 2013 at 01:23:14PM +0300, Fathi Boudra wrote:

...
Our aarch64 support doesn't use QEMU. We use OE and extract archives of cross-built binaries.

According to previous comments, the problem is that we try to run Ubuntu specific commands. Most likely because LAVA code path detects the rootfs and seems to think it's Ubuntu.

In BE case, LAVA behavior should be similar to aarch64:

LAVA detects OE

we don't run native OS commands but only extract archives (QEMU

isn't involved, or flash-kernel...)

Actually those commands are run regardless of which OS is in the image, but on little endian images it is harmless. The fix for this is to make the second commmand (the one that failed aborting the job) condition on the first one, which is allowed to fail:

https://git.linaro.org/gitweb?p=lava/lava-dispatcher.git%3Ba=commitdiff%3Bh=...

ah I thought it wasn't run at all with aarch64. In any case, it should behave like aarch64 :)

...

With this change in, I was able to reproduce that job up to the point of booting the big endian kernel - but the kernel does not boot:

https://staging.validation.linaro.org/scheduler/job/3369/log_file

LNG kernel issue. The fix in LAVA will do the trick.

...

(note that is the staging LAVA server, I will be looking into deploying this change to the production server later today)

Thanks.

Antonio Terceiro

7 Oct 7 Oct

5:20 p.m.

New subject: [LNG] run BE in lava

On Fri, Oct 04, 2013 at 04:11:38PM +0300, Fathi Boudra wrote:

...

...
With this change in, I was able to reproduce that job up to the point of booting the big endian kernel - but the kernel does not boot:

https://staging.validation.linaro.org/scheduler/job/3369/log_file

LNG kernel issue. The fix in LAVA will do the trick.

So that fix just landed on validation.linaro.org, the testing of BE images should be good to go.

-- Antonio Terceiro Software Engineer - Linaro http://www.linaro.org

4284

days inactive

4291

days old

linaro-validation@lists.linaro.org

12 comments

participants

tags (0)

participants (6)

Antonio Terceiro
Fathi Boudra
Maxim Uvarov
Michael Hudson-Doyle
Riku Voipio
Victor Kamensky