[VIRT-198 # QEMU: SVE Emulation Support ]
Last of the patches merged to mainline. Epic is now closed.
[VIRT-282 # QEMU: Accelerate TCG with KVM ]
A hallway talk with Paolo lead to a write-up, and Alex encouraged me to create
the epic. The epic description contains a link to the write-up, if anyone is
interested. I'd like to at least create a vm, measure some round-trip costs,
and properly gauge the level of difficulty. Beyond that... we'll see.
[Upstream]
Patch review:
- risc-v decodetree patches v2.
Produced some patches against decodetree itself in response.
I'm hopeful to see a much cleaner v3.
- per-cpu locks
[KVM Forum]
- Unsurprisingly, lots of people worked on speculation mitigation this year.
- Lots of focus on "ram", and the allocation and management thereof.
- Four talks on improving nested virtualization.
- The rest to follow in the trip report.
r~
== Progress ==
* FDPIC
- Posted binutils and uClibc-ng patches to fix cortex-M support, under
discussion
- GCC: handling feedback on v3 patches.
Experimented thumb-1 builds, failed.
* GCC upstream validation:
- reported a few regressions
- dealing with some random results, again
* GCC:
- bug report on aarch64 about misaligned accesses. Waiting for more
details to reproduce the problem.
* misc (conf-calls, meetings, emails, ....)
- Working on fixes of our benchmarking harness to support new gcc-8 releases.
== Next ==
FDPIC:
- GCC: followup v3 patches
- uclibc-ng: look at how to test fdpic mode with openadk
- use qemu-system mode to run more tests
Benchmarking:
- fix harness until they support gcc-8
== This Week ==
* GNU-405: Implement division using vrecpe / vrecps (4/10)
- Patch validated and posted upstream.
* SVE ACLE intrinsics (4/10)
- Going thru documentation.
* GNU-235: Provide value-range info for erf family of functions (1/10)
- Working on patch.
* Misc (1/10)
- Meetings
== Next Week ==
- Continue GNU-235, SVE ACLE intrinsics
Upstream Work ([VIRT-109])
==========================
- started looking at {Qemu-arm} {RFC v4 00/71} per-CPU locks
Message-Id: <20181025151103.GA19931@flamenco>
- this is a precursor to Emilio's {RFC 00/48} Plugin support
Message-Id: <20181025172057.20414-1-cota(a)braap.org>
[VIRT-109] https://projects.linaro.org/browse/VIRT-109
Other Tasks
===========
- Attended KVM Forum 2018
- I am now Spectre'd out ;-)
- Some interesting discussions on upstream CI
- will save the rest of my notes for the conference report
- Delivered QEMU Keynote/Status report @ KVM Forum 2018
- Here are [the slides]
[the slides]
http://people.linaro.org/~alex.bennee/org/presentations/kvm18-qemu-keynote.…
Current Review Queue
====================
* {Qemu-arm} {RFC v4 00/71} per-CPU locks
Message-Id: <20181025151103.GA19931@flamenco>
* {RFC 00/48} Plugin support
Message-Id: <20181025172057.20414-1-cota(a)braap.org>
* {PATCH v2 0/3} Modern shell scripting (use $() instead of ``)
Message-Id: <20181018031723.23459-1-maozhongyi(a)cmss.chinamobile.com>
* {PATCH 0/7} Acceptance Tests: basic architecture support
Message-Id: <20181004151429.7232-1-crosa(a)redhat.com>
* {PATCH v7 00/19} Fixing record/replay and adding reverse debugging
Message-Id: <20181010133333.24538.53169.stgit@pasha-VirtualBox>
* {PATCH v2 0/3} Bootstrap Python venv and acceptance/functional tests
Message-Id: <20181009041826.19462-1-crosa(a)redhat.com>
--
Alex Bennée
[LLVM-203] Investigation into profiling and code-size optimizations
- Collected the remaining data I needed over the weekend.
- Wrote up report
- Rebased patches on tip of trunk
- Attached results and report to Jira issue.
- A one line summary of the results is that if you are lucky you can
get close to peak performance at close to Os code size if your program
happens to spend most of its time in a few small places. If you are
unlucky then increased inlining and unrolling can still result in an
overall code size increase over -O3 but the effect will be limited.
[LLVM-158] Monitor and maintain buildbots
- Relatively quiet week, a couple of patches pinged for fixes/reverts.
== This Week ==
* TCWG-1234: Coremark regression (7/10)
- Fixed golang regressions with the patch.
- Posted patch upstream to change hoisting order and apply cost model.
* Public holiday (2/10)
* Misc (1/10)
- Meetings
== Next Week ==
- TCWG-319, SVE
[VIRT-214 # SVE System Registers ]
v4 posted and merged to target-arm.next; will be in master shortly.
That will complete basic SVE system mode support.
[UPSTREAM]
tcg-next patches collected and flushed.
Dug into apparently excessive overhead in aarch64 guest tlb flushing, noticed
by chance while doing something else. Two patches upstream, several more
written but need cleanup. Total overhead down from 25% to 9%.
r~
=== Work done during this past week ===
* GNU-296 / GCC PR85434 / CVE-2018-12886:
+ few more issues fixed and associated testing
+ now also running Thumb-1 bootstrap and testing
* Prepare patch to further update cpus and architectures in bfd
* Line management
=== Plan for week 43 ===
* GNU-296 / GCC PR85434 / CVE-2018-12886:
+ finish testing, and submit new stack protector patch for upstream review
* LLVM-432 (Support arithmetic on FileCheck regex variable):
+ extend testcase coverage (add tests for latest syntax change and
add more negative testing)
+ finish cleaning up the code
* Try to reproduce perf issue mentioned in week #30's weekly report on
latest perf
* Line management
Progress:
* VIRT-65 [QEMU upstream maintainership]
- code review
- use ID registers as master source for "should this CPU
have this feature" information (rth)
- clean up 32-bit Neon to use vector infrastructure (rth)
- don't let the kernel get loaded on top of our builtin
bootloader if it happens to ask for a zero text offset
- Xilinx Versal board patches
- target-arm pull requests
- some minor patches to fix new clang warnings
- preparation for KVM Forum/QEMU Summit etc next week
thanks
-- PMM
== Progress ==
* FDPIC
- GCC: handled feedback on v3 patches.
Not much info on how to test other existing uclinux targets.
Noticed that GCC trunk build fails when targeting cortex-m23
(v8-m.baseline), problems in support libs (libgcc, newlib)
* GCC upstream validation:
- reported a few regressions
- dealing with some random results, again
* GCC:
- bug report on aarch64 about misaligned accesses. Waiting for more
details to reproduce the problem.
* misc (conf-calls, meetings, emails, ....)
== Next ==
FDPIC:
- GCC: discuss v3 patches where the way forward is not clear yet
- uclibc-ng: look at how to test fdpic mode with openadk
- use qemu-system mode to run more tests
[LLVM-203] (was TCWG-1424, we've moved issues to a new project)
Started writing up results to close out this investigation.
- Reran some sample profiling test cases with a higher sample rate.
- Investigated why some test cases exploded in code-size with LTO.
- Got some results for thin LTO (broadly similar to LTO).
- Discovered that I need to pass in extra linker options to enable LTO
to use the new pass manager, sample profiling and setting of
optimisation level.
-- Need to rerun these configurations over the weekend.
- Have most of the surrounding text of the report written, now need to
work on presentation of results.
[TCWG-1473] Fix big-endian linux kernel builds for AArch32
Now committed upstream
Holiday Friday
o LLVM
* Machine Outliner on ARM prototype:
- Still debugging Thumb1 issues in Spec2K6
- Investigating issues in PIC mode
* Bots babysitting
o Misc
* Various meetings and discussions.
[VIRT-214 # SVE System Registers ]
Posted v3 patch set.
[VIRT-281 # Extend gdbstub for SVE ]
Crashed gdb. Posted patch fixing buffer overrun.
It was suggested to me that qemu's gdbstub might not support enough
modern bits of the remote protocol for SVE. So I spent quite a bit
of time reading up on the protocol and beginning to review
[PATCH v2 00/15] gdbstub: support for the multiprocess extension
In the end I'm not convinced there's anything missing for SVE.
I think I'm going to have to examine upstream gdb more closely,
running gdbserver proper.
[Upstream]
Collecting patches for tcg-next.
Misc patch review.
[GCC]
Pinged my LSE patch set from 2 Oct.
r~
== Progress ==
* FDPIC
- GCC: send v3 patches, got some feedback: will need another iteration
* GCC upstream validation:
- reported a few regressions
- dealing with some random results, again
* GCC:
- looking at bug report on aarch64 about misaligned accesses. Need
more details to reproduce the problem
* Newlib
- got a few small patches accepted
* misc (conf-calls, meetings, emails, ....)
== Next ==
FDPIC:
- GCC: handle v3 patches feedback
- uclibc-ng: look at how to test fdpic mode with openadk
=== Work done during this past week ===
* TCWG-1428 (Support arithmetic on FileCheck regex variable):
+ continued cleaning up code and painfully rebased it on recent trunk
* GNU-296 / GCC PR85434 / CVE-2018-12886:
+ fixed changes to routines for PIC access to use specified register
+ fixed 2 more issues in stack protector new instruction patterns
+ testing for arm and thumb2, now starting over due to one of the above issues
* GNU-580 / PR86968: in progress
+ investigate, try 2 approaches, need to start looking into 3rd approach
* Line management.
=== Plan for week 42 ===
* GNU-296 / GCC PR85434 / CVE-2018-12886:
+ finish testing, and submit new stack protector patch for upstream review
* TCWG-1428 (Support arithmetic on FileCheck regex variable):
+ extend testcase coverage (add tests for latest syntax change and
add more negative testing)
+ finish cleaning up the code
* GNU-580 / PR86968: in progress
+ attempt 3rd approach
* Try to reproduce perf issue mentioned in week #30's weekly report on
latest perf
* Line management:
+ continue progress on rotations
+ start preparing first AFDS
[TCWG-1473] Fix -fno-integrated-as and -mbig-endian (Linux Kernel
Build with clang)
- Needed some revision to handle linker emulation. Patch in upstream review
[TCWG-1474] Fix out of range branch (CBZ) when -fimplicit-it (or
-fno-integrated-as) and certain kinds of inline assembly
- Committed upstream.
[TCWG-1424] Code-size investigations with PGO
- Marking functions for size optimisation at the earliest possible
stage improves code-size for little loss in performance. The main
beneficiary is that loops are not unrolled in size optimised functions
and inline thresholds are lower.
- LTO with instrumented profiling still sees large increase in size.
Originally thought my changes weren't working with LTO but I think
that something else is happening.
-- Found out that the profiling information isn't being sent to the
LTO code-generator (although it should be present as IR annotations
from the objects.
-- There is an option to pass the sample profile through to the LTO
code-generator but not an instrumented profile file.
-- It seems like the LTO plugin doesn't use the new pass manager
unless a separate option is passed through to the code-generator.
-- It seems like Thin-LTO is where most of upstream development is
these days and there is a slightly different pass pipeline, and some
interaction with profiling. Worth some more experiments.
First draft made of incorporating YVR18 Jira discussion into
Confluence https://collaborate.linaro.org/display/TCWG/JIRA+Usage+and+Best+Practices
Progress:
* VIRT-65 [QEMU upstream maintainership]
- code review:
+ Xilinx Versal SoC support
- investigated problem with a "suppress this warning patch" which
gcc 8 didn't like. It turns out that _Pragma() in GCC is a bit
of a disaster area; fortunately we only need to suppress a
warning here for clang, so we can just avoid using _Pragma() with GCC.
(cf GCC bugs 85153, 69558, 82335, 66099, 55578, 69543.
clang is not flawless here either: cf clang bugs 31999, 15129, 35154.
The clang false-positive warning we're working around is bug 39113.)
* VIRT-164 [improve Cortex-M emulation]
- stack-limit emulation patches have now gone into master, so this
epic can be closed out. Some v8M work will continue under
VIRT-268 (notably FP emulation); bugfixing and similar
small work will go under the general VIRT-65 maintainership epic.
* VIRT-215 ["run microvisors", aka support AArch32 Hyp mode]
- working through some of the HCR bits we don't implement, to see
if any of them are the cause of the failures I see with AArch32
hypervisors. (Sadly they don't seem to be.) Sent out patches
implementing HCR.{FB,DC,VI,VF,PTW} and fixing some syndrome
reporting corner cases where AArch32 differs from AArch64.
thanks
-- PMM
[VIRT-249 # SVE System Mode ]
Posted v3 (and hopefully final) patch set for system mode.
[Upstream]
Fixed a problem with softfloat division; 3 versions + pull posted.
Posted v3 of a cleanup to 128-bit atomics.
[GCC]
V2 of the -matomic-ool patch set posted.
r~
o LLVM
* Machine Outliner on ARM prototype:
- Fixed some Thumb2 issues
- Implemented Thumb1 support
- Debugging Thumb1 issues in Spec2K6
* Bots babysitting
o Misc
* Various meetings and discussions.
=== Work done during this past week ===
* One day annual leave
* TCWG-1428 (Support arithmetic on FileCheck regex variable):
+ started cleaning up code and continued adding support for last syntax tweaks
* TCWG-1379 / GCC PR85434 / CVE-2018-12886: rework needed
+ After more testing committed fix for register allocator and
reverted following regression
+ fall back to workaround as register allocator lack information to
decide whether it's safe to not reload an address
+ start testing workaround more extensively
* TCWG-1470 / PR87374: upstream review
+ add missing documentation, improve related code slightly, submit
for external review again
Misc:
+ bits of line management
+ Doughnut session on OSS strategy
+ discussion around JIRA use in TCWG
=== Plan for week 41 ===
* TCWG-1379 / GCC PR85434 / CVE-2018-12886:
+ finish testing, and submit new stack protector patch for upstream review
* TCWG-1428 (Support arithmetic on FileCheck regex variable):
+ finish change to support last syntax changes
+ extend testcase coverage (add tests for latest syntax change and
add more negative testing)
+ start cleaning up the code
* Try to reproduce perf issue mentioned in week #30's weekly report on
latest perf
* Line management
[TCWG-1473] Fix -fno-integrated-as and -mbig-endian (Linux Kernel
Build with clang)
- Patch in upstream review
[TCWG-1474] Fix out of range branch (CBZ) when -fimplicit-it (or
-fno-integrated-as) and certain kinds of inline assembly
- Patch in upstream review
[TCWG-1424] Code-size investigations with PGO
- Reworked the clang command line options and pass manager interface
so I could insert the pass prior to inlining.
- Benchmarks running over the weekend.
SVE Support ([VIRT-198])
========================
SVE Reviews
- reviewed and tested {PATCH v2 0/4} softfloat: Fix division
Message-Id: <20181003180711.19335-5-richard.henderson(a)linaro.org>
and v3
QEMU Tooling ([VIRT-252])
=========================
[VIRT-252] https://projects.linaro.org/browse/VIRT-252
[VIRT-280] https://projects.linaro.org/browse/VIRT-280
QEMU plugin support ([VIRT-280])
- following discussion with team started adding plugin hooks to
[tracepoint clean-up]
- I think I need to resurrect and expand the cputlb de-macro stuff
to cleanly add memory tracing
- written hotblocks and tlbstats tools
- posted {RFC PATCH 00/21} Trace updates and plugin RFC Message-Id:
<20181005154910.3099-1-alex.bennee(a)linaro.org>
[VIRT-280] https://projects.linaro.org/browse/VIRT-280
[tracepoint clean-up]
https://github.com/stsquad/qemu/tree/misc/dfilter-and-trace-tweaks-v2
Kernel Debug via gdbstub
- There was some discussion about improving debug experience with
kernel debugging
- for example while in KVM a single-step usually goes to the
exception table
- in TCG this isn't always the case (maybe time accounting is
better?)
- follow-up on x86 and kgdb experience :todo
- worth creating a STORY for this work? :todo
Upstream Work ([VIRT-109])
==========================
- started looking at {PATCH 0/7} Acceptance Tests: basic architecture
support Message-Id: <20181004151429.7232-1-crosa(a)redhat.com>
- get the upstream CI back on track :todo
[VIRT-109] https://projects.linaro.org/browse/VIRT-109
Other Tasks
===========
- Started drafting the QEMU Keynote/Status report for KVM Forum 2018
Completed Reviews [6/6]
=======================
{PATCH v3 0/3} softfloat tests based on berkeley's testfloat
Message-Id: <20180913213910.28189-1-cota(a)braap.org>
- CLOSING NOTE [2018-09-28 Fri 19:40]
I'm happy with this
{PATCH v2 0/4} softfloat: Fix division
Message-Id: <20181003180711.19335-5-richard.henderson(a)linaro.org>
- CLOSING NOTE [2018-10-04 Thu 10:37]
some failures on emilio's tests
{Qemu-devel} {RFC PATCH v2 0/3} acceptance tests: Test firmware checking debug console output
Message-Id: <20181003183036.6716-1-philmd(a)redhat.com>
- CLOSING NOTE [2018-10-04 Thu 15:07]
Still broken for multiarch
{PATCH v2 0/4} per-TLB lock
Message-Id: <20181003200454.18384-1-cota(a)braap.org>
- CLOSING NOTE [2018-10-04 Thu 15:07]
Baring a few compile fixes it looks pretty stable in the soak tests.
{PATCH} fpu/softfloat: Replace countLeadingZeros32/64 with clz32/64
Message-Id: <1538118095-7003-1-git-send-email-thuth(a)redhat.com>
- CLOSING NOTE [2018-10-04 Thu 15:09]
Simple clean-up
{PATCH v3 0/4} softfloat: Fix division
Message-Id: <20181004175700.20847-1-richard.henderson(a)linaro.org>
- CLOSING NOTE [2018-10-05 Fri 16:59]
Looks good
Absences
========
- KVM Forum 2018 (24th-26th October 2018)
Current Review Queue
====================
* {PATCH 0/7} Acceptance Tests: basic architecture support
Message-Id: <20181004151429.7232-1-crosa(a)redhat.com>
* {Qemu-arm} {PATCH 00/13} target/arm: Implement v8M stack limit checks
Message-Id: <20181002163556.10279-1-peter.maydell(a)linaro.org>
* {Qemu-arm} {PATCH v2 00/15} gdbstub: support for the multiprocess extension
Message-Id: <20181001115704.701-1-luc.michel(a)greensocs.com>
* {Qemu-devel} {PATCH v2 0/9} target/arm: Rely on id regs instead of features
Message-Id: <20180927211322.16118-1-richard.henderson(a)linaro.org>
* {Qemu-devel} {PATCH v2 00/15} target/arm: sve system mode patches
Message-Id: <20180926192323.12659-1-richard.henderson(a)linaro.org>
* {Qemu-arm} {PATCH v2 00/15} gdbstub: support for the multiprocess extension
Message-Id: <20181001115704.701-1-luc.michel(a)greensocs.com>
--
Alex Bennée
Progress:
* VIRT-65 [QEMU upstream maintainership]
- code review:
+ bug fixes for aarch64 KVM debug support
+ various patches to avoid deprecated sysbus:init API
+ SVE system emulation support
- investigated a bug where system reset requested by a device
model was sometimes not firing -- seems to be a race condition
* VIRT-164 [improve Cortex-M emulation]
- picked up the half-finished patches for stack-limit emulation that
I'd written before going off on holiday, and completed them.
Sent the patchset out for review.
- sent patches for a couple of other minor bugs noticed in the process
thanks
-- PMM
== Progress ==
* FDPIC
- GCC: handling feedback on v2 patches
* GCC upstream validation:
- reported a few regressions
- dealing with some random results, again
- discussing collaboration with kernel-ci
* Linaro gcc-7 release
- backported fixes for bug #4007
* Newlib
- sent a few small patches to remove warnings when building for Arm and Aarch64
* misc (conf-calls, meetings, emails, ....)
- (internal) Wrote report about GNU Cauldron 2018
== Next ==
FDPIC:
- GCC: send v3 patches feedback
- uclibc-ng: look at how to test fdpic mode with openadk