Implement support to save/restore the ARM KVM VGIC state from the kernel. The basic appraoch is to transfer state from the in-kernel VGIC to the emulated arm-gic state representation and let the standard QEMU vmstate save/restore handle saving the arm-gic state. Restore works by reversing the process.
The first few patches adds missing features and fixes issues with the arm-gic implementation in qemu in preparation for the actual save/restore logic.
The patches depend on the device control patch series sent out earlier, which can also be found here: git://git.linaro.org/people/cdall/qemu-arm.git migration/device-ctrl
The whole patch series based on top of the above can be found here: git://git.linaro.org/people/cdall/qemu-arm.git migration/vgic
Christoffer Dall (5): hw: arm_gic: Fix gic_set_irq handling hw: arm_gic: Introduce GIC_SET_PRIORITY macro hw: arm_gic: Keep track of SGI sources hw: arm_gic: Support setting/getting binary point reg hw: arm_gic_kvm: Add KVM VGIC save/restore logic
hw/intc/arm_gic.c | 58 +++--- hw/intc/arm_gic_common.c | 4 + hw/intc/arm_gic_kvm.c | 418 ++++++++++++++++++++++++++++++++++++++++++- hw/intc/gic_internal.h | 13 ++ include/migration/vmstate.h | 6 + 5 files changed, 477 insertions(+), 22 deletions(-)
For some reason only edge-triggered or enabled level-triggered interrupts would set the pending state of a raised IRQ. This is not in compliance with the specs, which indicate that the pending state is separate from the enabled state, which only controls if a pending interrupt is actually forwarded to the CPU interface.
Therefore, simply always set the pending state on a rising edge, but only clear the pending state of falling edge if the interrupt is level triggered.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org --- hw/intc/arm_gic.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index d431b7a..bff3f9e 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -128,11 +128,12 @@ static void gic_set_irq(void *opaque, int irq, int level)
if (level) { GIC_SET_LEVEL(irq, cm); - if (GIC_TEST_TRIGGER(irq) || GIC_TEST_ENABLED(irq, cm)) { - DPRINTF("Set %d pending mask %x\n", irq, target); - GIC_SET_PENDING(irq, target); - } + DPRINTF("Set %d pending mask %x\n", irq, target); + GIC_SET_PENDING(irq, target); } else { + if (!GIC_TEST_TRIGGER(irq)) { + gic_clear_pending(s, irq, target, 0); + } GIC_CLEAR_LEVEL(irq, cm); } gic_update(s);
On 23 August 2013 21:10, Christoffer Dall christoffer.dall@linaro.org wrote:
For some reason only edge-triggered or enabled level-triggered interrupts would set the pending state of a raised IRQ. This is not in compliance with the specs, which indicate that the pending state is separate from the enabled state, which only controls if a pending interrupt is actually forwarded to the CPU interface.
Therefore, simply always set the pending state on a rising edge, but only clear the pending state of falling edge if the interrupt is level triggered.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index d431b7a..bff3f9e 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -128,11 +128,12 @@ static void gic_set_irq(void *opaque, int irq, int level)
if (level) { GIC_SET_LEVEL(irq, cm);
if (GIC_TEST_TRIGGER(irq) || GIC_TEST_ENABLED(irq, cm)) {
DPRINTF("Set %d pending mask %x\n", irq, target);
GIC_SET_PENDING(irq, target);
}
DPRINTF("Set %d pending mask %x\n", irq, target);
} else {GIC_SET_PENDING(irq, target);
if (!GIC_TEST_TRIGGER(irq)) {
gic_clear_pending(s, irq, target, 0);
} gic_update(s);} GIC_CLEAR_LEVEL(irq, cm);
This doesn't compile:
hw/intc/arm_gic.c: In function ‘gic_set_irq’: hw/intc/arm_gic.c:135:13: error: implicit declaration of function ‘gic_clear_pending’ [-Werror=implicit-function-declaration] hw/intc/arm_gic.c:135:13: error: nested extern declaration of ‘gic_clear_pending’ [-Werror=nested-externs]
(you don't provide that function until patch 3).
thanks -- PMM
On Fri, Sep 06, 2013 at 02:59:08PM +0100, Peter Maydell wrote:
On 23 August 2013 21:10, Christoffer Dall christoffer.dall@linaro.org wrote:
For some reason only edge-triggered or enabled level-triggered interrupts would set the pending state of a raised IRQ. This is not in compliance with the specs, which indicate that the pending state is separate from the enabled state, which only controls if a pending interrupt is actually forwarded to the CPU interface.
Therefore, simply always set the pending state on a rising edge, but only clear the pending state of falling edge if the interrupt is level triggered.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index d431b7a..bff3f9e 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -128,11 +128,12 @@ static void gic_set_irq(void *opaque, int irq, int level)
if (level) { GIC_SET_LEVEL(irq, cm);
if (GIC_TEST_TRIGGER(irq) || GIC_TEST_ENABLED(irq, cm)) {
DPRINTF("Set %d pending mask %x\n", irq, target);
GIC_SET_PENDING(irq, target);
}
DPRINTF("Set %d pending mask %x\n", irq, target);
} else {GIC_SET_PENDING(irq, target);
if (!GIC_TEST_TRIGGER(irq)) {
gic_clear_pending(s, irq, target, 0);
} gic_update(s);} GIC_CLEAR_LEVEL(irq, cm);
This doesn't compile:
hw/intc/arm_gic.c: In function ‘gic_set_irq’: hw/intc/arm_gic.c:135:13: error: implicit declaration of function ‘gic_clear_pending’ [-Werror=implicit-function-declaration] hw/intc/arm_gic.c:135:13: error: nested extern declaration of ‘gic_clear_pending’ [-Werror=nested-externs]
(you don't provide that function until patch 3).
whoops, should be GIC_CLEAR_PENDING at this stage. Thanks for testing the bisectability.
-Christoffer
To make the code slightly cleaner to look at and make the save/restore code easier to understand, introduce this macro to set the priority of interrupts.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org --- hw/intc/arm_gic.c | 6 +----- hw/intc/gic_internal.h | 6 ++++++ 2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index bff3f9e..a7bb528 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -444,11 +444,7 @@ static void gic_dist_writeb(void *opaque, hwaddr offset, irq = (offset - 0x400) + GIC_BASE_IRQ; if (irq >= s->num_irq) goto bad_reg; - if (irq < GIC_INTERNAL) { - s->priority1[irq][cpu] = value; - } else { - s->priority2[irq - GIC_INTERNAL] = value; - } + GIC_SET_PRIORITY(irq, cpu, value); } else if (offset < 0xc00) { /* Interrupt CPU Target. RAZ/WI on uniprocessor GICs, with the * annoying exception of the 11MPCore's GIC. diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index 1426437..d835aa1 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -57,6 +57,12 @@ #define GIC_GET_PRIORITY(irq, cpu) (((irq) < GIC_INTERNAL) ? \ s->priority1[irq][cpu] : \ s->priority2[(irq) - GIC_INTERNAL]) +#define GIC_SET_PRIORITY(irq, cpu, val) do { \ + uint8_t *__x = ((irq) < GIC_INTERNAL) ? \ + &s->priority1[irq][cpu] : \ + &s->priority2[(irq) - GIC_INTERNAL]; \ + *__x = val; \ +} while (0) #define GIC_TARGET(irq) s->irq_target[irq]
typedef struct gic_irq_state {
On 23.08.2013, at 21:10, Christoffer Dall wrote:
To make the code slightly cleaner to look at and make the save/restore code easier to understand, introduce this macro to set the priority of interrupts.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic.c | 6 +----- hw/intc/gic_internal.h | 6 ++++++ 2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index bff3f9e..a7bb528 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -444,11 +444,7 @@ static void gic_dist_writeb(void *opaque, hwaddr offset, irq = (offset - 0x400) + GIC_BASE_IRQ; if (irq >= s->num_irq) goto bad_reg;
if (irq < GIC_INTERNAL) {
s->priority1[irq][cpu] = value;
} else {
s->priority2[irq - GIC_INTERNAL] = value;
}
} else if (offset < 0xc00) { /* Interrupt CPU Target. RAZ/WI on uniprocessor GICs, with the * annoying exception of the 11MPCore's GIC.GIC_SET_PRIORITY(irq, cpu, value);
diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index 1426437..d835aa1 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -57,6 +57,12 @@ #define GIC_GET_PRIORITY(irq, cpu) (((irq) < GIC_INTERNAL) ? \ s->priority1[irq][cpu] : \ s->priority2[(irq) - GIC_INTERNAL]) +#define GIC_SET_PRIORITY(irq, cpu, val) do { \
- uint8_t *__x = ((irq) < GIC_INTERNAL) ? \
&s->priority1[irq][cpu] : \
&s->priority2[(irq) - GIC_INTERNAL]; \
- *__x = val; \
+} while (0)
Why not make this a function?
Alex
#define GIC_TARGET(irq) s->irq_target[irq]
typedef struct gic_irq_state {
1.7.10.4
kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
On Sun, Aug 25, 2013 at 04:37:44PM +0100, Alexander Graf wrote:
On 23.08.2013, at 21:10, Christoffer Dall wrote:
To make the code slightly cleaner to look at and make the save/restore code easier to understand, introduce this macro to set the priority of interrupts.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic.c | 6 +----- hw/intc/gic_internal.h | 6 ++++++ 2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index bff3f9e..a7bb528 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -444,11 +444,7 @@ static void gic_dist_writeb(void *opaque, hwaddr offset, irq = (offset - 0x400) + GIC_BASE_IRQ; if (irq >= s->num_irq) goto bad_reg;
if (irq < GIC_INTERNAL) {
s->priority1[irq][cpu] = value;
} else {
s->priority2[irq - GIC_INTERNAL] = value;
}
} else if (offset < 0xc00) { /* Interrupt CPU Target. RAZ/WI on uniprocessor GICs, with the * annoying exception of the 11MPCore's GIC.GIC_SET_PRIORITY(irq, cpu, value);
diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index 1426437..d835aa1 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -57,6 +57,12 @@ #define GIC_GET_PRIORITY(irq, cpu) (((irq) < GIC_INTERNAL) ? \ s->priority1[irq][cpu] : \ s->priority2[(irq) - GIC_INTERNAL]) +#define GIC_SET_PRIORITY(irq, cpu, val) do { \
- uint8_t *__x = ((irq) < GIC_INTERNAL) ? \
&s->priority1[irq][cpu] : \
&s->priority2[(irq) - GIC_INTERNAL]; \
- *__x = val; \
+} while (0)
Why not make this a function?
Ah, you're no fun anymore. ok.
-Christoffer
Right now the arm gic emulation doesn't keep track of the source of an SGI (which apparently Linux guests don't use, or they're fine with assuming CPU 0 always).
Add the necessary matrix on the GICState structure and maintain the data when setting and clearing the pending state of an IRQ.
Note that we always choose to present the source as the lowest-numbered CPU in case multiple cores have signalled the same SGI number to a core on the system.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org --- hw/intc/arm_gic.c | 38 ++++++++++++++++++++++++++++++-------- hw/intc/arm_gic_common.c | 1 + hw/intc/gic_internal.h | 3 +++ 3 files changed, 34 insertions(+), 8 deletions(-)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index a7bb528..4da534f 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -97,6 +97,20 @@ void gic_set_pending_private(GICState *s, int cpu, int irq) gic_update(s); }
+static void gic_clear_pending(GICState *s, int irq, int cm, uint8_t src) +{ + unsigned cpu; + + GIC_CLEAR_PENDING(irq, cm); + if (irq < GIC_NR_SGIS) { + cpu = (unsigned)ffs(cm) - 1; + while (cpu < NCPU) { + s->sgi_source[irq][cpu] &= ~(1 << src); + cpu = (unsigned)ffs(cm) - 1; + } + } +} + /* Process a change in an external IRQ input. */ static void gic_set_irq(void *opaque, int irq, int level) { @@ -163,7 +177,8 @@ uint32_t gic_acknowledge_irq(GICState *s, int cpu) s->last_active[new_irq][cpu] = s->running_irq[cpu]; /* Clear pending flags for both level and edge triggered interrupts. Level triggered IRQs will be reasserted once they become inactive. */ - GIC_CLEAR_PENDING(new_irq, GIC_TEST_MODEL(new_irq) ? ALL_CPU_MASK : cm); + gic_clear_pending(s, new_irq, GIC_TEST_MODEL(new_irq) ? ALL_CPU_MASK : cm, + GIC_SGI_SRC(new_irq, cpu)); gic_set_running_irq(s, cpu, new_irq); DPRINTF("ACK %d\n", new_irq); return new_irq; @@ -428,12 +443,9 @@ static void gic_dist_writeb(void *opaque, hwaddr offset, irq = (offset - 0x280) * 8 + GIC_BASE_IRQ; if (irq >= s->num_irq) goto bad_reg; - for (i = 0; i < 8; i++) { - /* ??? This currently clears the pending bit for all CPUs, even - for per-CPU interrupts. It's unclear whether this is the - corect behavior. */ - if (value & (1 << i)) { - GIC_CLEAR_PENDING(irq + i, ALL_CPU_MASK); + for (i = 0; i < 8; i++, irq++) { + if (irq > GIC_NR_SGIS && value & (1 << i)) { + gic_clear_pending(s, irq, 1 << cpu, 0); } } } else if (offset < 0x400) { @@ -506,6 +518,7 @@ static void gic_dist_writel(void *opaque, hwaddr offset, int cpu; int irq; int mask; + unsigned target_cpu;
cpu = gic_get_current_cpu(s); irq = value & 0x3ff; @@ -525,6 +538,11 @@ static void gic_dist_writel(void *opaque, hwaddr offset, break; } GIC_SET_PENDING(irq, mask); + target_cpu = (unsigned)ffs(mask) - 1; + while (target_cpu < NCPU) { + s->sgi_source[irq][target_cpu] |= (1 << cpu); + target_cpu = (unsigned)ffs(mask) - 1; + } gic_update(s); return; } @@ -542,6 +560,8 @@ static const MemoryRegionOps gic_dist_ops = {
static uint32_t gic_cpu_read(GICState *s, int cpu, int offset) { + int value; + switch (offset) { case 0x00: /* Control */ return s->cpu_enabled[cpu]; @@ -551,7 +571,9 @@ static uint32_t gic_cpu_read(GICState *s, int cpu, int offset) /* ??? Not implemented. */ return 0; case 0x0c: /* Acknowledge */ - return gic_acknowledge_irq(s, cpu); + value = gic_acknowledge_irq(s, cpu); + value |= (GIC_SGI_SRC(value, cpu) & 0x7) << 10; + return value; case 0x14: /* Running Priority */ return s->running_priority[cpu]; case 0x18: /* Highest Pending Interrupt */ diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index 709b5c2..7a1c9e5 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -71,6 +71,7 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT8_2DARRAY(priority1, GICState, GIC_INTERNAL, NCPU), VMSTATE_UINT8_ARRAY(priority2, GICState, GIC_MAXIRQ - GIC_INTERNAL), VMSTATE_UINT16_2DARRAY(last_active, GICState, GIC_MAXIRQ, NCPU), + VMSTATE_UINT8_2DARRAY(sgi_source, GICState, GIC_NR_SGIS, NCPU), VMSTATE_UINT16_ARRAY(priority_mask, GICState, NCPU), VMSTATE_UINT16_ARRAY(running_irq, GICState, NCPU), VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index d835aa1..6f04885 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -27,6 +27,7 @@ #define GIC_MAXIRQ 1020 /* First 32 are private to each CPU (SGIs and PPIs). */ #define GIC_INTERNAL 32 +#define GIC_NR_SGIS 16 /* Maximum number of possible CPU interfaces, determined by GIC architecture */ #define NCPU 8
@@ -64,6 +65,7 @@ *__x = val; \ } while (0) #define GIC_TARGET(irq) s->irq_target[irq] +#define GIC_SGI_SRC(irq, cpu) (((irq) < GIC_NR_SGIS) ? ffs(s->sgi_source[irq][cpu]) - 1 : 0)
typedef struct gic_irq_state { /* The enable bits are only banked for per-cpu interrupts. */ @@ -89,6 +91,7 @@ typedef struct GICState { uint8_t priority1[GIC_INTERNAL][NCPU]; uint8_t priority2[GIC_MAXIRQ - GIC_INTERNAL]; uint16_t last_active[GIC_MAXIRQ][NCPU]; + uint8_t sgi_source[GIC_NR_SGIS][NCPU];
uint16_t priority_mask[NCPU]; uint16_t running_irq[NCPU];
On 23 August 2013 21:10, Christoffer Dall christoffer.dall@linaro.org wrote:
Right now the arm gic emulation doesn't keep track of the source of an SGI (which apparently Linux guests don't use, or they're fine with assuming CPU 0 always).
Add the necessary matrix on the GICState structure and maintain the data when setting and clearing the pending state of an IRQ.
Note that we always choose to present the source as the lowest-numbered CPU in case multiple cores have signalled the same SGI number to a core on the system.
@@ -525,6 +538,11 @@ static void gic_dist_writel(void *opaque, hwaddr offset, break; } GIC_SET_PENDING(irq, mask);
target_cpu = (unsigned)ffs(mask) - 1;
while (target_cpu < NCPU) {
s->sgi_source[irq][target_cpu] |= (1 << cpu);
target_cpu = (unsigned)ffs(mask) - 1;
}
This is an infinite loop, because you don't do anything with mask inside the loop, so target_cpu is always the same each time round. gcc with optimization is smart enough to notice this:
=> 0x00005555556c1625 <+229>: jmp 0x5555556c1625 <gic_dist_writel+229>
:-)
Unsurprisingly, my test vexpress-a9 image hangs on startup.
--- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -71,6 +71,7 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT8_2DARRAY(priority1, GICState, GIC_INTERNAL, NCPU), VMSTATE_UINT8_ARRAY(priority2, GICState, GIC_MAXIRQ - GIC_INTERNAL), VMSTATE_UINT16_2DARRAY(last_active, GICState, GIC_MAXIRQ, NCPU),
VMSTATE_UINT8_2DARRAY(sgi_source, GICState, GIC_NR_SGIS, NCPU), VMSTATE_UINT16_ARRAY(priority_mask, GICState, NCPU), VMSTATE_UINT16_ARRAY(running_irq, GICState, NCPU), VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU),
You need to bump the version_id and minimum_version_id if you add a new field here.
-- PMM
On Fri, Sep 06, 2013 at 03:08:16PM +0100, Peter Maydell wrote:
On 23 August 2013 21:10, Christoffer Dall christoffer.dall@linaro.org wrote:
Right now the arm gic emulation doesn't keep track of the source of an SGI (which apparently Linux guests don't use, or they're fine with assuming CPU 0 always).
Add the necessary matrix on the GICState structure and maintain the data when setting and clearing the pending state of an IRQ.
Note that we always choose to present the source as the lowest-numbered CPU in case multiple cores have signalled the same SGI number to a core on the system.
@@ -525,6 +538,11 @@ static void gic_dist_writel(void *opaque, hwaddr offset, break; } GIC_SET_PENDING(irq, mask);
target_cpu = (unsigned)ffs(mask) - 1;
while (target_cpu < NCPU) {
s->sgi_source[irq][target_cpu] |= (1 << cpu);
target_cpu = (unsigned)ffs(mask) - 1;
}
This is an infinite loop, because you don't do anything with mask inside the loop, so target_cpu is always the same each time round. gcc with optimization is smart enough to notice this:
=> 0x00005555556c1625 <+229>: jmp 0x5555556c1625 <gic_dist_writel+229>
:-)
Unsurprisingly, my test vexpress-a9 image hangs on startup.
that's unfortunate, I think I meant to use find_next_bit and pass target_cpu, but ended up deciding to just clear the bit in the mask, but forgot to actually clear the bit. Whoops.
-Christoffer
--- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -71,6 +71,7 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT8_2DARRAY(priority1, GICState, GIC_INTERNAL, NCPU), VMSTATE_UINT8_ARRAY(priority2, GICState, GIC_MAXIRQ - GIC_INTERNAL), VMSTATE_UINT16_2DARRAY(last_active, GICState, GIC_MAXIRQ, NCPU),
VMSTATE_UINT8_2DARRAY(sgi_source, GICState, GIC_NR_SGIS, NCPU), VMSTATE_UINT16_ARRAY(priority_mask, GICState, NCPU), VMSTATE_UINT16_ARRAY(running_irq, GICState, NCPU), VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU),
You need to bump the version_id and minimum_version_id if you add a new field here.
Add a binary_point field to the gic emulation structure and support setting/getting this register now when we have it. We don't actually support interrupt grouping yet, oh well.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org --- hw/intc/arm_gic.c | 5 ++--- hw/intc/arm_gic_common.c | 1 + hw/intc/gic_internal.h | 3 +++ 3 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index 4da534f..cb2004d 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -568,8 +568,7 @@ static uint32_t gic_cpu_read(GICState *s, int cpu, int offset) case 0x04: /* Priority mask */ return s->priority_mask[cpu]; case 0x08: /* Binary Point */ - /* ??? Not implemented. */ - return 0; + return s->binary_point[0][cpu]; case 0x0c: /* Acknowledge */ value = gic_acknowledge_irq(s, cpu); value |= (GIC_SGI_SRC(value, cpu) & 0x7) << 10; @@ -596,7 +595,7 @@ static void gic_cpu_write(GICState *s, int cpu, int offset, uint32_t value) s->priority_mask[cpu] = (value & 0xff); break; case 0x08: /* Binary Point */ - /* ??? Not implemented. */ + s->binary_point[0][cpu] = (value & 0x7); break; case 0x10: /* End Of Interrupt */ return gic_complete_irq(s, cpu, value & 0x3ff); diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index 7a1c9e5..a50ded2 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -76,6 +76,7 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT16_ARRAY(running_irq, GICState, NCPU), VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), VMSTATE_UINT16_ARRAY(current_pending, GICState, NCPU), + VMSTATE_UINT8_2DARRAY(binary_point, GICState, 2, NCPU), VMSTATE_END_OF_LIST() } }; diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index 6f04885..424a41f 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -98,6 +98,9 @@ typedef struct GICState { uint16_t running_priority[NCPU]; uint16_t current_pending[NCPU];
+ /* these registers are mainly used for save/restore of KVM state */ + uint8_t binary_point[2][NCPU]; /* [0]: group 0, [1]: group 1 */ + uint32_t num_cpu;
MemoryRegion iomem; /* Distributor */
"arm_gic:" is sufficient, "hw/arm_gic:" if you see the need.
Am 23.08.2013 22:10, schrieb Christoffer Dall:
Add a binary_point field to the gic emulation structure and support setting/getting this register now when we have it. We don't actually support interrupt grouping yet, oh well.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic.c | 5 ++--- hw/intc/arm_gic_common.c | 1 + hw/intc/gic_internal.h | 3 +++ 3 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index 4da534f..cb2004d 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -568,8 +568,7 @@ static uint32_t gic_cpu_read(GICState *s, int cpu, int offset) case 0x04: /* Priority mask */ return s->priority_mask[cpu]; case 0x08: /* Binary Point */
/* ??? Not implemented. */
return 0;
case 0x0c: /* Acknowledge */ value = gic_acknowledge_irq(s, cpu); value |= (GIC_SGI_SRC(value, cpu) & 0x7) << 10;return s->binary_point[0][cpu];
@@ -596,7 +595,7 @@ static void gic_cpu_write(GICState *s, int cpu, int offset, uint32_t value) s->priority_mask[cpu] = (value & 0xff); break; case 0x08: /* Binary Point */
/* ??? Not implemented. */
case 0x10: /* End Of Interrupt */ return gic_complete_irq(s, cpu, value & 0x3ff);s->binary_point[0][cpu] = (value & 0x7); break;
diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index 7a1c9e5..a50ded2 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -76,6 +76,7 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT16_ARRAY(running_irq, GICState, NCPU), VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), VMSTATE_UINT16_ARRAY(current_pending, GICState, NCPU),
}VMSTATE_UINT8_2DARRAY(binary_point, GICState, 2, NCPU), VMSTATE_END_OF_LIST()
};
You can't just add VMState fields without bumping the version_id and marking the field as starting in that version. That breaks loadvm/migration format compatibility.
diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index 6f04885..424a41f 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -98,6 +98,9 @@ typedef struct GICState { uint16_t running_priority[NCPU]; uint16_t current_pending[NCPU];
- /* these registers are mainly used for save/restore of KVM state */
- uint8_t binary_point[2][NCPU]; /* [0]: group 0, [1]: group 1 */
binary_point[1] is never used, but changing the VMState field from length 1 to length 2 is probably more difficult than having a few extra bytes here.
Regards, Andreas
- uint32_t num_cpu;
MemoryRegion iomem; /* Distributor */
On 23 August 2013 21:10, Christoffer Dall christoffer.dall@linaro.org wrote:
Add a binary_point field to the gic emulation structure and support setting/getting this register now when we have it. We don't actually support interrupt grouping yet, oh well.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic.c | 5 ++--- hw/intc/arm_gic_common.c | 1 + hw/intc/gic_internal.h | 3 +++ 3 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index 4da534f..cb2004d 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -568,8 +568,7 @@ static uint32_t gic_cpu_read(GICState *s, int cpu, int offset) case 0x04: /* Priority mask */ return s->priority_mask[cpu]; case 0x08: /* Binary Point */
/* ??? Not implemented. */
return 0;
case 0x0c: /* Acknowledge */ value = gic_acknowledge_irq(s, cpu); value |= (GIC_SGI_SRC(value, cpu) & 0x7) << 10;return s->binary_point[0][cpu];
@@ -596,7 +595,7 @@ static void gic_cpu_write(GICState *s, int cpu, int offset, uint32_t value) s->priority_mask[cpu] = (value & 0xff); break; case 0x08: /* Binary Point */
/* ??? Not implemented. */
case 0x10: /* End Of Interrupt */ return gic_complete_irq(s, cpu, value & 0x3ff);s->binary_point[0][cpu] = (value & 0x7); break;
diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index 7a1c9e5..a50ded2 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -76,6 +76,7 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT16_ARRAY(running_irq, GICState, NCPU), VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), VMSTATE_UINT16_ARRAY(current_pending, GICState, NCPU),
}VMSTATE_UINT8_2DARRAY(binary_point, GICState, 2, NCPU), VMSTATE_END_OF_LIST()
}; diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index 6f04885..424a41f 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -98,6 +98,9 @@ typedef struct GICState { uint16_t running_priority[NCPU]; uint16_t current_pending[NCPU];
- /* these registers are mainly used for save/restore of KVM state */
- uint8_t binary_point[2][NCPU]; /* [0]: group 0, [1]: group 1 */
This looks a bit fishy. I think fundamentally we need to be clear about what kind of GIC we (and KVM) are exposing to the guest. I think that that is supposed to be "a v2 GIC without the Security Extensions", yes? The other possible answer would be "a v2 GIC with the security extensions, but you're stuck in the NS world". The distinction is important, because it affects what state the guest can see. For v2-GIC-no-TZ, there are two registers' worth of state per CPU, accessible as GICC_BPR and GICC_ABPR. For v2-GIC-TZ, a guest stuck in the NS world only gets to see one register worth of state, ie the NS banked version of GICC_BPR (and GICC_ABPR is an alias of it). There is another register worth of state in the S banked GICC_BPR, but it's totally inaccessible to the guest.
The TCG QEMU GIC model is currently adopting the "GIC without Security Extensions" model, which implies that we should be implementing GIC_ABPR too. What model does KVM's in-kernel vGIC use? (ie what state does it put into binary_point[0] and [1] on save/load)?
thanks -- PMM
On Fri, Sep 06, 2013 at 03:41:04PM +0100, Peter Maydell wrote:
On 23 August 2013 21:10, Christoffer Dall christoffer.dall@linaro.org wrote:
Add a binary_point field to the gic emulation structure and support setting/getting this register now when we have it. We don't actually support interrupt grouping yet, oh well.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic.c | 5 ++--- hw/intc/arm_gic_common.c | 1 + hw/intc/gic_internal.h | 3 +++ 3 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index 4da534f..cb2004d 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -568,8 +568,7 @@ static uint32_t gic_cpu_read(GICState *s, int cpu, int offset) case 0x04: /* Priority mask */ return s->priority_mask[cpu]; case 0x08: /* Binary Point */
/* ??? Not implemented. */
return 0;
case 0x0c: /* Acknowledge */ value = gic_acknowledge_irq(s, cpu); value |= (GIC_SGI_SRC(value, cpu) & 0x7) << 10;return s->binary_point[0][cpu];
@@ -596,7 +595,7 @@ static void gic_cpu_write(GICState *s, int cpu, int offset, uint32_t value) s->priority_mask[cpu] = (value & 0xff); break; case 0x08: /* Binary Point */
/* ??? Not implemented. */
case 0x10: /* End Of Interrupt */ return gic_complete_irq(s, cpu, value & 0x3ff);s->binary_point[0][cpu] = (value & 0x7); break;
diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index 7a1c9e5..a50ded2 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -76,6 +76,7 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT16_ARRAY(running_irq, GICState, NCPU), VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), VMSTATE_UINT16_ARRAY(current_pending, GICState, NCPU),
}VMSTATE_UINT8_2DARRAY(binary_point, GICState, 2, NCPU), VMSTATE_END_OF_LIST()
}; diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index 6f04885..424a41f 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -98,6 +98,9 @@ typedef struct GICState { uint16_t running_priority[NCPU]; uint16_t current_pending[NCPU];
- /* these registers are mainly used for save/restore of KVM state */
- uint8_t binary_point[2][NCPU]; /* [0]: group 0, [1]: group 1 */
This looks a bit fishy. I think fundamentally we need to be clear about what kind of GIC we (and KVM) are exposing to the guest. I think that that is supposed to be "a v2 GIC without the Security Extensions", yes?
yes, at least that's what we export through the GICD_TYPER.
The other possible answer would be "a v2 GIC with the security extensions, but you're stuck in the NS world". The distinction is important, because it affects what state the guest can see. For v2-GIC-no-TZ, there are two registers' worth of state per CPU, accessible as GICC_BPR and GICC_ABPR. For v2-GIC-TZ, a guest stuck in the NS world only gets to see one register worth of state, ie the NS banked version of GICC_BPR (and GICC_ABPR is an alias of it). There is another register worth of state in the S banked GICC_BPR, but it's totally inaccessible to the guest.
I'm a little uneasy about what the GICV_ABPR actually lets the guest see, but as far as I can tell it is consistent with the v2-GIC-no-TZ because group 1 interrupts can be configured to inject virtual FIQs to the guest (not that we implement distributor support for that yet) so the GICC_ABPR accesses from the non-secure (and only) state will control group 1 interrupts if the GICC_CTLR.CBPR==0 and will just be ignored if GICC_CTLR.CBPR==1.
(Reading this part of the spec is not a particularly pleasent experience, especially the wording of the GICC_ABPR is funky, and it actually says that reading the GICC_ABPR reads the NS copy of GICC_IAR, which I think is a typo?)
The TCG QEMU GIC model is currently adopting the "GIC without Security Extensions" model, which implies that we should be implementing GIC_ABPR too. What model does KVM's in-kernel vGIC use? (ie what state does it put into binary_point[0] and [1] on save/load)?
We put whatever the guest writes into the GICV_BPR and GICV_ABPR, but the in-kernel distributor does not care about priorities at all and considers all interrupts to be group 0 interrupts, which just happens to work for Linux.
So yes, we should implement the GIC_ABPR, but there's no need for it yet. But, if I don't add these fields the guest will (by reading the GICC_[A]BPR registers) be able to tell it was migrated, but it will not influence anything on the distributor level. Therefore, by adding these fields we support the kernel's limited model fully without adding a bunch of code that we can't really test in real life anyhow, and it doesn't prevent us from adding full grouping support later on.
-Christoffer
On 14 September 2013 02:52, Christoffer Dall christoffer.dall@linaro.org wrote:
On Fri, Sep 06, 2013 at 03:41:04PM +0100, Peter Maydell wrote:
The TCG QEMU GIC model is currently adopting the "GIC without Security Extensions" model, which implies that we should be implementing GIC_ABPR too. What model does KVM's in-kernel vGIC use? (ie what state does it put into binary_point[0] and [1] on save/load)?
We put whatever the guest writes into the GICV_BPR and GICV_ABPR, but the in-kernel distributor does not care about priorities at all and considers all interrupts to be group 0 interrupts, which just happens to work for Linux.
So yes, we should implement the GIC_ABPR, but there's no need for it yet. But, if I don't add these fields the guest will (by reading the GICC_[A]BPR registers) be able to tell it was migrated, but it will not influence anything on the distributor level. Therefore, by adding these fields we support the kernel's limited model fully without adding a bunch of code that we can't really test in real life anyhow, and it doesn't prevent us from adding full grouping support later on.
I agree we should have the fields for KVM's benefit. I think we should also add simple reads-as-written code in the TCG side so they're both accessible. State that the TCG implementation can't access is a bit weird. We should probably also call the fields bpr[NCPU] and abpr[NCPU] or something, so they're a little more self-documenting about what they represent.
-- PMM
On Sat, Sep 14, 2013 at 10:46:37AM +0100, Peter Maydell wrote:
On 14 September 2013 02:52, Christoffer Dall christoffer.dall@linaro.org wrote:
On Fri, Sep 06, 2013 at 03:41:04PM +0100, Peter Maydell wrote:
The TCG QEMU GIC model is currently adopting the "GIC without Security Extensions" model, which implies that we should be implementing GIC_ABPR too. What model does KVM's in-kernel vGIC use? (ie what state does it put into binary_point[0] and [1] on save/load)?
We put whatever the guest writes into the GICV_BPR and GICV_ABPR, but the in-kernel distributor does not care about priorities at all and considers all interrupts to be group 0 interrupts, which just happens to work for Linux.
So yes, we should implement the GIC_ABPR, but there's no need for it yet. But, if I don't add these fields the guest will (by reading the GICC_[A]BPR registers) be able to tell it was migrated, but it will not influence anything on the distributor level. Therefore, by adding these fields we support the kernel's limited model fully without adding a bunch of code that we can't really test in real life anyhow, and it doesn't prevent us from adding full grouping support later on.
I agree we should have the fields for KVM's benefit. I think we should also add simple reads-as-written code in the TCG side so they're both accessible. State that the TCG implementation can't access is a bit weird. We should probably also call the fields bpr[NCPU] and abpr[NCPU] or something, so they're a little more self-documenting about what they represent.
I already added the accessors for the TCG side?
I will rename to bpr and abpr.
-Christoffer
On Thu, Sep 19, 2013 at 08:48:35PM +0100, Christoffer Dall wrote:
On Sat, Sep 14, 2013 at 10:46:37AM +0100, Peter Maydell wrote:
On 14 September 2013 02:52, Christoffer Dall christoffer.dall@linaro.org wrote:
On Fri, Sep 06, 2013 at 03:41:04PM +0100, Peter Maydell wrote:
The TCG QEMU GIC model is currently adopting the "GIC without Security Extensions" model, which implies that we should be implementing GIC_ABPR too. What model does KVM's in-kernel vGIC use? (ie what state does it put into binary_point[0] and [1] on save/load)?
We put whatever the guest writes into the GICV_BPR and GICV_ABPR, but the in-kernel distributor does not care about priorities at all and considers all interrupts to be group 0 interrupts, which just happens to work for Linux.
So yes, we should implement the GIC_ABPR, but there's no need for it yet. But, if I don't add these fields the guest will (by reading the GICC_[A]BPR registers) be able to tell it was migrated, but it will not influence anything on the distributor level. Therefore, by adding these fields we support the kernel's limited model fully without adding a bunch of code that we can't really test in real life anyhow, and it doesn't prevent us from adding full grouping support later on.
I agree we should have the fields for KVM's benefit. I think we should also add simple reads-as-written code in the TCG side so they're both accessible. State that the TCG implementation can't access is a bit weird. We should probably also call the fields bpr[NCPU] and abpr[NCPU] or something, so they're a little more self-documenting about what they represent.
I already added the accessors for the TCG side?
oh, accessor for ABPR, got it.
I will rename to bpr and abpr.
-Christoffer
Save and restore the ARM KVM VGIC state from the kernel. We rely on QEMU to marshal the GICState data structure and therefore simply synchronize the kernel state with the QEMU emulated state in both directions.
We take some care on the restore path to check the VGIC has been configured with enough IRQs and CPU interfaces that we can properly restore the state, and for separate set/clear registers we first fully clear the registers and then set the required bits.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org --- hw/intc/arm_gic_common.c | 2 + hw/intc/arm_gic_kvm.c | 418 ++++++++++++++++++++++++++++++++++++++++++- hw/intc/gic_internal.h | 1 + include/migration/vmstate.h | 6 + 4 files changed, 425 insertions(+), 2 deletions(-)
diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index a50ded2..f39bdc1 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -77,6 +77,8 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), VMSTATE_UINT16_ARRAY(current_pending, GICState, NCPU), VMSTATE_UINT8_2DARRAY(binary_point, GICState, 2, NCPU), + VMSTATE_UINT32_2DARRAY(active_prio, GICState, 4, NCPU), + VMSTATE_UINT32(num_irq, GICState), VMSTATE_END_OF_LIST() } }; diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c index 9f0a852..9785281 100644 --- a/hw/intc/arm_gic_kvm.c +++ b/hw/intc/arm_gic_kvm.c @@ -23,6 +23,15 @@ #include "kvm_arm.h" #include "gic_internal.h"
+//#define DEBUG_GIC_KVM + +#ifdef DEBUG_GIC_KVM +#define DPRINTF(fmt, ...) \ +do { fprintf(stderr, "arm_gic: " fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) do {} while(0) +#endif + #define TYPE_KVM_ARM_GIC "kvm-arm-gic" #define KVM_ARM_GIC(obj) \ OBJECT_CHECK(GICState, (obj), TYPE_KVM_ARM_GIC) @@ -73,14 +82,419 @@ static void kvm_arm_gic_set_irq(void *opaque, int irq, int level) kvm_set_irq(kvm_state, kvm_irq, !!level); }
+static bool kvm_arm_gic_can_save_restore(GICState *s) +{ + KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s); + return kgc->dev_fd >= 0; +} + +static void kvm_arm_gic_reg_access(GICState *s, int group, int offset, + int cpu, uint32_t *val, bool write) +{ + KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s); + struct kvm_device_attr attr; + int type; + + cpu = cpu & 0xff; + + attr.flags = 0; + attr.group = group; + attr.attr = (((uint64_t)cpu << KVM_DEV_ARM_VGIC_CPUID_SHIFT) & + KVM_DEV_ARM_VGIC_CPUID_MASK) | + (((uint64_t)offset << KVM_DEV_ARM_VGIC_OFFSET_SHIFT) & + KVM_DEV_ARM_VGIC_OFFSET_MASK); + attr.addr = (uint64_t)(long)val; + + if (write) { + type = KVM_SET_DEVICE_ATTR; + } else { + type = KVM_GET_DEVICE_ATTR; + } + + if (kvm_device_ioctl(kgc->dev_fd, type, &attr) < 0) { + fprintf(stderr, "KVM_{SET/GET}_DEVICE_ATTR failed: %s\n", + strerror(errno)); + abort(); + } +} + +static void kvm_arm_gic_dist_reg_access(GICState *s, int offset, int cpu, + uint32_t *val, bool write) +{ + kvm_arm_gic_reg_access(s, KVM_DEV_ARM_VGIC_GRP_DIST_REGS, + offset, cpu, val, write); +} + +static void kvm_arm_gic_cpu_reg_access(GICState *s, int offset, int cpu, + uint32_t *val, bool write) +{ + kvm_arm_gic_reg_access(s, KVM_DEV_ARM_VGIC_GRP_CPU_REGS, + offset, cpu, val, write); +} + +#define for_each_irq_reg(_ctr, _max_irq, _field_width) \ + for (_ctr = 0; _ctr < ((_max_irq) / (32 / (_field_width))); _ctr++) + +/* + * Translate from the in-kernel field for an IRQ value to/from the qemu + * representation. + */ +typedef void (*vgic_translate_fn)(GICState *s, int irq, int cpu, + uint32_t *field, bool to_kernel); + +/* synthetic translate function used for clear/set registers to completely + * clear a setting using a clear-register before setting the remaing bits + * using a set-register */ +static void translate_clear(GICState *s, int irq, int cpu, + uint32_t *field, bool to_kernel) +{ + if (to_kernel) { + *field = ~0; + } else { + /* does not make sense: qemu model doesn't use set/clear regs */ + abort(); + } +} + +static void translate_enabled(GICState *s, int irq, int cpu, + uint32_t *field, bool to_kernel) +{ + int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK; + + if (to_kernel) { + *field = GIC_TEST_ENABLED(irq, cm); + } else { + if (*field & 1) { + GIC_SET_ENABLED(irq, cm); + } + } +} + +static void translate_pending(GICState *s, int irq, int cpu, + uint32_t *field, bool to_kernel) +{ + int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK; + + if (to_kernel) { + *field = GIC_TEST_PENDING(irq, cm); + } else { + if (*field & 1) { + GIC_SET_PENDING(irq, cm); + /* TODO: Capture is level-line is held high in the kernel */ + } + } +} + +static void translate_active(GICState *s, int irq, int cpu, + uint32_t *field, bool to_kernel) +{ + int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK; + + if (to_kernel) { + *field = GIC_TEST_ACTIVE(irq, cm); + } else { + if (*field & 1) { + GIC_SET_ACTIVE(irq, cm); + } + } +} + +static void translate_trigger(GICState *s, int irq, int cpu, + uint32_t *field, bool to_kernel) +{ + if (to_kernel) { + *field = (GIC_TEST_TRIGGER(irq)) ? 0x2 : 0x0; + } else { + if (*field & 0x2) { + GIC_SET_TRIGGER(irq); + } + } +} + +static void translate_priority(GICState *s, int irq, int cpu, + uint32_t *field, bool to_kernel) +{ + if (to_kernel) { + *field = GIC_GET_PRIORITY(irq, cpu) & 0xff; + } else { + GIC_SET_PRIORITY(irq, cpu, *field & 0xff); + } +} + +static void translate_targets(GICState *s, int irq, int cpu, + uint32_t *field, bool to_kernel) +{ + if (to_kernel) { + *field = s->irq_target[irq] & 0xff; + } else { + s->irq_target[irq] = *field & 0xff; + } +} + +static void translate_sgisource(GICState *s, int irq, int cpu, + uint32_t *field, bool to_kernel) +{ + if (to_kernel) { + *field = s->sgi_source[irq][cpu] & 0xff; + } else { + s->sgi_source[irq][cpu] = *field & 0xff; + } +} + +/* Read a register group from the kernel VGIC */ +static void kvm_arm_gic_dist_readr(GICState *s, uint32_t offset, int width, + int maxirq, vgic_translate_fn translate_fn) +{ + uint32_t reg; + int i; + int j; + int irq; + int cpu; + int regsz = 32 / width; /* irqs per kernel register */ + uint32_t field; + + for_each_irq_reg(i, maxirq, width) { + irq = i * regsz; + cpu = 0; + while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) { + kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, false); + for (j = 0; j < regsz; j++) { + field = reg >> (j * width) & ((1 << width) - 1); + translate_fn(s, irq + j, cpu, &field, false); + } + + cpu++; + } + offset += 4; + } +} + +/* Write a register group to the kernel VGIC */ +static void kvm_arm_gic_dist_writer(GICState *s, uint32_t offset, int width, + int maxirq, vgic_translate_fn translate_fn) +{ + uint32_t reg; + int i; + int j; + int irq; + int cpu; + int regsz = 32 / width; /* irqs per kernel register */ + uint32_t field; + + for_each_irq_reg(i, maxirq, width) { + irq = i * regsz; + cpu = 0; + while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) { + reg = 0; + for (j = 0; j < regsz; j++) { + translate_fn(s, irq + j, cpu, &field, true); + reg |= (field & ((1 << width) - 1)) << (j * width); + } + kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, true); + + cpu++; + } + offset += 4; + } +} + static void kvm_arm_gic_put(GICState *s) { - /* TODO: there isn't currently a kernel interface to set the GIC state */ + uint32_t reg; + int i; + int cpu; + int num_cpu; + int num_irq; + + if (!kvm_arm_gic_can_save_restore(s)) { + DPRINTF("Cannot put kernel gic state, no kernel interface"); + return; + } + + /* Note: We do the restore in a slightly different order than the save + * (where the order doesn't matter and is simply ordered according to the + * register offset values */ + + /***************************************************************** + * Distributor State + */ + + /* s->enabled -> GICD_CTLR */ + reg = s->enabled; + kvm_arm_gic_dist_reg_access(s, 0x0, 0, ®, true); + + /* Sanity checking on GICD_TYPER and s->num_irq, s->num_cpu */ + kvm_arm_gic_dist_reg_access(s, 0x4, 0, ®, false); + num_irq = ((reg & 0x1f) + 1) * 32; + num_cpu = ((reg & 0xe0) >> 5) + 1; + + if (num_irq < s->num_irq) { + fprintf(stderr, "Restoring %u IRQs, but kernel supports max %d\n", + s->num_irq, num_irq); + abort(); + } else if (num_cpu != s->num_cpu ) { + fprintf(stderr, "Restoring %u CPU interfaces, kernel only has %d\n", + s->num_cpu, num_cpu); + /* Did we not create the VCPUs in the kernel yet? */ + abort(); + } + + /* TODO: Consider checking compatibility with the IIDR ? */ + + /* irq_state[n].enabled -> GICD_ISENABLERn */ + kvm_arm_gic_dist_writer(s, 0x180, 1, s->num_irq, translate_clear); + kvm_arm_gic_dist_writer(s, 0x100, 1, s->num_irq, translate_enabled); + + /* s->irq_target[irq] -> GICD_ITARGETSRn + * (restore targets before pending to ensure the pending state is set on + * the appropriate CPU interfaces in the kernel) */ + kvm_arm_gic_dist_writer(s, 0x800, 8, s->num_irq, translate_targets); + + /* irq_state[n].pending + irq_state[n].level -> GICD_ISPENDRn */ + kvm_arm_gic_dist_writer(s, 0x280, 1, s->num_irq, translate_clear); + kvm_arm_gic_dist_writer(s, 0x200, 1, s->num_irq, translate_pending); + + /* irq_state[n].active -> GICD_ISACTIVERn */ + kvm_arm_gic_dist_writer(s, 0x380, 1, s->num_irq, translate_clear); + kvm_arm_gic_dist_writer(s, 0x300, 1, s->num_irq, translate_active); + + /* irq_state[n].trigger -> GICD_ICFRn */ + kvm_arm_gic_dist_writer(s, 0xc00, 2, s->num_irq, translate_trigger); + + /* s->priorityX[irq] -> ICD_IPRIORITYRn */ + kvm_arm_gic_dist_writer(s, 0x400, 8, s->num_irq, translate_priority); + + /* s->sgi_source -> ICD_CPENDSGIRn */ + kvm_arm_gic_dist_writer(s, 0xf10, 8, GIC_NR_SGIS, translate_clear); + kvm_arm_gic_dist_writer(s, 0xf20, 8, GIC_NR_SGIS, translate_sgisource); + + + /***************************************************************** + * CPU Interface(s) State + */ + + for (cpu = 0; cpu < s->num_cpu; cpu++) { + /* s->cpu_enabled[cpu] -> GICC_CTLR */ + reg = s->cpu_enabled[cpu]; + kvm_arm_gic_cpu_reg_access(s, 0x00, cpu, ®, true); + + /* s->priority_mask[cpu] -> GICC_PMR */ + reg = (s->priority_mask[cpu] & 0xff); + kvm_arm_gic_cpu_reg_access(s, 0x04, cpu, ®, true); + + /* s->binary_point[cpu][0] -> GICC_BPR */ + reg = (s->binary_point[0][cpu] & 0x7); + kvm_arm_gic_cpu_reg_access(s, 0x08, cpu, ®, true); + + /* s->binary_point[cpu][1] -> GICC_ABPR */ + reg = (s->binary_point[1][cpu] & 0x7); + kvm_arm_gic_cpu_reg_access(s, 0x1c, cpu, ®, true); + + /* s->active_prio[n][cpu] -> GICC_APRn */ + for (i = 0; i < 4; i++) { + reg = s->active_prio[i][cpu]; + kvm_arm_gic_cpu_reg_access(s, 0xd0 + i * 4, cpu, ®, true); + } + } }
static void kvm_arm_gic_get(GICState *s) { - /* TODO: there isn't currently a kernel interface to get the GIC state */ + uint32_t reg; + int i; + int cpu; + + if (!kvm_arm_gic_can_save_restore(s)) { + DPRINTF("Cannot get kernel gic state, no kernel interface"); + return; + } + + /***************************************************************** + * Distributor State + */ + + /* GICD_CTLR -> s->enabled */ + kvm_arm_gic_dist_reg_access(s, 0x0, 0, ®, false); + s->enabled = reg & 1; + + /* Sanity checking on GICD_TYPER -> s->num_irq, s->num_cpu */ + kvm_arm_gic_dist_reg_access(s, 0x4, 0, ®, false); + s->num_irq = ((reg & 0x1f) + 1) * 32; + s->num_cpu = ((reg & 0xe0) >> 5) + 1; + + if (s->num_irq > GIC_MAXIRQ) { + fprintf(stderr, "Too many IRQs reported from the kernel: %d\n", + s->num_irq); + abort(); + } + + /* GICD_IIDR -> ? */ + kvm_arm_gic_dist_reg_access(s, 0x8, 0, ®, false); + + /* Verify no GROUP 1 interrupts configured in the kernel */ + for_each_irq_reg(i, s->num_irq, 1) { + kvm_arm_gic_dist_reg_access(s, 0x80 + (i * 4), 0, ®, false); + if (reg != 0) { + fprintf(stderr, "Unsupported GICD_IGROUPRn value: %08x\n", + reg); + abort(); + } + } + + /* Clear all the IRQ settings */ + for (i = 0; i < s->num_irq; i++) { + memset(&s->irq_state[i], 0, sizeof(s->irq_state[0])); + } + + /* GICD_ISENABLERn -> irq_state[n].enabled */ + kvm_arm_gic_dist_readr(s, 0x100, 1, s->num_irq, translate_enabled); + + /* GICD_ISPENDRn -> irq_state[n].pending + irq_state[n].level */ + kvm_arm_gic_dist_readr(s, 0x200, 1, s->num_irq, translate_pending); + + /* GICD_ISACTIVERn -> irq_state[n].active */ + kvm_arm_gic_dist_readr(s, 0x300, 1, s->num_irq, translate_active); + + /* GICD_ICFRn -> irq_state[n].trigger */ + kvm_arm_gic_dist_readr(s, 0xc00, 2, s->num_irq, translate_trigger); + + /* GICD_IPRIORITYRn -> s->priorityX[irq] */ + kvm_arm_gic_dist_readr(s, 0x400, 8, s->num_irq, translate_priority); + + /* GICD_ITARGETSRn -> s->irq_target[irq] */ + kvm_arm_gic_dist_readr(s, 0x800, 8, s->num_irq, translate_targets); + + /* GICD_CPENDSGIRn -> s->sgi_source */ + kvm_arm_gic_dist_readr(s, 0xf10, 8, GIC_NR_SGIS, translate_sgisource); + + + /***************************************************************** + * CPU Interface(s) State + */ + + for (cpu = 0; cpu < s->num_cpu; cpu++) { + /* GICC_CTLR -> s->cpu_enabled[cpu] */ + kvm_arm_gic_cpu_reg_access(s, 0x00, cpu, ®, false); + s->cpu_enabled[cpu] = (reg & 1); + + /* GICC_PMR -> s->priority_mask[cpu] */ + kvm_arm_gic_cpu_reg_access(s, 0x04, cpu, ®, false); + s->priority_mask[cpu] = (reg & 0xff); + + /* GICC_BPR -> s->binary_point[cpu][0] */ + kvm_arm_gic_cpu_reg_access(s, 0x08, cpu, ®, false); + s->binary_point[0][cpu] = (reg & 0x7); + + /* GICC_ABPR -> s->binary_point[cpu][1] */ + kvm_arm_gic_cpu_reg_access(s, 0x1c, cpu, ®, false); + s->binary_point[1][cpu] = (reg & 0x7); + + /* GICC_APRn -> s->active_prio[n][cpu] */ + for (i = 0; i < 4; i++) { + kvm_arm_gic_cpu_reg_access(s, 0xd0 + i * 4, cpu, ®, false); + s->active_prio[i][cpu] = reg; + } + } }
static void kvm_arm_gic_reset(DeviceState *dev) diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index 424a41f..9771163 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -100,6 +100,7 @@ typedef struct GICState {
/* these registers are mainly used for save/restore of KVM state */ uint8_t binary_point[2][NCPU]; /* [0]: group 0, [1]: group 1 */ + uint32_t active_prio[4][NCPU]; /* implementation defined layout */
uint32_t num_cpu;
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index 1c31b5d..e5538c7 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -633,9 +633,15 @@ extern const VMStateInfo vmstate_info_bitmap; #define VMSTATE_UINT32_ARRAY_V(_f, _s, _n, _v) \ VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_uint32, uint32_t)
+#define VMSTATE_UINT32_2DARRAY_V(_f, _s, _n1, _n2, _v) \ + VMSTATE_2DARRAY(_f, _s, _n1, _n2, _v, vmstate_info_uint32, uint32_t) + #define VMSTATE_UINT32_ARRAY(_f, _s, _n) \ VMSTATE_UINT32_ARRAY_V(_f, _s, _n, 0)
+#define VMSTATE_UINT32_2DARRAY(_f, _s, _n1, _n2) \ + VMSTATE_UINT32_2DARRAY_V(_f, _s, _n1, _n2, 0) + #define VMSTATE_UINT64_ARRAY_V(_f, _s, _n, _v) \ VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_uint64, uint64_t)
On 23.08.2013, at 21:10, Christoffer Dall wrote:
Save and restore the ARM KVM VGIC state from the kernel. We rely on QEMU to marshal the GICState data structure and therefore simply synchronize the kernel state with the QEMU emulated state in both directions.
We take some care on the restore path to check the VGIC has been configured with enough IRQs and CPU interfaces that we can properly restore the state, and for separate set/clear registers we first fully clear the registers and then set the required bits.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic_common.c | 2 + hw/intc/arm_gic_kvm.c | 418 ++++++++++++++++++++++++++++++++++++++++++- hw/intc/gic_internal.h | 1 + include/migration/vmstate.h | 6 + 4 files changed, 425 insertions(+), 2 deletions(-)
diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index a50ded2..f39bdc1 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -77,6 +77,8 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), VMSTATE_UINT16_ARRAY(current_pending, GICState, NCPU), VMSTATE_UINT8_2DARRAY(binary_point, GICState, 2, NCPU),
VMSTATE_UINT32_2DARRAY(active_prio, GICState, 4, NCPU),
}VMSTATE_UINT32(num_irq, GICState), VMSTATE_END_OF_LIST()
}; diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c index 9f0a852..9785281 100644 --- a/hw/intc/arm_gic_kvm.c +++ b/hw/intc/arm_gic_kvm.c @@ -23,6 +23,15 @@ #include "kvm_arm.h" #include "gic_internal.h"
+//#define DEBUG_GIC_KVM
+#ifdef DEBUG_GIC_KVM +#define DPRINTF(fmt, ...) \ +do { fprintf(stderr, "arm_gic: " fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) do {} while(0)
You really want to make DPRINTF still be format checked by the compiler. Check out hw/intc/openpic.c for an example how to get there.
+#endif
#define TYPE_KVM_ARM_GIC "kvm-arm-gic" #define KVM_ARM_GIC(obj) \ OBJECT_CHECK(GICState, (obj), TYPE_KVM_ARM_GIC) @@ -73,14 +82,419 @@ static void kvm_arm_gic_set_irq(void *opaque, int irq, int level) kvm_set_irq(kvm_state, kvm_irq, !!level); }
+static bool kvm_arm_gic_can_save_restore(GICState *s) +{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- return kgc->dev_fd >= 0;
+}
+static void kvm_arm_gic_reg_access(GICState *s, int group, int offset,
int cpu, uint32_t *val, bool write)
+{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- struct kvm_device_attr attr;
- int type;
- cpu = cpu & 0xff;
- attr.flags = 0;
- attr.group = group;
- attr.attr = (((uint64_t)cpu << KVM_DEV_ARM_VGIC_CPUID_SHIFT) &
KVM_DEV_ARM_VGIC_CPUID_MASK) |
(((uint64_t)offset << KVM_DEV_ARM_VGIC_OFFSET_SHIFT) &
KVM_DEV_ARM_VGIC_OFFSET_MASK);
- attr.addr = (uint64_t)(long)val;
uintptr_t?
- if (write) {
type = KVM_SET_DEVICE_ATTR;
- } else {
type = KVM_GET_DEVICE_ATTR;
- }
- if (kvm_device_ioctl(kgc->dev_fd, type, &attr) < 0) {
fprintf(stderr, "KVM_{SET/GET}_DEVICE_ATTR failed: %s\n",
strerror(errno));
abort();
- }
+}
+static void kvm_arm_gic_dist_reg_access(GICState *s, int offset, int cpu,
uint32_t *val, bool write)
+{
- kvm_arm_gic_reg_access(s, KVM_DEV_ARM_VGIC_GRP_DIST_REGS,
offset, cpu, val, write);
+}
+static void kvm_arm_gic_cpu_reg_access(GICState *s, int offset, int cpu,
uint32_t *val, bool write)
+{
- kvm_arm_gic_reg_access(s, KVM_DEV_ARM_VGIC_GRP_CPU_REGS,
offset, cpu, val, write);
+}
+#define for_each_irq_reg(_ctr, _max_irq, _field_width) \
- for (_ctr = 0; _ctr < ((_max_irq) / (32 / (_field_width))); _ctr++)
+/*
- Translate from the in-kernel field for an IRQ value to/from the qemu
- representation.
- */
+typedef void (*vgic_translate_fn)(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel);
+/* synthetic translate function used for clear/set registers to completely
- clear a setting using a clear-register before setting the remaing bits
- using a set-register */
+static void translate_clear(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = ~0;
- } else {
/* does not make sense: qemu model doesn't use set/clear regs */
abort();
- }
I don't understand the to_kernel bits. I thought we're in a device file that only works with KVM?
+}
+static void translate_enabled(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK;
- if (to_kernel) {
*field = GIC_TEST_ENABLED(irq, cm);
- } else {
if (*field & 1) {
GIC_SET_ENABLED(irq, cm);
}
- }
+}
+static void translate_pending(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK;
- if (to_kernel) {
*field = GIC_TEST_PENDING(irq, cm);
- } else {
if (*field & 1) {
GIC_SET_PENDING(irq, cm);
/* TODO: Capture is level-line is held high in the kernel */
}
- }
+}
+static void translate_active(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK;
- if (to_kernel) {
*field = GIC_TEST_ACTIVE(irq, cm);
- } else {
if (*field & 1) {
GIC_SET_ACTIVE(irq, cm);
}
- }
+}
+static void translate_trigger(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = (GIC_TEST_TRIGGER(irq)) ? 0x2 : 0x0;
- } else {
if (*field & 0x2) {
GIC_SET_TRIGGER(irq);
}
- }
+}
+static void translate_priority(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = GIC_GET_PRIORITY(irq, cpu) & 0xff;
- } else {
GIC_SET_PRIORITY(irq, cpu, *field & 0xff);
- }
+}
+static void translate_targets(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = s->irq_target[irq] & 0xff;
- } else {
s->irq_target[irq] = *field & 0xff;
- }
+}
+static void translate_sgisource(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = s->sgi_source[irq][cpu] & 0xff;
- } else {
s->sgi_source[irq][cpu] = *field & 0xff;
- }
+}
+/* Read a register group from the kernel VGIC */ +static void kvm_arm_gic_dist_readr(GICState *s, uint32_t offset, int width,
int maxirq, vgic_translate_fn translate_fn)
+{
- uint32_t reg;
- int i;
- int j;
- int irq;
- int cpu;
- int regsz = 32 / width; /* irqs per kernel register */
- uint32_t field;
- for_each_irq_reg(i, maxirq, width) {
irq = i * regsz;
cpu = 0;
while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) {
kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, false);
for (j = 0; j < regsz; j++) {
field = reg >> (j * width) & ((1 << width) - 1);
translate_fn(s, irq + j, cpu, &field, false);
}
cpu++;
}
offset += 4;
- }
+}
+/* Write a register group to the kernel VGIC */ +static void kvm_arm_gic_dist_writer(GICState *s, uint32_t offset, int width,
int maxirq, vgic_translate_fn translate_fn)
+{
- uint32_t reg;
- int i;
- int j;
- int irq;
- int cpu;
- int regsz = 32 / width; /* irqs per kernel register */
- uint32_t field;
- for_each_irq_reg(i, maxirq, width) {
irq = i * regsz;
cpu = 0;
while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) {
reg = 0;
for (j = 0; j < regsz; j++) {
translate_fn(s, irq + j, cpu, &field, true);
reg |= (field & ((1 << width) - 1)) << (j * width);
}
kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, true);
cpu++;
}
offset += 4;
- }
+}
static void kvm_arm_gic_put(GICState *s) {
- /* TODO: there isn't currently a kernel interface to set the GIC state */
- uint32_t reg;
- int i;
- int cpu;
- int num_cpu;
- int num_irq;
- if (!kvm_arm_gic_can_save_restore(s)) {
DPRINTF("Cannot put kernel gic state, no kernel interface");
return;
- }
- /* Note: We do the restore in a slightly different order than the save
* (where the order doesn't matter and is simply ordered according to the
* register offset values */
- /*****************************************************************
* Distributor State
*/
- /* s->enabled -> GICD_CTLR */
- reg = s->enabled;
- kvm_arm_gic_dist_reg_access(s, 0x0, 0, ®, true);
- /* Sanity checking on GICD_TYPER and s->num_irq, s->num_cpu */
- kvm_arm_gic_dist_reg_access(s, 0x4, 0, ®, false);
- num_irq = ((reg & 0x1f) + 1) * 32;
- num_cpu = ((reg & 0xe0) >> 5) + 1;
- if (num_irq < s->num_irq) {
fprintf(stderr, "Restoring %u IRQs, but kernel supports max %d\n",
s->num_irq, num_irq);
abort();
- } else if (num_cpu != s->num_cpu ) {
fprintf(stderr, "Restoring %u CPU interfaces, kernel only has %d\n",
s->num_cpu, num_cpu);
/* Did we not create the VCPUs in the kernel yet? */
abort();
- }
- /* TODO: Consider checking compatibility with the IIDR ? */
- /* irq_state[n].enabled -> GICD_ISENABLERn */
- kvm_arm_gic_dist_writer(s, 0x180, 1, s->num_irq, translate_clear);
- kvm_arm_gic_dist_writer(s, 0x100, 1, s->num_irq, translate_enabled);
Don't these magic numbers have #define'd equivalents in their GIC implementation header file? Then you don't need the comments above each of these.
I also find the function names quite long, but I guess as long as everything still fits that doesn't really matter.
Alex
Il 25/08/2013 17:47, Alexander Graf ha scritto:
On 23.08.2013, at 21:10, Christoffer Dall wrote:
Save and restore the ARM KVM VGIC state from the kernel. We rely on QEMU to marshal the GICState data structure and therefore simply synchronize the kernel state with the QEMU emulated state in both directions.
We take some care on the restore path to check the VGIC has been configured with enough IRQs and CPU interfaces that we can properly restore the state, and for separate set/clear registers we first fully clear the registers and then set the required bits.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic_common.c | 2 + hw/intc/arm_gic_kvm.c | 418 ++++++++++++++++++++++++++++++++++++++++++- hw/intc/gic_internal.h | 1 + include/migration/vmstate.h | 6 + 4 files changed, 425 insertions(+), 2 deletions(-)
diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index a50ded2..f39bdc1 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -77,6 +77,8 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), VMSTATE_UINT16_ARRAY(current_pending, GICState, NCPU), VMSTATE_UINT8_2DARRAY(binary_point, GICState, 2, NCPU),
VMSTATE_UINT32_2DARRAY(active_prio, GICState, 4, NCPU),
}VMSTATE_UINT32(num_irq, GICState), VMSTATE_END_OF_LIST()
}; diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c index 9f0a852..9785281 100644 --- a/hw/intc/arm_gic_kvm.c +++ b/hw/intc/arm_gic_kvm.c @@ -23,6 +23,15 @@ #include "kvm_arm.h" #include "gic_internal.h"
+//#define DEBUG_GIC_KVM
+#ifdef DEBUG_GIC_KVM +#define DPRINTF(fmt, ...) \ +do { fprintf(stderr, "arm_gic: " fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) do {} while(0)
You really want to make DPRINTF still be format checked by the compiler. Check out hw/intc/openpic.c for an example how to get there.
+#endif
#define TYPE_KVM_ARM_GIC "kvm-arm-gic" #define KVM_ARM_GIC(obj) \ OBJECT_CHECK(GICState, (obj), TYPE_KVM_ARM_GIC) @@ -73,14 +82,419 @@ static void kvm_arm_gic_set_irq(void *opaque, int irq, int level) kvm_set_irq(kvm_state, kvm_irq, !!level); }
+static bool kvm_arm_gic_can_save_restore(GICState *s) +{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- return kgc->dev_fd >= 0;
+}
+static void kvm_arm_gic_reg_access(GICState *s, int group, int offset,
int cpu, uint32_t *val, bool write)
+{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- struct kvm_device_attr attr;
- int type;
- cpu = cpu & 0xff;
- attr.flags = 0;
- attr.group = group;
- attr.attr = (((uint64_t)cpu << KVM_DEV_ARM_VGIC_CPUID_SHIFT) &
KVM_DEV_ARM_VGIC_CPUID_MASK) |
(((uint64_t)offset << KVM_DEV_ARM_VGIC_OFFSET_SHIFT) &
KVM_DEV_ARM_VGIC_OFFSET_MASK);
- attr.addr = (uint64_t)(long)val;
uintptr_t?
- if (write) {
type = KVM_SET_DEVICE_ATTR;
- } else {
type = KVM_GET_DEVICE_ATTR;
- }
- if (kvm_device_ioctl(kgc->dev_fd, type, &attr) < 0) {
fprintf(stderr, "KVM_{SET/GET}_DEVICE_ATTR failed: %s\n",
strerror(errno));
abort();
- }
+}
+static void kvm_arm_gic_dist_reg_access(GICState *s, int offset, int cpu,
uint32_t *val, bool write)
+{
- kvm_arm_gic_reg_access(s, KVM_DEV_ARM_VGIC_GRP_DIST_REGS,
offset, cpu, val, write);
+}
+static void kvm_arm_gic_cpu_reg_access(GICState *s, int offset, int cpu,
uint32_t *val, bool write)
+{
- kvm_arm_gic_reg_access(s, KVM_DEV_ARM_VGIC_GRP_CPU_REGS,
offset, cpu, val, write);
+}
+#define for_each_irq_reg(_ctr, _max_irq, _field_width) \
- for (_ctr = 0; _ctr < ((_max_irq) / (32 / (_field_width))); _ctr++)
+/*
- Translate from the in-kernel field for an IRQ value to/from the qemu
- representation.
- */
+typedef void (*vgic_translate_fn)(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel);
+/* synthetic translate function used for clear/set registers to completely
- clear a setting using a clear-register before setting the remaing bits
- using a set-register */
+static void translate_clear(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = ~0;
- } else {
/* does not make sense: qemu model doesn't use set/clear regs */
abort();
- }
I don't understand the to_kernel bits. I thought we're in a device file that only works with KVM?
The opposite of "to_kernel" is "from_kernel".
IIUC this is for GIC registers that are modelled as a pair of MMIO registers, one with write-1-sets and one with write-1-clears semantics. The patch is first writing all ones to the w1c register, and then writing the actual value to the w1s register.
The way we solved this for x86 is that the set-registers, when called from userspace, also clear the zero bits. See the "host_initiated" parts in arch/x86/kvm/x86.c. However, this would be an ABI change for ARM, so Christoffer's solution is also fine.
One comment on the function names:
kvm_arm_gic_dist_readr kvm_arm_gic_dist_writer
Why not get_reg/set_reg (I was quite surprised to see readr instead of reader :) and it took me a while to understand the convention)? Or if the name is too long, s/readr/get/ and s/writer/set/ would be enough to match the ioctls.
Paolo
+}
+static void translate_enabled(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK;
- if (to_kernel) {
*field = GIC_TEST_ENABLED(irq, cm);
- } else {
if (*field & 1) {
GIC_SET_ENABLED(irq, cm);
}
- }
+}
+static void translate_pending(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK;
- if (to_kernel) {
*field = GIC_TEST_PENDING(irq, cm);
- } else {
if (*field & 1) {
GIC_SET_PENDING(irq, cm);
/* TODO: Capture is level-line is held high in the kernel */
}
- }
+}
+static void translate_active(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK;
- if (to_kernel) {
*field = GIC_TEST_ACTIVE(irq, cm);
- } else {
if (*field & 1) {
GIC_SET_ACTIVE(irq, cm);
}
- }
+}
+static void translate_trigger(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = (GIC_TEST_TRIGGER(irq)) ? 0x2 : 0x0;
- } else {
if (*field & 0x2) {
GIC_SET_TRIGGER(irq);
}
- }
+}
+static void translate_priority(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = GIC_GET_PRIORITY(irq, cpu) & 0xff;
- } else {
GIC_SET_PRIORITY(irq, cpu, *field & 0xff);
- }
+}
+static void translate_targets(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = s->irq_target[irq] & 0xff;
- } else {
s->irq_target[irq] = *field & 0xff;
- }
+}
+static void translate_sgisource(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = s->sgi_source[irq][cpu] & 0xff;
- } else {
s->sgi_source[irq][cpu] = *field & 0xff;
- }
+}
+/* Read a register group from the kernel VGIC */ +static void kvm_arm_gic_dist_readr(GICState *s, uint32_t offset, int width,
int maxirq, vgic_translate_fn translate_fn)
+{
- uint32_t reg;
- int i;
- int j;
- int irq;
- int cpu;
- int regsz = 32 / width; /* irqs per kernel register */
- uint32_t field;
- for_each_irq_reg(i, maxirq, width) {
irq = i * regsz;
cpu = 0;
while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) {
kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, false);
for (j = 0; j < regsz; j++) {
field = reg >> (j * width) & ((1 << width) - 1);
translate_fn(s, irq + j, cpu, &field, false);
}
cpu++;
}
offset += 4;
- }
+}
+/* Write a register group to the kernel VGIC */ +static void kvm_arm_gic_dist_writer(GICState *s, uint32_t offset, int width,
int maxirq, vgic_translate_fn translate_fn)
+{
- uint32_t reg;
- int i;
- int j;
- int irq;
- int cpu;
- int regsz = 32 / width; /* irqs per kernel register */
- uint32_t field;
- for_each_irq_reg(i, maxirq, width) {
irq = i * regsz;
cpu = 0;
while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) {
reg = 0;
for (j = 0; j < regsz; j++) {
translate_fn(s, irq + j, cpu, &field, true);
reg |= (field & ((1 << width) - 1)) << (j * width);
}
kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, true);
cpu++;
}
offset += 4;
- }
+}
static void kvm_arm_gic_put(GICState *s) {
- /* TODO: there isn't currently a kernel interface to set the GIC state */
- uint32_t reg;
- int i;
- int cpu;
- int num_cpu;
- int num_irq;
- if (!kvm_arm_gic_can_save_restore(s)) {
DPRINTF("Cannot put kernel gic state, no kernel interface");
return;
- }
- /* Note: We do the restore in a slightly different order than the save
* (where the order doesn't matter and is simply ordered according to the
* register offset values */
- /*****************************************************************
* Distributor State
*/
- /* s->enabled -> GICD_CTLR */
- reg = s->enabled;
- kvm_arm_gic_dist_reg_access(s, 0x0, 0, ®, true);
- /* Sanity checking on GICD_TYPER and s->num_irq, s->num_cpu */
- kvm_arm_gic_dist_reg_access(s, 0x4, 0, ®, false);
- num_irq = ((reg & 0x1f) + 1) * 32;
- num_cpu = ((reg & 0xe0) >> 5) + 1;
- if (num_irq < s->num_irq) {
fprintf(stderr, "Restoring %u IRQs, but kernel supports max %d\n",
s->num_irq, num_irq);
abort();
- } else if (num_cpu != s->num_cpu ) {
fprintf(stderr, "Restoring %u CPU interfaces, kernel only has %d\n",
s->num_cpu, num_cpu);
/* Did we not create the VCPUs in the kernel yet? */
abort();
- }
- /* TODO: Consider checking compatibility with the IIDR ? */
- /* irq_state[n].enabled -> GICD_ISENABLERn */
- kvm_arm_gic_dist_writer(s, 0x180, 1, s->num_irq, translate_clear);
- kvm_arm_gic_dist_writer(s, 0x100, 1, s->num_irq, translate_enabled);
Don't these magic numbers have #define'd equivalents in their GIC implementation header file? Then you don't need the comments above each of these.
I also find the function names quite long, but I guess as long as everything still fits that doesn't really matter.
Alex
On Fri, Sep 06, 2013 at 04:57:05PM +0200, Paolo Bonzini wrote:
Il 25/08/2013 17:47, Alexander Graf ha scritto:
On 23.08.2013, at 21:10, Christoffer Dall wrote:
Save and restore the ARM KVM VGIC state from the kernel. We rely on QEMU to marshal the GICState data structure and therefore simply synchronize the kernel state with the QEMU emulated state in both directions.
We take some care on the restore path to check the VGIC has been configured with enough IRQs and CPU interfaces that we can properly restore the state, and for separate set/clear registers we first fully clear the registers and then set the required bits.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic_common.c | 2 + hw/intc/arm_gic_kvm.c | 418 ++++++++++++++++++++++++++++++++++++++++++- hw/intc/gic_internal.h | 1 + include/migration/vmstate.h | 6 + 4 files changed, 425 insertions(+), 2 deletions(-)
diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index a50ded2..f39bdc1 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -77,6 +77,8 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), VMSTATE_UINT16_ARRAY(current_pending, GICState, NCPU), VMSTATE_UINT8_2DARRAY(binary_point, GICState, 2, NCPU),
VMSTATE_UINT32_2DARRAY(active_prio, GICState, 4, NCPU),
}VMSTATE_UINT32(num_irq, GICState), VMSTATE_END_OF_LIST()
}; diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c index 9f0a852..9785281 100644 --- a/hw/intc/arm_gic_kvm.c +++ b/hw/intc/arm_gic_kvm.c @@ -23,6 +23,15 @@ #include "kvm_arm.h" #include "gic_internal.h"
+//#define DEBUG_GIC_KVM
+#ifdef DEBUG_GIC_KVM +#define DPRINTF(fmt, ...) \ +do { fprintf(stderr, "arm_gic: " fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) do {} while(0)
You really want to make DPRINTF still be format checked by the compiler. Check out hw/intc/openpic.c for an example how to get there.
+#endif
#define TYPE_KVM_ARM_GIC "kvm-arm-gic" #define KVM_ARM_GIC(obj) \ OBJECT_CHECK(GICState, (obj), TYPE_KVM_ARM_GIC) @@ -73,14 +82,419 @@ static void kvm_arm_gic_set_irq(void *opaque, int irq, int level) kvm_set_irq(kvm_state, kvm_irq, !!level); }
+static bool kvm_arm_gic_can_save_restore(GICState *s) +{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- return kgc->dev_fd >= 0;
+}
+static void kvm_arm_gic_reg_access(GICState *s, int group, int offset,
int cpu, uint32_t *val, bool write)
+{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- struct kvm_device_attr attr;
- int type;
- cpu = cpu & 0xff;
- attr.flags = 0;
- attr.group = group;
- attr.attr = (((uint64_t)cpu << KVM_DEV_ARM_VGIC_CPUID_SHIFT) &
KVM_DEV_ARM_VGIC_CPUID_MASK) |
(((uint64_t)offset << KVM_DEV_ARM_VGIC_OFFSET_SHIFT) &
KVM_DEV_ARM_VGIC_OFFSET_MASK);
- attr.addr = (uint64_t)(long)val;
uintptr_t?
- if (write) {
type = KVM_SET_DEVICE_ATTR;
- } else {
type = KVM_GET_DEVICE_ATTR;
- }
- if (kvm_device_ioctl(kgc->dev_fd, type, &attr) < 0) {
fprintf(stderr, "KVM_{SET/GET}_DEVICE_ATTR failed: %s\n",
strerror(errno));
abort();
- }
+}
+static void kvm_arm_gic_dist_reg_access(GICState *s, int offset, int cpu,
uint32_t *val, bool write)
+{
- kvm_arm_gic_reg_access(s, KVM_DEV_ARM_VGIC_GRP_DIST_REGS,
offset, cpu, val, write);
+}
+static void kvm_arm_gic_cpu_reg_access(GICState *s, int offset, int cpu,
uint32_t *val, bool write)
+{
- kvm_arm_gic_reg_access(s, KVM_DEV_ARM_VGIC_GRP_CPU_REGS,
offset, cpu, val, write);
+}
+#define for_each_irq_reg(_ctr, _max_irq, _field_width) \
- for (_ctr = 0; _ctr < ((_max_irq) / (32 / (_field_width))); _ctr++)
+/*
- Translate from the in-kernel field for an IRQ value to/from the qemu
- representation.
- */
+typedef void (*vgic_translate_fn)(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel);
+/* synthetic translate function used for clear/set registers to completely
- clear a setting using a clear-register before setting the remaing bits
- using a set-register */
+static void translate_clear(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = ~0;
- } else {
/* does not make sense: qemu model doesn't use set/clear regs */
abort();
- }
I don't understand the to_kernel bits. I thought we're in a device file that only works with KVM?
The opposite of "to_kernel" is "from_kernel".
IIUC this is for GIC registers that are modelled as a pair of MMIO registers, one with write-1-sets and one with write-1-clears semantics. The patch is first writing all ones to the w1c register, and then writing the actual value to the w1s register.
The way we solved this for x86 is that the set-registers, when called from userspace, also clear the zero bits. See the "host_initiated" parts in arch/x86/kvm/x86.c. However, this would be an ABI change for ARM, so Christoffer's solution is also fine.
I prefer sticking to the ARM ABI so we don't have to keep a documentation of exceptions to that around to the widest extend possible. See my comment on Alex's e-mail for a more thorough response to this as well.
Sorry for the long response time on this.
One comment on the function names:
kvm_arm_gic_dist_readr kvm_arm_gic_dist_writer
Why not get_reg/set_reg (I was quite surprised to see readr instead of reader :) and it took me a while to understand the convention)? Or if the name is too long, s/readr/get/ and s/writer/set/ would be enough to match the ioctls.
r for register, read register, write register...
I thought I'd seen this convention elsewhere, but I may be wrong. If you feel really strongly about it, I can rename.
-Christoffer
Il 20/09/2013 22:41, Christoffer Dall ha scritto:
kvm_arm_gic_dist_readr kvm_arm_gic_dist_writer
Why not get_reg/set_reg (I was quite surprised to see readr instead of reader :) and it took me a while to understand the convention)? Or if the name is too long, s/readr/get/ and s/writer/set/ would be enough to match the ioctls.
r for register, read register, write register...
I thought I'd seen this convention elsewhere, but I may be wrong. If you feel really strongly about it, I can rename.
Yeah, there is readb/w/l and writeb/w/l, but I only found readreg when grepping for readr. So I think get/set would be better.
Paolo
On Fri, Sep 20, 2013 at 11:09:47PM +0200, Paolo Bonzini wrote:
Il 20/09/2013 22:41, Christoffer Dall ha scritto:
kvm_arm_gic_dist_readr kvm_arm_gic_dist_writer
Why not get_reg/set_reg (I was quite surprised to see readr instead of reader :) and it took me a while to understand the convention)? Or if the name is too long, s/readr/get/ and s/writer/set/ would be enough to match the ioctls.
r for register, read register, write register...
I thought I'd seen this convention elsewhere, but I may be wrong. If you feel really strongly about it, I can rename.
Yeah, there is readb/w/l and writeb/w/l, but I only found readreg when grepping for readr. So I think get/set would be better.
ok, renamed to kvm_dist_[get/put] and shortened the function names in the same go, as requested by Alex.
-Christoffer
On Sun, Aug 25, 2013 at 04:47:59PM +0100, Alexander Graf wrote:
On 23.08.2013, at 21:10, Christoffer Dall wrote:
Save and restore the ARM KVM VGIC state from the kernel. We rely on QEMU to marshal the GICState data structure and therefore simply synchronize the kernel state with the QEMU emulated state in both directions.
We take some care on the restore path to check the VGIC has been configured with enough IRQs and CPU interfaces that we can properly restore the state, and for separate set/clear registers we first fully clear the registers and then set the required bits.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic_common.c | 2 + hw/intc/arm_gic_kvm.c | 418 ++++++++++++++++++++++++++++++++++++++++++- hw/intc/gic_internal.h | 1 + include/migration/vmstate.h | 6 + 4 files changed, 425 insertions(+), 2 deletions(-)
diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index a50ded2..f39bdc1 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -77,6 +77,8 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), VMSTATE_UINT16_ARRAY(current_pending, GICState, NCPU), VMSTATE_UINT8_2DARRAY(binary_point, GICState, 2, NCPU),
VMSTATE_UINT32_2DARRAY(active_prio, GICState, 4, NCPU),
}VMSTATE_UINT32(num_irq, GICState), VMSTATE_END_OF_LIST()
}; diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c index 9f0a852..9785281 100644 --- a/hw/intc/arm_gic_kvm.c +++ b/hw/intc/arm_gic_kvm.c @@ -23,6 +23,15 @@ #include "kvm_arm.h" #include "gic_internal.h"
+//#define DEBUG_GIC_KVM
+#ifdef DEBUG_GIC_KVM +#define DPRINTF(fmt, ...) \ +do { fprintf(stderr, "arm_gic: " fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) do {} while(0)
You really want to make DPRINTF still be format checked by the compiler. Check out hw/intc/openpic.c for an example how to get there.
good point, thanks for the pointer.
+#endif
#define TYPE_KVM_ARM_GIC "kvm-arm-gic" #define KVM_ARM_GIC(obj) \ OBJECT_CHECK(GICState, (obj), TYPE_KVM_ARM_GIC) @@ -73,14 +82,419 @@ static void kvm_arm_gic_set_irq(void *opaque, int irq, int level) kvm_set_irq(kvm_state, kvm_irq, !!level); }
+static bool kvm_arm_gic_can_save_restore(GICState *s) +{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- return kgc->dev_fd >= 0;
+}
+static void kvm_arm_gic_reg_access(GICState *s, int group, int offset,
int cpu, uint32_t *val, bool write)
+{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- struct kvm_device_attr attr;
- int type;
- cpu = cpu & 0xff;
- attr.flags = 0;
- attr.group = group;
- attr.attr = (((uint64_t)cpu << KVM_DEV_ARM_VGIC_CPUID_SHIFT) &
KVM_DEV_ARM_VGIC_CPUID_MASK) |
(((uint64_t)offset << KVM_DEV_ARM_VGIC_OFFSET_SHIFT) &
KVM_DEV_ARM_VGIC_OFFSET_MASK);
- attr.addr = (uint64_t)(long)val;
uintptr_t?
yup
- if (write) {
type = KVM_SET_DEVICE_ATTR;
- } else {
type = KVM_GET_DEVICE_ATTR;
- }
- if (kvm_device_ioctl(kgc->dev_fd, type, &attr) < 0) {
fprintf(stderr, "KVM_{SET/GET}_DEVICE_ATTR failed: %s\n",
strerror(errno));
abort();
- }
+}
+static void kvm_arm_gic_dist_reg_access(GICState *s, int offset, int cpu,
uint32_t *val, bool write)
+{
- kvm_arm_gic_reg_access(s, KVM_DEV_ARM_VGIC_GRP_DIST_REGS,
offset, cpu, val, write);
+}
+static void kvm_arm_gic_cpu_reg_access(GICState *s, int offset, int cpu,
uint32_t *val, bool write)
+{
- kvm_arm_gic_reg_access(s, KVM_DEV_ARM_VGIC_GRP_CPU_REGS,
offset, cpu, val, write);
+}
+#define for_each_irq_reg(_ctr, _max_irq, _field_width) \
- for (_ctr = 0; _ctr < ((_max_irq) / (32 / (_field_width))); _ctr++)
+/*
- Translate from the in-kernel field for an IRQ value to/from the qemu
- representation.
- */
+typedef void (*vgic_translate_fn)(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel);
+/* synthetic translate function used for clear/set registers to completely
- clear a setting using a clear-register before setting the remaing bits
- using a set-register */
+static void translate_clear(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = ~0;
- } else {
/* does not make sense: qemu model doesn't use set/clear regs */
abort();
- }
I don't understand the to_kernel bits. I thought we're in a device file that only works with KVM?
yeah, but you need to retrieve the state from the kernel on a suspend "from_kernel" and you need to save the state "to_kernel" on a resume operation. These small functions translate between a qemu-representation and a KVM representation, which allows us to do saving of the in-kernel state by reusing the qemu model of the gic.
This specific little function is used for MMIO registers that have a separate register for clearing bits than setting bits, like Paolo describes. The performance of these operations are going to be vastly dominated by saving/restoring your memory and I therefore prefer keeping the interface coherent with the hardware spec, instead of rewriting things to save a few writes to the kernel. Does that answer your question?
+}
+static void translate_enabled(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK;
- if (to_kernel) {
*field = GIC_TEST_ENABLED(irq, cm);
- } else {
if (*field & 1) {
GIC_SET_ENABLED(irq, cm);
}
- }
+}
+static void translate_pending(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK;
- if (to_kernel) {
*field = GIC_TEST_PENDING(irq, cm);
- } else {
if (*field & 1) {
GIC_SET_PENDING(irq, cm);
/* TODO: Capture is level-line is held high in the kernel */
}
- }
+}
+static void translate_active(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK;
- if (to_kernel) {
*field = GIC_TEST_ACTIVE(irq, cm);
- } else {
if (*field & 1) {
GIC_SET_ACTIVE(irq, cm);
}
- }
+}
+static void translate_trigger(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = (GIC_TEST_TRIGGER(irq)) ? 0x2 : 0x0;
- } else {
if (*field & 0x2) {
GIC_SET_TRIGGER(irq);
}
- }
+}
+static void translate_priority(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = GIC_GET_PRIORITY(irq, cpu) & 0xff;
- } else {
GIC_SET_PRIORITY(irq, cpu, *field & 0xff);
- }
+}
+static void translate_targets(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = s->irq_target[irq] & 0xff;
- } else {
s->irq_target[irq] = *field & 0xff;
- }
+}
+static void translate_sgisource(GICState *s, int irq, int cpu,
uint32_t *field, bool to_kernel)
+{
- if (to_kernel) {
*field = s->sgi_source[irq][cpu] & 0xff;
- } else {
s->sgi_source[irq][cpu] = *field & 0xff;
- }
+}
+/* Read a register group from the kernel VGIC */ +static void kvm_arm_gic_dist_readr(GICState *s, uint32_t offset, int width,
int maxirq, vgic_translate_fn translate_fn)
+{
- uint32_t reg;
- int i;
- int j;
- int irq;
- int cpu;
- int regsz = 32 / width; /* irqs per kernel register */
- uint32_t field;
- for_each_irq_reg(i, maxirq, width) {
irq = i * regsz;
cpu = 0;
while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) {
kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, false);
for (j = 0; j < regsz; j++) {
field = reg >> (j * width) & ((1 << width) - 1);
translate_fn(s, irq + j, cpu, &field, false);
}
cpu++;
}
offset += 4;
- }
+}
+/* Write a register group to the kernel VGIC */ +static void kvm_arm_gic_dist_writer(GICState *s, uint32_t offset, int width,
int maxirq, vgic_translate_fn translate_fn)
+{
- uint32_t reg;
- int i;
- int j;
- int irq;
- int cpu;
- int regsz = 32 / width; /* irqs per kernel register */
- uint32_t field;
- for_each_irq_reg(i, maxirq, width) {
irq = i * regsz;
cpu = 0;
while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) {
reg = 0;
for (j = 0; j < regsz; j++) {
translate_fn(s, irq + j, cpu, &field, true);
reg |= (field & ((1 << width) - 1)) << (j * width);
}
kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, true);
cpu++;
}
offset += 4;
- }
+}
static void kvm_arm_gic_put(GICState *s) {
- /* TODO: there isn't currently a kernel interface to set the GIC state */
- uint32_t reg;
- int i;
- int cpu;
- int num_cpu;
- int num_irq;
- if (!kvm_arm_gic_can_save_restore(s)) {
DPRINTF("Cannot put kernel gic state, no kernel interface");
return;
- }
- /* Note: We do the restore in a slightly different order than the save
* (where the order doesn't matter and is simply ordered according to the
* register offset values */
- /*****************************************************************
* Distributor State
*/
- /* s->enabled -> GICD_CTLR */
- reg = s->enabled;
- kvm_arm_gic_dist_reg_access(s, 0x0, 0, ®, true);
- /* Sanity checking on GICD_TYPER and s->num_irq, s->num_cpu */
- kvm_arm_gic_dist_reg_access(s, 0x4, 0, ®, false);
- num_irq = ((reg & 0x1f) + 1) * 32;
- num_cpu = ((reg & 0xe0) >> 5) + 1;
- if (num_irq < s->num_irq) {
fprintf(stderr, "Restoring %u IRQs, but kernel supports max %d\n",
s->num_irq, num_irq);
abort();
- } else if (num_cpu != s->num_cpu ) {
fprintf(stderr, "Restoring %u CPU interfaces, kernel only has %d\n",
s->num_cpu, num_cpu);
/* Did we not create the VCPUs in the kernel yet? */
abort();
- }
- /* TODO: Consider checking compatibility with the IIDR ? */
- /* irq_state[n].enabled -> GICD_ISENABLERn */
- kvm_arm_gic_dist_writer(s, 0x180, 1, s->num_irq, translate_clear);
- kvm_arm_gic_dist_writer(s, 0x100, 1, s->num_irq, translate_enabled);
Don't these magic numbers have #define'd equivalents in their GIC implementation header file? Then you don't need the comments above each of these.
in the kernel yes, in qemu no. Unless I'm mistaken, in which case please point me in the right direction. I don't want to taint the git history with a rewrite of every mmio function in the tcg version or have defines for one file and not use them in the other file, which is also a bit weird. If people feel this is necessary, I can have a rewrite of the whole thing, but I prefer not to. Surprisingly.
I also find the function names quite long, but I guess as long as everything still fits that doesn't really matter.
I think the names are as descriptive as they should be, but we can drop some of the kvm_arm_gic prefixes if that's what you have in mind. So far, most calls and definitions fit on 80 chars width so I'm not too worried...
-Christoffer
On 23 August 2013 21:10, Christoffer Dall christoffer.dall@linaro.org wrote:
Save and restore the ARM KVM VGIC state from the kernel. We rely on QEMU to marshal the GICState data structure and therefore simply synchronize the kernel state with the QEMU emulated state in both directions.
We take some care on the restore path to check the VGIC has been configured with enough IRQs and CPU interfaces that we can properly restore the state, and for separate set/clear registers we first fully clear the registers and then set the required bits.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic_common.c | 2 + hw/intc/arm_gic_kvm.c | 418 ++++++++++++++++++++++++++++++++++++++++++- hw/intc/gic_internal.h | 1 + include/migration/vmstate.h | 6 + 4 files changed, 425 insertions(+), 2 deletions(-)
diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index a50ded2..f39bdc1 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -77,6 +77,8 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), VMSTATE_UINT16_ARRAY(current_pending, GICState, NCPU), VMSTATE_UINT8_2DARRAY(binary_point, GICState, 2, NCPU),
VMSTATE_UINT32_2DARRAY(active_prio, GICState, 4, NCPU),
VMSTATE_UINT32(num_irq, GICState),
You don't need to save and restore num_irq, it is constant for the lifetime of the device (set by a property on the device which is fixed by the board model). Migration only works between two identical command lines; if the command lines are identical at each end then num_irq should be too.
VMSTATE_END_OF_LIST() }
}; diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c index 9f0a852..9785281 100644 --- a/hw/intc/arm_gic_kvm.c +++ b/hw/intc/arm_gic_kvm.c @@ -23,6 +23,15 @@ #include "kvm_arm.h" #include "gic_internal.h"
+//#define DEBUG_GIC_KVM
+#ifdef DEBUG_GIC_KVM +#define DPRINTF(fmt, ...) \ +do { fprintf(stderr, "arm_gic: " fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) do {} while(0) +#endif
#define TYPE_KVM_ARM_GIC "kvm-arm-gic" #define KVM_ARM_GIC(obj) \ OBJECT_CHECK(GICState, (obj), TYPE_KVM_ARM_GIC) @@ -73,14 +82,419 @@ static void kvm_arm_gic_set_irq(void *opaque, int irq, int level) kvm_set_irq(kvm_state, kvm_irq, !!level); }
+static bool kvm_arm_gic_can_save_restore(GICState *s) +{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- return kgc->dev_fd >= 0;
+}
+static void kvm_arm_gic_reg_access(GICState *s, int group, int offset,
int cpu, uint32_t *val, bool write)
+{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- struct kvm_device_attr attr;
- int type;
- cpu = cpu & 0xff;
- attr.flags = 0;
- attr.group = group;
- attr.attr = (((uint64_t)cpu << KVM_DEV_ARM_VGIC_CPUID_SHIFT) &
KVM_DEV_ARM_VGIC_CPUID_MASK) |
(((uint64_t)offset << KVM_DEV_ARM_VGIC_OFFSET_SHIFT) &
KVM_DEV_ARM_VGIC_OFFSET_MASK);
- attr.addr = (uint64_t)(long)val;
(uintptr_t) should suffice.
- if (write) {
type = KVM_SET_DEVICE_ATTR;
- } else {
type = KVM_GET_DEVICE_ATTR;
- }
- if (kvm_device_ioctl(kgc->dev_fd, type, &attr) < 0) {
fprintf(stderr, "KVM_{SET/GET}_DEVICE_ATTR failed: %s\n",
strerror(errno));
Your kvm_device_ioctl returns -errno rather than setting errno, doesn't it?
abort();
- }
+}
+/* Read a register group from the kernel VGIC */ +static void kvm_arm_gic_dist_readr(GICState *s, uint32_t offset, int width,
int maxirq, vgic_translate_fn translate_fn)
+{
- uint32_t reg;
- int i;
- int j;
- int irq;
- int cpu;
- int regsz = 32 / width; /* irqs per kernel register */
- uint32_t field;
- for_each_irq_reg(i, maxirq, width) {
irq = i * regsz;
cpu = 0;
while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) {
kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, false);
for (j = 0; j < regsz; j++) {
field = reg >> (j * width) & ((1 << width) - 1);
field = extract32(reg, j * width, width);
translate_fn(s, irq + j, cpu, &field, false);
}
cpu++;
}
offset += 4;
- }
+}
+/* Write a register group to the kernel VGIC */ +static void kvm_arm_gic_dist_writer(GICState *s, uint32_t offset, int width,
int maxirq, vgic_translate_fn translate_fn)
+{
- uint32_t reg;
- int i;
- int j;
- int irq;
- int cpu;
- int regsz = 32 / width; /* irqs per kernel register */
- uint32_t field;
- for_each_irq_reg(i, maxirq, width) {
irq = i * regsz;
cpu = 0;
while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) {
reg = 0;
for (j = 0; j < regsz; j++) {
translate_fn(s, irq + j, cpu, &field, true);
reg |= (field & ((1 << width) - 1)) << (j * width);
reg = deposit32(reg, j * width, width, field);
}
kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, true);
cpu++;
}
offset += 4;
- }
+}
static void kvm_arm_gic_reset(DeviceState *dev) diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index 424a41f..9771163 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -100,6 +100,7 @@ typedef struct GICState {
/* these registers are mainly used for save/restore of KVM state */ uint8_t binary_point[2][NCPU]; /* [0]: group 0, [1]: group 1 */
- uint32_t active_prio[4][NCPU]; /* implementation defined layout */
You can't make this impdef in QEMU's state, that would mean we couldn't do migration between implementations which use different layouts. We need to pick a standard ("whatever the ARM v2 GIC implementation is" seems the obvious choice) and make the kernel convert if it's on an implementation which doesn't follow that version.
uint32_t num_cpu;
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index 1c31b5d..e5538c7 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -633,9 +633,15 @@ extern const VMStateInfo vmstate_info_bitmap; #define VMSTATE_UINT32_ARRAY_V(_f, _s, _n, _v) \ VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_uint32, uint32_t)
+#define VMSTATE_UINT32_2DARRAY_V(_f, _s, _n1, _n2, _v) \
- VMSTATE_2DARRAY(_f, _s, _n1, _n2, _v, vmstate_info_uint32, uint32_t)
#define VMSTATE_UINT32_ARRAY(_f, _s, _n) \ VMSTATE_UINT32_ARRAY_V(_f, _s, _n, 0)
+#define VMSTATE_UINT32_2DARRAY(_f, _s, _n1, _n2) \
- VMSTATE_UINT32_2DARRAY_V(_f, _s, _n1, _n2, 0)
#define VMSTATE_UINT64_ARRAY_V(_f, _s, _n, _v) \ VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_uint64, uint64_t)
Can you put the vmstate improvements in their own patch, please?
thanks -- PMM
On Fri, Sep 06, 2013 at 04:13:32PM +0100, Peter Maydell wrote:
On 23 August 2013 21:10, Christoffer Dall christoffer.dall@linaro.org wrote:
Save and restore the ARM KVM VGIC state from the kernel. We rely on QEMU to marshal the GICState data structure and therefore simply synchronize the kernel state with the QEMU emulated state in both directions.
We take some care on the restore path to check the VGIC has been configured with enough IRQs and CPU interfaces that we can properly restore the state, and for separate set/clear registers we first fully clear the registers and then set the required bits.
Signed-off-by: Christoffer Dall christoffer.dall@linaro.org
hw/intc/arm_gic_common.c | 2 + hw/intc/arm_gic_kvm.c | 418 ++++++++++++++++++++++++++++++++++++++++++- hw/intc/gic_internal.h | 1 + include/migration/vmstate.h | 6 + 4 files changed, 425 insertions(+), 2 deletions(-)
diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index a50ded2..f39bdc1 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -77,6 +77,8 @@ static const VMStateDescription vmstate_gic = { VMSTATE_UINT16_ARRAY(running_priority, GICState, NCPU), VMSTATE_UINT16_ARRAY(current_pending, GICState, NCPU), VMSTATE_UINT8_2DARRAY(binary_point, GICState, 2, NCPU),
VMSTATE_UINT32_2DARRAY(active_prio, GICState, 4, NCPU),
VMSTATE_UINT32(num_irq, GICState),
You don't need to save and restore num_irq, it is constant for the lifetime of the device (set by a property on the device which is fixed by the board model). Migration only works between two identical command lines; if the command lines are identical at each end then num_irq should be too.
ok
VMSTATE_END_OF_LIST() }
}; diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c index 9f0a852..9785281 100644 --- a/hw/intc/arm_gic_kvm.c +++ b/hw/intc/arm_gic_kvm.c @@ -23,6 +23,15 @@ #include "kvm_arm.h" #include "gic_internal.h"
+//#define DEBUG_GIC_KVM
+#ifdef DEBUG_GIC_KVM +#define DPRINTF(fmt, ...) \ +do { fprintf(stderr, "arm_gic: " fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) do {} while(0) +#endif
#define TYPE_KVM_ARM_GIC "kvm-arm-gic" #define KVM_ARM_GIC(obj) \ OBJECT_CHECK(GICState, (obj), TYPE_KVM_ARM_GIC) @@ -73,14 +82,419 @@ static void kvm_arm_gic_set_irq(void *opaque, int irq, int level) kvm_set_irq(kvm_state, kvm_irq, !!level); }
+static bool kvm_arm_gic_can_save_restore(GICState *s) +{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- return kgc->dev_fd >= 0;
+}
+static void kvm_arm_gic_reg_access(GICState *s, int group, int offset,
int cpu, uint32_t *val, bool write)
+{
- KVMARMGICClass *kgc = KVM_ARM_GIC_GET_CLASS(s);
- struct kvm_device_attr attr;
- int type;
- cpu = cpu & 0xff;
- attr.flags = 0;
- attr.group = group;
- attr.attr = (((uint64_t)cpu << KVM_DEV_ARM_VGIC_CPUID_SHIFT) &
KVM_DEV_ARM_VGIC_CPUID_MASK) |
(((uint64_t)offset << KVM_DEV_ARM_VGIC_OFFSET_SHIFT) &
KVM_DEV_ARM_VGIC_OFFSET_MASK);
- attr.addr = (uint64_t)(long)val;
(uintptr_t) should suffice.
yes
- if (write) {
type = KVM_SET_DEVICE_ATTR;
- } else {
type = KVM_GET_DEVICE_ATTR;
- }
- if (kvm_device_ioctl(kgc->dev_fd, type, &attr) < 0) {
fprintf(stderr, "KVM_{SET/GET}_DEVICE_ATTR failed: %s\n",
strerror(errno));
Your kvm_device_ioctl returns -errno rather than setting errno, doesn't it?
yes, it does
abort();
- }
+}
+/* Read a register group from the kernel VGIC */ +static void kvm_arm_gic_dist_readr(GICState *s, uint32_t offset, int width,
int maxirq, vgic_translate_fn translate_fn)
+{
- uint32_t reg;
- int i;
- int j;
- int irq;
- int cpu;
- int regsz = 32 / width; /* irqs per kernel register */
- uint32_t field;
- for_each_irq_reg(i, maxirq, width) {
irq = i * regsz;
cpu = 0;
while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) {
kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, false);
for (j = 0; j < regsz; j++) {
field = reg >> (j * width) & ((1 << width) - 1);
field = extract32(reg, j * width, width);
ok
translate_fn(s, irq + j, cpu, &field, false);
}
cpu++;
}
offset += 4;
- }
+}
+/* Write a register group to the kernel VGIC */ +static void kvm_arm_gic_dist_writer(GICState *s, uint32_t offset, int width,
int maxirq, vgic_translate_fn translate_fn)
+{
- uint32_t reg;
- int i;
- int j;
- int irq;
- int cpu;
- int regsz = 32 / width; /* irqs per kernel register */
- uint32_t field;
- for_each_irq_reg(i, maxirq, width) {
irq = i * regsz;
cpu = 0;
while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) {
reg = 0;
for (j = 0; j < regsz; j++) {
translate_fn(s, irq + j, cpu, &field, true);
reg |= (field & ((1 << width) - 1)) << (j * width);
reg = deposit32(reg, j * width, width, field);
ok
}
kvm_arm_gic_dist_reg_access(s, offset, cpu, ®, true);
cpu++;
}
offset += 4;
- }
+}
static void kvm_arm_gic_reset(DeviceState *dev) diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index 424a41f..9771163 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -100,6 +100,7 @@ typedef struct GICState {
/* these registers are mainly used for save/restore of KVM state */ uint8_t binary_point[2][NCPU]; /* [0]: group 0, [1]: group 1 */
- uint32_t active_prio[4][NCPU]; /* implementation defined layout */
You can't make this impdef in QEMU's state, that would mean we couldn't do migration between implementations which use different layouts. We need to pick a standard ("whatever the ARM v2 GIC implementation is" seems the obvious choice) and make the kernel convert if it's on an implementation which doesn't follow that version.
Implementation defined as in implementation defined in the architecture. I didn't think it would make sense to choose a format for an a15 implementation, for example, and then translate to that format for other cores using the ARM gic. Wouldn't migration only be support for same qemu model to same qemu model, in which case the format of this register would always be the same, and the kernel must return a format corresponding to the target cpu. Am I missing something here?
uint32_t num_cpu;
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index 1c31b5d..e5538c7 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -633,9 +633,15 @@ extern const VMStateInfo vmstate_info_bitmap; #define VMSTATE_UINT32_ARRAY_V(_f, _s, _n, _v) \ VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_uint32, uint32_t)
+#define VMSTATE_UINT32_2DARRAY_V(_f, _s, _n1, _n2, _v) \
- VMSTATE_2DARRAY(_f, _s, _n1, _n2, _v, vmstate_info_uint32, uint32_t)
#define VMSTATE_UINT32_ARRAY(_f, _s, _n) \ VMSTATE_UINT32_ARRAY_V(_f, _s, _n, 0)
+#define VMSTATE_UINT32_2DARRAY(_f, _s, _n1, _n2) \
- VMSTATE_UINT32_2DARRAY_V(_f, _s, _n1, _n2, 0)
#define VMSTATE_UINT64_ARRAY_V(_f, _s, _n, _v) \ VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_uint64, uint64_t)
Can you put the vmstate improvements in their own patch, please?
yes,
-Christoffer
On 21 September 2013 04:50, Christoffer Dall christoffer.dall@linaro.org wrote:
On Fri, Sep 06, 2013 at 04:13:32PM +0100, Peter Maydell wrote:
/* these registers are mainly used for save/restore of KVM state */ uint8_t binary_point[2][NCPU]; /* [0]: group 0, [1]: group 1 */
- uint32_t active_prio[4][NCPU]; /* implementation defined layout */
You can't make this impdef in QEMU's state, that would mean we couldn't do migration between implementations which use different layouts. We need to pick a standard ("whatever the ARM v2 GIC implementation is" seems the obvious choice) and make the kernel convert if it's on an implementation which doesn't follow that version.
Implementation defined as in implementation defined in the architecture. I didn't think it would make sense to choose a format for an a15 implementation, for example, and then translate to that format for other cores using the ARM gic. Wouldn't migration only be support for same qemu model to same qemu model, in which case the format of this register would always be the same, and the kernel must return a format corresponding to the target cpu. Am I missing something here?
I know it's architecturally impdef, but there are a couple of issues here: *) moving to the 'create me an irqchip' API is separating out the "which GIC do I have?" and "which CPU do I have?" questions somewhat, so it seems a shame to tangle them up again
*) for getting TCG<->KVM and KVM-with-non-host-CPU cases right we need to do translation anyway, or at least think about it. So we need to at minimum specifically document what the format is for the CPUs we care about. At that point we might as well have a standard format. IIRC the GIC spec defines a "this is the sensible format" anyway.
In practice, for the v7 and v8 CPUs we support, what format do they use?
-- PMM
On Sat, Sep 21, 2013 at 06:22:23AM +0900, Peter Maydell wrote:
On 21 September 2013 04:50, Christoffer Dall christoffer.dall@linaro.org wrote:
On Fri, Sep 06, 2013 at 04:13:32PM +0100, Peter Maydell wrote:
/* these registers are mainly used for save/restore of KVM state */ uint8_t binary_point[2][NCPU]; /* [0]: group 0, [1]: group 1 */
- uint32_t active_prio[4][NCPU]; /* implementation defined layout */
You can't make this impdef in QEMU's state, that would mean we couldn't do migration between implementations which use different layouts. We need to pick a standard ("whatever the ARM v2 GIC implementation is" seems the obvious choice) and make the kernel convert if it's on an implementation which doesn't follow that version.
Implementation defined as in implementation defined in the architecture. I didn't think it would make sense to choose a format for an a15 implementation, for example, and then translate to that format for other cores using the ARM gic. Wouldn't migration only be support for same qemu model to same qemu model, in which case the format of this register would always be the same, and the kernel must return a format corresponding to the target cpu. Am I missing something here?
I know it's architecturally impdef, but there are a couple of issues here: *) moving to the 'create me an irqchip' API is separating out the "which GIC do I have?" and "which CPU do I have?" questions somewhat, so it seems a shame to tangle them up again
Hmm, they are inherently coupled in hardware at least for v7 though, right?
*) for getting TCG<->KVM and KVM-with-non-host-CPU cases right we need to do translation anyway, or at least think about it.
Why? Wouldn't we always only support the case where QEMU emulates the same model as KVM in the first case, and the kernel should behave the same and export the same state if you ask for a specific target no matter what the underlying hardware is, no?
So we need to at minimum specifically document what the format is for the CPUs we care about. At that point we might as well have a standard format. IIRC the GIC spec defines a "this is the sensible format" anyway.
In practice, for the v7 and v8 CPUs we support, what format do they use?
It's not particularly clear as far as I can read it, but I'll have to investigate in more details. Also, I'm not quite clear on how TCG GIC reads/writes would get translated to the proper format depending on the core in that case, and we would have to have code in the arm_gic_kvm file to detect the emulated target and do translation. I'm failing to see the benefits?
-Christoffer
On 21 September 2013 06:46, Christoffer Dall christoffer.dall@linaro.org wrote:
On Sat, Sep 21, 2013 at 06:22:23AM +0900, Peter Maydell wrote:
*) for getting TCG<->KVM and KVM-with-non-host-CPU cases right we need to do translation anyway, or at least think about it.
Why? Wouldn't we always only support the case where QEMU emulates the same model as KVM in the first case, and the kernel should behave the same and export the same state if you ask for a specific target no matter what the underlying hardware is, no?
If the kernel has to be able to do translation of the state, why not make its life easier by having it only need to do one thing (host h/w format -> whatever standard format we pick) rather than lots and lots of things (host CPU X h/w format -> format for supported guest CPU A, host CPU X h/w format -> format for supported guest CPU B, host CPU Y h/w format -> format for guest CPU A, host CPU Y h/w format -> format for guest CPU B, etc etc etc)
? That's basically a cross product over every CPU we support.
-- PMM
On Sat, Sep 21, 2013 at 06:38:19PM +0900, Peter Maydell wrote:
On 21 September 2013 06:46, Christoffer Dall christoffer.dall@linaro.org wrote:
On Sat, Sep 21, 2013 at 06:22:23AM +0900, Peter Maydell wrote:
*) for getting TCG<->KVM and KVM-with-non-host-CPU cases right we need to do translation anyway, or at least think about it.
Why? Wouldn't we always only support the case where QEMU emulates the same model as KVM in the first case, and the kernel should behave the same and export the same state if you ask for a specific target no matter what the underlying hardware is, no?
If the kernel has to be able to do translation of the state, why not make its life easier by having it only need to do one thing (host h/w format -> whatever standard format we pick) rather than lots and lots of things (host CPU X h/w format -> format for supported guest CPU A, host CPU X h/w format -> format for supported guest CPU B, host CPU Y h/w format -> format for guest CPU A, host CPU Y h/w format -> format for guest CPU B, etc etc etc)
? That's basically a cross product over every CPU we support.
Good point.
So after reading the GIC specs again, the way I understand it is that the APR regs have a bit set, if that group priority (a.k.a. preemption level) has an active interrupt. Further, multiple set bits would would only happen if software acknowledges an interrupt and before EOIing it, the GIC gets preempted by an interrupt with a higher group priority (lower number). Correct?
Further, and again, I don't think the spec is particularly clear on this point, but I think it suggests that if bit[0] is set, then there's an interrupt from interrupt priority group 0 (the group with the highest priority) in the active state, if bit[1] is set, one from group 1 is active, and so on.
That would be a perfectly fine format for the APR in the GICstate structure, and the only remaining questions would be:
(1) How many preemption levels should be supported, which would be most easily solved by just defining GICC_APR0-GICC_APR3 for all cpu interfaces. (2) How does the arm_gic_kvm.c code detect the underlying host CPU that the kernel read the register from when it returned the value of the register to do the proper translation? I don't even want to think about how this will work on Big.Little...
-Christoffer
On 23 September 2013 11:14, Christoffer Dall christoffer.dall@linaro.org wrote:
On Sat, Sep 21, 2013 at 06:38:19PM +0900, Peter Maydell wrote: (2) How does the arm_gic_kvm.c code detect the underlying host CPU that the kernel read the register from when it returned the value of the register to do the proper translation? I don't even want to think about how this will work on Big.Little...
That's why the kernel does the translation, not userspace :-)
-- PMM
On Mon, Sep 23, 2013 at 09:02:44PM +0900, Peter Maydell wrote:
On 23 September 2013 11:14, Christoffer Dall christoffer.dall@linaro.org wrote:
On Sat, Sep 21, 2013 at 06:38:19PM +0900, Peter Maydell wrote: (2) How does the arm_gic_kvm.c code detect the underlying host CPU that the kernel read the register from when it returned the value of the register to do the proper translation? I don't even want to think about how this will work on Big.Little...
That's why the kernel does the translation, not userspace :-)
Ah yeah, I misread your previous mail as returning just the hardware state from the kernel. OK, I'll add this implementation definition of the register group to the kernel interface and to qemu's structure.
-Christoffer
On 23.08.2013, at 21:10, Christoffer Dall wrote:
Implement support to save/restore the ARM KVM VGIC state from the kernel. The basic appraoch is to transfer state from the in-kernel VGIC to the emulated arm-gic state representation and let the standard QEMU vmstate save/restore handle saving the arm-gic state. Restore works by reversing the process.
The first few patches adds missing features and fixes issues with the arm-gic implementation in qemu in preparation for the actual save/restore logic.
The patches depend on the device control patch series sent out earlier, which can also be found here: git://git.linaro.org/people/cdall/qemu-arm.git migration/device-ctrl
The whole patch series based on top of the above can be found here: git://git.linaro.org/people/cdall/qemu-arm.git migration/vgic
Reviewed-by: Alexander Graf agraf@suse.de
Alex
linaro-kernel@lists.linaro.org