op 02-11-13 22:36, Colin Cross schreef:
On Wed, Oct 30, 2013 at 5:17 AM, Maarten Lankhorst maarten.lankhorst@canonical.com wrote:
op 24-10-13 14:13, Maarten Lankhorst schreef:
So I actually tried to implement it now. I killed all the deprecated members and assumed a linear timeline. This means that syncpoints can only be added at the end, not in between. In particular it means sw_sync might be slightly broken.
I only tested it with a simple program I wrote called ufence.c, it's in drivers/staging/android/ufence.c in the following tree:
http://cgit.freedesktop.org/~mlankhorst/linux
the "rfc: convert android to fence api" has all the changes from my dma-fence proposal to what android would need, it also converts the userspace fence api to use the dma-fence api.
sync_pt is implemented as fence too. This meant not having to convert all of android right away, though I did make some changes. I killed the deprecated members and made all the fence calls forward to the sync_timeline_ops. dup and compare are no longer used.
I haven't given this a spin on a full android kernel, only with the components that are in mainline kernel under staging and my dumb test program.
~Maarten
PS: The nomenclature is very confusing. I want to rename dma-fence to syncpoint, but I want some feedback from the android devs first. :)
Come on, any feedback? I want to move the discussion forward.
~Maarten
I experimented with it a little on a device that uses sync and came across a few bugs:
- sync_timeline_signal needs to call __fence_signal on all signaled
points on the timeline, not just the first 2. fence_add_callback doesn't always initialize cb.node 3. sync_fence_wait should take ms 4. sync_print_pt status printing was incorrect 5. there is a deadlock: sync_print_obj takes obj->child_list_lock sync_print_pt fence_is_signaled fence_signal takes fence->lock == obj->child_list_lock 6. freeing a timeline before all the fences holding points on that timeline have timed out results in a crash
With the attached patch to fix these issues, our libsync and sync_test give the same results as with our sync code. I haven't tested against the full Android framework yet.
The compare op and timeline ordering is critical to the efficiency of sync points on Android. The compare op is used when merging fences to drop all but the latest point on the same timeline. This is necessary for example when the same buffer is submitted to the display on multiple frames, like when there is a live wallpaper in the background updating at 60 fps and a static screen of widgets on top of it. The static widget buffer is submitted on every frame, returning a new fence each time. The compositor merges the new fence with the fence for the previous buffer, and because they are on the same timeline it merges down to a single point. I experimented with disabling the merge optimization on a Nexus 10, and found that leaving the screen on running a live wallpaper eventually resulted in 100k outstanding sync points.
Well, here I did the same for dma-fence, can you take a look?
---
diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c index 2c7fd3f2ab23..d1d89f1f8553 100644 --- a/drivers/staging/android/sync.c +++ b/drivers/staging/android/sync.c @@ -232,39 +232,62 @@ void sync_fence_install(struct sync_fence *fence, int fd) } EXPORT_SYMBOL(sync_fence_install);
+static void sync_fence_add_pt(struct sync_fence *fence, int *i, struct fence *pt) { + fence->cbs[*i].sync_pt = pt; + fence->cbs[*i].fence = fence; + + if (!fence_add_callback(pt, &fence->cbs[*i].cb, fence_check_cb_func)) { + fence_get(pt); + (*i)++; + } +} + struct sync_fence *sync_fence_merge(const char *name, struct sync_fence *a, struct sync_fence *b) { int num_fences = a->num_fences + b->num_fences; struct sync_fence *fence; - int i; + int i, i_a, i_b;
fence = sync_fence_alloc(offsetof(struct sync_fence, cbs[num_fences]), name); if (fence == NULL) return NULL;
- fence->num_fences = num_fences; atomic_set(&fence->status, num_fences);
- for (i = 0; i < a->num_fences; ++i) { - struct fence *pt = a->cbs[i].sync_pt; - - fence_get(pt); - fence->cbs[i].sync_pt = pt; - fence->cbs[i].fence = fence; - if (fence_add_callback(pt, &fence->cbs[i].cb, fence_check_cb_func)) - atomic_dec(&fence->status); + /* + * Assume sync_fence a and b are both ordered and have no + * duplicates with the same context. + * + * If a sync_fence can only be created with sync_fence_merge + * and sync_fence_create, this is a reasonable assumption. + */ + for (i = i_a = i_b = 0; i_a < a->num_fences || i_b < b->num_fences; ) { + struct fence *pt_a = i_a < a->num_fences ? a->cbs[i_a].sync_pt : NULL; + struct fence *pt_b = i_b < b->num_fences ? b->cbs[i_b].sync_pt : NULL; + + if (!pt_b || pt_a->context < pt_b->context) { + sync_fence_add_pt(fence, &i, pt_a); + + i_a++; + } else if (!pt_a || pt_a->context > pt_b->context) { + sync_fence_add_pt(fence, &i, pt_b); + + i_b++; + } else { + if (pt_a->seqno - pt_b->seqno <= INT_MAX) + sync_fence_add_pt(fence, &i, pt_a); + else + sync_fence_add_pt(fence, &i, pt_b); + + i_a++; + i_b++; + } }
- for (i = 0; i < b->num_fences; ++i) { - struct fence *pt = b->cbs[i].sync_pt; - - fence_get(pt); - fence->cbs[a->num_fences + i].sync_pt = pt; - fence->cbs[a->num_fences + i].fence = fence; - if (fence_add_callback(pt, &fence->cbs[a->num_fences + i].cb, fence_check_cb_func)) - atomic_dec(&fence->status); - } + if (num_fences > i) + atomic_sub(num_fences - i, &fence->status); + fence->num_fences = i;
sync_fence_debug_add(fence); return fence;