I just hit something interesting in Djano and thought I'd share. I was playing with some data from a queryset. I was somewhat aware its not your ordinary python list. However, I was surprised to see this issue:
I had a query that was returning TestResult objects. When I iterated over the list like:
for r in results: print r.measurement
I got the objects I expected. However, at the time I was coding I needed to know the loop index, so I did something like:
for i in xrange(results.count()) print results[i].measurement
This example acts like it works, but upon looking at the data I realized I got incorrect results. It seems like results[0] and results[1] are always the same, but I haven't dug enough to prove that's always the case.
Anyway, I had a feeling using the xrange approach was wrong to begin with, but it turns out to be actually wrong without telling you.
-andy
On Thu, 23 Feb 2012 14:49:16 -0600, Andy Doan andy.doan@linaro.org wrote:
I just hit something interesting in Djano and thought I'd share. I was playing with some data from a queryset. I was somewhat aware its not your ordinary python list. However, I was surprised to see this issue:
I had a query that was returning TestResult objects.
From what happens next I guess the query didn't have an ORDER BY?
When I iterated over the list like:
for r in results: print r.measurement
I got the objects I expected. However, at the time I was coding I needed to know the loop index, so I did something like:
for i in xrange(results.count()) print results[i].measurement
This example acts like it works, but upon looking at the data I realized I got incorrect results. It seems like results[0] and results[1] are always the same, but I haven't dug enough to prove that's always the case.
Anyway, I had a feeling using the xrange approach was wrong to begin with, but it turns out to be actually wrong without telling you.
If there's a bug here, it's that Django lets you write this. "results[i]" appears to translate to this sort of query:
select * from ... where ... limit 1 offset $i
So the for loop you wrote is executing this results.count() times.
The thing is, if you execute the query multiple times, there is no guarantee that the ordering as considered by the limit & offset will be the same, so you won't necessarily get the all the objects once in the for loop.
Even if you did ORDER BY the query, it's still horribly inefficient. You probably wanted to write:
for i, ob in enumerate(results): ...
instead :-)
I think it's arguably a bug that Django lets you issue offset/limit queries on unordered result sets. I can't imagine when it would be the right thing to do.
Cheers, mwh
On Thu, Feb 23, 2012 at 10:26 PM, Michael Hudson-Doyle michael.hudson@canonical.com wrote:
On Thu, 23 Feb 2012 14:49:16 -0600, Andy Doan andy.doan@linaro.org wrote:
I just hit something interesting in Djano and thought I'd share. I was playing with some data from a queryset. I was somewhat aware its not your ordinary python list. However, I was surprised to see this issue:
You reasoned based on a broken assumption.
What you must know is that all Django's QuerySet methods are _lazy_. You only get a copy of the QuerySet object with extra expression/filter applied. The only exception to this is the small subset of methods that actually "evaluate" the QuerySet and return the data. This includes methods like .count(), __iter__(), __getitem__(), .values(), values_list(), and a few others.
If you initially started with result = Model.objects.all() then, as Michael has already explained, each time results[i] is evaluated Django does a SQL query (that in this case can return random element as the ordering is not specified).
I had a query that was returning TestResult objects.
From what happens next I guess the query didn't have an ORDER BY?
When I iterated over the list like:
for r in results: print r.measurement
I got the objects I expected. However, at the time I was coding I needed to know the loop index, so I did something like:
for i in xrange(results.count()) print results[i].measurement
This example acts like it works, but upon looking at the data I realized I got incorrect results. It seems like results[0] and results[1] are always the same, but I haven't dug enough to prove that's always the case.
Anyway, I had a feeling using the xrange approach was wrong to begin with, but it turns out to be actually wrong without telling you.
If there's a bug here, it's that Django lets you write this. "results[i]" appears to translate to this sort of query:
select * from ... where ... limit 1 offset $i
So the for loop you wrote is executing this results.count() times.
The thing is, if you execute the query multiple times, there is no guarantee that the ordering as considered by the limit & offset will be the same, so you won't necessarily get the all the objects once in the for loop.
Even if you did ORDER BY the query, it's still horribly inefficient. You probably wanted to write:
for i, ob in enumerate(results): ...
instead :-)
I think it's arguably a bug that Django lets you issue offset/limit queries on unordered result sets. I can't imagine when it would be the right thing to do.
It's probably used in the implementation of .get()
Best regards ZK
linaro-validation@lists.linaro.org