Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 839 Vote(s) - 3.55 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Is django prefetch_related supposed to work with GenericRelation

#1
**UPDATE 2022:** The original ticked [#24272](

[To see links please register here]

) which I opened 8 years ago about this issue is now closed in favor of [#33651](

[To see links please register here]

), which once implemented will give us a new syntax to do this type of prefetches.

**============== END OF UPDATE ==============**

What's all about?

Django has a [GenericRelation](

[To see links please register here]

) class, which adds a **“reverse” generic relationship** to enable an additional **API**.

It turns out we can use this `reverse-generic-relation` for `filtering` or `ordering`, but we can't use it inside `prefetch_related`.

I was wondering if this is a bug, or its not supposed to work, or its something that can be implemented in the feature.

Let me show you with some examples what I mean.

Lets say we have two main models: `Movies` and `Books`.

- `Movies` have a `Director`
- `Books` have an `Author`

And we want to assign tags to our `Movies` and `Books`, but instead of using `MovieTag` and `BookTag` models, we want to use a single `TaggedItem` class with a `GFK` to `Movie` or `Book`.

Here is the model structure:

from django.db import models
from django.contrib.contenttypes.fields import GenericForeignKey, GenericRelation
from django.contrib.contenttypes.models import ContentType


class TaggedItem(models.Model):
tag = models.SlugField()
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey('content_type', 'object_id')

def __unicode__(self):
return self.tag


class Director(models.Model):
name = models.CharField(max_length=100)

def __unicode__(self):
return self.name


class Movie(models.Model):
name = models.CharField(max_length=100)
director = models.ForeignKey(Director)
tags = GenericRelation(TaggedItem, related_query_name='movies')

def __unicode__(self):
return self.name


class Author(models.Model):
name = models.CharField(max_length=100)

def __unicode__(self):
return self.name


class Book(models.Model):
name = models.CharField(max_length=100)
author = models.ForeignKey(Author)
tags = GenericRelation(TaggedItem, related_query_name='books')

def __unicode__(self):
return self.name

And some initial data:

>>> from tags.models import Book, Movie, Author, Director, TaggedItem
>>> a = Author.objects.create(name='E L James')
>>> b1 = Book.objects.create(name='Fifty Shades of Grey', author=a)
>>> b2 = Book.objects.create(name='Fifty Shades Darker', author=a)
>>> b3 = Book.objects.create(name='Fifty Shades Freed', author=a)
>>> d = Director.objects.create(name='James Gunn')
>>> m1 = Movie.objects.create(name='Guardians of the Galaxy', director=d)
>>> t1 = TaggedItem.objects.create(content_object=b1, tag='roman')
>>> t2 = TaggedItem.objects.create(content_object=b2, tag='roman')
>>> t3 = TaggedItem.objects.create(content_object=b3, tag='roman')
>>> t4 = TaggedItem.objects.create(content_object=m1, tag='action movie')

So as the [docs](

[To see links please register here]

) show we can do stuff like this.

>>> b1.tags.all()
[<TaggedItem: roman>]
>>> m1.tags.all()
[<TaggedItem: action movie>]
>>> TaggedItem.objects.filter(books__author__name='E L James')
[<TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: roman>]
>>> TaggedItem.objects.filter(movies__director__name='James Gunn')
[<TaggedItem: action movie>]
>>> Book.objects.all().prefetch_related('tags')
[<Book: Fifty Shades of Grey>, <Book: Fifty Shades Darker>, <Book: Fifty Shades Freed>]
>>> Book.objects.filter(tags__tag='roman')
[<Book: Fifty Shades of Grey>, <Book: Fifty Shades Darker>, <Book: Fifty Shades Freed>]

But, if we try to `prefetch` some `related data` of `TaggedItem` via this `reverse generic relation`, we are going to get an **AttributeError**.


>>> TaggedItem.objects.all().prefetch_related('books')
Traceback (most recent call last):
...
AttributeError: 'Book' object has no attribute 'object_id'


Some of you may ask, why I just don't use `content_object` instead of `books` here? The reason is, because this only work when we want to:

1) `prefetch` only one level deep from `querysets` containing different type of `content_object`.

```
>>> TaggedItem.objects.all().prefetch_related('content_object')
[<TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: action movie>]
```


2) `prefetch` many levels but from `querysets` containing only one type of `content_object`.

```
>>> TaggedItem.objects.filter(books__author__name='E L James').prefetch_related('content_object__author')
[<TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: roman>]
```

But, if we want both 1) and 2) (to `prefetch` many levels from `queryset` containing different types of `content_objects`, we can't use `content_object`.


>>> TaggedItem.objects.all().prefetch_related('content_object__author')
Traceback (most recent call last):
...
AttributeError: 'Movie' object has no attribute 'author_id'


`Django` thinks that all `content_objects` are `Books`, and thus they have an `Author`.

Now imagine the situation where we want to `prefetch` not only the `books` with their `author`, but also the `movies` with their `director`. Here are few attempts.

The silly way:

>>> TaggedItem.objects.all().prefetch_related(
... 'content_object__author',
... 'content_object__director',
... )
Traceback (most recent call last):
...
AttributeError: 'Movie' object has no attribute 'author_id'

Maybe with custom `Prefetch` object?

>>> TaggedItem.objects.all().prefetch_related(
... Prefetch('content_object', queryset=Book.objects.all().select_related('author')),
... Prefetch('content_object', queryset=Movie.objects.all().select_related('director')),
... )
Traceback (most recent call last):
...
ValueError: Custom queryset can't be used for this lookup.


Some solutions of this problem are shown [here](

[To see links please register here]

). But that's a lot of massage over the data which I want to avoid.
I really like the API coming from the `reversed generic relations`, it would be very nice to be able to do `prefetchs` like that:

>>> TaggedItem.objects.all().prefetch_related(
... 'books__author',
... 'movies__director',
... )
Traceback (most recent call last):
...
AttributeError: 'Book' object has no attribute 'object_id'

Or like that:

>>> TaggedItem.objects.all().prefetch_related(
... Prefetch('books', queryset=Book.objects.all().select_related('author')),
... Prefetch('movies', queryset=Movie.objects.all().select_related('director')),
... )
Traceback (most recent call last):
...
AttributeError: 'Book' object has no attribute 'object_id'

But as you can see, we aways get that **AttributeError**.
I'm using Django `1.7.3` and Python `2.7.6`. And i'm curious why Django is throwing that error? Why is Django searching for an `object_id` in the `Book` model?
**Why I think this may be a bug?**
Usually when we ask `prefetch_related` to resolve something it can't, we see:

>>> TaggedItem.objects.all().prefetch_related('some_field')
Traceback (most recent call last):
...
AttributeError: Cannot find 'some_field' on TaggedItem object, 'some_field' is an invalid parameter to prefetch_related()

But here, it is different. Django actually tries to resolve the relation... and fails. Is this a bug which should be reported? I have never reported anything to Django so that's why I'm asking here first. I'm unable to trace the error and decide for myself if this is a bug, or a feature which could be implemented.
Reply

#2
If you want to retrieve `Book` instances and prefetch the related tags use `Book.objects.prefetch_related('tags')`. No need to use the reverse relation here.

You can also have a look at the related tests in the [Django source code](

[To see links please register here]

).

Also the [Django documentation](

[To see links please register here]

) states that `prefetch_related()` is supposed to work with `GenericForeignKey` and `GenericRelation`:

> `prefetch_related`, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of `GenericRelation` and `GenericForeignKey`.

**UPDATE:** To prefetch the `content_object` for a `TaggedItem` you can use `TaggedItem.objects.all().prefetch_related('content_object')`, if you want to limit the result to only tagged `Book` objects you could additionally filter for the `ContentType` (not sure if `prefetch_related` works with the `related_query_name`). If you also want to get the `Author` together with the book you need to use [`select_related()`](

[To see links please register here]

) not `prefetch_related()` as this is a `ForeignKey` relationship, you can combine this in a [custom `prefetch_related()` query](

[To see links please register here]

):

from django.contrib.contenttypes.models import ContentType
from django.db.models import Prefetch

book_ct = ContentType.objects.get_for_model(Book)
TaggedItem.objects.filter(content_type=book_ct).prefetch_related(
Prefetch(
'content_object',
queryset=Book.objects.all().select_related('author')
)
)
Reply

#3
`prefetch_related_objects` to the rescue.

Starting from Django 1.10 *(Note: it still presents in the previous versions, but was not part of the public API.)*, we can use [prefetch_related_objects][1] to divide and conquer our problem.

`prefetch_related` is an operation, where Django fetches related data **after** the queryset has been evaluated (doing a second query after the main one has been evaluated). And in order to work, it expects the items in the queryset to be homogeneous (the same type). The main reason the reverse generic generation does not work right now is that we have objects from different content types, and the code is not yet smart enough to separate the flow for different content types.

Now using `prefetch_related_objects` we do fetches only on a **subset** of our queryset where all the items will be homogeneous. Here is an example:

from django.db import models
from django.db.models.query import prefetch_related_objects
from django.core.paginator import Paginator
from django.contrib.contenttypes.models import ContentType
from tags.models import TaggedItem, Book, Movie


tagged_items = TaggedItem.objects.all()
paginator = Paginator(tagged_items, 25)
page = paginator.get_page(1)

# prefetch books with their author
# do this only for items where
# tagged_item.content_object is a Book
book_ct = ContentType.objects.get_for_model(Book)
tags_with_books = [item for item in page.object_list if item.content_type_id == book_ct.id]
prefetch_related_objects(tags_with_books, "content_object__author")

# prefetch movies with their director
# do this only for items where
# tagged_item.content_object is a Movie
movie_ct = ContentType.objects.get_for_model(Movie)
tags_with_movies = [item for item in page.object_list if item.content_type_id == movie_ct.id]
prefetch_related_objects(tags_with_movies, "content_object__director")

# This will make 5 queries in total
# 1 for page items
# 1 for books
# 1 for book authors
# 1 for movies
# 1 for movie directors
# Iterating over items wont make other queries
for item in page.object_list:
# do something with item.content_object
# and item.content_object.author/director
print(
item,
item.content_object,
getattr(item.content_object, 'author', None),
getattr(item.content_object, 'director', None)
)



[1]:

[To see links please register here]

Reply

#4
Building on Bernhard's answer, which has a code-snippet at the end that throws the below error in reality:
```
ValueError: Custom queryset can't be used for this lookup.
```

I've overridden the GenericForeignKey to actually allow the behavior, how bulletproof this implementation is, is unknown to me at this time but it seems to get what I need done, so I'm posting it here, hopefully it'll help out others. Please lookout for `START CHANGES` and `END CHANGES` tags to see my changes to the original django code.

```
from django.contrib.contenttypes.fields import GenericForeignKey as BaseGenericForeignKey

class CustomGenericForeignKey(BaseGenericForeignKey):
def get_prefetch_queryset(self, instances, queryset=None):
"""
Enable passing queryset to get_prefetch_queryset when using GenericForeignKeys but only works when a single
content type is being queried
"""
# START CHANGES
# if queryset is not None:
# raise ValueError("Custom queryset can't be used for this lookup.")
# END CHANGES

# For efficiency, group the instances by content type and then do one
# query per model
fk_dict = defaultdict(set)
# We need one instance for each group in order to get the right db:
instance_dict = {}
ct_attname = self.model._meta.get_field(self.ct_field).get_attname()
for instance in instances:
# We avoid looking for values if either ct_id or fkey value is None
ct_id = getattr(instance, ct_attname)
if ct_id is not None:
fk_val = getattr(instance, self.fk_field)
if fk_val is not None:
fk_dict[ct_id].add(fk_val)
instance_dict[ct_id] = instance

ret_val = []
for ct_id, fkeys in fk_dict.items():
instance = instance_dict[ct_id]
# START CHANGES
if queryset is not None:
assert len(fk_dict) == 1 # only a single content type is allowed, else undefined behavior
ret_val.extend(queryset.filter(pk__in=fkeys))
else:
ct = self.get_content_type(id=ct_id, using=instance._state.db)
ret_val.extend(ct.get_all_objects_for_this_type(pk__in=fkeys))
# END CHANGES

# For doing the join in Python, we have to match both the FK val and the
# content type, so we use a callable that returns a (fk, class) pair.
def gfk_key(obj):
ct_id = getattr(obj, ct_attname)
if ct_id is None:
return None
else:
model = self.get_content_type(id=ct_id,
using=obj._state.db).model_class()
return (model._meta.pk.get_prep_value(getattr(obj, self.fk_field)),
model)

return (
ret_val,
lambda obj: (obj.pk, obj.__class__),
gfk_key,
True,
self.name,
True,
)
```
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through