DjangoCon 2019 – The Ins and Outs of Model Inheritance by Blythe J Dunham
Articles Blog

DjangoCon 2019 – The Ins and Outs of Model Inheritance by Blythe J Dunham


BLYTHE DUNHAM: Sorry about the technical difficulties.
I am Blythe Dunham and just venturing out on my own to freelance as snow giraffe. I
have been doing Django at Rover.com, the largest network of trusted dog sitters and walkers
and now we support cats and hire humans. Prior to that, I worked on Ruby on Rails for almost
a decade. Since I have been in the tech industry for over 20 years, you can probably say I
spent a lot of time hanging out with models. The jokes get better. So as you can tell from
the name snow giraffe, I really love snow sports and giraffes so I thought I would include
them in today’ adventure we will talk about composition, inheritance, and then avoiding
inheritance all together. So, who has seen this before? OK. I think I put it in the abstract.
In 1994, a book called design patterns came out where they advised folks doing Object
Oriented design to prefer composition over inheritance because it is more flexible. What
does this mean? Composition is a mechanism to combine objects or data into more complex
ones. You can think of it as a has-a relationship. For example, a giraffe has a blue tongue.
Inheritance is a way of deriving a subclass from a parent or base class to create a hierarchy
of shared attributes and methods. You can think of inheritance as an is a relationship
so a giraffe is a wonderful animal. It is more obvious to build relationships between
objects than it is to find commonalities and organize them into a hierarchy. When we think
about how objects map to the database via the ORM, we find composition is really intuitive
and has a natural mapping, however inheritance isn’t supported by relational databases so
therefore we have several different approaches to choose from since each has its own ins
and outs it is important to choose wisely or better yet rethink the problem using composition.
Let’s look at composition in Django. We have a giraffe class, it has a name field. We have
a tongue class and it has a 1-1 field back to giraffe. Now if giraffes had multiple tongues
— that would be scary — and I would use a foreign key type here instead for the one
to many relationship. If we look at the Object Oriented UML diagram, we represent our objects
with a 1-1 relationship. This looks super similar to the entity relational diagram that
represents our database schema. The objects are represented as tables and the foreign
keys are used to show the association. So, unfortunately, model inheritance is little
bit more awkward and doesn’t have that natural mapping. Up first is abstract models. The
point of abstract models is to reuse the parent classes, fields and field related functionality.
The parent is abstract and not backed by a table in the database therefore each derived
class has all of the fields from the parent and itself on the table. In Django, we have
an animal, parent class, that subclass is model, and we have a name field, we have a
method for speak that returns gibberish, and we have overridden the meta class definition
to set abstract to true. Giraffe’s subclasses animal, we add a field for the number of spots
that the giraffe has, and we override speak to return hum because that’s what giraffes
do you just can’t hear it. It is infrasonic. The ERD diagram looks like this. We have the
id integer and inname field for the parent animal and the spots count from the giraffe
class. If we were to subclass, animal again with zebra it would have anything added like
a stripe count. When we query for giraffes this queries against the abstract giraffe
table. We can’t query with animal.objects.all because that animal table doesn’t exist. When
we call speak on the giraffe it returns hum because we have over ridden that method. So,
use-cases. Abstract models work best when there is a lot of duplicated fields. If there
is only a few fields it is better to be explicit and just define them on each model. Great
examples include any sort of base or core model functionality that all or many of your
models inherent. In two scoops of Django, it walks you through the time stamp model
which is implemented also in Django extensions. What it does is it addsanned — adds an addead
and updated field. The advantages of abstract models are that you can easily reuse the parent
classes fields and field-related logic. However, the parent class can’t be used in isolation.
If you have any related models, you can’t have an animal id. You will need to have a
zebra and giraffe id. OK. This is my favorite slide in the whole deck. The photographer
granted me permission to use it so I could warn you about using multiple table inheritance.
Don’t get eaten by the lion. No, multi-table inheritance is defined like this in Django.
We have a big cat parent class with a name field and we have a subclass, lion, that adds
giraffes hunted, and a method called speak. Now notice that I haven’t overridden the meta
class definition. This is vanilla out of box inheritance in Django. So, first this is called
concrete inheritance because the parent class is concrete. We have a big cat table in the
database with that id and a name field. The lion table has a pointer, a big cat pointer,
which is a foreign key back to big cat, and then it adds any fields of its own like giraffe
hunted. Notice we don’t have an id field which isn’t usual the default in Django so the primary
key of the lion table is this big cat pointer id. If we subclass big cat with cheetah and
cheetah has none of its own fields we still have that big cat pointer id. Notice here
that you could implement this explicitty with 1-1 relationships if you wanted. What happens
when we query? Let’s try to get all of the Lions. This executes a query on the lion table
joined to the big cat table. This allows you to access the big cat if — instance, without
executing an additional query. You can also call any of the fields or methods on the parent
directly on lion. You can say lion.name. The problem starts when we try to get all of the
species of cats regardless of lion, cheetah. Give me all the big cats and we will run a
query on the big cat table. Then I want to access the speak method on the child but I
don’t know if this is a cheetah or a lion because that foreign key is on the cheetah
and lion fields. I run a query on the cheetah table and it is not a cheetah. So we get an
exception. Then we can try again with lion. This time it works, it returns roar, but we
have executed another query on the lion table so what this means is for each record you
have, you will execute up to N queries where N is the number of subclasses. If you add
another subclass, then your performance might be degraded. But wait, you say, I love to
eager load and optimize everything. OK. That’s great. Good for you. And second of all, you
are still going to have to do a prefetch query or a select related which causes an evil left
join per subclass. A good use case were multi-table inheritance is a travel cart. We sell trips
with start and end dates and t shirts are sizing information. So the cart or order has
a many-to-many relationship with product meaning I have a join table here, and if I just need
that name and pricing information, I don’t have to follow the pointer to the trip and
clothing classes, then it is not really a performance problem. In addition, if I only
have a few products in a cart at a time, you can follow that foreign key, and it shouldn’t
be too terrible. In conclusion, the advantages of multiple-table inheritance with that all
the common parent attributes can be queried easily together. However, when you start accessing
those subclasses, it could lead to inefficient queries that hurt performance. A lot of this
is lack of understanding of what is happening under the covers so sometimes it is better
to be explicit because if your coworker adds a subclass two years down the road you might
find yourself having performance problems. Last but not least, we have proxy models.
The purpose of proxy models is to override the behavior and functionality of the parent
class. So we have exactly one table to rule them all. For lack of a better word, every
one in middle earth is a person. This is our parent class. They have a name and a person
type that I will talk about in a minute. And there is a method called characteristic that
returns middle earth dweller. Hobt subclasses person and sets proxy to true on the meta
class definition and defines a method characteristic that returns hairy feet. There is one precious
table and when we access this via hobbit we will get back a hobbit instance and calling
characteristic will return hairy feet. If we access it through person, we will use the
same row in the database and same record, but when we call characteristic we will get
back middle earth dweller. We basically just change the behavior of the subclass. So one
cool thing you can do with this is add a custom manager. This elf has an elf manager. The
elf manager overrides the type and person E. We override the filter set. If we query
person.objects.all we will get back instances of person however if we query with elf we
get back on legalos because frat frodo is a hobbit. We are just changing the order of
the column said we select. It is against proxy person which is our one and only table. The
advantages of proxy models are it is easy to modify the subclasses behavior but the
disadvantage and fields must be designed for everyone on the table. Use-cases are like
ordered model when you change the model to sort on an added field, or an active model
where you filter out deactivated models. You can create a custom user model but it might
be better to think of that has a 1-1 relationship between user and user profile. One thing you
can do with proxy models is downcasting and single table inheritance. Down casting is
a way to cast instances in the subclass. Normally whether we query with person, it returns a
person instance when you call characteristic it returns middle earth dweller. With down
casting, it will return a hobbit and an elf instance and the way this works is we have
a type field, and we set it with the class name. So we do one query, we get that class
name, and abstantiate the class. When we call frodo and get there characteristics we have
never wanted hairy feet so much. Downcasting is not out of the box. There is an an article
calls Django STI on the cheap. There is downcasting for multiple table inheritance but you have
to be careful because you will still incur that extra query or select depending on the
implementation. The advantage is performance, performance, performance. One table means
one query. The disadvantage is since each subclass, all the fields have to be represented
on that one table, it can lead to clutter and bloat. Some call this denormalization
of all data on one table for performance. The use-cases for single table and multiple
table inheritance are really similar. The shopping cart scenario would work. The way
that you can choose between them is ask yourself do most of the subclasses share fields and
functionality? If so, single table inheritance might be appropriate. If they are vastly different
than multiple table inheritance is preferable. Now that we have gone through all this. I
am going to tell you sometimes the best type of model inheritance is not to use inheritance
at all. This guy. It is not good to play with generic explosives all the time. We have something
called generic foreign keys in Django to implement polymorphism which is the ability of an object
to make on many Forms like you saw with inheritance. Your homework is to look up how to define
this in Django. But a generic foreign key fakes a real key with two fields. The first
is the content id and that holds the name and the app label for all of your concrete
models across all of your apps. The other field is an object id which is just an integer.
So we put the ID of the related model here. You could putO or non-sense data. It is just
an integer field. For that reason, we have a very week relationship to anything you want
it to be. It could be a blog, it could be a giraffe, it could be a location. It doesn’t
matter. The advantages are you can use any model. You don’t have to do another migration.
The use-cases for generic foreign keys are things like tags and comments where the related
object can be anything like a blog post, a giraffe or a location and the object, the
comment, is not usually accesses — accessed outside of the blog post. Your typical use
case is I have a blog post, give me all of my comments for this particular post. If you
are asking the question like give me all the comments in the world and I don’t care what
the object related is and I need to look into object and do another query to see what it
is you might consider using single or multiply table inheritance. The disadvantages are that
code can become hard to maintain. If you are using dynamic type checking then two years
down the road you might not remember what this obobject is that you are passing around.
Another advantage is in order to access the objects from the scenario where we have all
the comments, you can’t use select related so you are going to have to write custom SQL
if you want to optimize the performance. The other major disadvantage is there is no referential
integrity. That is like a seatbelt. You have nothing to prevent you from putting dirty
data in the database. If you delete a record, the object id field is just an integer, so
it might not cascade through and you could just end up with a little bit of unclean data.
So the second alternative to model inheritance is unstructured data. What we are doing here
is taking a bunch of fields and we are serializing and just shoving them into the database. This
can avoid the clutter that you see with single table inheritance because each subclass uses
that one database field to jam in whatever it wants. And similarly, it avoids to need
for related objects as it multiple table inheritance. The disadvantages are it is tough to query
against unstructured fields. Postgres will let you do it but for the most part you want
to just put data in the database and not index or query against it. The other disadvantage
is that you lose data integrity that is not enforced by the database. All the validation
has to be on the application level. Then this can lead to dirty data since we are just putting
in exactly what each subclass wants and you make a change so you have another blob and
some of your data might be a little bit dirty. OK. We have gotten through the two alternatives
but maybe we could rethink this a little bit. This lady is going into Jackson hole and probably
could have rethought her approach. This is the easy entrance. She made it. I went in
after her so it’s all good. My point is just because objects share attributes it doesn’t
mean we should represent them together in a hierarchy. A human and a beetle both have
legs but they are not inherently similar or most of them aren’t. The good news is post
is a relationships can be expressed as a has a relationship. A user is a seller or a user
has a seller profile. Thigs this is that 1-1 relationship I was talking about with user
you can create a profile instead of subclassing. Another example is a manager is an employee
or an employee has a managerial job. The last thing is sometimes it is good to be explicit.
Sometimes you can use multiple foreign keys instead of inheritance like proxy models,
or maybe if you only have one field to repeat you can add it to multiple models. With multiple
table inheritance you won’t bet the bells and whistles that Django provides but it is
more explicit to implement it with 1-1 fields, so everybody in your organization knows what’s
going on and you will recognize the fact that when you add a subclass your performance will
be degraded. Your future self will thank you. And I thank you to. I appreciate and a big
shout-out to all the organizers and volunteers to make DjangoCon so special. Feel free to
hit me up on all of the normal means. I put the slides and the code that I used at BlytheDunham/dml.
Thank you very much.>>Thank you, Blythe. Will you be around for
the sprints?>>I won’t be around for the sprints —
>>Catch her today. Last chance. Thank you, again. After two very long talks, we have
a whole bunch of really short ones are our lightning talks or alternatively you can go
get lunch. That’s OK. Has anyone seen Kojo because he is up now? Oh, for the Latinx Django
nauts in the room there is a group photo at 1:40 at the pool. It is being announced on
Twitter as well but I am a voice.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top