Guest post by Mike Fanning
One of the more subtly difficult decisions to make when data modeling is selecting the appropriate cardinality for a relationship. Choosing the wrong cardinalities can lead to some awkward problems down the road, so it is worthwhile to use extra care when creating them. Here's a quick recap of two cardinalities I'd like to discuss in this post: a one-to-many (hereafter referred to as O2M) is normally used when a there is some kind of parent-child relationship and one entity is subordinate to another. A many-to-many (M2M) allows an entity on either side of the relationship to map to multiple entities on the opposite side, and is traditionally implemented with a junction or join table in relational databases.
I'm not sure if it's because of the perceived increase in complexity, difficulty with ORM tools or something else altogether, but developers seem to instinctively reach for O2M mappings. Maybe computer science curricula spend too much time on trees and not enough on graphs or something.
Anyways, here are a few things I like to think about when modeling a relationship (with some hypothetical scenarios in italics):
- Consider the relationship from the other side. You might have several employees working in one department, but could a single employee work in multiple departments?
- Is there information that is specific to the combination of the two entities? If an employee transferred from one department to another, where would you track the date of transfer?
- Are you going to create multiple relationships between the same two entities? Will there be one relationship for the manager of the department and another for the other employees?
Hopefully this line of reasoning will give you a deeper understanding of the connections in your application. Sometimes a M2M will clearly lead to too much ambiguity. For example, associating an invoice line item with multiple invoices won't make much sense - how would the amount for the line item factor into the total for each invoice? Other times it may become clear that a O2M will paint you into a corner.
Working under the basic assumption that schema changes to production systems, along with the associated data transformations, are more difficult than code changes, I would give the following advice: build your first class entities, then create M2M relationships between them. When in doubt, keep the data model flexible by weakly associating major entities and enforcing business rules higher up the stack. You can always make the service or business tier more restrictive than the data access tier, but going the other direction is going to be painful and full of exception handling code (or worse yet, full of un-handled HTTP 500 errors and/or modal dialogs full of gibberish).
A WORD OF WARNING
This may sound like a bunch of abstract nonsense, so to illustrate this point we’ll look at how a model can degenerate and become unwieldy over time. For this example, consider Sales Orders and Customers. When modeling a Sales Order, the temptation is to make a many-to-one relationship to Customers, meaning that any given order will have only a single customer.
Seems pretty normal, right? In most cases you’re only dealing with a single entity that could be considered a customer, so this model makes perfect sense.
WHY IS THAT TABLE CALLED PARTY?
Rather than making a table for customers, another for suppliers, a third one for distributors, one for employees, etc., I’ve combined them all into a single table. Any organization or person could be a party. This has benefits and drawbacks, but it’s too big of a topic to cover here. You can equate them with customers for this example.
What happens when you’re supposed to bill one customer and ship the order to another? What if some third party places the order on behalf of somebody else?
You could add more fields and relationships to your model, like so:
This requires schema changes though, and it can lead to a bloated model with all the attendant issues. It can also be intimidating to look at database tables with dozens of columns, and over time it becomes difficult to remember and document the meaning of them.
A more flexible approach using a join table is shown below:
Here the OrderRole table creates the M2M between Order and Party, while the RoleType table enumerates the different types of relationship (bill-to customer, ship-to customer, etc). There are more tables now, so some of the conceptual difficulty has simply shifted from the additional columns into additional tables, but it has much more room to grow.
Another benefit is that you already have a place to put data that is specific to an Order-Party combination. If you want to track when a Party was associated with an Order, you add a single column to the OrderRole table, whereas in the previous example you would need to add four new columns, one for each many-to-one relationship.
I’ll wrap things up with a real world anecdote. Several years ago a I received a request for a new feature that would allow users to “watch” POs that were of particular interest to them. These POs would then appear on a dashboard so that the watchers could monitor their progress (e.g. estimated delivery date, completion date). Because we had this type of schema in place, along with supporting services, we were able to implement the feature by simply adding a new row to the RoleType table and making the necessary user interface changes. Interestingly enough, if we had been using the schema from the first or second diagram, we would have needed a many-to-many relationship for the watcher feature anyway.
You can find MySQL Workbench models for the diagrams in this post here: https://github.com/mikegfanning/bloggin/tree/master/post2