Common Mistakes in SQL Database Design
Hey, we all make mistakes. All of us. Even the brightest database administrators and schema designers. But we should also be learning from all those mistakes. Some lessons we learn from our own mistakes, some from watching others, and some just out of experience with dealing with common practices.
Every mistake below, I've made numerous times over the years, some of out of ignorance, but most from using time constraints and cutting corners. But most of the time, I ended up paying for the mistake and had to fix it later.
It wasn't worth it. It should have just been designed correctly in the first place.
So with that being said, below I've compiled a list of common mistakes I've seen and the better/correct alternative approach.
Naming tables with the singular form of the entity
Although this is the recommended standard in some circles, this is just bad practice and creates too many complications down the road. The main reason is because in the software development world, we manage data in both entity (singular/row) and dataset (collection/table) form, and it's extremely important to keep the operations on these two different types separate, so it's standard practice to name row entities with the singular form of the word and collection entities in the plural form of the word.
If the database table is named in the singular, what happens is we must retrieve data from a table in one form of the word (such as Person), store it in a collection with the plural name (Persons or People), then identify items in the collection back in the singular form (Person). If we end up referring to a single row as Person but the database calls the table Person, this is just a headache best avoided. In fact, some software Object Relational Mapping (ORM) systems such as Entity Framework have complex language logic built into them that attempts to distinguish between rows and tables based on whether the name is plural or singular. When tables are named in the singular form, this creates a domino effect of problems within the software that requires the developer to go out of their way to explicitly rename things so that the ORM can understand what is what.
Storing contact (phone, address, email) information in the same table as its parent entity
Examples: Persons.Address, Employee.Address, Company.WorkEmail, Supplier.MobilePhone. No matter how tempting it may be, do not create all these columns. Each of these contact types can potentially have multiple values, no matter how guaranteed you are there won't be. There may be multiples is what's more important. If you create these columns in the beginning and then later need to extract them out and normalize them to other tables, the amount of work required to modify the the database and any connecting applications is painful.
- Better: Create a separate table per contact type, per entity for each of these types, such as PersonAddresses, CompanyPhones, PersonEmails, CustomerAddresses, CustomerPhones, etc. Then create contact type tables to identify the purposes of each of the rows in these tables, such AddressTypes, EmailTypes, PhoneTypes, etc.
Assuming every table should allow DELETE
In modern enterprise databases, it's extremely important to keep a lot of data forever and never allow it to be truly deleted from the database. Customers, Companies, Persons, Employees, Orders should never be deleted. Not only does deleting rows that have foreign key dependencies create complications that reverberate throughout your database, this may actually be illegal or a violation of company policy.
- Better: add a Deleted column (bit or datetime) that allows Soft Deletes, which is a software pattern of setting Deleted to a value, but leaving the row in the table. This allows all important data to remain in the database, foreign key constraints to not be violated, and data to be undeleted later. Yes, this requires a little extra work for applications to ensure they are only retrieved non-deleted, active rows, but it's worth it.
Naming the primary key column ID
This practice is outdated and allows too much ambiguity down the road in the application and when building complex queries, when you have a dozen tables joined, all with columns named ID.
- Better: name table key something related to the table name that can be reused as the foreign key column in other tables, such as PersonID, CompanyID, OrderID, EmployeeID, OrderItemID, SupplierID, CompanyLocationAddressID.
Creating Currency, Datetime, and Measurement columns with ambiguous units of measure (UoM)
This is just a disaster waiting to happen. Do not allow any ambiguity on the time zone or unit of measure when storing values of these types.
Better: add a UoM column for each measurement column. For example: (Price / PriceUoM), (Weight / WeightUoM), (OrderTotal / OrderTotalCurrency)
Best: Postfix the UoM onto the column name. Examples: PriceUSD, PaidEUR, LengthIN, MassG, WeightLB, DistanceKM, HeightCM
Date, Time, and DateTime columns that are not UTC
This goes along with the UoM point above, but it's bad practice to ever store date, time, or datetime values in any time zone other than UTC, for a multitude of reasons:
- Which time zone are you using: the client application, application server, cloud zone, API host, database server?
- Do you even know what time zone the system you're using is even set to? You'd be surprised at how many browsers, servers, and database hosts out there have never had their time zone set, to anything at all, and the original installer, container, or imaging system simply used the default (whatever that is, who knows). Using a central time server is the best option, but if this isn't available, always try to use UTC.
- Computer, Phone, Server, and database time zone settings get changed sometimes. Surprise! One day the server was set to UTC+6, then after a new admin started or software update, it's suddenly at UTC+0 or UTC-7:30.
Better: name your DateTime colums with a UTC postfix to signal to developers to only store UTC values in the database. Example: ModifedDateTimeUTC, OrderSubmittedDateTimeUTC, PaymentReceivedUTC.
Planning on adding new columns for every new future object property
This is more of a modern problem with relational databases and refers to the needs of software to quickly expand over time or provide dynamic properties for your table entities. In traditional normalized databases, every time a table entity (such as person) needed a new property, a column would be added to the schema. While this is fine if the number of columns is known and limited, this doesn't allow application administrators, business intelligence specialists, report creators, and superusers to create new properties whenever they want, an increasing requirement in modern software design.
Better: Normalized approach - Create an attributes or properties table to store the names of each additional properties, then an entity attributes table (PersonAttributes, CompanyProperties, CustomerProperties), etc to hold the normalized, scalable values. An example PersonAttributes table would have a structure like:
PersonAttributeID (int, primary key)
AttributeTypeID (int, => Attributes.AttributeTypeID)
PersonID (int, => Persons.PersonID)
AttributeValue (nvarchar)
Best: Denormalized approach - Create a single column in the table, such as Persons.Properties or Companies.Attributes to hold serialized (JSON, XML, etc) data that can then be used by the application(s) to store and retrieve data with a near-infinite size, structure, hierarchy, and number of properties.
Example Customers.Properties column value:
{
"OrderHabit": "1-2 orders per month",
"FormerEmployee": true,
"EmployedByCompetitor": null,
"SocialMedia": [
{
"Instagram": {
"Username": "-------",
"Followers": 712,
"FollowedByEmployeeIDs": [
5712,
9812,
332,
879
]
}
},
{
"Quora": {
"Username": "--------",
"Followers": 81,
"FollowedByEmployeeIDs": [
3245,
4156,
334,
1534,
1122,
1156,
2347,
234
]
}
}
]
}