A few years ago I inherited a codebase where the users table had no unique constraint on email. The model had validates :email, uniqueness: true, so everything seemed fine—until I found 847 duplicate email addresses in production.
Someone had written a rake task. It bypassed ActiveRecord. The model validation never ran, and the database didn’t care. Nearly a thousand users couldn’t log in because the system kept finding the wrong account.
This is the kind of bug that makes you question your career choices.
The conventional approach
Rails tutorials teach you to think about validation first. You write validates :email, presence: true, uniqueness: true in your model, run your tests, and move on. Maybe you add database constraints later if you remember.
This creates a false sense of security. The model validation is a suggestion. It only runs when you use ActiveRecord the normal way. Console sessions, rake tasks, raw SQL, bulk imports, that legacy microservice someone built in 2019—none of these care about your model validations.
When the database is treated as dumb storage, your data integrity depends on every piece of code going through the front door. That’s not a realistic assumption.
Why the database must lead
The database is the only layer that sees every write. It doesn’t matter if you’re using ActiveRecord, raw SQL, a different ORM, or some other app entirely. If you insert a row, the database constraints are checked.
This is why constraints like NOT NULL, unique indexes, and foreign keys exist. They’re not convenience features—they’re guarantees. A unique index on email means there will never be duplicate emails, period. No amount of buggy application code can create them.
When you put constraints in the database first, you’re making a statement about what your data must look like. Not what it should look like if everything goes right, but what it must look like regardless of how it got there.
Foreign keys are another obvious example. Without them, you can end up with posts pointing to users that don’t exist. With them, the database refuses to let that happen. You don’t have to remember to add dependent: :destroy or handle orphaned records—the problem can’t occur.
The cost of bad data
I want to be clear about what’s at stake here, because “data integrity” sounds abstract until you’ve lived with the alternative.
When your data can be in impossible states, every piece of code has to account for that. You write user&.email instead of user.email because you’ve learned the hard way that sometimes user_id points to nothing. You add rescue blocks around code that shouldn’t be able to fail. You check for nil in places where nil shouldn’t exist. The codebase accumulates defensive scar tissue.
Queries get weird. You can’t just SELECT * FROM orders WHERE user_id = ? because some of those orders belong to deleted users and you’ll get crashes downstream. So you add INNER JOIN users ON users.id = orders.user_id everywhere, or you add WHERE user_id IN (SELECT id FROM users), and now every query is slower and more complicated than it needs to be.
Debugging becomes archaeology. A bug report comes in: “this user can’t see their orders.” You check the orders table. The orders exist. You check the user. The user exists. You spend an hour before realizing the user_id on the orders is pointing to a different user who was deleted and whose ID got recycled. Or the email is duplicated across two accounts and they’re logged into the wrong one. Or the status field contains “shiped” instead of “shipped” because someone fat-fingered a console command three years ago.
Migrations become terrifying. You want to add null: false to a column that should never have been nullable. But it’s been nullable for two years, so now there’s garbage data in there. You have to write a one-off script to fix the existing rows before you can add the constraint—or worse, someone puts the data fix in the migration itself, and now your migrations are time bombs that behave differently depending on when they run. Sometimes you can’t even figure out what the correct value should be—the information is just gone.
New features get harder. You want to add a “subscription tier” to users. Simple enough, except you discover that 3% of your users have invalid subscription records, so now you have to handle that edge case in the new feature. And the next feature. And the one after that. The bad data is a tax on every piece of work you do.
The worst part is that none of this is visible until it’s too late. The app works fine when the data is clean. Tests pass because test data is clean. It’s only production—with years of accumulated writes from buggy code, rake tasks, console sessions, and that one time someone “fixed” something with raw SQL—where the impossible states live.
Every constraint you skip is a bet that nothing will ever write bad data through any path, forever. That’s a bad bet.
Why you still need model validations
So why not just skip model validations entirely? Let the database constraints handle everything?
Because catching errors at the database level is miserable.
When a unique index violation happens, you get ActiveRecord::RecordNotUnique. When a NOT NULL constraint fails, you get ActiveRecord::NotNullViolation. These are exceptions, not validation errors. Your controller code has to rescue them. Your form doesn’t show a nice error message—the user sees a 500 page or a generic error.
Model validations exist for user experience. They let you catch problems before the database query happens, return friendly error messages, and populate errors so forms can highlight which fields have issues.
class User < ApplicationRecord
validates :email, presence: true, uniqueness: true
end
user = User.new(email: nil)
user.valid? # => false
user.errors[:email] # => ["can't be blank"]
This is much better than rescuing exceptions and trying to parse error messages to figure out what went wrong.
The rule: mirror, don’t diverge
Here’s the principle: put constraints in the database first, then add model validations that mirror those constraints.
If the database says NOT NULL, the model says presence: true. If there’s a unique index, the model says uniqueness: true. If there’s a foreign key, the model has the association. They should match.
# migration
class CreateUsers < ActiveRecord::Migration[7.1]
def change
create_table :users do |t|
t.string :email, null: false
t.string :username, null: false
t.timestamps
end
add_index :users, :email, unique: true
add_index :users, :username, unique: true
end
end
# model
class User < ApplicationRecord
validates :email, presence: true, uniqueness: true
validates :username, presence: true, uniqueness: true
end
The model should never be more permissive than the database. If the model allows null emails but the database doesn’t, you’ll get exceptions in production. If the model allows duplicates but the database doesn’t, same thing.
You can make the model more restrictive—maybe you validate email format at the model level even though the database just has a NOT NULL. That’s fine. The database guarantees minimum integrity; the model can add business rules on top.
The problem is divergence. When the model says one thing and the database says another, you’re setting up future bugs. Someone will wonder why their save failed with a database exception when valid? returned true. Or worse, they’ll work around it in some creative way that makes the data model even more confusing.
Beyond the basics
The simple cases—NOT NULL and unique indexes—are obvious. But databases can enforce more than that, and the same principle applies: constrain it at the database level, then mirror it in the model.
Check constraints for allowed values
Status columns are a classic example. You have an order that can be pending, processing, shipped, or cancelled. Without a constraint, someone will inevitably insert shiped or Pending or an empty string.
# migration
class CreateOrders < ActiveRecord::Migration[7.1]
def change
create_table :orders do |t|
t.references :user, null: false, foreign_key: true
t.string :status, null: false, default: "pending"
t.decimal :total, precision: 10, scale: 2, null: false
t.timestamps
end
add_check_constraint :orders, "status IN ('pending', 'processing', 'shipped', 'cancelled')", name: "orders_status_check"
add_check_constraint :orders, "total >= 0", name: "orders_total_non_negative"
end
end
# model
class Order < ApplicationRecord
STATUSES = %w[pending processing shipped cancelled].freeze
belongs_to :user
validates :status, presence: true, inclusion: { in: STATUSES }
validates :total, presence: true, numericality: { greater_than_or_equal_to: 0 }
end
The check constraint guarantees the data. The model validation gives you errors[:status] with a message like “is not included in the list” instead of a database exception.
Note the total >= 0 constraint. You’d be surprised how often negative totals show up when there’s no constraint. Some bug in the discount calculation, a race condition in a refund flow, whatever. The database won’t let it happen.
Composite unique indexes
Sometimes uniqueness depends on scope. A user can have one subscription per plan, but could subscribe to multiple different plans. Or slugs need to be unique within an account, not globally.
# migration
class CreateSubscriptions < ActiveRecord::Migration[7.1]
def change
create_table :subscriptions do |t|
t.references :user, null: false, foreign_key: true
t.references :plan, null: false, foreign_key: true
t.datetime :expires_at
t.timestamps
end
add_index :subscriptions, [:user_id, :plan_id], unique: true
end
end
# model
class Subscription < ApplicationRecord
belongs_to :user
belongs_to :plan
validates :user_id, uniqueness: { scope: :plan_id, message: "already has this subscription" }
end
The composite index enforces that the combination is unique. The model validation mirrors it with scope: so you get a sensible error message.
Foreign keys with cascading behavior
Foreign keys do more than prevent orphaned records. They can define what happens when the parent is deleted.
# migration
class CreateComments < ActiveRecord::Migration[7.1]
def change
create_table :comments do |t|
t.references :post, null: false, foreign_key: { on_delete: :cascade }
t.references :user, null: false, foreign_key: { on_delete: :nullify }
t.text :body, null: false
t.timestamps
end
end
end
# model
class Comment < ApplicationRecord
belongs_to :post
belongs_to :user, optional: true
validates :body, presence: true
end
When a post is deleted, its comments are automatically deleted—no need for dependent: :destroy callbacks that might not run during bulk deletes. When a user is deleted, their comments stick around but user_id becomes null—preserving the content while removing the association.
The model reflects this: belongs_to :user, optional: true because the database allows null there after a cascade.
Partial indexes for conditional uniqueness
Sometimes you only want uniqueness to apply in certain conditions. Active records should be unique, but you don’t care about archived ones.
# migration
class AddSlugToProjects < ActiveRecord::Migration[7.1]
def change
add_column :projects, :slug, :string, null: false
add_column :projects, :archived, :boolean, null: false, default: false
add_index :projects, [:account_id, :slug], unique: true, where: "archived = false", name: "index_projects_unique_slug_when_active"
end
end
# model
class Project < ApplicationRecord
belongs_to :account
validates :slug, presence: true
validates :slug, uniqueness: { scope: :account_id }, unless: :archived?
end
The partial index only enforces uniqueness for non-archived projects. The model validation uses unless: :archived? to match. Archived projects can have duplicate slugs—maybe you want to reuse a slug after archiving the old project.
What about validations the database can’t express?
Not everything fits in a constraint. Email format validation, conditional requirements, cross-model validations—these can’t be expressed in most databases.
These belong in the model, or in form objects and command objects if the logic is complex enough. The key is being clear about what each layer does:
Database constraints handle data integrity. The shape of valid data, regardless of how it’s written.
Model validations mirror those constraints for better error handling, plus add business logic that only applies to normal application flow.
Form and command objects handle context-specific validation. Maybe a user can be created without a phone number through one flow but not another. That’s not data integrity—that’s workflow logic.
Trying to cram everything into check constraints gets ugly fast. A check constraint for email format is possible in Postgres, but now you’re maintaining regex in SQL and hoping it matches what your application expects. Usually not worth it.
Closing
The mental model is simple: database constraints are the contract, model validations are the friendly error messages.
Write the migration first. Add the NOT NULLs, the unique indexes, the foreign keys. Then write the model validations that mirror them. When in doubt, the database constraint wins—if you’re not sure whether something should be nullable, make it NOT NULL in the database and see what breaks.
Your data will thank you. Or at least, you won’t spend a weekend figuring out why there are 847 duplicate email addresses in production.