Invest in your seeds file - Josh Brody Invest in your seeds file | Josh Brody
Back

Invest in your seeds file

Every Rails project I inherit has the same seeds file: three admin users, two enum reference tables, and a commented-out block that “worked at some point.” The file hasn’t been run successfully in months. Nobody knows what state it expects or what state it produces.

Meanwhile, the development database is a graveyard of test records, half-finished features, and that one user account everyone shares because nobody remembers the password to the others. When something breaks, you spend twenty minutes trying to reproduce it with the right combination of data. When a new developer joins, they spend a day getting their database into a usable state.

This is backwards. Your seeds file should be a first-class citizen—maintained, tested, and capable of giving you a fully functional development environment in one command.

The original purpose of seeds

The seeds file was designed for data that every environment needs to function. Roles, permission levels, subscription plans, countries, currencies—reference data that the application assumes exists. This is legitimate and important. Your app shouldn’t boot without an admin role to assign.

But somewhere along the way, seeds became synonymous with only this reference data. Development environments got nothing else, and developers were left to create their own test records by hand or import sanitized production dumps.

I split these concerns. Reference data goes in seeds that run everywhere. Development data goes in seeds that only run in development. A custom rake task handles the routing:

# lib/tasks/seed.rake

namespace :db do
  namespace :seed do
    desc "Seed production reference data"
    task production: :environment do
      load Rails.root.join("db/seeds/production.rb")
    end

    desc "Seed test reference data"
    task test: :environment do
      load Rails.root.join("db/seeds/production.rb")
      load Rails.root.join("db/seeds/test.rb")
    end

    desc "Seed development with full sample data"
    task development: :environment do
      load Rails.root.join("db/seeds/production.rb")
      load Rails.root.join("db/seeds/development.rb")
    end
  end
end

# override default db:seed to run the appropriate environment
Rake::Task["db:seed"].clear
task "db:seed" => :environment do
  Rake::Task["db:seed:#{Rails.env}"].invoke
end

Production seeds create the reference data. Test seeds load production seeds plus any test-specific setup. Development seeds load production seeds plus all the realistic sample data you need to actually work.

The override at the bottom intercepts rake db:seed and routes it to the appropriate task based on Rails.env. Run rake db:seed in development and you get the full development experience. CI runs it and gets test seeds. Production gets only reference data. You can still call the specific tasks directly if you want to be explicit, but the default does the right thing.

The workflow that changes everything

Here’s what I want from my development environment:

rails db:reset

That’s it. After that command, I have a database with enough realistic data to work on any feature in the app. Users with different roles. Orders in various states. Subscriptions that are active, expired, and canceled. Edge cases I’ve hit before and will hit again.

When my database gets weird—and it will—I don’t debug it. I don’t try to fix the broken record. I nuke it and start fresh. Thirty seconds later I’m back to work with clean data.

This only works if your development seeds are actually good. Most aren’t.

What makes a seeds file good

A good seeds file creates a realistic environment. Not one user with one order, but a small ecosystem that exercises real code paths. Multiple users with different permissions. Records in various states. Enough data that your index pages have something to paginate.

It runs fast. If seeding takes five minutes, you won’t run it. If it takes ten seconds, you’ll run it whenever something feels off. I aim for under thirty seconds on a fresh database.

It’s idempotent where it matters. Running it twice shouldn’t create duplicate reference data. Users and test records can be duplicated—you’re about to nuke them anyway—but your roles, statuses, and categories should use find_or_create_by.

It handles dependencies correctly. If orders need users, and line items need orders, and orders need products, the seeds file creates them in the right order with real associations. No dangling foreign keys, no half-created records.

It works on a fresh database. This sounds obvious, but I’ve seen seeds files that assumed certain records already existed from a production dump. If it doesn’t work after rails db:create, it’s not a seeds file—it’s a prayer.

Structure for sanity

I organize seeds into sections that build on each other. The production seeds handle reference data:

# db/seeds/production.rb

puts "seeding roles and permissions..."
load Rails.root.join("db/seeds/roles.rb")

puts "seeding plans and features..."
load Rails.root.join("db/seeds/plans.rb")

The development seeds add everything else:

# db/seeds/development.rb

puts "seeding demo users..."
load Rails.root.join("db/seeds/users.rb")

puts "seeding sample data..."
load Rails.root.join("db/seeds/sample_data.rb")

Each file handles one concern. The roles file creates admin, member, and guest roles. The plans file creates your subscription tiers. The users file creates a handful of users with different roles and subscription states. The sample data file creates the bulk of the realistic data.

This structure means you can run parts independently when debugging, and the seed files read like a table of contents.

Reference data: find_or_create_by

For data that should exist exactly once—roles, statuses, categories—use find_or_create_by:

# db/seeds/roles.rb

Role.find_or_create_by!(name: "admin") do |role|
  role.permissions = ["manage_users", "manage_billing", "view_reports"]
end

Role.find_or_create_by!(name: "member") do |role|
  role.permissions = ["view_reports"]
end

Role.find_or_create_by!(name: "guest") do |role|
  role.permissions = []
end

The block only runs on create, so existing records keep their current values. This makes seeds idempotent for reference data—you can run them on a database that already has these records without duplicating anything.

Some people use upsert or upsert_all here. That works too, especially for large amounts of reference data. I prefer find_or_create_by! because it validates and runs callbacks, which catches mistakes early.

Users you’ll actually use

Create users you can log in as:

# db/seeds/users.rb

admin = User.find_or_create_by!(email: "admin@example.com") do |user|
  user.name = "Admin User"
  user.password = "password"
  user.role = Role.find_by!(name: "admin")
end

member = User.find_or_create_by!(email: "member@example.com") do |user|
  user.name = "Regular Member"
  user.password = "password"
  user.role = Role.find_by!(name: "member")
end

expired = User.find_or_create_by!(email: "expired@example.com") do |user|
  user.name = "Expired Subscription"
  user.password = "password"
  user.role = Role.find_by!(name: "member")
  user.subscription_ends_at = 1.month.ago
end

I use password as the password for all seed users. No, it doesn’t matter—this is development data that gets nuked regularly. If someone compromises your development database, you have bigger problems.

The important thing is knowing what users exist and what states they’re in. I keep a list in the seeds file or in a doc:

# Available test users:
# - admin@example.com / password - full admin access
# - member@example.com / password - regular member
# - expired@example.com / password - expired subscription
# - unconfirmed@example.com / password - hasn't confirmed email

When a new developer joins, they know immediately how to log in and what they can test.

Realistic sample data

This is where most seeds files fall short. They create the minimum viable data and call it done. But you don’t work with minimum viable data—you work with the mess that accumulates over months of real usage.

Create enough data to exercise real code paths:

# db/seeds/sample_data.rb

member = User.find_by!(email: "member@example.com")

# orders in various states
5.times do
  order = member.orders.create!(
    status: "pending",
    created_at: rand(30).days.ago
  )
  rand(1..5).times do
    order.line_items.create!(
      product: Product.all.sample,
      quantity: rand(1..3)
    )
  end
end

3.times do
  order = member.orders.create!(
    status: "shipped",
    created_at: rand(60..90).days.ago,
    shipped_at: rand(30..60).days.ago
  )
  rand(1..5).times do
    order.line_items.create!(
      product: Product.all.sample,
      quantity: rand(1..3)
    )
  end
end

# a canceled order
canceled = member.orders.create!(
  status: "canceled",
  created_at: 2.months.ago,
  canceled_at: 2.months.ago + 1.day
)
canceled.line_items.create!(
  product: Product.first,
  quantity: 1
)

Now when you’re working on the orders index page, you have orders to look at. When you’re working on the cancellation flow, there’s a canceled order to inspect. When you’re debugging a date filter, there are orders with realistic timestamps.

Edge cases you’ve hit before

Every time you hit a weird edge case in production, add it to your seeds. That user who somehow has two active subscriptions? Create one. The order with a negative line item from a bug you fixed six months ago? Seed it.

# edge case: user with overlapping subscriptions (bug from JIRA-1234)
edge_case_user = User.find_or_create_by!(email: "overlap@example.com") do |user|
  user.name = "Overlapping Subscriptions"
  user.password = "password"
  user.role = Role.find_by!(name: "member")
end

edge_case_user.subscriptions.create!(
  plan: Plan.find_by!(name: "basic"),
  starts_at: 2.months.ago,
  ends_at: 1.month.from_now
)
edge_case_user.subscriptions.create!(
  plan: Plan.find_by!(name: "premium"),
  starts_at: 1.month.ago,
  ends_at: 2.months.from_now
)

Your seeds file becomes a living document of the weird states your data can be in. When you’re working on a feature that touches subscriptions, you can test against the overlapping case without setting it up manually.

Speed matters

If seeding is slow, you won’t do it. A few techniques help.

Batch inserts work well for bulk data. If you’re creating a thousand products for realistic pagination testing, use insert_all:

products = 1000.times.map do |i|
  {
    name: "Product #{i}",
    price_cents: rand(100..10000),
    created_at: Time.current,
    updated_at: Time.current
  }
end
Product.insert_all(products)

This skips validations and callbacks, so only use it for bulk data where that’s acceptable.

You can also disable callbacks temporarily. If your User model sends welcome emails on create, you don’t want that during seeding:

User.skip_callback(:create, :after, :send_welcome_email)
# ... create users ...
User.set_callback(:create, :after, :send_welcome_email)

Whatever you do, please don’t:

after_create :send_welcome_email, unless: -> { Rails.env.development? && ENV["SEEDING"] }

Finally, avoid N+1 patterns. Load what you need upfront:

products = Product.all.to_a
users = User.where(role: Role.find_by!(name: "member")).to_a

users.each do |user|
  # use products.sample instead of Product.all.sample
end

Factories vs seeds

I use both, and they serve different purposes.

Factories (FactoryBot, Fabrication) are for tests. They create the minimum data needed to exercise a specific code path. They’re designed to be composed and overridden per-test.

Seeds are for development. They create a realistic environment that you work in day-to-day. They’re designed to be run once and give you everything you need.

You can use factories in your seeds:

# db/seeds/sample_data.rb
require Rails.root.join("spec/support/factory_bot")

10.times { FactoryBot.create(:order, :shipped) }
5.times { FactoryBot.create(:order, :canceled) }

This reuses your factory definitions so you’re not duplicating the logic for creating valid orders. But you’re still curating which states exist and how many, which is the seeds file’s job.

The payoff

Once your seeds are good, the workflow becomes:

  1. Something’s weird with my data? rails db:reset. Fixed.
  2. New developer joins? rails db:setup. They’re ready.
  3. Major migration coming? Test it on seed data first.
  4. Feature branch with data model changes? Rebase, rails db:reset, keep working.

You stop thinking about your development database as a precious artifact that must be preserved. It’s disposable. It’s reproducible. When it causes friction, you throw it away and make a new one.

This sounds like a small thing, but it compounds. Every time you’d have spent ten minutes debugging bad data or thirty minutes helping a new developer get their database into shape, you spend thirty seconds instead. Those minutes add up to hours over a project’s lifetime.

The investment is maintaining your seeds file as you build features. When you add a new model, add seed data for it. When you hit a weird edge case, add it. When the seeds break, fix them immediately—broken seeds are broken infrastructure.

Your seeds file is documentation of what your app’s data looks like. It’s an onboarding tool. It’s a reset button. Invest in it.

Stay in the loop

Occasional essays on design, tools, and the craft of building things. No spam, unsubscribe anytime.

Ambient weather

The background of this site reflects the current weather and time of day in Saint Paul. The orbs shift in color and behavior based on what's happening outside my window.

Learn more about how this works