Terraform best practices multiple environments

What is Terraform?

Terraform is an open-source, vendor-neutral infrastructure as code (IaC) tool created by HashiCorp that enables developers to define, provision, and manage cloud and on-premises infrastructure using declarative configuration files. It uses HashiCorp Configuration Language (HCL) to automate the lifecycle of resources like servers, networks, and databases across providers (e.g., AWS, Azure, GCP).

How to Install Terraform?

HashiCorp distributes Terraform as an executable CLI that you can install on supported operating systems, including Microsoft Windows, macOS, and several Linux distributions. You can also compile the Terraform CLI from source if a pre-compiled binary is not available for your system.

Homebrew is a free and open-source package management system for macOS. If you have Homebrew installed, use it to install Terraform from your command line.

First, install the HashiCorp tap, which is Hashicorp's official repository of all our Homebrew packages.

$ brew tap hashicorp/tap

Now, install Terraform from hashicorp/tap/terraform.

$ brew install hashicorp/tap/terraform

Verify the Installation

Verify that the installation worked by opening a new terminal session and listing Terraform's available subcommands.

$ terraform -help
Usage: terraform [global options] <subcommand> [args]

The available commands for execution are listed below.
The primary workflow commands are given first, followed by
less common or more advanced commands.

Main commands:
##...

Add -help to any Terraform command to learn more about what it does and available options.

$ terraform plan -help

Key Features and Benefits

Infrastructure as Code (IaC): Instead of manual, manual configuration, infrastructure is defined in version-controlled text files, promoting collaboration and repeatability.

Declarative Approach: Users define the desired end state (e.g., "I need two servers"), and Terraform automatically determines the actions needed to achieve that state, unlike imperative scripting which requires defining specific, manual steps.

Cloud-Agnostic: A single tool and workflow can be used to manage multiple providers, reducing vendor lock-in.

State Management: Terraform maintains a terraform.tfstate file, which tracks the actual, deployed infrastructure, allowing it to detect "drift" and safely update or destroy resources.

Lifecycle Management: Automates the creation, modification, and deletion of infrastructure components.

How Terraform Works

Write: Define infrastructure in .tf configuration files using HCL.
Plan: Run terraform plan to compare the desired configuration with the current state and see a preview of changes.
Apply: Run terraform apply to execute the planned actions and make the actual infrastructure match the configuration.

Key Components

Providers: Plugins (e.g., AWS, Azure, Google Cloud) that communicate with cloud provider APIs.
Modules: Reusable, shared configuration templates that simplify complex, repetitive deployments

Terraform is widely used in DevOps pipelines to manage complex, multi-cloud, and hybrid environments efficiently and securely.

Terraform Project and Components

Within a single directory (a module), the following file naming conventions are recommended by HashiCorp for clarity:

main.tf: Contains the primary resource and data source definitions, as well as module calls.
variables.tf: Declares input variables with descriptions.
outputs.tf: Declares output values from the resources created.
providers.tf: Configures the required providers and their versions.
locals.tf: Contains local values for cleaner, more readable configurations.
backend.tf: Configures the remote backend for storing the state file securely and enabling collaboration.
terraform.tfvars: An optional file that assigns values to variables for the current environment.

Terraform automatically loads all .tf and .tfvars files in the working directory and processes them as a single configuration.

Common Project Structures

For multi-environment projects, teams typically use one of two main approaches:

1. Environment-based with consistent components

This approach uses a single set of configuration files for all environments but separates the variable values using specific tfvars files.

terraform/
├── main.tf
├── variables.tf
├── outputs.tf
├── providers.tf
├── backend.tf
├── environments/
│   ├── dev.tfvars
│   ├── staging.tfvars
│   └── prod.tfvars
└── modules/
    ├── network/
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    └── compute/
        ├── main.tf
        ├── variables.tf
        └── outputs.tf

Usage: You run Terraform commands from the root directory, specifying the environment's variable file (e.g., terraform plan -var-file="environments/dev.tfvars").
Pros: Keeps code DRY (Don't Repeat Yourself) as the main configuration is shared.
Cons: Less flexibility for per-environment customization (e.g., a resource existing in prod but not dev).

2. Environment-based with flexible components per environment

This structure provides complete isolation by using separate directories for each environment, with its own full set of .tf files and dedicated state file.

terraform/
├── modules/
│   ├── network/
│   └── compute/
├── dev/
│   ├── main.tf
│   ├── variables.tf
│   ├── terraform.tfvars
│   └── backend.tf
├── staging/
│   ├── main.tf
│   ├── variables.tf
│   ├── terraform.tfvars
│   └── backend.tf
└── prod/
    ├── main.tf
    ├── variables.tf
    ├── terraform.tfvars
    └── backend.tf

Usage: You navigate into the specific environment's directory to run commands (e.g., cd terraform/dev; terraform apply).
Pros: Limits the "blast radius" of changes to a single environment and allows for vastly different configurations or even different providers.
Cons: Involves code duplication, requiring more effort to manage consistency across environments.

Important Directories and Files to Ignore

When using version control (like Git), you must exclude certain files to prevent sensitive data exposure or conflicts:

.terraform/: A hidden directory used by Terraform to store cached plugins, modules, and other metadata. This is automatically managed by Terraform.
*.tfstate: The local state file. In real-world scenarios, remote backends should be used, but this file should still be ignored locally.
*.tfstate.backup: Backup state files created during operations.
*.tfvars or *.tfvars.json: Files containing sensitive variables should be kept out of source control. Environment variables or a secure vault should be used to manage secrets.
.terraform.lock.hcl: The dependency lock file, which should be committed to ensure consistent provider versions across runs.

Your summary is absolutely aligned with how Terraform must be run in enterprise environments.
Let me refine it into a production-grade reference architecture and add the missing piece most interviewers expect: cross-account access using AssumeRole on Amazon Web Services.

Enterprise Terraform Operating Model (Large Teams)

1. Remote State Is Non-Negotiable

Use a centralized backend to ensure:

Single source of truth
State locking (prevents concurrent applies)
Auditability
Disaster recovery via versioning

Recommended Backend (AWS Example)

S3 Bucket → State storage (versioned + encrypted)
DynamoDB Table → State locking
KMS Key → Encryption
Bucket Policy → Restricted to CI roles only

Backend Configuration

terraform {
  backend "s3" {
    bucket         = "org-terraform-state"
    key            = "network/prod.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

2. Multi-Account Strategy (Real Enterprise Pattern)

Large organizations NEVER deploy everything from one AWS account.

Recommended Layout

AWS Organization
│
├── Shared Services Account
│     ├── Terraform State Bucket
│     ├── CI/CD Runners
│     └── Logging / Security Tools
│
├── Network Account
│     └── VPC / Transit Gateway
│
├── Dev Account
├── Staging Account
└── Prod Account

Terraform runs from Shared Services Account and assumes roles into target accounts.

3. Cross-Account Access Using AssumeRole (Critical)

Instead of storing credentials, Terraform uses STS AssumeRole.

Provider Configuration

provider "aws" {
  region = "us-east-1"

  assume_role {
    role_arn = "arn:aws:iam::123456789012:role/TerraformExecutionRole"
  }
}

Target Account IAM Role

Allow only the CI system to assume:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::SHARED_ACCOUNT_ID:role/ci-runner"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Benefits:

No static credentials
Full audit trail (CloudTrail)
Easy revocation
Works across 100+ accounts

4. State Separation Strategy (Avoid Monolithic State)

Never keep everything in one state file.

Split by Responsibility

Stack	Owns
network	VPC, subnets, routing
security	IAM, guardrails
platform	EKS, databases
apps	Application resources

This reduces:

Apply time
Blast radius
Merge conflicts
Risk during failure

5. Mandatory Modular Design

Reusable modules must be versioned like software.

modules/
 ├── vpc/
 ├── eks/
 ├── rds/
 └── observability/

Each module:

Has inputs/outputs only
Contains no environment logic
Is version tagged (Git release)

6. CI/CD-Only Execution Model

Humans must never run terraform apply locally.

Pipeline should:

terraform fmt        → formatting gate
terraform validate   → syntax gate
security scan        → checkov / terrascan
terraform plan       → PR visibility
manual approval      → required
terraform apply      → controlled execution

This enforces:

Change traceability
Peer review culture
Compliance alignment

7. Secrets Must Come From Secret Managers

Never allow this:

password = "Hardcoded123"

Instead:

data "aws_secretsmanager_secret_version" "db" {
  secret_id = "prod/db/password"
}

Secrets never enter state as plain text when designed correctly.

8. Policy & Governance Layer

Large teams enforce guardrails using platforms from HashiCorp (Sentinel) or OPA.

Example Rules:

Deny public S3 buckets
Enforce tagging
Restrict instance sizes
Block unapproved regions

This converts Terraform into a governed platform, not just IaC.

9. Observability & Drift Detection

Add scheduled jobs:

terraform plan -detailed-exitcode

Detects:

Manual infra changes
Security drift
Cost leaks

10. Reference Architecture (What “Good” Looks Like)

                Developers
                    │
              Pull Request
                    │
                CI Pipeline
                    │
        ┌─────────────────────────┐
        │ Terraform Runner (Shared)
        └─────────────────────────┘
                    │ AssumeRole
   ┌───────────────┼────────────────┐
   │               │                │
Network Account  Dev Account     Prod Account
   │               │                │
   └──── Remote State (S3 + Locking)┘

What Interviewers Want to Hear

If you explain Terraform for large teams, they expect these keywords:

Remote state with locking
Cross-account AssumeRole model
CI-driven applies (no local execution)
State isolation by stack
Versioned reusable modules
Policy enforcement
Secret externalization
Drift detection strategy

Below is a complete enterprise-ready reference you can use in real implementation.

1. Reference Architecture Diagram + Repo Structure

Enterprise Terraform Architecture (Multi-Account Model)

                         Developers
                             │
                      Pull Request (Review)
                             │
                     CI/CD Pipeline Runner
                             │
                   (Assumes Deployment Role)
                             │
        ┌────────────────────────────────────────┐
        │        Shared Services Account         │
        │  - Remote State (S3)                   │
        │  - Lock Table (DynamoDB)               │
        │  - CI Runner / Audit Logs              │
        └────────────────────────────────────────┘
                   │            │             │
        AssumeRole │            │             │
                   ▼            ▼             ▼
            Network Account   Dev Account   Prod Account
            (VPC / TGW)       (Apps)        (Live Infra)

Key Idea:
Terraform runs centrally and assumes roles into each environment — no credentials stored anywhere.

Used heavily in enterprises running on Amazon Web Services.

Recommended Repository Layout

terraform-platform/
│
├── modules/                     # Reusable building blocks
│   ├── vpc/
│   ├── eks/
│   ├── rds/
│   └── iam/
│
├── live/                        # Environment-specific configs
│   ├── dev/
│   │   ├── network/
│   │   └── app/
│   │
│   ├── staging/
│   └── prod/
│
├── global/
│   └── backend-bootstrap/       # Creates S3 + locking table
│
└── policies/                    # Security guardrails

Module Design Principles

Each module must:

Be stateless
Accept only variables
Never reference environments
Be version tagged (v1.2.0)
Be reusable across accounts

Remote State Bootstrap (Run Once)

You create the backend first, then everything else consumes it.

This prevents circular dependency problems — a detail many candidates miss.

2. CI/CD Pipeline Example (Production Safe)

Terraform should always run via automation like GitHub Actions, not laptops.

Deployment Workflow

Developer Change → PR → Plan → Approval → Apply → Audit Log

Example Pipeline (terraform.yml)

name: Terraform Deploy

on:
  pull_request:
    branches: [ main ]

jobs:
  terraform-plan:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3

      - name: Terraform Init
        run: terraform init

      - name: Terraform Format Check
        run: terraform fmt -check

      - name: Terraform Validate
        run: terraform validate

      - name: Security Scan
        run: |
          pip install checkov
          checkov -d .

      - name: Terraform Plan
        run: terraform plan -out=tfplan

Apply Stage (After Manual Approval)

  terraform-apply:
    needs: terraform-plan
    runs-on: ubuntu-latest
    environment: production

    steps:
      - uses: actions/checkout@v4

      - name: Terraform Init
        run: terraform init

      - name: Terraform Apply
        run: terraform apply -auto-approve tfplan

Why This Matters

This ensures:

No uncontrolled changes
Full audit trail
RBAC via CI permissions
Zero credential leakage
Repeatable deployments

3. Migration Roadmap (Manual Infra → Terraform at Scale)

This is the real-world transformation plan companies expect senior engineers to know.

Phase 1 — Discovery (Do NOT Write Code Yet)

Inventory everything:

VPCs
Databases
IAM roles
Clusters
DNS
Secrets

Use read-only access to map dependencies.

Goal: Avoid breaking hidden integrations.

Phase 2 — Establish Terraform Foundation

Create only:

Remote backend
CI pipeline
IAM deployment roles

No resources managed yet.

This creates the “landing zone”.

Phase 3 — Import Existing Infrastructure

Bring resources under Terraform control safely:

terraform import aws_vpc.main vpc-xxxx
terraform import aws_db_instance.prod db-xxxx

Then write matching configuration.

Golden Rule: Import first, modify later.

Phase 4 — Modularization

Refactor imported configs into modules:

Before:

5000-line main.tf ❌

After:

modules/network
modules/data
modules/compute

Phase 5 — Introduce Guardrails

Add:

Policy checks (block public exposure)
Drift detection jobs
Cost visibility tags
Change approval workflow

Now Terraform becomes a governance system, not just IaC.

Phase 6 — Gradual Ownership Transition

Teams move from:

ClickOps → Controlled IaC → Self-Service Platform

Application teams consume modules instead of writing infra.

Common Migration Failure (Interview Trick Question)

Many teams try:

“Rewrite everything in Terraform.”

That causes outages.

Correct strategy:

Adopt → Import → Stabilize → Improve

How to Explain (30-Second Answer)

“For large-scale Terraform adoption we centralize remote state with locking, run all applies through CI using cross-account roles, split infrastructure into isolated stacks to limit blast radius, and migrate existing environments via import before modularizing into reusable, versioned components.”

Need Help is System Design Using Terrorform over AWS / GCP / Azure / DO etc.

https://kubeify.com/schedule-meeting