Terraform best practices multiple environments


What is Terraform?

Terraform is an open-source, vendor-neutral infrastructure as code (IaC) tool created by HashiCorp that enables developers to define, provision, and manage cloud and on-premises infrastructure using declarative configuration files. It uses HashiCorp Configuration Language (HCL) to automate the lifecycle of resources like servers, networks, and databases across providers (e.g., AWS, Azure, GCP).


How to Install Terraform?

HashiCorp distributes Terraform as an executable CLI that you can install on supported operating systems, including Microsoft Windows, macOS, and several Linux distributions. You can also compile the Terraform CLI from source if a pre-compiled binary is not available for your system.

Homebrew is a free and open-source package management system for macOS. If you have Homebrew installed, use it to install Terraform from your command line.

First, install the HashiCorp tap, which is Hashicorp's official repository of all our Homebrew packages.

$ brew tap hashicorp/tap

Now, install Terraform from hashicorp/tap/terraform.

$ brew install hashicorp/tap/terraform

Verify the Installation

Verify that the installation worked by opening a new terminal session and listing Terraform's available subcommands.

$ terraform -help
Usage: terraform [global options] <subcommand> [args]

The available commands for execution are listed below.
The primary workflow commands are given first, followed by
less common or more advanced commands.

Main commands:
##...

Add -help to any Terraform command to learn more about what it does and available options.

$ terraform plan -help


Key Features and Benefits
  • Infrastructure as Code (IaC): Instead of manual, manual configuration, infrastructure is defined in version-controlled text files, promoting collaboration and repeatability.
  • Declarative Approach: Users define the desired end state (e.g., "I need two servers"), and Terraform automatically determines the actions needed to achieve that state, unlike imperative scripting which requires defining specific, manual steps.
  • Cloud-Agnostic: A single tool and workflow can be used to manage multiple providers, reducing vendor lock-in.
  • State Management: Terraform maintains a terraform.tfstate file, which tracks the actual, deployed infrastructure, allowing it to detect "drift" and safely update or destroy resources.
  • Lifecycle Management: Automates the creation, modification, and deletion of infrastructure components.
How Terraform Works
  1. Write: Define infrastructure in .tf configuration files using HCL.
  2. Plan: Run terraform plan to compare the desired configuration with the current state and see a preview of changes.
  3. Apply: Run terraform apply to execute the planned actions and make the actual infrastructure match the configuration.
Key Components
  • Providers: Plugins (e.g., AWS, Azure, Google Cloud) that communicate with cloud provider APIs.
  • Modules: Reusable, shared configuration templates that simplify complex, repetitive deployments

Terraform is widely used in DevOps pipelines to manage complex, multi-cloud, and hybrid environments efficiently and securely.


Terraform Project and  Components
    Within a single directory (a module), the following file naming conventions are recommended by HashiCorp for clarity:
    • main.tf: Contains the primary resource and data source definitions, as well as module calls.
    • variables.tf: Declares input variables with descriptions.
    • outputs.tf: Declares output values from the resources created.
    • providers.tf: Configures the required providers and their versions.
    • locals.tf: Contains local values for cleaner, more readable configurations.
    • backend.tf: Configures the remote backend for storing the state file securely and enabling collaboration.
    • terraform.tfvars: An optional file that assigns values to variables for the current environment.
    Terraform automatically loads all .tf and .tfvars files in the working directory and processes them as a single configuration.
    Common Project Structures
    For multi-environment projects, teams typically use one of two main approaches:
    1. Environment-based with consistent components
    This approach uses a single set of configuration files for all environments but separates the variable values using specific tfvars files.
    terraform/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    ├── providers.tf
    ├── backend.tf
    ├── environments/
    │   ├── dev.tfvars
    │   ├── staging.tfvars
    │   └── prod.tfvars
    └── modules/
        ├── network/
        │   ├── main.tf
        │   ├── variables.tf
        │   └── outputs.tf
        └── compute/
            ├── main.tf
            ├── variables.tf
            └── outputs.tf
    
    • Usage: You run Terraform commands from the root directory, specifying the environment's variable file (e.g., terraform plan -var-file="environments/dev.tfvars").
    • Pros: Keeps code DRY (Don't Repeat Yourself) as the main configuration is shared.
    • Cons: Less flexibility for per-environment customization (e.g., a resource existing in prod but not dev).
    2. Environment-based with flexible components per environment
    This structure provides complete isolation by using separate directories for each environment, with its own full set of .tf files and dedicated state file.
    terraform/
    ├── modules/
    │   ├── network/
    │   └── compute/
    ├── dev/
    │   ├── main.tf
    │   ├── variables.tf
    │   ├── terraform.tfvars
    │   └── backend.tf
    ├── staging/
    │   ├── main.tf
    │   ├── variables.tf
    │   ├── terraform.tfvars
    │   └── backend.tf
    └── prod/
        ├── main.tf
        ├── variables.tf
        ├── terraform.tfvars
        └── backend.tf
    
    • Usage: You navigate into the specific environment's directory to run commands (e.g., cd terraform/dev; terraform apply).
    • Pros: Limits the "blast radius" of changes to a single environment and allows for vastly different configurations or even different providers.
    • Cons: Involves code duplication, requiring more effort to manage consistency across environments.
    Important Directories and Files to Ignore
    When using version control (like Git), you must exclude certain files to prevent sensitive data exposure or conflicts:
    • .terraform/: A hidden directory used by Terraform to store cached plugins, modules, and other metadata. This is automatically managed by Terraform.
    • *.tfstate: The local state file. In real-world scenarios, remote backends should be used, but this file should still be ignored locally.
    • *.tfstate.backup: Backup state files created during operations.
    • *.tfvars or *.tfvars.json: Files containing sensitive variables should be kept out of source control. Environment variables or a secure vault should be used to manage secrets.
    • .terraform.lock.hcl: The dependency lock file, which should be committed to ensure consistent provider versions across runs.

    Your summary is absolutely aligned with how Terraform must be run in enterprise environments.
    Let me refine it into a production-grade reference architecture and add the missing piece most interviewers expect: cross-account access using AssumeRole on Amazon Web Services.


    Enterprise Terraform Operating Model (Large Teams)

    1. Remote State Is Non-Negotiable

    Use a centralized backend to ensure:

    • Single source of truth

    • State locking (prevents concurrent applies)

    • Auditability

    • Disaster recovery via versioning

    Recommended Backend (AWS Example)

    • S3 Bucket → State storage (versioned + encrypted)

    • DynamoDB Table → State locking

    • KMS Key → Encryption

    • Bucket Policy → Restricted to CI roles only

    Backend Configuration

    terraform {
      backend "s3" {
        bucket         = "org-terraform-state"
        key            = "network/prod.tfstate"
        region         = "us-east-1"
        dynamodb_table = "terraform-locks"
        encrypt        = true
      }
    }
    

    2. Multi-Account Strategy (Real Enterprise Pattern)

    Large organizations NEVER deploy everything from one AWS account.

    Recommended Layout

    AWS Organization
    │
    ├── Shared Services Account
    │     ├── Terraform State Bucket
    │     ├── CI/CD Runners
    │     └── Logging / Security Tools
    │
    ├── Network Account
    │     └── VPC / Transit Gateway
    │
    ├── Dev Account
    ├── Staging Account
    └── Prod Account
    

    Terraform runs from Shared Services Account and assumes roles into target accounts.


    3. Cross-Account Access Using AssumeRole (Critical)

    Instead of storing credentials, Terraform uses STS AssumeRole.

    Provider Configuration

    provider "aws" {
      region = "us-east-1"
    
      assume_role {
        role_arn = "arn:aws:iam::123456789012:role/TerraformExecutionRole"
      }
    }
    

    Target Account IAM Role

    Allow only the CI system to assume:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::SHARED_ACCOUNT_ID:role/ci-runner"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    

    Benefits:

    • No static credentials

    • Full audit trail (CloudTrail)

    • Easy revocation

    • Works across 100+ accounts


    4. State Separation Strategy (Avoid Monolithic State)

    Never keep everything in one state file.

    Split by Responsibility

    StackOwns
    networkVPC, subnets, routing
    securityIAM, guardrails
    platformEKS, databases
    appsApplication resources

    This reduces:

    • Apply time

    • Blast radius

    • Merge conflicts

    • Risk during failure


    5. Mandatory Modular Design

    Reusable modules must be versioned like software.

    modules/
     ├── vpc/
     ├── eks/
     ├── rds/
     └── observability/
    

    Each module:

    • Has inputs/outputs only

    • Contains no environment logic

    • Is version tagged (Git release)


    6. CI/CD-Only Execution Model

    Humans must never run terraform apply locally.

    Pipeline should:

    terraform fmt        → formatting gate
    terraform validate   → syntax gate
    security scan        → checkov / terrascan
    terraform plan       → PR visibility
    manual approval      → required
    terraform apply      → controlled execution
    

    This enforces:

    • Change traceability

    • Peer review culture

    • Compliance alignment


    7. Secrets Must Come From Secret Managers

    Never allow this:

    password = "Hardcoded123"
    

    Instead:

    data "aws_secretsmanager_secret_version" "db" {
      secret_id = "prod/db/password"
    }
    

    Secrets never enter state as plain text when designed correctly.


    8. Policy & Governance Layer

    Large teams enforce guardrails using platforms from HashiCorp (Sentinel) or OPA.

    Example Rules:

    • Deny public S3 buckets

    • Enforce tagging

    • Restrict instance sizes

    • Block unapproved regions

    This converts Terraform into a governed platform, not just IaC.


    9. Observability & Drift Detection

    Add scheduled jobs:

    terraform plan -detailed-exitcode
    

    Detects:

    • Manual infra changes

    • Security drift

    • Cost leaks


    10. Reference Architecture (What “Good” Looks Like)

                    Developers
                        │
                  Pull Request
                        │
                    CI Pipeline
                        │
            ┌─────────────────────────┐
            │ Terraform Runner (Shared)
            └─────────────────────────┘
                        │ AssumeRole
       ┌───────────────┼────────────────┐
       │               │                │
    Network Account  Dev Account     Prod Account
       │               │                │
       └──── Remote State (S3 + Locking)┘
    


    What Interviewers Want to Hear

    If you explain Terraform for large teams, they expect these keywords:

    • Remote state with locking
    • Cross-account AssumeRole model
    • CI-driven applies (no local execution)
    • State isolation by stack
    • Versioned reusable modules
    • Policy enforcement
    • Secret externalization
    • Drift detection strategy



    Below is a complete enterprise-ready reference you can use in real implementation.


    1. Reference Architecture Diagram + Repo Structure

    Enterprise Terraform Architecture (Multi-Account Model)

                             Developers
                                 │
                          Pull Request (Review)
                                 │
                         CI/CD Pipeline Runner
                                 │
                       (Assumes Deployment Role)
                                 │
            ┌────────────────────────────────────────┐
            │        Shared Services Account         │
            │  - Remote State (S3)                   │
            │  - Lock Table (DynamoDB)               │
            │  - CI Runner / Audit Logs              │
            └────────────────────────────────────────┘
                       │            │             │
            AssumeRole │            │             │
                       ▼            ▼             ▼
                Network Account   Dev Account   Prod Account
                (VPC / TGW)       (Apps)        (Live Infra)
    

    Key Idea:
    Terraform runs centrally and assumes roles into each environment — no credentials stored anywhere.

    Used heavily in enterprises running on Amazon Web Services.


    Recommended Repository Layout

    terraform-platform/
    │
    ├── modules/                     # Reusable building blocks
    │   ├── vpc/
    │   ├── eks/
    │   ├── rds/
    │   └── iam/
    │
    ├── live/                        # Environment-specific configs
    │   ├── dev/
    │   │   ├── network/
    │   │   └── app/
    │   │
    │   ├── staging/
    │   └── prod/
    │
    ├── global/
    │   └── backend-bootstrap/       # Creates S3 + locking table
    │
    └── policies/                    # Security guardrails
    

    Module Design Principles

    Each module must:

    • Be stateless

    • Accept only variables

    • Never reference environments

    • Be version tagged (v1.2.0)

    • Be reusable across accounts


    Remote State Bootstrap (Run Once)

    You create the backend first, then everything else consumes it.

    This prevents circular dependency problems — a detail many candidates miss.


    2. CI/CD Pipeline Example (Production Safe)

    Terraform should always run via automation like GitHub Actions, not laptops.


    Deployment Workflow

    Developer Change → PR → Plan → Approval → Apply → Audit Log
    

    Example Pipeline (terraform.yml)

    name: Terraform Deploy
    
    on:
      pull_request:
        branches: [ main ]
    
    jobs:
      terraform-plan:
        runs-on: ubuntu-latest
    
        steps:
          - name: Checkout
            uses: actions/checkout@v4
    
          - name: Setup Terraform
            uses: hashicorp/setup-terraform@v3
    
          - name: Terraform Init
            run: terraform init
    
          - name: Terraform Format Check
            run: terraform fmt -check
    
          - name: Terraform Validate
            run: terraform validate
    
          - name: Security Scan
            run: |
              pip install checkov
              checkov -d .
    
          - name: Terraform Plan
            run: terraform plan -out=tfplan
    

    Apply Stage (After Manual Approval)

      terraform-apply:
        needs: terraform-plan
        runs-on: ubuntu-latest
        environment: production
    
        steps:
          - uses: actions/checkout@v4
    
          - name: Terraform Init
            run: terraform init
    
          - name: Terraform Apply
            run: terraform apply -auto-approve tfplan
    

    Why This Matters

    This ensures:

    • No uncontrolled changes

    • Full audit trail

    • RBAC via CI permissions

    • Zero credential leakage

    • Repeatable deployments


    3. Migration Roadmap (Manual Infra → Terraform at Scale)

    This is the real-world transformation plan companies expect senior engineers to know.


    Phase 1 — Discovery (Do NOT Write Code Yet)

    Inventory everything:

    • VPCs

    • Databases

    • IAM roles

    • Clusters

    • DNS

    • Secrets

    Use read-only access to map dependencies.

    Goal: Avoid breaking hidden integrations.


    Phase 2 — Establish Terraform Foundation

    Create only:

    • Remote backend

    • CI pipeline

    • IAM deployment roles

    No resources managed yet.

    This creates the “landing zone”.


    Phase 3 — Import Existing Infrastructure

    Bring resources under Terraform control safely:

    terraform import aws_vpc.main vpc-xxxx
    terraform import aws_db_instance.prod db-xxxx
    

    Then write matching configuration.

    Golden Rule: Import first, modify later.


    Phase 4 — Modularization

    Refactor imported configs into modules:

    Before:

    5000-line main.tf ❌
    

    After:

    modules/network
    modules/data
    modules/compute
    

    Phase 5 — Introduce Guardrails

    Add:

    • Policy checks (block public exposure)

    • Drift detection jobs

    • Cost visibility tags

    • Change approval workflow

    Now Terraform becomes a governance system, not just IaC.


    Phase 6 — Gradual Ownership Transition

    Teams move from:

    ClickOps → Controlled IaC → Self-Service Platform
    

    Application teams consume modules instead of writing infra.


    Common Migration Failure (Interview Trick Question)

    Many teams try:

    “Rewrite everything in Terraform.”

    That causes outages.

    Correct strategy:

    Adopt → Import → Stabilize → Improve


    How to Explain (30-Second Answer)

    “For large-scale Terraform adoption we centralize remote state with locking, run all applies through CI using cross-account roles, split infrastructure into isolated stacks to limit blast radius, and migrate existing environments via import before modularizing into reusable, versioned components.”


    Top 50 Terraform Interview Questions and Answers

    https://shyam.kubeify.com/2025/12/top-50-terraform-interview-questions.html


    Need Help is System Design Using Terrorform over AWS / GCP / Azure / DO etc.

    https://kubeify.com/schedule-meeting





    Popular posts from this blog

    What is the Difference Between K3s and K3d

    DevOps Learning Roadmap Beginner to Advanced

    Lightweight Kubernetes Options for local development on an Ubuntu machine

    How to Transfer GitHub Repository Ownership

    Open-Source Tools for Kubernetes Management

    Cloud Native Devops with Kubernetes-ebooks

    DevOps Engineer Tech Stack: Junior vs Mid vs Senior

    Apache Kafka: The Definitive Guide

    Setting Up a Kubernetes Dashboard on a Local Kind Cluster

    Use of Kubernetes in AI/ML Related Product Deployment