Terraform best practices multiple environments
What is Terraform?
Terraform is an open-source, vendor-neutral infrastructure as code (IaC) tool created by HashiCorp that enables developers to define, provision, and manage cloud and on-premises infrastructure using declarative configuration files. It uses HashiCorp Configuration Language (HCL) to automate the lifecycle of resources like servers, networks, and databases across providers (e.g., AWS, Azure, GCP).
How to Install Terraform?
HashiCorp distributes Terraform as an executable CLI that you can install on supported operating systems, including Microsoft Windows, macOS, and several Linux distributions. You can also compile the Terraform CLI from source if a pre-compiled binary is not available for your system.
Homebrew is a free and open-source package management system for macOS. If you have Homebrew installed, use it to install Terraform from your command line.
First, install the HashiCorp tap, which is Hashicorp's official repository of all our Homebrew packages.
$ brew tap hashicorp/tap
Now, install Terraform from hashicorp/tap/terraform.
$ brew install hashicorp/tap/terraform
Verify the Installation
Verify that the installation worked by opening a new terminal session and listing Terraform's available subcommands.
$ terraform -help
Usage: terraform [global options] <subcommand> [args]
The available commands for execution are listed below.
The primary workflow commands are given first, followed by
less common or more advanced commands.
Main commands:
##...
Add -help to any Terraform command to learn more about what it does and available options.
$ terraform plan -help- Infrastructure as Code (IaC): Instead of manual, manual configuration, infrastructure is defined in version-controlled text files, promoting collaboration and repeatability.
- Declarative Approach: Users define the desired end state (e.g., "I need two servers"), and Terraform automatically determines the actions needed to achieve that state, unlike imperative scripting which requires defining specific, manual steps.
- Cloud-Agnostic: A single tool and workflow can be used to manage multiple providers, reducing vendor lock-in.
- State Management: Terraform maintains a terraform.tfstate file, which tracks the actual, deployed infrastructure, allowing it to detect "drift" and safely update or destroy resources.
- Lifecycle Management: Automates the creation, modification, and deletion of infrastructure components.
- Write: Define infrastructure in
.tfconfiguration files using HCL. - Plan: Run
terraform planto compare the desired configuration with the current state and see a preview of changes. - Apply: Run
terraform applyto execute the planned actions and make the actual infrastructure match the configuration.
- Providers: Plugins (e.g., AWS, Azure, Google Cloud) that communicate with cloud provider APIs.
- Modules: Reusable, shared configuration templates that simplify complex, repetitive deployments
main.tf: Contains the primary resource and data source definitions, as well as module calls.variables.tf: Declares input variables with descriptions.outputs.tf: Declares output values from the resources created.providers.tf: Configures the required providers and their versions.locals.tf: Contains local values for cleaner, more readable configurations.backend.tf: Configures the remote backend for storing the state file securely and enabling collaboration.terraform.tfvars: An optional file that assigns values to variables for the current environment.
.tf and .tfvars files in the working directory and processes them as a single configuration.tfvars files.- Usage: You run Terraform commands from the root directory, specifying the environment's variable file (e.g.,
terraform plan -var-file="environments/dev.tfvars"). - Pros: Keeps code DRY (Don't Repeat Yourself) as the main configuration is shared.
- Cons: Less flexibility for per-environment customization (e.g., a resource existing in
prodbut notdev).
.tf files and dedicated state file.- Usage: You navigate into the specific environment's directory to run commands (e.g.,
cd terraform/dev; terraform apply). - Pros: Limits the "blast radius" of changes to a single environment and allows for vastly different configurations or even different providers.
- Cons: Involves code duplication, requiring more effort to manage consistency across environments.
.terraform/: A hidden directory used by Terraform to store cached plugins, modules, and other metadata. This is automatically managed by Terraform.*.tfstate: The local state file. In real-world scenarios, remote backends should be used, but this file should still be ignored locally.*.tfstate.backup: Backup state files created during operations.*.tfvarsor*.tfvars.json: Files containing sensitive variables should be kept out of source control. Environment variables or a secure vault should be used to manage secrets..terraform.lock.hcl: The dependency lock file, which should be committed to ensure consistent provider versions across runs.
Your summary is absolutely aligned with how Terraform must be run in enterprise environments.
Let me refine it into a production-grade reference architecture and add the missing piece most interviewers expect: cross-account access using AssumeRole on Amazon Web Services.
Enterprise Terraform Operating Model (Large Teams)
1. Remote State Is Non-Negotiable
Use a centralized backend to ensure:
Single source of truth
State locking (prevents concurrent applies)
Auditability
Disaster recovery via versioning
Recommended Backend (AWS Example)
S3 Bucket → State storage (versioned + encrypted)
DynamoDB Table → State locking
KMS Key → Encryption
Bucket Policy → Restricted to CI roles only
Backend Configuration
terraform {
backend "s3" {
bucket = "org-terraform-state"
key = "network/prod.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
2. Multi-Account Strategy (Real Enterprise Pattern)
Large organizations NEVER deploy everything from one AWS account.
Recommended Layout
AWS Organization
│
├── Shared Services Account
│ ├── Terraform State Bucket
│ ├── CI/CD Runners
│ └── Logging / Security Tools
│
├── Network Account
│ └── VPC / Transit Gateway
│
├── Dev Account
├── Staging Account
└── Prod Account
Terraform runs from Shared Services Account and assumes roles into target accounts.
3. Cross-Account Access Using AssumeRole (Critical)
Instead of storing credentials, Terraform uses STS AssumeRole.
Provider Configuration
provider "aws" {
region = "us-east-1"
assume_role {
role_arn = "arn:aws:iam::123456789012:role/TerraformExecutionRole"
}
}
Target Account IAM Role
Allow only the CI system to assume:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::SHARED_ACCOUNT_ID:role/ci-runner"
},
"Action": "sts:AssumeRole"
}
]
}
Benefits:
No static credentials
Full audit trail (CloudTrail)
Easy revocation
Works across 100+ accounts
4. State Separation Strategy (Avoid Monolithic State)
Never keep everything in one state file.
Split by Responsibility
| Stack | Owns |
|---|---|
| network | VPC, subnets, routing |
| security | IAM, guardrails |
| platform | EKS, databases |
| apps | Application resources |
This reduces:
Apply time
Blast radius
Merge conflicts
Risk during failure
5. Mandatory Modular Design
Reusable modules must be versioned like software.
modules/
├── vpc/
├── eks/
├── rds/
└── observability/
Each module:
Has inputs/outputs only
Contains no environment logic
Is version tagged (Git release)
6. CI/CD-Only Execution Model
Humans must never run terraform apply locally.
Pipeline should:
terraform fmt → formatting gate
terraform validate → syntax gate
security scan → checkov / terrascan
terraform plan → PR visibility
manual approval → required
terraform apply → controlled execution
This enforces:
Change traceability
Peer review culture
Compliance alignment
7. Secrets Must Come From Secret Managers
Never allow this:
password = "Hardcoded123"
Instead:
data "aws_secretsmanager_secret_version" "db" {
secret_id = "prod/db/password"
}
Secrets never enter state as plain text when designed correctly.
8. Policy & Governance Layer
Large teams enforce guardrails using platforms from HashiCorp (Sentinel) or OPA.
Example Rules:
Deny public S3 buckets
Enforce tagging
Restrict instance sizes
Block unapproved regions
This converts Terraform into a governed platform, not just IaC.
9. Observability & Drift Detection
Add scheduled jobs:
terraform plan -detailed-exitcode
Detects:
Manual infra changes
Security drift
Cost leaks
10. Reference Architecture (What “Good” Looks Like)
Developers
│
Pull Request
│
CI Pipeline
│
┌─────────────────────────┐
│ Terraform Runner (Shared)
└─────────────────────────┘
│ AssumeRole
┌───────────────┼────────────────┐
│ │ │
Network Account Dev Account Prod Account
│ │ │
└──── Remote State (S3 + Locking)┘
What Interviewers Want to Hear
If you explain Terraform for large teams, they expect these keywords:
- Remote state with locking
- Cross-account AssumeRole model
- CI-driven applies (no local execution)
- State isolation by stack
- Versioned reusable modules
- Policy enforcement
- Secret externalization
- Drift detection strategy
Below is a complete enterprise-ready reference you can use in real implementation.
1. Reference Architecture Diagram + Repo Structure
Enterprise Terraform Architecture (Multi-Account Model)
Developers
│
Pull Request (Review)
│
CI/CD Pipeline Runner
│
(Assumes Deployment Role)
│
┌────────────────────────────────────────┐
│ Shared Services Account │
│ - Remote State (S3) │
│ - Lock Table (DynamoDB) │
│ - CI Runner / Audit Logs │
└────────────────────────────────────────┘
│ │ │
AssumeRole │ │ │
▼ ▼ ▼
Network Account Dev Account Prod Account
(VPC / TGW) (Apps) (Live Infra)
Key Idea:
Terraform runs centrally and assumes roles into each environment — no credentials stored anywhere.
Used heavily in enterprises running on Amazon Web Services.
Recommended Repository Layout
terraform-platform/
│
├── modules/ # Reusable building blocks
│ ├── vpc/
│ ├── eks/
│ ├── rds/
│ └── iam/
│
├── live/ # Environment-specific configs
│ ├── dev/
│ │ ├── network/
│ │ └── app/
│ │
│ ├── staging/
│ └── prod/
│
├── global/
│ └── backend-bootstrap/ # Creates S3 + locking table
│
└── policies/ # Security guardrails
Module Design Principles
Each module must:
Be stateless
Accept only variables
Never reference environments
Be version tagged (
v1.2.0)Be reusable across accounts
Remote State Bootstrap (Run Once)
You create the backend first, then everything else consumes it.
This prevents circular dependency problems — a detail many candidates miss.
2. CI/CD Pipeline Example (Production Safe)
Terraform should always run via automation like GitHub Actions, not laptops.
Deployment Workflow
Developer Change → PR → Plan → Approval → Apply → Audit Log
Example Pipeline (terraform.yml)
name: Terraform Deploy
on:
pull_request:
branches: [ main ]
jobs:
terraform-plan:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: Terraform Init
run: terraform init
- name: Terraform Format Check
run: terraform fmt -check
- name: Terraform Validate
run: terraform validate
- name: Security Scan
run: |
pip install checkov
checkov -d .
- name: Terraform Plan
run: terraform plan -out=tfplan
Apply Stage (After Manual Approval)
terraform-apply:
needs: terraform-plan
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Terraform Init
run: terraform init
- name: Terraform Apply
run: terraform apply -auto-approve tfplan
Why This Matters
This ensures:
No uncontrolled changes
Full audit trail
RBAC via CI permissions
Zero credential leakage
Repeatable deployments
3. Migration Roadmap (Manual Infra → Terraform at Scale)
This is the real-world transformation plan companies expect senior engineers to know.
Phase 1 — Discovery (Do NOT Write Code Yet)
Inventory everything:
VPCs
Databases
IAM roles
Clusters
DNS
Secrets
Use read-only access to map dependencies.
Goal: Avoid breaking hidden integrations.
Phase 2 — Establish Terraform Foundation
Create only:
Remote backend
CI pipeline
IAM deployment roles
No resources managed yet.
This creates the “landing zone”.
Phase 3 — Import Existing Infrastructure
Bring resources under Terraform control safely:
terraform import aws_vpc.main vpc-xxxx
terraform import aws_db_instance.prod db-xxxx
Then write matching configuration.
Golden Rule: Import first, modify later.
Phase 4 — Modularization
Refactor imported configs into modules:
Before:
5000-line main.tf ❌
After:
modules/network
modules/data
modules/compute
Phase 5 — Introduce Guardrails
Add:
Policy checks (block public exposure)
Drift detection jobs
Cost visibility tags
Change approval workflow
Now Terraform becomes a governance system, not just IaC.
Phase 6 — Gradual Ownership Transition
Teams move from:
ClickOps → Controlled IaC → Self-Service Platform
Application teams consume modules instead of writing infra.
Common Migration Failure (Interview Trick Question)
Many teams try:
“Rewrite everything in Terraform.”
That causes outages.
Correct strategy:
Adopt → Import → Stabilize → Improve
How to Explain (30-Second Answer)
“For large-scale Terraform adoption we centralize remote state with locking, run all applies through CI using cross-account roles, split infrastructure into isolated stacks to limit blast radius, and migrate existing environments via import before modularizing into reusable, versioned components.”
Top 50 Terraform Interview Questions and Answers
https://shyam.kubeify.com/2025/12/top-50-terraform-interview-questions.html
Need Help is System Design Using Terrorform over AWS / GCP / Azure / DO etc.
https://kubeify.com/schedule-meeting
