Having used Ansible to easily mange AWS, I have been horrified by how my new org using Terraform for AWS, precisely because TF maintains state. It is both hard to develop in a team and a security nightmare.
They diligently recorded all secrets in AWS Secrets Manager. They used IAM roles and policies to control who can see which environment’s secrets. Great!
But then I discovered all the secrets in PLAIN TEXT dumped into the state file?! And this is known behavior that Hashicorp defends as reasonable. They suggest you just encrypt the state file. So the next time there’s a big oil spill, let’s just throw a big blanket over it and call it a day.
Terraform’s solution swiftly obliterates all the audit, key rotation, separation of duties built into secrets manager. Indeed, what is the point of using Secrets Manager at all if you use Terraform?
Next, they recorded state in a hit repo. So all those secrets are committed to the repo. So now how am I supposed to encrypt the state file? What a disaster.
But wait. There’s more. With the team growing, how are we supposed to manage shared resources? Do I have to run tf apply, wait for completion, and then immediately commit and push, and hope I don’t have to manually manage a state file merge conflict? Or should I use some bizarre, self-built mutex?
Ugh. Terraform is the worst. I was always unimpressed by its feature set and documentation. Now I hate it. I don’t understand why it is so popular which there are such better alternatives.
Why on earth is your state in git? The tool has built-in functionality to handle just these kinds of workflows. This reads a lot like hitting your thumb with the hammer and blaming hammers.
If that then create a separate locked down Git repo just for this. Protecting your state file was a big deal when I first reading about Terraform. It was really drilled in.
And that's why many people don't like the idea of a state file. Sure there are benefits, but there are also drawbacks. You now need another system to manage your state. You don't with ansible.
Ansible is a different system, with a subtly different use case. It generally manages a preexisting list of targets. In that sense, there is some initial "state" in Ansible, this being your inventory.
Terraform (or CloudFormation, or Pulumi, or Crossplane, for that matter) shine when you need to create resources. Think of the state as the inventory of what you've created (or imported).
If you think of the resource you are managing with ansible being your AWS account (or your VMWare system, or whatever), then I guess it makes more sense. That state (the account you manage) doesn't really change. (I don't use ansible but that is my understanding)
Having 3 different sources of truth (what is, in AWS, what should be, in the .tf, and something else -- in the statefile) can mean nasty 3 way merges, which i
But I don't manage thousands of different resources, I manage 50. It feels to me that the overhead needed to manage thousand struggles to scale down without bringing all the required baggage. It feels like kubernetes vs docker-compose.
That said, the concept of using an S3 bucked for storing state I saw elsewhere in these comments is an interesting idea so I may revisit terraform.
> But wait. There’s more. With the team growing, how are we supposed to manage shared resources? Do I have to run tf apply, wait for completion, and then immediately commit and push, and hope I don’t have to manually manage a state file merge conflict? Or should I use some bizarre, self-built mutex?
In one project, we had Terraform run from GitLab CI. The CI was creating a plan. The reviewer had to approve applying that plan.
I'm curious about your usage of Ansible for AWS resources. How do you delete them? Do you have a policy of always having 'state: absent' for at least one commit?
The usual practice is to keep the terraform state in an encrypted S3 bucket tucked away in a separate account (CI/CD, management or similar), with IAM policies controlling who can actually access the terraform state file in a cross-account setup. Limited access to the bucket with the terraform state can be controlled via the S3 bucket access IAM policy. Typically, there is an overarching, cross-account organisational IAM role that controls such access.
Each terraform project typically gets its own dedicated state bucket. Sharing the same S3 bucket for multiple solutions is unusual.
The encrypted S3 bucket persisting the terraform state file has to have the S3 versioning enabled. If one stores the terraform state in git, they are cooking it wrong.
Static and third party provided secrets are stored manually in the Parameter Store and are sourced via «data» blocks in terraform programmatically. Access to the secrets is controlled via the appropriate IAM roles and policies. Access to the secrets by humans is by exemption that is attached to a (typically) SSO role associated with an access group in the organisation's own IDP. This is no different from the non-AWS secret management solutions and tools.
Ephemeral or «low value»[0] secrets in the Secrets Manager can be rotated daily (e.g. every 24 hours) – to discourage the manual access to, storage of and reliance on them as well as to encourage to retrieve the secrets programmatically.
terraform is a vehicle to get from point A to point B, and is not an AI nor a substitute for the knowledge of the platform.
The terraform documentation is excellent, by the way. It is no replacement for the knowledge of AWS though.
For secrets or credentials, I've had a reasonable experience putting the name of the secret into a terraform resource, but then setting the secret's actual value outside of terraform (i.e. via web or cli).
This avoids storing the secret's value in the state file (it's stored as an empty string) and also keeps the terraform plan clean.
As soon as I read that Ansible was easier than Terraform, I knew you were probably doing something wrong.
Ansible is fine to be clear, for managing things like configuration within compute instances and databases (such as user accounts for example). It is good for this, but for building and maintaining the creation and lifecycle of the infrastructure itself, it is the wrong tool for the job.
First of all, you are using local state in Terraform which is a terrible, terrible idea. You are already seeing why based on your problems with it. I've never seen an organization that actually works on a team with local state backends. You need to use a remote state backend. The most popular is to just use a provider like S3 (or equivilant with other clouds). The state file is pulled from there at plan/apply-time and then pushed back up there when complete. Then everyone is always getting the latest statefile and it is shared automatically without needing to commit it.
This solves other problems too. Like what if two people try to apply at the same time? Well that is why state-locking exists. Before the remote state file is pulled down, it is "checked out" and locked. Now only that person can use the state file until it is checked back in and unlocked. THis happens seamlessly in Terraform once you set up a state lock backend. You still just run `terraform apply` and all this happens seamlessly. Now if someone else tries to do it while you're already updating things, then their cli will wait until you are complete. THen they will end up pulling down all of the changes you just made. If they were running the same code as you, then it would say there is nothing to update. Easy.
Encryption of your state file also happens this way, by encrypting the remote backend. It doesn't solve local encryption and this is admittedly a problem with Terraform which Hashicorp has refused to address, but is already being addressed with OpenTofu. But as a result, secrets should simply not be in your state file. You should use a provider like secrets manager to do this for you and then you only have references to your secrets in state, instead of the secrets itself. This is simply a known rule just like you don't commit anything secret to a git repo, you don't put anything secret in your terraform. It's the same way.
Lastly, state versioning. This is accomplished through your remote backend state provider too. We use S3 at work, so we get native versioning there and we can (and have) rolled back state in emergencies (although this itself is not a great practice, just like how you should never edit previous commits in git), but it can be done and preserved this way as a safety mechanism.
As for your auditing and key rotation of secrets, it once again sounds like you are doing this wrong. You shouldn't be updating the secret keys themselves in Terraform. You should only be creating the secrets and passing around references to them in Terraform. This requires a better understanding of how Secrets Manager works. Creating a secret in AWS for example only creates an empty "repo" for a secret. It has an id and metadata, but no inherit value. This is what you do in Terraform. Adding the value to the secret is a seperate (hopefully automated) process. The secret value itself should be hidden from honestly all of your users (even yourself). Rotation should happen automatically with Lambdas or automated jobs of some kind. You could even do something with ansible for this. You rotate the secret value, but the secret reference doesn't change and everything should continue to work with secret versioning and some proper architecture on your side. This should NOT be done in terraform. Auditing your secrets again, is not a job of Terraform. This is a seperate process from IaC. Terraform is IaC. Use the right tool for the job.
State merge conflicts should never happen as you fear because it is wholly managed by the terraform binary. Yes you might have to resolve state conflicts where someone added too much drift outside of Terraform and you need to fix it, but the cli provides tools for making these changes and you shouldn't edit them yourself. Using these tools (like `terraform state mv` or `terraform state rm` or `terraform import`) should resolve state conflict without causing any sort of merge conflict. The state file (with proper state locking) will only ever be edited by the tf binary and only by one person at a time.
So I wouldn't hate on Terraform just yet. It sounds like you and your whole team is using it wrong.
This is a very informative reply and I appreciate it. With the improvements you suggest, many of which I had already started to make, we could mitigate many of the
issues.
However, these issues are just a few of the most recent reasons I dislike TF. I’ve never been impressed by TF documentation. It always comes up lacking for me. Subjective, I know.
Similarly subjectively, when I was looking around about 6 years ago, TF had way less coverage of AWS resources than Ansible.
Basically, my experience is that the level of thought and quality of engineering that went into Terraform is way less than Ansible. And I am annoyed because somehow Teraform won in spite of that. My hope is Terraform’s licensing change will fracture the market and the next thing will emerge.
They diligently recorded all secrets in AWS Secrets Manager. They used IAM roles and policies to control who can see which environment’s secrets. Great!
But then I discovered all the secrets in PLAIN TEXT dumped into the state file?! And this is known behavior that Hashicorp defends as reasonable. They suggest you just encrypt the state file. So the next time there’s a big oil spill, let’s just throw a big blanket over it and call it a day.
Terraform’s solution swiftly obliterates all the audit, key rotation, separation of duties built into secrets manager. Indeed, what is the point of using Secrets Manager at all if you use Terraform?
Next, they recorded state in a hit repo. So all those secrets are committed to the repo. So now how am I supposed to encrypt the state file? What a disaster.
But wait. There’s more. With the team growing, how are we supposed to manage shared resources? Do I have to run tf apply, wait for completion, and then immediately commit and push, and hope I don’t have to manually manage a state file merge conflict? Or should I use some bizarre, self-built mutex?
Ugh. Terraform is the worst. I was always unimpressed by its feature set and documentation. Now I hate it. I don’t understand why it is so popular which there are such better alternatives.