Challenges

Minimum Viable DevOps: What Every Startup Needs Before Going to Production

By Marc Molas·February 20, 2024·10 min read

You don't need Kubernetes. You probably don't need microservices. You definitely don't need a dedicated SRE team at this stage. But you do need a DevOps foundation. Without one, every deployment is a game of Russian roulette and every production incident becomes a fire drill at 3 AM.

I've seen startups with a solid product and real traction lose weeks because they didn't have a decent CI/CD pipeline. And I've seen others spend months building enterprise-grade infrastructure when they had 3 engineers and 0 paying customers. Both extremes are mistakes. What you need is Minimum Viable DevOps.

The Minimum Viable DevOps checklist

These are the 8 elements every startup needs before putting a product in the hands of real users. This is non-negotiable. If you're missing any of them, you're accumulating operational debt that will blow up in your face at the worst possible time.

1. Version control with a branching strategy

This seems obvious, but the number of startups working directly on main with no branching strategy is staggering.

You have two reasonable options:

Trunk-based development. Short-lived branches (hours, not days), frequent merges to main, feature flags for unfinished code. Ideal for small teams that deploy multiple times a day.
Simplified Git Flow. Feature branches, a develop branch, releases from main. More structure, useful when you need clear staging environments.

For most early-stage startups, trunk-based with feature flags is the right call. Less overhead, fewer merge conflicts, faster cycles.

2. CI pipeline: lint, test, build on every PR

Every pull request should pass through an automated pipeline before anyone reviews it. At a minimum:

Linting. ESLint, Pylint, whatever fits your stack. This isn't about aesthetics -- it's about preventing bugs.
Automated tests. Unit tests at a minimum. Integration tests if you have them. The pipeline fails if any test fails. No exceptions.
Build. If your application compiles, compile it in CI. A PR that breaks the build doesn't get merged. Period.

This gives you a basic safety net. Not perfect, but infinitely better than "works on my machine."

3. CD pipeline: automated deployment to staging and production

If you're doing manual deployments -- SSH into the server, git pull, npm run build, pray -- you're living dangerously. A basic CD pipeline does this:

Merge to develop = automatic deployment to staging.
Merge to main (or release tag) = automatic deployment to production.
Accessible rollback. A button or a command to revert to the previous version in under 5 minutes.

Deployments should be boring events. If every time you deploy to production the entire team holds its breath, you have a process problem, not a product problem.

4. Basic monitoring: uptime, errors, response times

You don't need dashboards with 47 metrics. You need to know three things at all times:

Is your application online? Uptime monitoring. If it goes down, you find out in minutes, not when a user tweets at you.
Are there errors? 5xx error rate, uncaught exceptions. Tools like Sentry are perfect for this.
Is it fast? Response times on your critical endpoints. If your API goes from 200ms to 2 seconds, you want to know before your users do.

The options: Datadog if you have the budget, New Relic with its free tier, or even CloudWatch if you're on AWS. The tool doesn't matter -- what matters is that monitoring exists and alerts reach someone who will see them.

5. Centralized, searchable logging

console.log in production isn't logging. Logs scattered across 3 different servers aren't useful when you have an incident at 11 PM.

You need centralized logs in a place where you can search them. The options:

ELK Stack (Elasticsearch, Logstash, Kibana). Powerful, but requires maintenance if you self-host.
CloudWatch Logs if you're on AWS. Easy to configure, searchable, integrated.
Papertrail or Logtail. Simple, cheap, good enough for early-stage startups.

The golden rule: if a user reports a bug, you should be able to find the corresponding log in under 5 minutes.

6. Backup and recovery: database backups with tested restores

Having backups doesn't count if you've never tested restoring them. Here's the minimum:

Automatic daily backups of your database. If you're using RDS or Cloud SQL, this comes built in.
Minimum 7-day retention. Ideally 30.
Tested restore. At least once a quarter, restore a backup to a test environment and verify the data is there. If you've never tested your restore, you don't have backups -- you have an illusion of safety.

7. Environment parity: staging mirrors production

Your staging environment should be as close to production as possible. Same database version, same server configuration, same environment variables (with different values, obviously).

If something works in staging but fails in production, your staging is useless. The most common problems:

Different dependency versions. Staging on Node 18, production on Node 16. Use Docker or at least .nvmrc to pin versions.
Different database. Staging with SQLite, production with PostgreSQL. No. Use the same database in both environments.
Unrealistic test data. Your staging has 10 records. Your production has 100,000. Performance problems don't show up with 10 records.

8. Secrets management: zero hardcoded credentials

If there's an API key, database password, or access token in your source code, you have a security problem that's only a matter of time before it explodes.

The minimum:

Environment variables for all secrets. Never in the code, never in the repository.
.env in .gitignore. Always. No exceptions.
Secret rotation. If a secret leaks, you should be able to rotate it in minutes, not hours.

Tools like AWS Secrets Manager, HashiCorp Vault, or even GitHub Actions' built-in secrets manager are enough to get started.

What you DON'T need (yet)

Just as important as what you need is what you don't. These are the most common over-engineering traps:

Kubernetes. Unless you have a team of 10+ engineers and genuine container orchestration needs, Kubernetes is complexity that adds no value. A simple ECS, Railway, or Fly.io setup is more than enough.
Service mesh. Istio, Linkerd... solutions to problems you don't have with 3 microservices (which probably should be a monolith anyway).
Custom metrics dashboards. Grafana with 15 panels doesn't make you faster. Basic alerts do.
Multi-region failover. If your startup has 500 users, you don't need geographic redundancy. You need solid uptime in one region.

The rule: if you can't explain in one sentence why you need a tool or practice, you probably don't need it.

When to level up

Minimum Viable DevOps isn't the destination -- it's the starting point. You should start investing in more robust infrastructure when:

You have paying customers. Now there are implicit SLAs. Downtime costs real money.
Your team exceeds 5 engineers. More people = more need for automation, better development environments, more sophisticated pipelines.
You have regulatory requirements. If you handle health, financial, or personal data with compliance requirements, your infrastructure needs to reflect that.

Tools by budget

Zero budget (free tier): GitHub Actions for CI/CD, Vercel or Railway for hosting, Sentry free for errors, UptimeRobot for uptime monitoring. This covers you surprisingly well up to your first few thousand users.

Mid-range budget (€200-500/month): AWS with Terraform for infrastructure as code, GitHub Actions or CircleCI for CI/CD, Datadog or New Relic for monitoring, managed ELK or CloudWatch for logs.

Enterprise budget (€1,000+/month): Managed AWS or GCP services, ECS or EKS for containers, Datadog full suite, PagerDuty for incident management, Terraform Cloud for state management.

The profile you need

Setting all of this up doesn't require a DevOps team. It requires a senior engineer who's done it before. Someone who knows the difference between necessary and aspirational, who can configure a CI/CD pipeline in a day -- not a sprint -- and who understands that infrastructure should serve the product, not the other way around.

At Conectia, we have senior DevOps and infrastructure engineers who've built exactly this kind of foundation for startups -- no over-engineering, no enterprise solutions for startup problems. Every one of them passed our CTO-led technical vetting, where we evaluate real production experience, not certifications. And we respond within 72 hours.

Minimum Viable DevOps isn't glamorous. You're not going to write a LinkedIn post bragging about your CI pipeline. But it's the difference between a startup that can iterate with confidence and one that's afraid to deploy on a Friday.

Need a senior DevOps engineer to lay the foundation without over-engineering? Talk to a CTO -- the right infrastructure for your stage, ready in days.