Topic: FAO: AC

tteh !MemesToDNA started this discussion 6 years ago #96,224

do you use kubernetes @ work

Anonymous B joined in and replied with this 6 years ago, 56 seconds later[^] [v] #1,085,512

Whom is AC, bud?

Is it AntiChristos?

(Edited 1 minute later.)

beckyderp !3NeoVaGFAg joined in and replied with this 6 years ago, 13 minutes later, 14 minutes after the original post[^] [v] #1,085,513

@previous (B)
ugh newfag

tteh !MemesToDNA (OP) replied with this 6 years ago, 47 seconds later, 15 minutes after the original post[^] [v] #1,085,514

@previous (beckyderp !3NeoVaGFAg)

> ugh newfag

jodi !ariasXXmaE joined in and replied with this 6 years ago, 1 minute later, 16 minutes after the original post[^] [v] #1,085,516

@1,085,512 (B)
the grape on the far left

Anonymous B replied with this 6 years ago, 2 minutes later, 19 minutes after the original post[^] [v] #1,085,517

@1,085,513 (beckyderp !3NeoVaGFAg)
@1,085,514 (tteh !MemesToDNA)

Mad. Thanks.

Reid joined in and replied with this 6 years ago, 1 hour later, 1 hour after the original post[^] [v] #1,085,536

Yes. Quite a lot. It’s great.

We tinkered with Apache Mesos in late 2017 and went full-in on Kubernetes in early 2018. By late 2018 we had fully moved all production systems to Kubernetes.

My team has four regional clusters comprising about $50k a month in computing costs and we are on version 1.13. Soon we will upgrade to version 1.14.

Another team in our department runs the data platform from which we read and write. They have a massive cluster that costs something like $500k a month to run.

Kubernetes does what it says on the tin and I’ve enjoyed working with it. Our uptime went from about 98 to 99.999 percent once we got serious about fixing cruft in our architecture.

The biggest flaw we made so far with Kubernetes was not running enough DNS replicas. The second biggest flaw was using too many default settings for lifecycles of workloads. Right now we are pretty heavy on the amount of configuration we specify, but it feels right.

tteh !MemesToDNA (OP) replied with this 6 years ago, 8 minutes later, 1 hour after the original post[^] [v] #1,085,538

@previous (Reid)
Nice! $50k/month is insane; we have three clusters, one on which we run internal projects + a few clients' projects, and two each specific to other clients, yet total costs come nowhere near that.

Do you use Helm + helmfile/helmsman, out of interest? Or any other k8s templating?

We're now finally throwing Terraform into the mix and I'm feeling a bit overhwelmed.

(Edited 28 seconds later.)

tteh !MemesToDNA (OP) double-posted this 6 years ago, 5 minutes later, 1 hour after the original post[^] [v] #1,085,539

@1,085,536 (Reid)
And $500k/month is just... like... what the fuck. It's taken me a few minutes to process that. Bonkers!

(Edited 22 seconds later.)

Reid replied with this 6 years ago, 10 minutes later, 1 hour after the original post[^] [v] #1,085,549

@1,085,538 (tteh !MemesToDNA)
$50k is wild and it meets some of the legal and regulatory complexities. We must have separate clusters for EU and Japan. Our development cluster costs as much as our combined production clusters because we have such a large team with so many concurrent projects.

For comparison, the entire annual revenue of my employer back in 2009 was something like 1.5 million. Things have changed!

We use Helm for some of our monitoring daemons.

When we moved to Kubernetes, Helm wasn’t yet suitable for our needs, so we built our own templating and tooling to manage our deployments. At this point we could probably migrate it all over to our own custom Helm charts, but our workflow is so well integrated and reliable that it just might not be worth it.

Terraform is cool and great, but also complicated and overwhelming. It’s hard to wrap your head around it!

For AWS, Terraform is quite good. We use GCP and Terraform is still very inadequate for those resources.

You’re definitely not the only one struggling with it!

Reid double-posted this 6 years ago, 8 minutes later, 1 hour after the original post[^] [v] #1,085,553

@1,085,539 (tteh !MemesToDNA)
It’s… stupefying. And a huge money pit.

Those two clusters (in Frankfurt and Iowa) are mostly Cassandra workloads with something like 600 TB of data. The data is ingested through a message queuing system (it used to be Kafka, but might now be Pub/Sub) and processed by Spark jobs.

That department is a shitshow of toxic engineering culture and they’ve burned through a lot of good talent. I’m glad to keep my distance ?

Reid triple-posted this 6 years ago, 6 minutes later, 2 hours after the original post[^] [v] #1,085,555

@1,085,538 (tteh !MemesToDNA)
What versions of k8s do y’all run? How many people do you work with?

Our custom templating tooling comprises several Ruby scripts that essentially find-and-replace placeholder strings in static YAML files. We also make heavy use of configmaps, secrets, and projected volumes to modify how the application runs.

I’ve been learning a lot in the past few years from some really smart and talented peers and enjoy it a lot.

Fake anon !ZkUt8arUCU joined in and replied with this 6 years ago, 4 minutes later, 2 hours after the original post[^] [v] #1,085,556

Oh yeah I also uh use the computer machine thing like what they're talking about.

Reid replied with this 6 years ago, 2 minutes later, 2 hours after the original post[^] [v] #1,085,557

@previous (Fake anon !ZkUt8arUCU)
You’re doing great, sweetheart.

tteh !MemesToDNA (OP) replied with this 6 years ago, 8 minutes later, 2 hours after the original post[^] [v] #1,085,559

@1,085,549 (Reid)
Yeah, I've experienced the complexities of regulation — GDPR is a bitch, and most providers consider their "EU" regions to still include the UK (rightfully), yet clients' expectations vary. It's bloody 2 days until the UK 'officially' leaves, yet I'm as lost as anybody else in much of the minutiae. I set up VPC peering on one of our newer clusters recently, and discovered Google really makes it difficult to strictly keep within a single zone in Europe. In fact, using Terraform, you sometimes have no ability to choose the zone.

Honestly, I feel like Terraform + GCP just plain sucks (we don't use Terraform with AWS/Azure at all, unfortunately (not that I'm a fan of Azure...)).

We use Helm + Helmsman, but we've had to modify it fairly extensively for our needs. We're indefinitely putting off the move from Helm v2 to v3 (tillerless) because of our adjustments.

Reid replied with this 6 years ago, 1 minute later, 2 hours after the original post[^] [v] #1,085,560

@previous (tteh !MemesToDNA)
Yes, Terraform sucks on GCP ^_^

tteh !MemesToDNA (OP) replied with this 6 years ago, 1 minute later, 2 hours after the original post[^] [v] #1,085,562

@1,085,555 (Reid)
1.13, but hoping to upgrade "soon". The company I'm with is fairly small but rapidly expanding, and our infra team is somehow still only a handful of people.

I've learnt so much from this job, but I feel like we need more devops/infra colleagues, and soon. Any tips for convincing management?

tteh !MemesToDNA (OP) double-posted this 6 years ago, 4 minutes later, 2 hours after the original post[^] [v] #1,085,568

@1,085,553 (Reid)
Seriously, 600 TB is beyond my ability to really comprehend. We've handled a lot of data but... shit, man.

Reid replied with this 6 years ago, 9 minutes later, 2 hours after the original post[^] [v] #1,085,574

@1,085,562 (tteh !MemesToDNA)
I’ve had luck getting more resources from management by being really stern about how discomforted I feel about the risks we are taking, the likelihood we will burn somebody out, and that too few people have knowledge of particular subjects. Good luck ?

Anonymous B replied with this 6 years ago, 28 minutes later, 3 hours after the original post[^] [v] #1,085,579

@1,085,568 (tteh !MemesToDNA)

My brother was working in the IT department at a local community college, and he was the only one who could program in several languages. Basically, everyone else was limited to Python and C++. He was doing his supervisor's job, but not getting paid for it. He now works for the local ISD, and makes more than twice what he was.