SRE / DevOps / Kubernetes Weekly Reportまとめ#63(2021/4/11~4/16) - 運び屋 (A carrier(forwarder) changed his career to an engineer)

タイトルは「Building Momentum with Tinkerbell」。
OSSのベアメタルプロビジョナーである「Tinkerbell」のこれまでの歩みと今後について以下のポイントで解説している。
- The journey of building an open-source platform
- What’s new?
- Building what users want
- Growing as part of the CNCF community

A purposefully provocatively titled post with some good observations; nobody cares about the operating system anymore.

タイトルは「Nobody Cares About the Operating System Anymore」。
以下のポイントをタイトルに沿って解説している。
- Isn’t that a bit dramatic?
- Enter cloud
- We’re in what looks like a post-OS world
- It goes beyond the OS
- The Distro Wars are now about Kubernetes implementations
- I’m expecting angry letters

A quick post on getting started with CINC Auditor, the open packaging for Chef Inspec.

タイトルは「My New Friend, Cinc-Auditor」。
内容は上記Editorのコメントと、タイトルの通り。TL;DRは以下。
- TL;DR: The other gems being pulled from Package Cloud are all dependencies of cinc-auditor-bin, so we pull them from PackageCloud and not RubyGems.

A post explaining the concepts behind policy as code and how the author came to appreciate the idea and the open policy agent project.

タイトルは「WTF Is Policy as Code?」。
筆者達が「Policy as Code」の目標を達成するために、いかにオープンソースソリューションであるOPA(Open Policy Agent)に辿り着いたかを解説している。

A case study of adopting the new ARM-based Graviton instances in AWS, and the resulting price and performance improvements.

タイトルは「One Year of Graviton2 at Honeycomb」。
Graviton2のearly adopterとしてのこの1年の取り組みと成果を共有している。
数週間以内に、続けてブログが投稿される模様。Graviton2インスタンスによって、同じ数のシャードで2倍以上の顧客のクエリ、トリガー、SLOからのトラフィックを信頼性、待ち時間、コストはほぼ同じに水準を維持しながら、どのようにサービスを提供できるかを紹介する。

A post on building decision trees for threat modelling with Graphviz. It’s a nice graphviz DSL tutorial as well.

タイトルは「Creating Security Decision Trees With Graphviz」。
「Security Chaos Engineering e-book」の共著者の一人が、Graphvizと.DOTファイルを使用して電子書籍からデシジョンツリーの例を作成する方法について解説している。

Every discussed DNS propagation? As this post points out, It doesn’t exist and it’s a matter of layered caches.

タイトルは「DNS propagation does not exist」。
DNSレコードはプッシュ(伝播)されるのではなく、プル(クエリおよびキャッシュ)されるということを解説し、用語について以下のように提言している。
- So let's eliminate this fallacy, and call it cache expiration instead of propagation.

An interesting service management survey looking for input. The post has some good data points from the previous years survey too.

タイトルは「Introducing the 2021 State of Service Management Survey」。
上記の通り、サーベイの協力の依頼。また、過去のサーベイからのデータを抽出して解説している。

Tools

Ledokku is a web frontend for the Dokku minimal platform as a service.

dokku搭載のUIである「Ledokku」のWebページ。node.js、php、rubyなどで書かれたアプリをデプロイでき、データベース(postgresql、mongodb、redis)とリンクする。
GitHubページはこちら。

Volcano is a Kubernetes-native batch processing system for compute-heavy workloads like machine learning or bioinformatics.

Kubernetesで高性能ワークロードを実行するためのシステム「Volcano」のWebページ。
GitHubページはこちら。

SRE Weekly Issue #265 April 11th, 2021

Articles

Insights into a Product SRE team at LinkedIn

Here’s a great look into how LinkedIn’s embedded SREs work.

[…] the mission for Product SRE is to “engineer and drive product reliability by influencing architecture, providing tools, and enhancing observability.”

Zaina Afoulki and Lakshmi Namboori — LinkedIn

LinkedIn社のSREが日々対応している問題、考え方を垣間見ることができる。

DNS propagation does not exist

It’s all just other people’s caches.

Ruurtjan Pul

上記DEVOPS WEEKLY ISSUE #537で取り上げているので、割愛。

Advice for someone moving from SRE to backend engineering

Recently there was a Reddit post asking for advice about moving from Site Reliability Engineering to Backend Eng. I started writing a response to it, the response got long, and so I turned it into a blog post.

Charles Cary — Shoreline

上記の通り、Redditの投稿「Need advice for moving from SRE to Backend」に対する回答を長くなったのでブログにしている。

The Mightiest Monolith

This is the first in a series about lessons SREs can learn from the space shuttle program. The author likens earlier spacecraft to microservices and the Shuttle to a monolith.

Robert Barron

上記の通り、現代の開発者、DevOps実践者、サイト信頼性エンジニアがスペースシャトルプログラムから学べることを解説している。

The 5 characteristics of high reliability organizations

This article is ostensibly about Emergency Medical Services (EMS), but as is so often the case, it’s directly applicable to SRE. The 5 characteristics are enlightening, and so is the fictitious anecdote about an EMT rattled from a previous incident.

Ems1

タイトル通り、highly reliable organization (HRO)の下記5つのPrinciple(characteristic)を中心に解説している。これらはSocial psychologistであるKarl Weick(PhD)とKathleen Sutcliffe(PhD)の両氏著の「"Managing the Unexpected."」に記されている。
- HRO Principle 1: Preoccupation with failure
- HRO Principle 2: Reluctance to simplify
- HRO Principle 3: Sensitivity to operations
- HRO Principle 4: Commitment to resilience
- HRO Principle 5: Deference to expertise

How we scaled the GitHub API with a sharded, replicated rate limiter in Redis

Simple solution meets reality. I like how we get to see what they did when things didn’t quite work out as they were hoping.

Robert Mosolgo — GitHub

GitHub社が約1年前、より多くのトラフィックを処理し、より復元力のあるプラットフォームアーキテクチャに対応するために、古いレートリミッターを移行する過程で得たいくつかの教訓を共有している。

GitHub Availability Report: March 2021

They did the work to convert a database column to a 64-bit integer before it was too late. Unfortunately, one of their library dependencies didn’t use 64-bit integers.

Keith Ballinger — GitHub

2021年3月分のAvailability Report。発生した3つのインシデントについて解説している。

Learning from incidents: getting Sidekiq ready to serve a billion jobs

Learning from incidents: getting Sidekiq ready to serve a billion jobs In this post, I’ll walk you through one of our first ever Sidekiq incidents and how we improved our Sidekiq implementation as a result of this incident.

Nakul Pathak — Scribd

Sidekiq社にとって初めてのインシデントと、このインシデントにより実装をどのように改善したかを解説している。問題箇所への対処、Datadogのダッシュボード追加など。

Outages

Let’s Encrypt
Uber
Multiple Airlines’ Online Booking Sites
An error in Google’s flight information service caused problems at multiple sites that consume it.
Tinder
BBC Website
Facebook, Instagram, and WhatsApp
Stellar.org (cryptocurrency)
WazirX (cryptocurrency exchange)
Microsoft Azure and other services

Azure DNS servers experienced an anomalous surge in DNS queries from across the globe targeting a set of domains hosted on Azure.

上記各社の障害情報

KubeWeekly #259 April 16th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

‘Master,’ ‘Slave’ and the fight over offensive terms in computing

Kate Conger, New York Times

The Inclusive Naming Initiative provides "guidance to standards bodies and companies that want to change their terminology but don’t know where to begin." Read about the awesome work the Kubernetes community is doing to remove harmful language from code in the article.

上記の通り、コード上から特定の有害とされる用語を取り除き、置き換える動きに関する記事。

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

Enforce configuration and security checks for your YAML Files and Helm Charts with KubeLinter

Viswajith Venugopal, StackRox

アプリのハンズオンデモを紹介し、ローカルマシンとCIパイプライン統合のユースケースを紹介し、「KubeLinter」を組織に最適に統合する方法を解説している。

What's new in Argo Workflows 3.0

Alex Collins, Intuit

最近リリースされたArgo Workflows3.0の新機能と機能の詳細をmaintainerにより解説されている。

Sensible step in the right direction, let’s make KEP-2572 happen. That is: moving to 3 from 4 releases per annum, as this allows for more soaking/triaging, and should result in more resilient and secure infra.

Spread the word & get heard!https://t.co/O8wxDlr6nU #kubernetes pic.twitter.com/rLaFkSJLvZ
— Michael Hausenblas (@mhausenblas) 2021年4月14日

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

How we use metamonitoring Prometheus servers to monitor all other Prometheus servers at Grafana Labs

Jeroen Op 't Eynde, Grafana Labs

メタモニタリングの概念に基づきGrafana Labs社で採用しているソリューションの解説している。
地理的に分散した少数のメタモニタリングPrometheusサーバーが、他のすべてのPrometheusサーバーと相互のクロスクラスターを監視し、アラートチェーンはデッドマンスイッチのようなメカニズムによって保護している。

Docker without Docker

Thomas Ptacek, Fly.io

タイトル通り、ほとんどのユーザーにソフトウェアをDockerコンテナとして提供しているが、Dockerを使用して実行していないことを解説している。
Dockerは優れているが、Fly.ioは高密度のマルチテナントであり、Dockerの分離は自社の用途にとって十分に強力ではないため、代わりにコンテナイメージをFirecrackerマイクロVMに変換して利用している。

Unveil hidden malicious processes with Falco in cloud-native environments

Kaizhe Huang, Sysdig

「libprocesshider」ツールによる非表示プロセスのシナリオをコンテナとKubernetesクラスターに適用する方法について詳しく解説している。
また、Falcoがこれらの攻撃を検出して軽減する方法についても解説している。

How to use Dex with Google Accounts to manage access in Kubernetes

Emil Vanneback and Pavan Gunda, Elastisys Engineering

KubernetesクラスターでIDプロバイダーとしてGoogleを使用するために必要なすべてを解説しているチュートリアル。

HAProxy forwards 2 Million HTTP requests per second on AWS's Arm instances

Willy Tarreau, HAProxy

筆者が周りの人にした以下の発言に基づいて作成した記事。タイトル通り、現在はその2倍以上に達している。
- “The day we cross the million-requests-per-second barrier, I’ll write about it.”

First look at GKE Autopilot

Ahmet Alp Balkan, Google Cloud Ahmet Alp Balkan, Google Cloud

筆者がGKE Autopilotがノード、APIを突いて動作することを実際に確認し、0から500のポッド自動スケーリングを実行して、ユーザーの観点からどれだけ適切にスケーリングされるか、をまとめている。
GKE Autopilot teamにドラフトをチェックしてもらっている、とのこと。

Best Kubernetes monitoring tools: Free, open source & paid

Adnan Rahic, Sematext

Kubernetes環境のパフォーマンスを確保するためのモニタリングツールのリストを共有している。オープンソースと商用の両方であわせて以下13。
1. Sematext
2. Kubernetes Dashboard
3. Prometheus
4. Grafana
5. Jaeger
6. Elastic Stack (ELK)
7. cAdvisor
8. Kubewatch
9. Kube-state-metrics
10. Datadog
11. New Relic
12. Sensu
13. Dynatrace
Kubernetesは複雑な性質を持っているため、さまざまな機能を備えたソリューションを含めている。一部はログに取り組み、その他は単なるメトリック、一部はKubernetesネイティブ、その他は汎用。また、データコレクターとして機能するものもあれば、インターフェイスとして機能するものもある。

Does Linkerd mesh with GitOps? Flux around and find out

Jason Morgan, Buoyant

GitOpsと、それがLinkerdサービスメッシュにどのように関連しているかについて解説している。読むとFluxを使用してLinkerdでクラスターをブートストラップし、それにアプリをデプロイできるようになる。

Authorizing microservice APIs with OPA and Kuma

タイトルの内容を人やマシンが実行したアクションを制御する問題=認可、に焦点を当てて解説している。

The simple way to connect existing apps to public cloud

タイトルに沿って、以下のポイントを解説している。
- The progression and move away from on-premises datacenters to modern VM or microservice-based architectures
- How tunnels could be used to securely connect private clouds to the public cloud

Prometheus definitive guide part II - Prometheus query language

Prometheusの解説をしているシリーズ。今回は主にPrometheusデータをクエリする方法を解説している。
Part Iはこちら。

First rule of the YAML club pic.twitter.com/bqldb5PQfn
— memenetes (@memenetes) 2021年4月15日

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Kubernetes 1.21, with Nabarun Pal

Craig Box, Kubernetes Podcast from Google

Google社社員によるKubernetes Podcast。今回のHostはCraig Box氏とGuest HostのDaniel Smith氏。以前の登場回は以下。
- CRDs, Extensibility and API Machinery, with Daniel Smith
VMware社のSoftware EngineerでKubernetes 1.21のrelease team leadであるNabarun Pal氏をゲストとして迎えている。
News of the weekで気になったトピックは以下の通り。

Emissary-ingress, Ambassador’s API Gateway, is officially an incubation project at the Cloud Native Computing Foundation

Richard Li, Ambassador Labs

Kubernetesプロジェクト用のAmbassador API Gateway(現在はEmissary-ingressとして知られています)がCNCFのインキュベーションプロジェクトとして正式に承認されたことと、これまでの歩みを簡潔に案内している。
4月29日(木)のoffice hours sessionも紹介している。毎月最終木曜日に開催される模様。

Top 6 open source networking projects for a cloud native world

Twain Taylor, TechGenix

タイトル通り下記6つのネットワークに関するオープンソースのプロジェクトを紹介している。
1. Project Calico
2. Cilium
3. Envoy
4. Jaeger
5. Flannel
6. Kuma

No one wants to manage Kubernetes anymore

Scot Carey, InfoWorld

堅実で多様なマネージドKubernetesオプションが利用できるようになったことで、自社のクラスターの管理を躊躇する企業が増えている背景を、主要なベンダーのメンバーへのインタビューを交えて解説している。

Cheryl Hung on trends in cloud native and DevOps for 2021

Matt Campbell, InfoQ

以前取り上げた「10 predictions for cloud native in 2021 – Keynote, The DevOps Conference 」を元にInfoQがインタービューしているもの。

Reminder: Take the CNCF Microsurveys on edge, CFM and diversity

以前紹介している上記サーベイのリマインダー。関連する業種の方はチェックを。

Don’t forget, the scholarship for KubeCon + CloudNativeCon Europe 2021- Virtual closes on April 19, 2021 at 11:59 pm PDT! Be sure to apply today.

Upcoming CNCF Online Programs

Cloud Native Live

4/21/21: Automate & orchestrate databases & other stateful workloads with Kubernetes, Alex Chircop, Storage OS - RSVP

On-demand

4/22/21: What is cloud native and why should I care?, Jamie Dobson, Container Solutions - RSVP
4/22/21: Managing add-ons across clusters, Anubhav Sharma, Nirmata - RSVP

YouTube playlist submissions

Looking for more great curated content? Visit our Online Programs playlist on YouTube.

Learn more about CNCF Online Programs

いかがでしたか？気になる記事や情報はありましたか？

私もまだ内容を咀嚼出来ていないものが多々ありますので、この備忘録兼リンク集を活用しながら理解を深めていきたいと思います。

では、また。

Bye now!!

Yoshiki Fujiwara