SRE / DevOps / Kubernetes Weekly Reportまとめ#35(9/27~10/2) - 運び屋 (A carrier(forwarder) changed his career to an engineer)

この記事は2020/9/27~10/2に発行された下記3つのWeekly Reportを読み、備忘録兼リンク集として残したものです。
English Version of this blog is here.
DEVOPS WEEKLY ISSUE #509 September 27th, 2020
- News
- Tools
  - version-checker is a Kubernetes utility for observing the current versions of images running in the cluster, as well as the latest available upstream. These checks get exposed as Prometheus metrics to be viewed on a dashboard.
  - Portus is an open source authorization service that sits atop a Docker container registry. It provides user management and a number of other useful features.
SRE Weekly Issue #237 September 27th, 2020
- Articles
- Outages
KubeWeekly #235 October 2nd

この記事は2020/9/27~10/2に発行された下記3つのWeekly Reportを読み、備忘録兼リンク集として残したものです。

誰かの情報源や検索工数削減などになれば幸いです。

DEVOPS WEEKLY ISSUE #509 September 27th, 2020

SRE Weekly Issue #237 September 27th, 2020

KubeWeekly #235 October 2nd, 2020

English Version of this blog is here.

この記事を読んで疑問点や不明点があれば、URLから本文をご確認の上、ご指摘頂ければ幸いです。
理解が浅いジャンルも、とにかくコメントする様にしていますので、私の勘違いや説明不足による誤解も多々あろうかと思います。
情報量が多いので文字とリンクだけに絞っております。
各レポートで取り上げられている記事には2019年以前のものもあり、必ずしも最新のものという訳ではない様です。

DEVOPS WEEKLY ISSUE #509 September 27th, 2020

News

A post on one open source project’s search for a new CI system. Lots of useful research for anyone investigating CI and build systems.

タイトルは「Rebuilding Linkerd's continuous integration (CI) with Kubernetes in Docker (kind) and GitHub Actions」。
KubeCon EU 2020での筆者のプレゼンを書き起こした記事。
デモ動画が2つ埋め込まれていて、特にOSSでCI/CDを組み立てたい場合に参考になる。

A good post on the evolution of systems administration, the embrace of devops, shifting responsibility and changing job titles.

タイトルは「The evolution of DevOps and why we are here」。
システム管理者の役割と、時間の経過とともに、彼らの責任が今日のDevOpsエンジニアと呼ばれるものに変化する必要に迫られた経緯と、現在について解説している記事。

The idea of commercial off-the-shelf software is fine, but the reality often differs. This post explains the frustration and suggests it’s only COTS if you can get something up and running in a day.

タイトルは「“Fake COTS” and the one-day rule」。
政府機関が市販品/民生品=COTS(commericial-off-the-shelf)を調達した際に期待通りすぐに使える場合と、名前負けして1日ですぐに使えないプロダクトを"Fake COTS"と呼び、製品名を例として挙げながら解説している記事。
以下の免責事項が冒頭に強く書かれている。
- Extra-prominent disclaimer: The views expressed here are my own. Products mentioned in the examples below are not endorsements.

Dependency management is an unfortunate consequence of a library ecosystem. This post, reviewing a recent paper looking at the Python ecosystem, has some interesting ideas and tooling demonstrations.

タイトルは「Watchman: monitoring dependency conflicts for Python library ecosystem」。論文を解説しているブログ。
Pythonプロジェクトでの依存関係の競合の蔓延とその原因を「依存関係地獄」などに触れながら解説している。

A look at Terraspace, a deployment tool for Terraform that provides some interesting high-level features and visualisation tools.

タイトルは「Terraspace All: Deploy Multiple Stacks or Terraform Modules At Once」。
TerraformのフレームワークであるTerraspaceを解説している。組織化された構造、設定より規約を提供し、コードをDRYに保ち、便利なツールを追加する。
動画も埋め込まれている。

AWS has grown to have several different overlapping approaches to managing users and accounts. This post looks at some of the nuance and makes some recommendations.

タイトルは「AWS Account Structure: Think twice before using AWS Organizations」。
AWSアカウントの管理方法の変遷とポイントを解説している。Podcastも埋め込まれている。

Krew, the plugin manager for kubectl, now supports custom indexes. So you can distribute kubectl plugins for your own projects or for internal company usage.

タイトルは「Using Custom Plugin Indexes」。
「カスタムプラグインインデックス」をいくつかのkubectl krewコマンドで使用する方法を解説している。

Tools

version-checker is a Kubernetes utility for observing the current versions of images running in the cluster, as well as the latest available upstream. These checks get exposed as Prometheus metrics to be viewed on a dashboard.

クラスターで実行されているイメージの現在のバージョンと、利用可能な最新のアップストリームを監視するためのKubernetesユーティリティ「Tversion-checker」のGitHubページ。
これらのチェックは、ダッシュボードに表示されるPrometheusメトリック、またはソフトアラートクラスターオペレーターとして公開される。
このツールは現在、experimental。

Portus is an open source authorization service that sits atop a Docker container registry. It provides user management and a number of other useful features.

次世代のDockerレジストリ用のオープンソース認可サービスおよびユーザーインターフェイスである「Portus」のioページ。
ユーザーがDockerレジストリを管理および保護できるようにするオンプレミスのアプリ。

SRE Weekly Issue #237 September 27th, 2020

Articles

Postmortem — why Allegro went down

They fully expected their deep-discount sale to drive traffic, but they didn’t expect their system to handle the increase in the way that it did.

Michał Kosmulski — Allegro

2018年8月31日付の記事。Allegro社の2018年7月18日の昼にWebサイトが20分間ダウンした障害に関するポストモーテム。
顧客とテックコミュニティー向けに障害が起きた経緯と、将来の同様のイベントが起こりにくくするためにどのような技術的な処置を取ったかを、内部向けのポストモーテムをベースに公開、解説している。

Zero-Downtime Kubernetes Deployments

Pre-stop hooks, liveness probes, and readiness probes were key to smoothly transitioning their services from a home-grown container system to Kubernetes.

Oliver Leaver-Smith — Sky Betting & Gaming

Sky Betting & Gaming社が過去数か月にわたって行って来た、OIDC / OAuth2 IDサービスを戦術的なコンテナープラットフォームからオンプレミスのKubernetesクラスターへの移行作業を解説している。

Feelings during incident response

The experience of responding to an incident can evoke emotions that run the gamut.

Mads Hartmann

Glitch社のPodcast「Shift Shift Forward」で「障害対応時の心境」の質問を受けた部分の抜粋、解説。

Join SRE Classroom NALSD workshops

Google has released course materials the first of a series of classes on NALSD (“non-abstract large systems design”). This first one is about a distributed Pub-Sub system.

Auithor: Jenny Liao and Salim Virji — Google

Google社がSREClassroomにおける「Distributed Pub/Sub workshop」の最初のワークショップ「NALSD(Non-Abstract Large System Design)」を紹介している。

Why you should write up your own incident

Usually, doing a post-analysis on an incident you were in is an anti-pattern because you’re likely to introduce bias. But sometimes, it can lead you to learn more than you would have otherwise.

Lorin Hochstein

「なぜ障害対応者自身の手で障害の事後分析を書くべきなのか」を筆者の直近の体験から解説している。
以下のように述べ、可能であれば障害対応者自身が事後分析を書くのを避けるべきだが、必要に迫られている場合は止むを得ないとしている。また、筆者自身が最近自身が対応した障害の事後調査での新しい発見を体験したことに基づいて、障害対応者自身で事後調査を行うことを勧めている。
- You shouldn’t write up your own incident if you can avoid it. To write up an incident well, you need to be able to capture the perspectives of the different people who were involved. If the write-up author was also one of the responders, then the writeup will be biased towards their perspective, at the expense of capturing the perspectives of the other engineers who were engaged.

Outages

Datadog
G Suite
Google Cloud Platform
Let’s Encrypt
Google CT logs had an issue, impairing Let’s Encrypt’s ability to issue.
Tesla
Apple
Reddit
Heroku
Connectivity Issues
Crypto.com (cryptocurrency exchange)
The CEO says a database issue (nearly) opened up the possibility for arbitrage.

上記各社の障害情報

KubeWeekly #235 October 2nd

The Headlines

Editor’s pick of the highlights from the past week.

KubeCon + CloudNativeCon North America 2020 Virtual – schedule now available!

We’re so excited to announce that the schedule for KubeCon + CloudNativeCon North America 2020 Virtual is live! The fourth virtual event from CNCF this year will host ~200 maintainer sessions, tutorials, keynotes, and breakout sessions, including insights from end users on cloud native technology in production. This educational event will arm attendees – from beginner to advanced – with the insights they need to successfully implement and manage cloud native architectures within their organization. Don’t forget that you can save $25 when you register by the end of October!

KubeCon + CloudNativeCon North America 2020 Virtualのスケジュール公開の案内と、参加登録のリマインド。有料参加は10月中は$25引きの$75、11月の申し込みだと$100。
もう来月ですね。早めに見たいセッション決めて、できる準備はしておこうと思います。

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Project webinar: Kubernetes 1.19

Kubernetes release team

Kubernetes 1.19の変更点を、SIGごとのそれぞれのステータス「Alpha」「Beta」「Stable」などを踏まえて解説している。1.19からサポート期間が1年に。

CNCF Member webinar: VanillaStack as a platform for a truly vendor-agnostic open-source ecosystem

Karsten Samaschke, CEO @Cloudical

VanillaStackのオープンソース版とその基礎となるアイデアを紹介し、プラットフォームの背後にあるアイデアを説明し、オープンソースプロジェクトの統合と展開のための将来のロードマップを示している。

CNCF Member webinar: Effective disaster recovery strategies for Kubernetes

Rasheed Amir, CEO @Stakater AB

ミッションクリティカルなクラウドネイティブアプリで、DevOpsを通して企業がKubernetesをどのように活用しているかについて説明している。
- Some concepts and terms to consider for disaster recovery business needs
- Kubernetes architecture for ensuring fault tolerance and high availability
- Factors to consider while creating a Disaster recovery plan
- The components for which to implement backup and restore

CNCF Member webinar: Self service Kubernetes for enterprises

Jim Bugwadia, Founder and CEO @Nirmata

企業全体でセルフサービスのKubernetesクラスターを実現するのに役立つベストプラクティスと新しいパターンについて説明している。
可視性とガバナンスを必要とし、企業全体のビジネスの俊敏性を実現し、クラウドネイティブツールの採用を促進したいプラットフォームチーム向け。

CNCF Member webinar: Dapr, Lego for microservices

Mark Chmarny, Principal Program Manager @Microsoft

分散アプリケーションランタイムDaprを利用して、Kubernetesやその他のホスティングプラットフォームにデプロイされたクラウドネイティブアプリを、効率的に構築する方法を紹介している。

Super excited to announce the alpha release of @EnvoyProxy for @Windows. This has been an amazing cross-functional effort, especially by @VMware and @Microsoft. Well done and please test! 👏🎉🥳🚀https://t.co/qzYOu8HAdN
— Matt Klein (@mattklein123) September 30, 2020

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

The Level Up Hour (podman play kube)

Langdon White and Chris Short, Red Hat

KubeWeeklyの編集者であるRed Hat社のChris Short氏によるプログラム「The Level Up Hour」、同社のLangdon White氏をゲストに迎えてPodmanTwitch動画。

Our online analytical processing journey with ClickHouse on Kubernetes

Sudeep Kumar, Mohan Garadi, Xiancheng Li, Amber Vaidya and Liangfei Su, eBay

Kubernetesを利用したClickHouse(列指向データベース)をテーマに、オンライン分析処理(OLAP)データの最新の進化について解説している。

A Linux sysadmin’s introduction to cgroups

Steve Ovens, Red Hat

4部構成のシリーズの最初の記事。cgroupの定義と、それらがリソース管理とパフォーマンスチューニングにどのように役立つかについて解説している。

Rabbitmq monitoring on Kubernetes

Piotr Minkowski

RabbitMQを使ってKubernetesでモニタリングスタックを実行する方法について説明している。
- RabbitMQ監視ツールを使用すると、ノードの一般的なメトリックとすべてのメッセージの詳細なログを確認できる。
- Spring Boot AMQPは、RabbitMQと対話するアプリケーション専用のメトリックを提供する。

Build a data streaming pipeline using Kafka Streams and Quarkus

Kapil Shukla, Red Hat

KafkaStreamsを使用してデータをリアルタイムでストリーミングおよび処理するQuarkusアプを構築、解説している。

Chaos Mesh 1.0: Chaos Engineering on Kubernetes made easier

Chaos Mesh Maintainers

2020年7月にサンドボックスプロジェクトとしてCNCFに参入し、「ChaosMesh®」のv 1.0のGAのリリース発表と、概要の紹介をしている。

Rootless containers with Podman: The basics

Prakhar Sethi, Red Hat

コンテナーとPodmanを使用する利点、ルートレスコンテナーとそれらが重要である理由、例を挙げてPodmanでルートレスコンテナーを使用する方法を示している。

couler-proj/couler

Unified interface for constructing and managing workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow

Argoワークフロー、Tektonパイプライン、ApacheAirflowなどのさまざまなワークフローエンジンでワークフローを構築および管理するための統合インターフェイス「Couler」のGitHubページ。以下を提供する。
- Simplicity: Unified interface and imperative programming style for defining workflows with automatic construction of directed acyclic graph (DAG).
- Extensibility: Extensible to support various workflow engines.
- Reusability: Reusable steps for tasks such as distributed training of machine learning models.
- Efficiency: Automatic workflow and resource optimizations under the hood.

jetstack/version-checker

Kubernetes utility for exposing image versions in use, compared to latest available upstream, as metrics

上記のDEVOPS WEEKLY ISSUE #509で触れているので割愛します。

TiDB Operator: Your TiDB operations expert in Kubernetes

Aylei Wu, PingCap

TiDB OperatorがTiDBをKubernetesでスムーズに実行し、データの安全性を確保する方法を探り、企業が本番環境でTiDBOperatorをどのように使用しているか、およびベストプラクティスを解説している。

GitHub actions demystified

Pooja Dhoot

GitHub Actions、いくつかのコード検証アクション、最後にいくつかの監視アクションによって作成されたGKE KubernetesクラスターにFissionをデプロイするためのパイプラインを作成するためのワークフローを共有している。

Use Terraform to create and manage a HA AKS Kubernetes cluster in Azure

Kentaro Wakayama, Coder Society

Terraformを使用して、AzureAD統合とCalicoネットワークポリシーを有効にした高可用性Azure AKS Kubernetesクラスターを管理する方法を解説している。

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Contributing to the Development Guide

Erik L. Arneson

新しいコントリビューターが、「Kubernetes Development Guide」に変更を書き込み、送信した経験について説明している。私もWebページのドキュメントのローカライゼーションに参加し始めていていますが、範囲を広げていきたいと思います。

Anthos in depth: Easy load balancing for your on-prem workloads

Mahesh Narayanan, Product Manager, GKE and Yuan Liu, Software Engineer, GKE

Anthosが外部ロードバランサーをデプロイするために提供する3つの異なるオプションを紹介し、Anthosにバンドルされているロードバランサーについて詳しく説明している。

Kubernetes: When to use, and when to avoid, the operator pattern

Mary Branscombe, The New Stack

「RancherLabsの最高技術責任者であるDarren Shepherd氏は、開発者が独自のKubernetes Operatorを作成する前に、GitOpsやその他の構成管理オプションを検討することを強く推奨していること」を引き合いに出しながら、Operatorの使い所、ポイントを検討している。

Leader Election, with Mike Danese

Adam Glick and Craig Box, Kubernetes Podcast from Google

Google社社員によるKubernetes Podcast。現在のCo-hostはCraig Box氏とAdam Glick氏。
Google社のSWE、およびKubernetes SIG Authのchair、TLであるMike Danese氏をゲストとして迎えている。
News of the weekで気になったトピックは以下の通り。
- OpenServiceMesh joins the CNCF Sandbox
- Chaos Mesh 1.0
- Determined AI on Kubernetes
- KubeAcademy Pro from VMware
- Scholarships for KubeCon NA 2020 are open for application

Security in all its forms – detection of undesirable behavior thanks to Falco with Thomas Labarussias

Electro Monkeys podcast (in French)

フランス語のPodast「Electro Monkeys podcast」でFalcoによる望ましくない挙動を検知する仕組みについて取り上げている模様。

The Cloud Native Landscape: The runtime layer explained

Catherine Paganini and Jason Morgan

CNCFの「Cloud Native Landscape」の各カテゴリを説明するシリーズの記事。クラウドネイティブ環境で実行するためにコンテナーが必要とするすべてを網羅するランタイムレイヤーに焦点を当てている。

With an eye toward standardization and security for its media brands, Verizon Media turned to cloud native

CNCF Case Study

CNCFによるユーザー事例の紹介記事。Verizon Media社のクラウドネイティブ技術の採用を取り上げている。
YahooからHuffPost、TechCrunch、その他多くのブランドを保持している。事例の全文はこちら。

Kubernetes people!

If you are eligible to vote(*) in the steering committee election, PLEASE DO. Turnout is ... low so far. It just takes a minute or two!

(*) https://t.co/uVtAQSJX0z
— Tim Hockin (@thockin) September 29, 2020

Upcoming CNCF webinars

気になるWebinarがあれば登録してチェックを。以下は直近のものとしてリストされていたものです。

Member Webinar: Multi-Cluster & multi-cloud service mesh with CNCF’s Kuma and Envoy
Marco Palladino, CTO & Co-Founder @Kong
Oct 6, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: The evolution of cloud orchestration systems from ephemeral to persistent storage
Boyan Krosnov, CPO @StorPool
Oct 7, 2020 8:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Kubernetes native two-level resource management for AI/ML workloads
Diana Arroyo Software Engineer @IBM Research
Alaa Youssef, Manager, Container Cloud Platform @IBM Research
Oct 7, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Building dynamic machine learning pipelines with KubeDirector
Tom Phelan, Fellow, Software Organization @Hewlett Packard Enterprise
Oct 8, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: You can be a Kubernetes contributor too!
Jeremy L. Morris, Software Engineer @DigitalOcean
Oct 13, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: ephemeral.run: A full application environment for every PR–before you merge to master!
Vishal Biyani, CTO @InfraCloud
Jono Spiro, Staff Software Engineer, Engineering Operations @OpenGov
Oct 14, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: GitOps at scale for a multicloud, multi-region stateful application
Rick Spencer, Head of Platform @InfluxData
Oct 14, 2020 1:00 PM Pacific Time
REGISTER NOW »

Member Webinar: S&P experience report: multi-cloud serverless on Knative
Evan Anderson, Software Engineer @VMware
Mark Wang, Head of Cloud Engineering @S&P Global Ratings
Oct 15, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Delivering cloud native apps to Kubernetes using werf
Dmitry Stolyarov CTO, @Flant
Oct 16, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: How to migrate NF or VNF to CNF without vendor lock-in
Grzegorz Sikora, VP Business Development @OVOO
Oct 20, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Deploying Kubernetes to bare metal using cluster API
Seán McCord, Principal Senior Software Engineer @Talos Systems, Inc.
Oct 21, 2020 1:00 PM Pacific Time
REGISTER NOW »

Member Webinar: K8s audit logging deep dive
Randy Abernethy, Managing Partner @RX-M
Oct 22, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Building 12 factor streaming data apps on Kubernetes
Stelios Charmpalis, Frontend Engineer @Lenses.io
Francisco Perez, Senior Backend Engineer @Lenses.io
Oct 23, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Admission controllers: one part of your Kubernetes security and governance toolkit
Gunjan Patelm, Cloud Architect @Palo Alto Networks
Robert Haynes, Cloud Security Evangelist @Palo Alto Networks
Oct 28, 2020 7:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Developer-friendly platforms with Kubernetes and infrastructure as code
Lee Briggs, Staff Software Engineer @Pulumi
Nov 6, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Metal³: Kubernetes-native bare metal host management
Maël Kimmerlin, Senior Software Engineer @Ericsson Software Technology
Dec 10, 2020 10:00 AM Pacific Time
REGISTER NOW »

いかがでしたか？気になる記事や情報はありましたか？

私もまだ内容を咀嚼出来ていないものが多々ありますので、この備忘録兼リンク集を活用しながら理解を深めていきたいと思います。

では、また。

Bye now!!

Yoshiki Fujiwara