SRE / DevOps / Kubernetes Weekly Reportまとめ#78(2021/7/25~7/30) - 運び屋 (A carrier(forwarder) changed his career to an engineer)

The English Version of this blog is here.
この記事は2021/7/25〜7/30発行の下記3つのWeekly Reportを読み、備忘録兼リンク集として残しているものです。
なるべく情報を早く届けたい/共有したいので、ブログのリンクを確認次第、先行公開しています。自身のコメントは随時追加しています。
DEVOPS WEEKLY ISSUE #552 July 25th, 2021
- News
- Tools
SRE Weekly Issue #280 July 25th, 2021
- Articles
- Outages
KubeWeekly #270 July 30th, 2021

The English Version of this blog is here.

この記事は2021/7/25〜7/30発行の下記3つのWeekly Reportを読み、備忘録兼リンク集として残しているものです。

なるべく情報を早く届けたい/共有したいので、ブログのリンクを確認次第、先行公開しています。自身のコメントは随時追加しています。

誰かの情報源や検索工数削減などになれば幸いです。

DEVOPS WEEKLY ISSUE #552 July 25th, 2021

SRE Weekly Issue #280 July 25th, 2021

KubeWeekly #270 July 30th, 2021

この記事を読んで疑問点や不明点があれば、URLから本文をご確認の上、ご指摘頂ければ幸いです。
理解が浅いジャンルも、とにかくコメントする様にしていますので、私の勘違いや説明不足による誤解も多々あろうかと思います。
情報量が多いので文字とリンクだけに絞っております。
各レポートで取り上げられている記事には2020年以前のものもあり、必ずしも最新のものという訳ではない様です。

DEVOPS WEEKLY ISSUE #552 July 25th, 2021

News

The latest State of Devops report is out, with some interesting observations based on this years survey around cultural blockers holding up adoption of mature devops practices.

タイトルは「The 2021 State of DevOps Report is here!」。
「2021 State of DevOps Report」の内容を一部以下のポイントで抜粋して解説しているが、レポートをダウンロードして、自分で読んでみることを強くお勧めしている。
- Cultural blockers are keeping mid-evolution firms stuck in the middle
- Team identities and clear interaction paradigms matter
- DevOps is not automation and DevOps is not the cloud
- Read the report
- Learn more

Discussion of the importance of team structures and communication when it comes to devops transformation. Promises, feedback and customer centricity.

タイトルは「After Team Topologies: Getting Past the DevOps Middle」。
一つ上の記事の「2021 State of DevOps Report」の内容から解説している。冒頭でほとんどの組織がDevOpsの進化に「立ち往生」しているとして、以下3つの理由に触れている。
1. Lack of well-defined team structures, responsibilities, and interactions, especially with respect to internal platform teams
2. Insufficient feedback loops
3. Risk avoidance
Team Topologies戦略の奨励に同意しつつ、「アジャイルと継続的デリバリーは、必ずしも顧客価値のフローではなく、仕事のフローを最適化することがよくある」という疑問が残っているとして、アイデアを出している。

It’s easy to have an SRE team or function that feels siloed from other engineering teams. This post examines why, and what you can do about it.

タイトルは「De-Siloing Incident Management: How to Make Reliability Engineering Everyone’s Job」。
「先週のSRE Weekly Issue #279」で取り上げているので、割愛。

A look at solving the thundering herd problem after clearing a higher level cache.

タイトルは「Solving The Three Stooges Problem」。
Reddit社の検索インフラへのトラフィックが「The Three Stooges」の出入り口のスケッチをどのように彷彿とさせるか、そしてこれらのリクエストパターンを修正するアプローチの概要を説明している。

A look on designing useful altering systems, with a good list of dos and don’ts.

タイトルは「Monitoring Alerts That Don't Suck」。
筆者が過去1年半の間、オンコールを実践するプロジェクトに参加し、アラートの品質をどのように改善を試みたか解説している。

A reasoned argument for not using Kubernetes.

タイトルは「No, we don’t use Kubernetes」。
タイトル通り、なぜKubernetesを利用しないのかを丁寧に解説している。Conclusionの部分だけでも読むと彼らの伝えたいポイントが理解できる。

A look at the Headlamp Kuberneters dashboard and why you may consider using it.

タイトルは「Kubernetes Dashboards: Headlamp」。
Kubernetesダッシュボード「Headlamp」のデザインの良さ、実行環境が選択できる点、筆者の主な不満点である推奨の認証パスなどについて解説している。

If you’re managing the nodes under a Kuberntes cluster you might find some nodes causing problems. This post looks at using the Kubernetes API and extension points to make the cluster self-healing.

タイトルは「Automatic Remediation of Kubernetes Nodes」。
「先週のSRE Weekly Issue #279」で取り上げているので、割愛。

Tools

SchemaHero is a Kubernetes-native implementation of declarative database schema management.

さまざまなデータベースの宣言型スキーマ管理を行うKubernetes Operator「SchemaHero」のWebページ。
GitHubページはこちら。

Ortelius is a platform for managing microservices. It provides a central catalog of services with their deployment specs, application teams can easily consume and deploy services across a cluster.

マイクロサービス管理プラットフォーム「Ortelius」のWebページ。マイクロサービス、それらの消費アプリ、所有権、爆発範囲、およびすべての重要なデプロイメントメタデータとともにデプロイされた場所をバージョン管理および追跡する。
GitHubページはこちら。

N8n is a tool for connecting services together, with a visual editor, command line tools, and commons clause self-hosted version available.

拡張可能なワークフロー自動化ツール「N8n」のWebページ。
GitHubページはこちら。

SRE Weekly Issue #280 July 25th, 2021

Articles

The Harmful Consequences of the Robustness Principle

The Robustness Principle (“be conservative in what you send, and liberal in what you accept”) has its uses, but it may not be best for the development of mature protocols, according to this IETF draft.

Martin Thomson

堅牢性原則(Robustness Principle)について意見を述べているInternet-Draftの枝番号05、初版が00なのでこれは第6版。Internet-Draftは6ヶ月の有効期限が設けられている。

No, we don’t use Kubernetes

Docker without Kubernetes, does it make sense? These folks have a well-reasoned argument explaining why Kubernetes is not for them.

Maik Zumstrull — Ably

上記のDEVOPS WEEKLY ISSUE#552で取り上げているので、割愛。

Personal data breach reporting for service outages (such as when your CDN is down)

Can a service outage unrelated to security count as a “personal data breach” in terms of GDPR and other regulations? If you agree with this article’s logic, then maybe it can.

Neil Brown

英国の規制当局と欧州データ保護委員会の両方からのガイダンスによると、「可用性の喪失」が「個人データ侵害にあたる可能性があること」を示唆しているため、それが妥当か検討している。

When You Do DevSecOps, Don’t Forget the SREs

The interactions between security and reliability incidents can be complex and hard to navigate. The example scenarios in this article really made me think.

Quentin Rousseau — Rootly

DevSecOpsに語る際に、「SREによって実行されるインシデント管理作業において、セキュリティをより集中的に統合することの重要性がほとんど注意を払われていないこと」に警鐘を鳴らし、対処法を解説している。

Solving the Three Stooges Problem

To deal with thundering herds, reddit implements caching in front of each of its microservices.

Raj Shah — reddit

上記のDEVOPS WEEKLY ISSUE#552で取り上げているので、割愛。

What’s allowed to count as a cause?

Incident causes are a social construct, and it may be that your organizational structure prevents something from being counted as a cause.

Lorin Hochstein

「インシデントの原因とラベル付けできるものが組織の文化規範による」という意見が刺さった。インシデントに限らず、何かを提案した際に明確なフィードバックが無いまま受け入れられなかった場合に、その集団にあるこうしたものを感じることがある。

IC1 Reliability Engineer – Dropbox Engineering Career Framework

Check it out, Dropbox publicly released their SRE career ladder.

Dropbox

Dropbox社のSREのレベル(IC1-8)に応じたキャリアの指標が公開されていて、とても参考になる。
ポジションのすぐ下の「何をするロールなのか？」を記載している一文から、主語が「I」で一貫して書かれていて良い。期待値を自覚した上での行動を促せる作りだと思う。

Incidents, Response, and the People With Tim Nicholas

There’s a moment halfway through this episode of Page It to the Limit where they talk about blamelessness. If you just tell people to “do blameless postmortems”, but you don’t tell them how, then they’ll be afraid to talk about anything people did, and that will hamper learning.

Julie Gunderson, with guestTim Nicholas — Page It to the Limit

「インシデント対応と学習」をテーマにしたPodcast。インシデントの捉え方、コミュニケーションパターン、心理的安全性と説明責任などについて語っている。

Migrating Facebook to MySQL 8.0

This was a monumental task, considering the 1000+(!!) internal code patches they had to port from MySQL 5.6 to 8.0.

Herman Lee, Pradeep Nayak — Facebook

上記の通り、Facebook社のMySQL 5.6から8.0への移行の話。コードのパッチの数を見ただけで気が遠くなりそうでした。こうした貴重な知見をタイムリーに出していて素晴らしいと思う。

Outages

Akamai
Akamai had what they’re calling an “Edge DNS Service Incident”. It made headlines this week because it took down many of their customers, similar to the Akamai incident last month.
Let’s Encrypt
Disney park-related apps
Heroku

上記各社の障害情報

KubeWeekly #270 July 30th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

Cloud Native Computing Foundation announces Linkerd graduation

Congratulations to Linkerd for hitting Graduated status! Linkerd was the first project to join the CNCF Sandbox, known as inception at the time, and is now the first service mesh project to achieve graduated status.

Linkerd is a service mesh that provides critical observability, security, and reliability features to cloud native applications without requiring code changes. The project was created in 2016 by Buoyant and joined CNCF in early 2017 as the foundation’s fifth project. It was the first service mesh project and the first CNCF project to adopt the Rust programming language to improve security and performance. Today, organizations like Microsoft, Nordstrom, Expedia, JPMC, Clover Health, Entain, H-E-B, and more rely on Linkerd to power mission-critical production systems.

「Linkerd」がCNCFの成熟度、Graduationを達成したことを伝える記事。
上記の通り、LinkeredはCNCFの「Sandbox」レベルに参加した最初のプロジェクトであり、サービスメッシュとして初のGraduationに達したプロジェクト。
MaintainerによるGraduationに際した記事はこちら。

KubeCon + CloudNativeCon North America Co-located events CFP reminder

Production Identity Day: SPIFFE + SPIRE North America hosted by CNCF

CFP Closes, Sunday, Aug 1 at 11:59 PM PST

どうやら上記のリンクはスキップされた過去のイベントのリンクの模様下記の情報が載っていた。
- This event has passed. View the upcoming KubeCon and other CNCF Events.
- November 17, 2020
正しくはこちら。KubeCon + CloudNativeCon North AmericaのCo-located eventとして、October 11, 2021(現地時間)開催予定。CFPの締め切りは上記の通り。

Cloud Native Wasm Day North America hosted by CNCF

CFP Closes, Monday, Aug 9 at 11:59 PM PST

同じくKubeCon + CloudNativeCon North AmericaのCo-located eventとして、こちらはOctober 12, 2021(現地時間)開催予定。CFPの締め切りは上記の通り。

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

Building the Telegraf Kubernetes Operator Wojciech Kocjan, InfluxData

Telegraf Kubernetes OperatorのMaintainer達が協力して、Kubernetes環境にそれを活用する方法と、サービス処理を改善するためのベストプラクティスを解説している約1時間のセッション。

Visit our Online Programs playlist on YouTube for more content.

Taking baby steps on open-source contribution! (Got my first PR approved 🥺)
Thanks to @dims for introducing me to the community!

Things I am learning:
- @golang
- @kubernetesio
- @Docker

Also, there's this thing I learnt today: how to squash commits.

It was a good day! :)
— Haimantika Mitra (@HaimantikaM) 2021年7月29日

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

6 Ways to leverage Insomnia as a gRPC client

Alvin Lee, Kong

コア技術の概念の説明から入り、Node.jsで楽しくシンプルなgRPCサーバーを構築し、「Insomnia」を使用してサーバーでgRPCリクエストを行う方法を解説している。夏で寝苦しいことがあるので、Insomniaに目がいってしまった。

My Prometheus is overwhelmed! Help!

Ryan Dawson, ThoughtWorks

Prometheusを利用していく中で発生する可能性のあるいくつかのケースと、さまざまなPrometheusを拡張するオプションについて解説している。最後の「You Are Not Alone」の項目を読んで

Verify container image signatures in Kubernetes using Notary or Cosign or both

Christoph Hamsen, SSE Blog

コンテナイメージの署名検証と信頼の固定をKubernetesクラスターに統合するアドミッションコントローラ「Connaisseur」のバージョン2.0リリースの紹介。

Debugging apps in Kubernetes with Bridge

Thorsten Hans, Blog

Bridge to Kubernetes（Bridge）を使用してKubernetesで実行されているアプリをデバッグする方法の解説およびデモ。
Bridge extension from Visual Studio Marketplace、VSCode向けのBridge to Kubernetes extensionがあり、開発者がシームレスにデバッグ、テストができる環境を提供している。

Announcing Deckhouse, the Kubernetes platform from Flant is now generally available

Dmitry Shurupov, Flant

Flant社よりKubernetesプラットフォーム「Deckhouse」のオープンソースリリースの発表。特徴は以下の5つ。
1. Infrastructure Agnostic
2. Providing everything you need to maintain your production cluster
3. Renders K8s usage more straightforward thanks to the NoOps approach
4. Deploys clusters in 8 minutes
5. Offers a 99.95% SLA guarantee
EditionはDeckhouse CE/Deckhouse EEの2つがあり、SLAは「* This applies to the Enterprise Edition only.」の但し書きが上記DeckhouseのWebページにあるのでEEのみの適用。

I'm so grateful for the opportunity to collaborate with amazing people on OSM this summer! I loved learning about both OSM and the open source community, as well as exploring the use cases for the Multicluster functionality. Excited to see this project continue to grow :) https://t.co/0RCo9BvI8H
— Annie Wang (@annieee_wang) 2021年7月29日

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Opstrace, with Sebastien Pahl

Craig Box, Kubernetes Podcast from Google

Google社社員によるKubernetes Podcast。今回のHostはCraig Box氏とGuest HostのJimmy Moore氏。
Opstraceのco-founder/CEOで、Docker社の前身である「Dotcloud」のco-founderであるSebastien Pahl氏をゲストとして迎えている。
News of the weekで気になったトピックは以下の通り。

Cosign 1.0 is now GA

Dan Lorenc, Sigstore

タイトル通り、cosign 1.0がリリースされGAとなったこと、今後について語っている。

GKE best practices: Create a cost-optimized cluster in just a few clicks

Roman Arcea. Google Cloud

GKE Autopilotをまだ使用する準備ができていないユーザー向けのGKEでコスト最適化されたKubernetesアプリケーションを実行するためのベストプラクティスガイド、組み込みのGKEコスト最適化クラスターセットアップガイドを紹介している。

How Riskfuel is using Inlets to build machine learning models at scale

Addison van den Hoeven, Riskfuel

「Inlets」を使用して、完全にリモートのハイブリッドクラウドへのデプロイを安全に監視する方法を紹介している。
Inletsを使用して、クライアントのインフラストラクチャで機械学習モデルをトレーニングし、何百万もの制御メッセージを送信する方法についても触れている。

Announcing Vitess 11

Alkin Tezuysal, Vitess maintainer

タイトル通りVitess 11がリリースされたこと、今回のMajor Themesとして以下に触れている。
- Schema Tracking
- Schema Management
- Performance Optimizations
- VTAdmin
- VReplication
- Benchmarking
詳細は「Release Notes」の参照を案内している。

Take the CNCF microsurvey on Cloud Native Security:

www.surveymonkey.co.uk

Upcoming CNCF Online Programs

Cloud Native Live

August 4 at 9am PT: Humanising your cloud native platform by Lee Briggs, Pulumi - RSVP

On-demand Webinars

August 5: Securing your continuous everything strategy by Abubakar Siddiq Ango, GitLab - RSVP
August 5: Kubernetes clusters need persistent data by James Spurin, StorageOS - RSVP
Looking for more great curated content? Visit our Online Programs playlist on YouTube.

Learn more about CNCF Online Programs

いかがでしたか？気になる記事や情報はありましたか？

私もまだ内容を咀嚼出来ていないものが多々ありますので、この備忘録兼リンク集を活用しながら理解を深めていきたいと思います。

では、また。

Bye now!!

Yoshiki Fujiwara