SRE / DevOps / Kubernetes Weekly Reportまとめ#51(2021/1/17~1/22) - 運び屋 (A carrier(forwarder) changed his career to an engineer)

この記事は2021/1/17~2021/1/22発行の下記3つのWeekly Reportを読み、備忘録兼リンク集として残しているものです。
なるべく情報を早く届けたい/共有したいので、ブログのリンクを確認次第、先行公開しています。自身のコメントは随時追加しています。
English Version of this blog is here.
DEVOPS WEEKLY ISSUE #525 January 17th, 2021
- News
- Tools
  - driftctl tracks how well your Terraform/AWS codebase covers your cloud configuration and warns you about drift.
  - Please is a cross-language build system with an emphasis on high performance, extensibility and reproducibility. It supports a number of popular languages and can automate nearly any aspect of your build process.
SRE Weekly Issue #253 January 17th, 2021
- Articles
- Outages
KubeWeekly #247 January 22nd, 2021 ←Webページはまだアップロードされていない模様(2020/01/23 15時時点)

この記事は2021/1/17~2021/1/22発行の下記3つのWeekly Reportを読み、備忘録兼リンク集として残しているものです。

なるべく情報を早く届けたい/共有したいので、ブログのリンクを確認次第、先行公開しています。自身のコメントは随時追加しています。

誰かの情報源や検索工数削減などになれば幸いです。
English Version of this blog is here.

DEVOPS WEEKLY ISSUE #525 January 17th, 2021

SRE Weekly Issue #253 January 17th, 2021

KubeWeekly #247 January 22nd, 2021 ←Webページはまだアップロードされていない模様(2020/01/23 15時時点)

この記事を読んで疑問点や不明点があれば、URLから本文をご確認の上、ご指摘頂ければ幸いです。
理解が浅いジャンルも、とにかくコメントする様にしていますので、私の勘違いや説明不足による誤解も多々あろうかと思います。
情報量が多いので文字とリンクだけに絞っております。
各レポートで取り上げられている記事には2019年以前のものもあり、必ずしも最新のものという訳ではない様です。

DEVOPS WEEKLY ISSUE #525 January 17th, 2021

News

A good argument for service mesh disappearing out of sight, making the point that service mesh is the dynamic linker for cloud based environments.

タイトルは「Why The Service Mesh Should Fade Out Of Sight」。
本文を読む前はタイトルの意図を誤解してサービスメッシュの存在価値に疑問を呈しているものかと思ったが、違った。現在のサービスメッシュの在り方に疑問を呈していた。タイトルと合わせて以下の筆者の要約部分を見ると分かりやすい。
- In sum, the service mesh should be a platform feature, not a product category — as far out of sight and mind from the DevOps team as possible.

A good checklist of things to do to protect your GitHub projects. Supply chain attacks are increasingly in the news.

タイトルは「Securing Your GitHub Project」。
オープンソースのセキュリティーについて思考と会話を最近繰り返している筆者は、会話が多ければ多いほど、これは非常に複雑なトピックであると確信するようになり、以下の考えに至った。
- Making good security practices the path of least resistance is a solid way to raise the bar in this space.
オープンソースプロジェクトを保護する方法がわからない状態からスタートできるように、以下の15項目のチェックリストをまとめている。
1. Use a credential manager to protect your access credentials
2. Configure two-factor authentication (2FA)
3. Enforce signed commits
4. Protect the release branch
5. Require pull request reviews and approvals
6. Scan source code for sensitive data leaks
7. Scrub leaked secrets from git history
8. Only use trusted GitHub Actions
9. Protect the secrets used by GitHub Actions
10. Review project dependencies for vulnerabilities
11. Patch dependencies with vulnerabilities
12. Scan project source code for vulnerabilities
13. Publish a security policy
14. Collaborate on fixes for security vulnerabilities in private forks
15. Publish maintainer advisories for security fixes

A set of posts on best practices for creating container images for your .NET applications, including configuration and connecting to a database.

Editorが記載している通り、2個でセットの記事。上記リンクのタイトルは「A container journey: .NET 5 web app dockerization」。
2つ目の記事のタイトルが「The journey continues: Containerized .NET5 web app on Docker connects to database-container」。

A few posts on less-well-known capabilities of the Kubernetes role-based-access system, looking closely at bind and escalate.

Editorが同じ筆者からRBACの「Bind」と「Escalate」をテーマにしている2つの記事ピックアップしている。上記リンクのタイトルは「Escalating Away」。
2つ目の記事のタイトルが「Getting into a bind with Kubernetes」

An interesting walkthrough of the test suite of a reasonably complex project, discussing tradeoffs, configuration and the importance of optimising CI.

タイトルは「Improving Testing & Continuous Integration in Phoenix」。
「Phoenix」プロジェクトのテストとCIにどのようにアプローチし、最近の変更によってこのプロセスが非常にスムーズになったことを解説している。

Most internal development teams have documentation for new starters to get set up with all of the needed software. It’s an interesting insight into a team’s stack. But it’s interesting to see this set of documentation posted publicly for others to explore.

タイトルは「Deploying Software at GoCardless: Open-Sourcing our “Getting Started” Tutorial」。
Editorが記載している通り、社内の新規参入者向けのドキュメント、フレームワークである「Utopia」がGitHub上に公開されていたことを紹介している。
GitHubページはこちら

A good post for anyone needing to learn Gradle, or interested in building understandable software.

タイトルは「The Problem with Gradle」。
Gradleを学ぶ際のフラストレーションを少なくするべく以下の問題点を中心に共有している。
1. You’re not Configuring, You’re Programming
2. Groovy is Not Java
3. Gradle Uses a Domain-Specific Language
4. There are Many Ways to do the Same Thing
5. Magic

A comprehensive guide to vertical pod autoscaling in Kubernetes.

タイトルは「VERTICAL POD AUTOSCALING: THE DEFINITIVE GUIDE」。
先週のKubeWeekly #246で取り上げているため、割愛。

A big list of patterns for working with environment variables on the shell.

タイトルは「How to Set Environment Variables in Linux and Mac: The Missing Manual」。
内容はタイトルと上記のEditorの解説の通り。手を動かして確認したいのでブックマーク。
記事の最後に次のレベルとして学習を進める以下のリソースを紹介している。
- Bite Size Bash by Julia Evans (not free but totally worth it)
- Shellcheck static analysis tool
- Google shell style guide

Tools

driftctl tracks how well your Terraform/AWS codebase covers your cloud configuration and warns you about drift.

Editorが上記に記載している通り、IaCコードベースがクラウド構成をどの程度カバーしているかを追跡し、ドリフトを警告するOSSツール「driftctl」のGitHubページ。
Featuresは以下の通り。
- Scan cloud provider and map resources with IaC code
- Analyze diff, and warn about drift and unwanted unmanaged resources
- Allow users to ignore resources
- Multiple output formats
Webページはこちら。

Please is a cross-language build system with an emphasis on high performance, extensibility and reproducibility. It supports a number of popular languages and can automate nearly any aspect of your build process.

Editorが上記に記載している通り、高性能、拡張性、正確性に重点を置いたクロスランゲージビルドシステム「Please」のWebページ。
GitHubページはこちら。

SRE Weekly Issue #253 January 17th, 2021

Articles

May 30 SSL incident

TLS can be such a headache.

This was an interesting situation. There was a valid path to the USERTrust RSA Certification Authority, and there was also an expired path. The browser was able to find the valid chain, but the curl was not able to find it.

Adam Surak — Algolia

2020/06/02付けの2020/05/30のSSL関連の障害に関するふりかえりの記事。
上記のEditorが取り上げている部分が一見では不可解で面白かった。

Shifting Modes: Creating a Program to Support Sustained Resilience

A well-researched article on shifting emphasis from incident prevention to learning and resilience.

Incidents cannot be prevented, because incidents are the inevitable result of success.

Alex Elman

レジリエンスが研究対象者や組織にとって何を意味するのかを探るため、1.レジリエンスに関する文献を参考にする、2.業界全体のエンジニアやエンジニアリングマネージャーが独自の組織のケーススタディーを使用する、以上2つを通して研究対象者が学んだ教訓を共有している。
上記のEditorが取り上げている以外の「Key Takeaways」は以下。
- Organizations must shift from a “prevent and fix” safety mode to a “learn and adapt” (Learn & Adapt) safety mode to manage reliability and resilience. This shift helps to more effectively cope with increasing complexity and scale.
- Finding advocates to help socialize the movement and communicating broadly are key aspects of creating a sustained shift to a Learn & Adapt mode.
- Normalizing behaviors — such as stating assumptions, asking more questions, increasing cooperation between diverse roles, and broadly sharing incident write-ups across the organization — help with the mode shift by increasing the flow of information.
- Developing the cultural traits of opportunity creation, flexibility, agility, and trust are necessary for an organization poised to shift to Learn & Adapt.

Error budgets and the legacy of Herbert Heinrich

This one’s worth reading through twice to let it sink in. It puts me in mind of this article by WIll Gallego, which is another thoughtful critique of error budgets.

Here are the claims I’m going to make:
1. Large incidents are much more costly to organizations than small ones, so we should work to reduce the risk of large incidents.
2. Error budgets don’t help reduce risk of large incidents.

Lorin Hochstein

ソフトウェア開発者が直面する疑問を投げ掛け、この疑問にアプローチする手法としてSREの世界の「エラーバジェット」を使った場合の問題点を解説している。
- How can we best use our knowledge about the past behavior of our system to figure out where we should be investing our time?
筆者はエラーバジェットのような定量的な、メトリックベースのアプローチに懐疑的で、エンジニアの実験的判断を活用する定性的なアプローチを好んでいる。

97 things every SRE should know – Part 01

This is a review of a few of the chapters of the book of the same title by Emil Stolarsky and Jaime Woo.

Have you read it too? I’d love to read your take on it!

Dean Wilson

オライリー本の「97 Things Every SRE Should Know」を読んだ筆者が、将来の自身のためにいくつかの読書ノートとして公開。 Chapterごとに分けて記載している。

Understanding Incidents: Three Analytical Traps

This one’s worth reading the next time need to do an incident retrospective. The traps are:

1. Counterfactual reasoning
2. Normative language
3. Mechanistic reasoning

John Allspaw — Adaptive Capacity Labs

インシデントアナリストと事故調査員が陥る可能性のある3つの一般的な分析トラップについて説明する約7分のビデオを文字起こしして、筆者で強調するしたい部分を赤色と太字で表現している。
1. Counterfactual reasoning
2. Normative language
3. Mechanistic reasoning

This Is the Most Underappreciated Skill for SREs

The skill in question is glue work, and I sure appreciate a good gluer when I see one.

Emily Arnott — Blameless

タイトルに沿って、SREが実行する以下のglue work(コードベースに貢献していなくても、プロジェクトの成功に不可欠なタスク)の例を取り上げている。
- SREs align stakeholders’ goals with common language
- SREs bring people together in inspiring ways
- SREs grow an empathetic, trusting culture

Building and Scaling Your SRE Team

This one starts out by defining SRE, then goes into how to define your team and fill it with people.

Julie Gunderson — PagerDuty

Gremlin社のPrincipal SREであるTammy Bryant氏が「Page it to the Limit podcast」で共有した、いくつかのベストプラクティスに基づいてSREの役割を定義するだけでなく、SREチームを構築および拡張するための実用的な方法を以下の項目で詳しく解説している。
- What is an SRE?
- SRE Skills & Responsibilities
- Establishing an SRE Team
- Scaling Your SRE Team

Outages

Fastly
Fastly is my employer.
Slack
Tyro Payments
Signal
.ke TLD (Kenya)
Microsoft Teams, Office 365 and OneDrive
Instagram

上記各社の障害情報

KubeWeekly #247 January 22nd, 2021 ←Webページはまだアップロードされていない模様(2020/01/23 15時時点)

The Headlines

Editor’s pick of the highlights from the past week.

The First Six Months: CNCF Observations and 2021 Vision

Priyanka Sharma, CNCF

Priyanka Sharma reflects on her role as the General Manager of CNCF (over the last six months) and shares her philosophy for enabling #teamCloudNative going into 2021.

上記の通り、CNCFのGM(General Manager)であるPriyanka Sharma氏のGM就任後の6ヶ月と、2021年のビジョンを彼女の哲学を通して語っている。KubeCon EUの彼女のプレゼンでも印象に残った下記のフレーズなど。#teamCloudNativeとして前に進む意欲を感じる文章だった。
- CNCF is a foundation of doers

CNCF and the Linux Foundation, with Chris Aniszcyzk

Adam Glick and Craig Box, Kubernetes Podcast from Google

With his unique vantage point of cloud native trends, Chris Aniszczyk shares his technology journey and his predictions for 2021.

Google社社員によるKubernetes Podcast。現在のCo-hostはCraig Box氏とAdam Glick氏。
Linux FoundationのVP of DevRel、CNCFのCTO、Open Container InitiativeのExecutive DirectorであるChris Aniszcyzk氏をゲストとして迎えている。
News of the weekで気になったトピックは以下の通り。

The Cloud Native Network Function (CNF) Working Group was launched by @CloudNativeFdn to help the telco industry define what #cloudnative means for them.

Your participation is welcomed!

📅 Mondays at 16:00 UTC
📝 Meeting details at https://t.co/aYWQ8yOAOT https://t.co/mZjQYMMnx2
— CNF Conformance (@cnfconformance) January 21, 2021

kubernetes.us10.list-manage.com

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

Hoot: Advanced Istio Configuration with Envoy CRDs

Scott Weiss, Solo.io

Solo.io社のArchitect Scott Weiss氏によるタイトルにある内容をEnvoyFilterを中心に解説している26分程のWebinar動画。

Implement Policy-based Governance Using Configuration Management of Red Hat Advanced Cluster Management for Kubernetes

Jaya Ramanathan and Christian Stark, Red Hat

セキュリティーと法令遵守、レジリエンシー、およびソフトウェアエンジニアリングの側面に関する宣言型ポリシーを作成するためのベストプラクティスの概要を解説している。これらはすべて、プログラミング無しで実施可能。
Red Hat Advanced ClusterManagementの組み込み構成ポリシーコントローラーを使用して、以下のアクションを完了する方法を解説している。
- Use best practices to configure Kubernetes resources used to ensure various security aspects such as access control and encryption.
- Deploy operators, check if they are operating and are configured properly, as well as receive status results from the operators.

Kubernetes Readiness Probes - Examples & Common Pitfalls

Levent Ogut

タイトルに沿って、Readiness Probeの効果を確認し、構成できるパラメーターなどを解説している。

Kubernetes Cost Reporting using Kubecost

Aman Juneja, Infracloud Technologies

マルチテナントEKSクラスターにKubecostを使用して、可視性(visibility)を向上させる方法を詳しく解説している。
Conclusionとして以下のコメントを残しており、良さそう。
- Kubecost covered almost all our requirements but it comes with a slight operational overhead to set it up properly compared to many other paid solutions in the market. But I feel the value it provides is way more than efforts to configure it correctly.
- Kubecost support is also very prompt and the team is always up for help. If you are looking for any open source tool to get your Kubernetes cluster cost insights coupled with your cloud provider’s costing details then Kubecost is worth trying.

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Cilium, with Thomas Graf

Adam Glick and Craig Box, Kubernetes Podcast from Google

今種はThe Headlinesで取り上げているものと合わせてKubernetes Podcastを2エピソード取り上げている。
Ciliumのinventorであり、Isovalent社のco-founderであるThomas Graf氏をゲストとして迎えている。
News of the weekで気になったトピックは以下の通り。

GitOps-based Policy Management: How to Scale in a Multi-Node, Multicloud World

Anita Buehrle, WeaveWorks

マルチクラスター環境で直面する一般的な課題と、GitOpsと効果的なポリシー管理によって大規模なKubernetesのデプロイをどこでも簡単にする方法を、以下の項目で解説している。
- Diverse cluster stacks add complexity
- Manage cluster configuration definitions with GitOps
- What does Git-based policy look like?
- Self-service Kubernetes with guardrails
- Managing fleets of clusters
- Achieving consistency between local environments and the cloud
- Streamlining access control across the organization
- Conclusion

Cloud DevOps With OpenShift and JFrog

Alex Handy, Red Hat and Jeff Fry, JFrog

開発者向けにOpenShift、OpenShift Pipelines、およびJFrog Platformを使用してソフトウェアをどのように提供するかを解説している。

# 61 - Containers and Security with Liz Rice (in French)

Electro Monkeys Podcast

「Container Security」をテーマにLiz Rice氏をゲストに迎えてフランス語で話されている。頑張って聞こうと思ったが、無理だった(それはそう)。。。フランス語勉強したい。

Savithru Lokanath, Salesforce Engineering

Salesforce社でKubernetesの拡張性を利用して、タイトルにもある以下のユースケースへの対応をカスタムコントローラー「Agumbe」を利用してどう達成したかを解説している。AgumbeはインドのKarnataka州にある小さな海岸沿いの町にちなんで名付けられた。
- At Salesforce, we use Kubernetes to orchestrate our services layer and recently ran into a use case where we wanted to apply and manage certain common objects across Kubernetes namespaces.

The LFX Mentorship program for Spring 2021 is OPEN NOW for project applications!

Project maintainers and potential mentors are welcome to submit their ideas via GitHub PR by Jan 31st 📅

Learn more: https://t.co/D3eSAzI0Md

Submit: https://t.co/4tGHagwHS0
— CNCF (@CloudNativeFdn) January 22, 2021

kubernetes.us10.list-manage.com

Upcoming CNCF Online Programs

CNCF Project Webinar: Kubernetes 1.20
Jeremy Rickard, Software Engineer @VMware
Kirsten Garrison, Software Engineer @Red Hat
January 27, 2021 at 11:00 am PT
Register Now

For more information, please visit our updated Online Programs page.

上記のリンク先から、「今後のイベント」「過去のイベント」「オーガナイザー」の項目ができていて、今後のイベントなどが入ってきていた。　
CNCF Online ProgramsにGroupができたので、8人目のメンバーとして登録しておきました。

いかがでしたか？気になる記事や情報はありましたか？

私もまだ内容を咀嚼出来ていないものが多々ありますので、この備忘録兼リンク集を活用しながら理解を深めていきたいと思います。

では、また。

Bye now!!

Yoshiki Fujiwara