SRE / DevOps / Kubernetes Weekly Reportまとめ#82(2021/8/22~8/27) - 運び屋 (A carrier(forwarder) changed his career to an engineer)

タイトルは「How to audit and secure an AWS account」。
タイトルの内容を以下の項目で詳細に解説している。
- How to audit an AWS account
  1. Generate and maintain a complete list of assets
  2. Secure IAM
  3. Find public resources
  4. Use AWS Organizations
  5. Ensure audit logs are enabled
  6. Turn on security controls
  7. Build data flow diagrams and network maps if none exist
  8. Pick a standard
  9. Build a risk register to track findings
- How to secure an AWS account
- Improving security of AWS environments

A good post on service level objectives. It starts out with a good introduction, but it’s nice to see concrete examples and discussion of how to implement this in code, in this case with Ruby and Java.

タイトルは「Focusing on What Matters: Using SLOs to Pursue User Happiness」。
SLOを定義するためのいくつかの哲学的アプローチ、それらが優先順位付けにどのように役立つか、このプロセスを少し簡単にするため「Betterment Engineers」が現在利用できるツールの概要を解説している。

If you’ve read much about SRE, you’ll probably have heard of the four golden signs of monitoring. This post provides a quick introduction and suggests some improvements and gaps.

タイトルは「How to Improve Upon Google’s Four Golden Signals of Monitoring」。
タイトルの内容を以下のポイントで解説している。
- The Four Golden Signals, defined
- Benefits of the Four Golden Signals
- Thinking beyond the Four Golden Signals
- Conclusion: Getting more from the Golden Signals

The infrastructure for storage and usage of internal data is an ever-growing part of lots of operations teams responsibilities. This post provides a useful high level view of such a modern data platform.

タイトルは「The Anatomy of an Active Metadata Platform」。
タイトルの内容を以下のポイントで解説している。
1. The metadata lake: A single central store for metadata
2. Programmable-intelligence bots
3. Embedded collaboration plugins
4. Data process automation
5. Reverse metadata Summing up

Tools

Secrets and Kubernetes can be a challenge. This webhook provides one option, injecting secrets into Kubernetes resources from various secrets managers including Vault, AWS, GCP and Azure secrets managers.

KubernetesのアドミッションWebhookであり、シークレットマネージャーからPod、Secret、およびConfigmapにsecretを直接挿入するためのKubernetesリソースに関連するイベントをリッスンする「k8s-vault-webhook」のGitHubページ。

Kubescape is a new security scanning tool for checking the setup of a Kubernetes cluster, based on the recently published NSA and CISA guidance.

Kubernetesが「Kubernetes Hardening Guidance by to NSA and CISA」で定義されているように安全にデプロイされているかどうかをテストするための最初のツール「Kubescape」のGitHubページ。

SRE Weekly Issue #284 August 22nd, 2021

Articles

Alerting on SLOs like Pros

Soundcloud is very clear on the fact that they are not at Google scale. It’s interesting to see how they apply SRE principles at their scale.

Björn “Beorn” Rabenstein — SoundCloud

2019/06/04付けの記事。SoundCloud社でSLO/エラーバジェット/エラーバジェットの消費に基づくアクションの概念を実装し、エンジニアに持続不可能な量のページを要求することなくSLOを実現した方法を解説している。

Distributed Troubleshooting

Here’s why Target set up their ELK stack, and how they used it to troubleshoot a problem in ElasticSearch itself.

Dan Getzke — Target

2017/04/05付の記事。Target社が分散システムでのトラブルシューティングを行うためのプラットフォームを拡張している知見を共有している。

Error Budgets and their Dependencies

A key point in this article is that calculating your error budget as just “100% – SLO” goes about things backward.

Adam Hammond — Squadcast

システムで発生する可能性のある計画的および計画外の停止の責任と、チームがエラーバジェットを効率的に計算する方法を解説している。

Capacity Planning at Scale

They periodically scale up their systems just to test and be sure they’ll be ready for big events like Black Friday / Cyber Monday.

Kathryn Tang — Shopify

キャパシティプランニングへのアプローチとそれを組織全体および数十のチームにデプロイする方法、キャパシティプランニングをスケーラビリティテストで検証してそれらが機能することを確認する方法、を解説している。

How to drive ownership in microservices

In this post, we’ll focus on service ownership. Why is service ownership important? How should teams self-organize to achieve it? Where’s the best place to start?

Cortex

サービスの所有権に焦点を当て上記のEditorが抜粋している3つの質問を投げ掛け、解説している。

One, Two, Skip a Few…

This fun troubleshooting story hinges around the internal details of how PostgreSQL’s sequences work.

Pete Hamilton — incident.io

カスタマーからのインシデントIDの想定外の飛躍の報告から、Postgresの内部動作の調査と対処を行なった事例を解説している。

KubeWeekly #274 August 27th, 2021

The Headlines

Editor’s pick of the highlights from the past week.

KubeCon + CloudNativeCon North America co-located event schedules are live!

We’re excited to announce that the schedules* for all CNCF-hosted co-located events are now available! Co-located events, including those that are sponsor-hosted, will take place on October 11-12 both in-person and virtually. Be sure to add to your registration today! Please note that an additional fee is required to attend.
*The schedule for the Kubernetes Contributor Summit will be announced on October 5.

上記の通り、KubeCon + CloudNativeCon North Americaのco-located eventのスケジュールが公開された。自身がどれに参加するかは検討中。

ICYMI: CNCF online programs this week

A weekly summary of CNCF online programs from this week.

Easy, secure Kubernetes authentication with pinniped

Matt Moyer & Margo Crawford, VMware

「Pinniped」の2人のmaintainerであるMatt Moyer氏とMargo Crawford氏が、Pinnipedをインストールして使用し、Kubernetesクラスターを一般的なエンタープライズSSOソリューションにリンクする方法と、クラスターユーザーに簡単なログインフローを提供する方法を紹介している約35分間のセッション。

Gear up for performance - Leveraging eBPF on Openshift with Project Calico

Chris Tomkins, Tigera

eBPFデータプレーンのOpenShift活用方法を以下のポイントで紹介している約1時間のセッション。
- How to leverage Calico eBPF on OpenShift
- How eBPF brings improved performance
- How eBPF brings improved service handling
- Best practices for an eBPF implementation in OpenShift

What is the cost of a secret?

Presented by Steve Giguere, Bridgecrew by Prisma Cloud and sponsored by Palo Alto Networks

誤ってSecretを公開してしまう落とし穴と、将来の公開を回避するためのベストプラクティスを解説している約22分間のセッション。

Next generation observability using open source monitoring

Presented by Karl Gouverneur, Northwestern Mutual and sponsored by OpsCruise

タイトルに沿って以下3点を解説している約57分間のセッション。
1. Get deep insights into your application from open-source CNCF monitoring
2. Leverage real-time analytics for proactively detecting, isolating and resolving problems, and
3. Learn how Ops teams can stay on top of your modern applications and infrastructure

It's official! I can now share more about the project @mrsabath and I have been working on with @SPIFFEio.

We've donated "Tornjak" (https://t.co/2R0Z1iSTUr) to @CloudNativeFdn - a control plane/UI for managing SPIRE workload identities and servers. https://t.co/6w0pBMrtUo
— Brandon Lum (@lumjjb) 2021年8月26日

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

How to secure containers with Cosign and Distroless images

Jeswin K. Ninan, InfraCloud

タイトル通り、アプリケーションコンテナをより安全にデプロイして本番環境で実行するのに役立つ、CosignおよびDistrolessコンテナイメージを紹介している。

Writing a Kubernetes validating webhook using Python

Kristijan Mitevski, Mitevski blog

タイトル通り、筆者が自分で検証用Webhookを試しにPythonで書いた話。

How to leverage Insomnia as a GraphQL client

Garen Torikian, Kong

タイトル通り「Insomnia」をGraphQLクライアントとして使用する際のハイライトのいくつかを紹介している。Webページに約11分間のデモ動画が埋め込まれている。

Serverless storage for your functions from the Datastax Astra DB

Alex Ellis, OpenFaaS

Datastax社の「AstraDB」がサーバーレス機能に便利な従量課金(Pay As You Go)ストレージを提供する方法を解説している。

kubescape is the first tool for testing if Kubernetes is deployed securely as defined in Kubernetes Hardening Guidance by to NSA and CISA

上記のDEVOPS WEEKLY ISSUE #556で取り上げているので割愛。

Thanks for all your hard work, @mhickeybot 👏 https://t.co/29NLWiS14W
— Helm (@HelmPack) 2021年8月24日

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

KEDA, with Tom Kerkhove

Craig Box, Kubernetes Podcast from Google

Google社社員によるKubernetes Podcast。今回のHostはCraig Box氏とGuest HostのJimmy Moore氏。
KEDA(Kubernetes Event-driven Autoscaling)がCNCFのMaturity LevelでIncubationに達したことを祝い、同プロジェクトのmaintainerであるCodit社のTom Kerkhove氏をゲストとして迎えている。
News of the weekで気になったトピックは以下の通り。
- Gloo Mesh 1.1
- Cron jobs and timezones in Kubernetes

Maintaining Envoy proxy with Snow Petterson

Curiefense podcast

Lyft社のResilience teamチームのメンバーで、Envoy ProxyのSenior Maintainer であるSnow Pettersen氏をゲストに迎えた約34分間のPodcastのエピソードとスクリプト。同氏がSquare, Netflix, Lyftの各社でクラウドネイティブ技術を扱ってきた経験に触れ、この数年の変化から話を始めている。