運び屋 (A carrier(forwarder) changed his career to an engineer)

Network / Cloud Native / Kubernetes / コンテナー / SRE / DevOps

SRE / DevOps / Kubernetes Weekly Reportまとめ#28(8/9~8/14)

この記事は2020/8/9~8/14に発行された下記3つのWeekly Reportを読み、備忘録兼リンク集として残したものです。

  • 誰かの情報源や検索工数削減などになれば幸いです。
DEVOPS WEEKLY ISSUE #502 August 9th, 2020
SRE Weekly Issue #231 August 10th, 2020
KubeWeekly #229 August 14th

English Version of this blow is here.

  • この記事を読んで疑問点や不明点があれば、URLから本文をご確認の上、ご指摘頂ければ幸いです。
  • 理解が浅いジャンルも、とにかくコメントする様にしていますので、私の勘違いや説明不足による誤解も多々あろうかと思います。
  • 情報量が多いので文字とリンクだけに絞っております。
  • 各レポートで取り上げられている記事には2019年以前のものもあり、必ずしも最新のものという訳ではない様です。

DEVOPS WEEKLY ISSUE #502 August 9th, 2020

News

A list of anti-patterns for transformation projects in large organisations. Good advice on choosing technology, technical management, roadmaps and more.
  • タイトルは「Ten ‘antipatterns’ that are derailing technology transformations」。
  • 筆者達が見てきた50以上の主要な組織で観測された以下「10のアンチパターン」をデジタルトランスフォーメーションを阻むものとして解説している。
    1. Force-fitting technology solutions: Are you choosing technology out of context?
    2. Adopting cutting-edge tech that’s not fully mature: Are you adopting new technology that seems promising but doesn’t have a proven track record?
    3. Building out your own cloud infrastructure without sufficient capabilities: Have you let security and regulation block your adoption of public cloud?
    4. Initiating big-system-replacement programs: Are you focusing on system replacement rather than improving existing systems in a way that is faster and more cost-effective?
    5. Focusing on architecture and tooling improvements without enhancing process and delivery discipline: Did you re-architect and implement new tooling but forget to adapt the delivery processes?
    6. Focusing on outputs rather than business outcomes: Are your technologists focused on output instead of business/technology outcome?
    7. Managing IT purely for cost: Are you sacrificing significant value by overindexing on price and cost?
    8. Investing in developing new platforms without involving the business: Is your primary focus platform development instead of platform adoption by the business?
    9. Outsourcing your core value streams: Are vendors doing the work that creates the most value for your business?
    10. Building up an army of managers rather than developing an engineering culture: Do you value your managers more than your engineers?
  • タイトルは「About the Quay.io Outage: Post Mortem」。
  • Quay.ioの2度(5/19早朝(Eastern Daylight Time (EDT)、5/28の昼前(EDT))の障害のポストモーテム。
A great post on dashboard design, with lots of reasoning, hints, tips and examples.
  • タイトルは「Building dashboards for operational visibility」。
  • AWSの「Amazon Builders' Library」→「SOFTWARE DELIVERY AND OPERATIONS | LEVEL 300」の記事で、以下の項目に沿って、AWSにおけるダッシュボードを解説している記事。
    • Dashboarding at Amazon
    • Types of dashboards
    • High-level dashboards
    • Low-level dashboards
    • Dashboard design
    • Dashboard maintenance
    • Conclusion
A look at several tools that are useful to validating and testing Kubernetes configuration files. Useful comparison table and examples of each of the different tools.
  • タイトルは「Validating Kubernetes YAML for best practice and policies」。
  • TL;DR:
    • この記事では、Kubernetes YAMLファイルを検証、スコア付けするための6つの静的ツールを比較して、ベストプラクティスとコンプライアンスを確認している。
      1. Kubeval
      2. Kube-score
      3. Config-lint
      4. Copper
      5. Conftest
      6. Polaris
  • KubernetesのYAMLファイルを静的にチェックするエコシステムは下記3つのカテゴリーにグループ分けできる
    1. API validators — Tools in this category validate a given YAML manifest against the Kubernetes API server.
    2. Built-in checkers — Tools in this category bundle opinionated checks for security, best practices, etc.
    3. Custom validators — Tools in this category allow writing custom checks in several languages such as Rego and Javascript.
A good discussion of all things Service Mesh and the SMI specification.
  • タイトルは「Service Mesh With Michelle Noorali and Delyan Raychev」。
  • サービスメッシュ、SMI(Service Mesh Interface)Open Service MeshについてMS社所属のOSS/CNCFに貢献しているメンバーが解説している。
  • Kubernetesとの関わりなども解説されていて、KubeCon + CloudNativeCon EUに向けて良い準備になる。
A post on using Conftest and Regula to help write secure Terraform code and test as part of a CI process.
  • タイトルは「Securing Your Terraform Pipelines with Conftest, Regula, and OPA」。
  • セキュリティーおよび運用チームが「policy-as-code」として要件をコード化するツール(Terraform/OPA/Conftest/Regula)を解説している。

Tools

Open Service Mesh is a new lightweight, extensible, service mesh for dynamic microservice environments. It provides out-of-the-box observability features and uses SMI for configuration.
  • 新たな軽量で拡張可能なサービスメッシュのOSSツール「Open Service Mesh (OSM)」のioページ。GitHubページはこちら
Sysbox is a new container runtime that makes it easier to run low-level software, like Systemd, Docker, and Kubernetes, in containers. You can run it with Docker too due to the pluggable runtimes feature.
  • 新しいOSSのコンテナランタイム「Sysbox」のGitHubページ。
We’re starting to see application frameworks and developer tools provide high-level abstractions for running on platforms like Kubernetes. Tye is an interesting .NET tool that eases running .NET applications on cloud native platforms.
  • マイクロサービスと分散アプリの開発、テスト、デプロイを楽にする開発者用ツール「Tye」のGitHubページ。
Turandot allows for using TOSCA with Kubernetes. TOSCA provides a high-level service description aimed at portability and interoperability between underlying infrastructure.
  • Kubernetesワークロードのオーケストレート、構成をTOSCA(Topology and Orchestration Specification for Cloud Applications)を使用して実行するツール「Turandot」のWebページ。
  • FAQで以下の質問に回答している
    • Is this a lifecycle manager (LCM) for Kubernetes workloads?
    • Why doesn’t Turandot include a workflow engine?
    • Why is there a built-in inventory? Shouldn’t the inventory be managed externally?
    • Why use TOSCA and CSARs instead of packaged Helm charts?
    • Why is it called “Turandot”?
Copper is a configuration file validator for Kubernetes. It supports writing bespoke tests using a built-in Javascript DSL.
  • Kubernetesなどのコンフィグファイルのバリデーションを実行するシンプルなOSSツール「Copper」のGitHubページ。

SRE Weekly Issue #231 August 10th, 2020

Articles

Improving Postmortems from Chores to Masterclass with Paul Osman

The lead SRE at Under Armour(!) has a ton of interesting things to share about how they do SRE. I love their approach to incident retrospectives that starts with 1:1 interviews with those involved.

Paul Osman — Under Armour (Blameless Summit)

  • 2019 Blameless Summitでの「ポストモーテム、またはインシデントのふりかえり」をテーマにしたプレゼン動画の書き起こし記事。動画が埋め込まれている。
  • ポストモーテムの捉え方を変えている。他にも根本原因の考え方も参考になった。
    • It's coming to the conclusion that our goal in doing these postmortems is not actually to understand what happened. It's not to understand a clear causal chain of events that led to an incident. It's actually to understand the context that people were operating within when responding to an incident that either helped or hindered their ability to make decisions.
About the Quay.io Outage: Post Mortem

A routine infrastructure maintenance had unintended consequences, saturating MySQL with excessive connections.

Daniel Messer — RedHat

  • 上記のDEVOPS WEEKLY ISSUE #502で取り上げているので、割愛します。
The 2020 Midland County Dam Failure

This report details the complex factors that contributed to the failure of a dam in Michigan in May of this year.

Jason Hayes — Mackinac Center for Public Policy

  • 2,500以上の家屋、建物がダメージを受けたミシガン中部の洪水(2020/05/19)に関するレポート。
Heroku Incident #2090 Follow-up

This incident involved a DNS failure in Heroku’s infrastrucure provider (presumably AWS).

Heroku

  • Heroku社で 2020/07/28 08:22 UTC ~ 10:28 UTC に発生していた障害のフォローアップ情報。
  • APIおよびその他のツールからの断続的なエラー、および米国のデータサービスへの接続に問題があった可能性、などのユーザー影響が発生していた。
Theory vs. Practice: Learnings from a recent Hadoop incident

This incident at LinkedIn impacted multiple internal customers with varying requirements for durability and latency, making recovery complex.

Sandhya Ramu and Vasanth Rajamani — LinkedIn

  • Linkedin社が最近のHadoop障害において、DR戦略の実践がクラウド環境下の理論と、どう対抗したのかを比較している記事。
GitHub Availability Report: July 2020

This report includes a description of an incident involving Kubernetes pods and an impaired DNS service.

Keith Ballinger — GitHub

  • 以前このブログでも取り上げたGitHubが始めた毎月最初の水曜日にだすAvailability Reportの8/5(水)付、7月分のレポート。
  • 7/13 08:18 UTCに発生し、4時間25分継続した障害を取り上げている。
  • 原因は以下3つ。
    1. KubernetesのPodで定義されていたメモリー上限を超えたため終了した
    2. ImagePullPolicyがAlwaysになっていたため、Podの立ち上げには常にレジストリからイメージを取りに行く
    3. 障害前に行われていたDNSのメンテナンスによりレジストリからイメージを取ってこれなかった
Incident Report: Investigating an Incident That’s Already Resolved

In this report, Honeycomb describes how they investigated an incident from the prior week that their monitoring had missed.

Martin Holman — Honeycomb

  • 障害発生から1週間後に、ユーザー問い合わせで障害に気づいて調査した話。
  • ポストモーテムの「where we got lucky」の例を、SREの第一人者(Liz Fong-Jones氏)のコメントから見れて個人的に良かったです。

Outages

上記各社の障害情報

KubeWeekly #229 August 14th

The Headlines

Editor’s pick of the highlights from the past week.

Back by Popular Demand – Free Keynote and Expo Hall Pass for KubeCon + CloudNativeCon EU 2020 Virtual!

KubeCon + CloudNativeCon EU 2020 Virtual is happening next week! Are you new to cloud native or CNCF? First time attending? We have you covered! Our free Keynote + Expo Hall Only Pass brings the key pieces of the conference together, including access to:

All Keynote Sessions available with closed captioning in 18 languages
Virtual Expo Hall where you can visit our sponsors to try the latest demos, talk to experts, and score some swag.
Sponsor Demo Theater showcasing community leaders as they demonstrate how they are adopting Kubernetes and other open source technologies
Project Pavilion where you can engage with Project Maintainers + Leads
Looking for the immersive experience including keynote and breakout sessions, sponsor showcase, and conference activities (co-located events not included)? The Full Access Pass is for you. Register today!

  • 週明けに開催されるKubeCon + CloudNativeCon EU 2020 Virtualのパス、参加可能セッション、登録の案内。

Special Offer – Save on LF Training with KubeCon + CloudNativeCon EU 2020 Virtual

When you register for the Full Event Pass AND attend KubeCon + CloudNativeCon Europe Virtual, you are eligible to receive a training discount. The offer includes:

50% off CKA exam OR CKAD exam
30% off other courses or exams from LF Training!*
Be sure to reserve your spot today and save on an upcoming training session. It’s a win-win!

Details on how to access and download the coupon will be provided to registered attendees in the pre-event attendee email (coming the week of August 10). The coupon is only available for attendees of KubeCon + CloudNativeCon Europe Virtual and will expire on 23:59 UTC on August 20, 2020. Cannot be combined with any other discount or promotion. Only valid for net new training purchases, cannot be applied to previously purchased exam or bundle.

50% off CKA exam OR CKAD exam
Visit the Linux Foundation training websites below to purchase your CKA or CKAD exam for $150 (typically $300). You may choose between the exam-only option listed at the top or a course+exam bundle listed in the Combine & Save section.

30% off other courses or exams from LF Training!* Visit the Linux Foundation training website and view the course catalog. This voucher may be applied towards any course or exam available for purchase in the catalog. Just add your coupon code during checkout to see your total discount.

  • KubeCon + CloudNativeCon EU 2020 VirtualにFull Event Passで登録し、参加した場合に得られるトレーニング/資格試験の割引案内。
KubeCon + CloudNativeCon EU Virtual Session Spotlight

The countdown to KubeCon + CloudNativeCon EU Virtual on August 17-20, 2020 is on! As we approach the event, we curated a few recommended sessions that we don’t want you to miss. Please see the feature for this week and be sure to register today!

Don’t miss our co-located eventsAugust 17 (additional registration required)!
Jump start your education or get that topic deep dive by attending a co-located event! There are many options, so you are sure to find that extra something you’ve been looking for.

AWS Container Day 2020 hosted by AWS
Building a DevOps Pipeline with Kubernetes and Apache Cassandra™ hosted by DataStax
Cloud Native ROOST Hack-a-thon presented by Zettabytes
Cloud Native Security Day hosted by CNCF
KubeAcademy: Introduction to Containers and Kubernetes hosted by VMware
NSMCon hosted by the Network Service Mesh Community
Serverless Practitioners Summit hosted by CNCF
ServiceMeshCon hosted by CNCF

It’s easy to add a co-located event to an existing registration! Log into your existing registration, enter your confirmation number, and modify by going back through the registration pages to add.

Register now!

  • KubeCon + CloudNativeCon EU Virtualのco-located eventsにスポットライトを当てている。
  • 「AWS Container Day 2020 hosted by AWS」はAPAC用に8/19⋅10:00~18:00(JST)であるので要チェック。申し込みはこちらから

ICYMI: CNCF Webinars

You can view all CNCF recorded and upcoming webinars here.

CNCF Ambassador Webinar: GitOps, DSL and App Model – Getting Started Building Developer Centric Kubernetes

Lei Zhang, Staff Engineer @Alibaba

  • 幅広いクラウドネイティブコミュニティのエンドユーザーへのサービス提供から学んだ教訓を以下5つのポイントで具体的に説明している。
    1. Why are end users not satisfied in Kubernetes?
    2. Is PaaS the right answer?
    3. What is “developer-centric” Kubernetes?
    4. How can we build it? Anything is missing in the picture?
    5. Is GitOps and DSL part of the story? What about OAM?
CNCF Member Webinar: Hardware for Kubernetes, Peeling Back the Layers

Erik Reidel, SVP Compute & Storage Solutions, ITRenew

  • 広くて深いハードウェアエコシステムの力を利用してクラウドネイティブアプリケーションを実現する方法を示している。
  • 通常は隠されているインフラストラクチャレイヤーをめくり、ハイパースケールデザインの最新のイノベーションを示している。
CNCF Member Webinar: Migrating Real-Time Communication Applications to Kubernetes at Scale: Learnings from 8×8’s Experience

Lance Johnson, Director of Engineering, Cloud R&D @8×8 Michael Laws, Sr. Site Reliability Engineer @8×8 Pankaj Gupta, Senior Director @Citrix

  • 顧客にグローバルクラウドコミュニケーションプラットフォームを提供する会社である8x8のDevOpsチームの直々の体験を共有している。
  • オンプレミス環境から、AWS上のKubernetesにVoIPを正常に移行した際の重要な考慮事項と教訓について説明している。
CNCF Member Webinar: The Open-Source Observability Playbook

Hen Peretz, Head of Solutions Engineering @Epsagon

  • ツール自体を理解するだけでなく、ツールの使用に関するベストプラクティスについて説明することを目標にしている。

The Technical

Tutorials, tools, and more that take you on a deep dive into the code.

5 reasons to run Kubernetes on your Raspberry Pi homelab

Seth Kenlon, Red Hat

  • Chris Collins氏の新たなeBook「Running Kubernetes on your Raspberry Pi homelab」をダウンロード後に、「何をやるか?」となった際に「おうちKubernetesをラズベリーパイ上で動かす」以下5つの理由をアイデアとして提供している。
    1. Network-attached storage for your home
    2. Education and upskilling
    3. Web server
    4. Containers
    5. Web development
7 Best Practices for Writing Kubernetes Operators: An SRE Perspective

Manuel Dewald, Red Hat

  • AWS、GCP上で稼働するRed Hat OpenShift Dedicated (OSD)の紹介から入り、OSDの運用チームが従来の運用チームからSREに変わっており、そのSREチームがOperatorの作成と保守の過程から学んだことを以下7つのベストプラクティスとして説明している。
    1. Use the Operator SDK
    2. Avoid Overstuffed Functions
    3. Idempotent Subroutines
    4. One Custom Resource Modification at a Time
    5. Wrap External Dependencies
    6. Test Your Code
    7. Reconciling Return Values
DevNation Tech Talk: 10 awesome Kubernetes tools every user should know

Alex Soto and Burr Sutter, Red Hat

  • Red Hat社のOpenShiftチームより、全てのKubernetesユーザーが知るべき以下の10のツールをTwitchの動画で紹介している。
    1. k9s
    2. Kubectl Aliases
    3. Stern
    4. Dive
    5. Kubens
    6. Kube-PS1
    7. Kubectx
    8. KubeSpy
    9. Kube-shell
    10. Kubectl
How to monitor etcd

David Lorite Solanas, Sysdig

  • etcdの重要性、仕組みに触れた上で、よくある失敗に触れ、監視方法を解説している。
The “podman play kube” command now supports deployments

Matthew Heon, Red Hat

  • Podmanがv2.0でpodman play kubeコマンドにてKubernetesのリソースDeploymentをサポートしたこと、今後のプランを紹介している。

The Editorial

Articles, announcements, and morethatgive you a high-level overview of challenges and features.

Independent Open Source, with Alex

Craig Box and Adam Glick, Kubernetes Podcast from Google

Service Mesh With Michelle Noorali and Delyan Raychev

Arrested DevOps Podcast, Bridget Kromhout, Microsoft

  • 上記のDEVOPS WEEKLY ISSUE #502で取り上げているので割愛します。
Introducing Tekton Hub

CD Foundation

  • Red Hat社がTektonコミュニティーと協力して、Tektonタスク、パイプライン、およびTektonのすべての検索とディスカバリーを容易にする「Tekton Hub」のプレビューリリースを発表している 。
Envoy 1.15 introduces a new Postgres extension with monitoring support

Fabrízio Mello and Álvaro Hernández, OnGres

  • Envoy v1.15での新たなPostgresプラグインとモニタリングのサポートを案内しているCNCFのブログ。
Protecting Kubernetes applications data using Kanister

Vivek Singh, InfraCloud

  • OSSツール「Kanister」を利用したKubernetes上で動かすアプリのデータを保護する方法を解説している記事。
21 CNCF Interns Graduate from the Q2 2020 Linux Foundation CommunityBridge Program

CNCF staff

  • CNCFの「CommunityBridge program」で21人のインターンがQ2 2020を終えた。
  • CNCFの14のGraduated、Incubating、Sandboxプロジェクトに参加した。 顔写真と共にそれぞれの参加プロジェクト、メンター、コメントを紹介している。

Upcoming CNCF webinars

気になるWebinarがあれば登録してチェックを。以下は直近のものとしてリストされていたものです。

Ambassador Webinar: Navigating the service mesh ecosystem
Lachie Evenson, Principal Program Manager @Azure & CNCF Ambassador
Aug 14, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Modern Software Development Pipeline: A Security Reference Architecture
Vinay Venkataraghavan, Cloud CTO, Prisma Cloud @Palo Alto Networks
Aug 25, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Local Development in The Age of Kubernetes
Misha Gusarov, Software Architect @Ridge Cloud
Aug 26, 2020 7:00 AM Pacific Time
REGISTER NOW »

Member Webinar: MLOps automation with Git Based CI/CD for ML
Yaron Haviv, Co-Founder and CTO, Iguazio
Aug 26, 2020 1:00 PM Pacific Time
REGISTER NOW »

Member Webinar: How to migrate databases into Kubernetes?
Alex Chircop, CEO & Founder @StorageOS
Aug 27, 2020 10:00 AM Pacific Time
REGISTER NOW »

Project Webinar: Kubernetes 1.19
Kubernetes release team
Aug 28, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Let’s Untangle The Service Mesh
Dominik Tornow, Principal Engineer @Cisco
Sept 1, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Getting started with container runtime security using Falco
Loris Degioanni, CTO and Founder @Sysdig
Sept 2, 2020 1:00 PM Pacific Time
REGISTER NOW »

Member Webinar: Running the next generation of cloud-native applications using Open Application Model (OAM)
Ryan Zhang, Staff Software Engineer @Alibaba Cloud
Sept 3, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Arm Developer Experience Spanning Cloud, 5G and IoT
Darragh Grealish, Co-Founder @56K.Cloud
Marc Meunier, Sr. Manager, SW Ecosystem Development @Arm
Sept 8, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Building a Cloud-Native Technology Stack that Supports Full Cycle Development
Daniel Bryant, Product Architect @Datawire
Sept 9, 2020 7:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Achieving Least Privilege Access in Kubernetes
Eran Leib Co-Founder and VP Product Management @Apolicy
Gregg Ogden Senior Product Marketing Manager @Aqua Security
Sept 11, 2020 10:00 AM Pacific Time
REGISTER NOW »

Member Webinar: Effective Kubernetes Onboarding
Kathleen Juell, Developer, DODX @DigitalOcean
Sept 16, 2020 1:00 PM Pacific Time
REGISTER NOW »

いかがでしたか?気になる記事や情報はありましたか?

私もまだ内容を咀嚼出来ていないものが多々ありますので、この備忘録兼リンク集を活用しながら理解を深めていきたいと思います。

では、また。

Bye now!!

Yoshiki Fujiwara