Speeding up helm dependency build
When working with Helm you may find yourself using helm dependency build
.
This will resolve chart dependencies from scratch. It places the resulting packages in charts/
and generates a lockfile to boot.
I’ve noticed it can be particularly slow, and so went about some sleuthing.
Test setup
I’m using helm 3 (3.4.2) at the moment, however this behaviour has been around for some time. I’m using a fresh install of helm with no extra repositories defined (this is important).
Let’s setup our test case. We create a barebones chart with a single dependency. For simplicity, let’s use the archived stable repository.
apiVersion: v2
name: foo
description: A Helm chart for Kubernetes
type: application
version: 0.1.0
appVersion: 1.16.0
dependencies:
- name: rabbitmq
repository: https://charts.helm.sh/stable/
version: "^1.0.0"
Now let’s see how helm dependency build
behaves.
➜ foo git:(master) ✗ time ./clean-build
Getting updates for unmanaged Helm repositories...
...Successfully got an update from the "https://charts.helm.sh/stable/" chart repository
Saving 1 charts
Downloading rabbitmq from repo https://charts.helm.sh/stable/
Deleting outdated charts
real 0m26.427s
user 0m6.640s
sys 0m1.500s
➜ foo git:(master) ✗ ll -R
total 16
-rw-r--r-- 1 stew staff 214B 22 Mar 11:40 Chart.lock
-rw-r--r-- 1 stew staff 217B 22 Mar 11:35 Chart.yaml
drwxr-xr-x 3 stew staff 96B 22 Mar 11:40 charts
./charts:
total 24
-rw-r--r-- 1 stew staff 9.4K 22 Mar 11:40 rabbitmq-1.1.9.tgz
So far so good. It took about 30 seconds (bless my poor internet), and resulted in a single chart file being downloaded.
So what just happened? Helm downloaded the index file for the stable charts repository, resolved the chart version we need and downloaded it, producing a Chart.lock
file along the way.
Let’s add a few more dependencies…
dependencies:
- name: rabbitmq
repository: https://charts.helm.sh/stable/
version: "^1.0.0"
- name: minio
repository: https://charts.helm.sh/stable/
version: "^1.0.0"
- name: mysql
repository: https://charts.helm.sh/stable/
version: "^1.0.0"
➜ foo git:(master) ✗ time helm dependency build
Getting updates for unmanaged Helm repositories...
...Successfully got an update from the "https://charts.helm.sh/stable/" chart repository
...Successfully got an update from the "https://charts.helm.sh/stable/" chart repository
...Successfully got an update from the "https://charts.helm.sh/stable/" chart repository
Saving 3 charts
Downloading rabbitmq from repo https://charts.helm.sh/stable/
Downloading minio from repo https://charts.helm.sh/stable/
Downloading mysql from repo https://charts.helm.sh/stable/
Deleting outdated charts
real 0m40.973s
user 0m17.711s
sys 0m2.583s
Interestingly, we see Helm make three attempts to update the same chart repository.
We also see a long delay - multiple seconds - between fetching each individual chart. A chart is just a gzipped tarball, usually a few KiB in size.
The message ...Successfully got an update from the "https://charts.helm.sh/stable/" chart repository
arrives in an irregular order, as if downloaded in parallel.
In an ideal world, we would expect Helm to download our chart manifest exactly once, resolve dependencies and download the relevant chart packages. We instead see multiple attempts to fetch the chart manifest, and long delays between fetching individual packages.
Here’s how it behaves in relation to total dependencies.
It is linear with dependencies. However, we’d expect the lengthy index download to dominate at the lower ordinals.
That it doesn’t indicates something isn’t quite right.
So what’s going on?
- The same chart manifest is downloaded multiple times during the first stage
- There’s some undetermined delay causing resolution of individual charts to be slower than expected
Rolling our own
For fun, I wrote a hacky and limited implementation of helm dependency build
. You can find it on my GitHub.
It supports limited repository locations (just http(s)
and local file
locations) and relies on the helm CLI to package local charts. It has
loose support for v1 and v2 Charts (using requirements.yaml
vs Chart.yaml
dependencies respectively), and is thoroughly untested.
➜ foo git:(master) ✗ time helm-dependency-fetch
Fetching rabbitmq @ ^1.0.0
Fetching index from https://charts.helm.sh/stable/
Fetching chart: https://charts.helm.sh/stable/charts/rabbitmq-1.1.9.tgz
Fetching minio @ ^1.0.0
Fetching chart: https://charts.helm.sh/stable/charts/minio-1.9.2.tgz
Fetching mysql @ ^1.0.0
Fetching chart: https://charts.helm.sh/stable/charts/mysql-1.6.9.tgz
Fetching coredns @ ^1.0.0
Fetching chart: https://charts.helm.sh/stable/charts/coredns-1.13.8.tgz
Fetching couchdb @ ^2.0.0
Fetching chart: https://charts.helm.sh/stable/charts/couchdb-2.3.0.tgz
Fetching dokuwiki @ ^1.0.0
Fetching chart: https://charts.helm.sh/stable/charts/dokuwiki-1.0.3.tgz
Fetching drone @ ^2.0.0
Fetching chart: https://charts.helm.sh/stable/charts/drone-2.7.2.tgz
Fetching drupal @ ^6.0.0
Fetching chart: https://charts.helm.sh/stable/charts/drupal-6.2.12.tgz
Fetching elastabot @ ^1.0.0
Fetching chart: https://charts.helm.sh/stable/charts/elastabot-1.2.1.tgz
Fetching elastalert @ ^1.0.0
Fetching chart: https://charts.helm.sh/stable/charts/elastalert-1.5.1.tgz
Fetching elastic-stack @ ^2.0.0
Fetching chart: https://charts.helm.sh/stable/charts/elastic-stack-2.0.6.tgz
Fetching elasticsearch @ ^1.0.0
Fetching chart: https://charts.helm.sh/stable/charts/elasticsearch-1.32.5.tgz
real 0m7.072s
user 0m0.883s
sys 0m0.290s
It takes just over 7s to fetch 12 dependencies. Previously it was taking about 163s.
We’ve got it down to about 4.3% the original time, and I suspect that speedup is valid even though this logic is greatly simplified.
Diving into Helm
Rolling our own tool is fun, but now it’s time to investigate Helm itself.
helm dependency build
hands off directly to the downloader
package, specifically the manager
, which in turn calls chart_downloader
’s methods.
We observe the following abridged call structure.
cmd/dependency_build:newDependencyBuildCmd ->
manager:Build ->
manager:Update ->
manager:UpdateRepositories -> # Updates 'unmannaged' repositories in parallel
chartrepo:DownloadIndexFile # downloads the index file
manager:downloadAll -> # Downloads all charts found as dependencies
chart_downloader:DownloadTo ->
chartrepo:FindChartUrl ->
chartrepo:FindChartInRepoURL ->
chartrepo:DownloadIndexFile # Downloads the index file (again!)
chart_downloader:ResolveChartVersion ->
chart_downloader:scanReposForURL # Finds the chart, iterates over all repos
It’s clear that UpdateRepositories
does not perform any de-deduplication on unmannaged repos. This explains our excessive initialisation.
Those same repos are then re-fetched in FindChartInRepoURL
. This does not happen with managed repositories, indicating the local cache is not being searched in this case.
Finally we still see a delay when fetching charts. This is the result of scanReposForURL
which inefficiently searches all repository indexes for the given chart version.
func (c *ChartDownloader) scanReposForURL(u string, rf *repo.File) (*repo.Entry, error) {
// FIXME: This is far from optimal. Larger installations and index files will
// incur a performance hit for this type of scanning.
for _, rc := range rf.Repositories {
r, err := repo.NewChartRepository(rc, c.Getters)
In closing
All major Helm versions have a severe performance issue with resolving dependencies from unmanaged repositories.
There are 3 issues:
- Unmanaged Helm repository indexes are not de-deduplicated before download
- Unmanaged Helm repository indexes are fetched, and fetched again for each dependency
- Chart resolution unnecessarily loads irrelevant repository indexes, which negatively affects charts with many dependencies, or dependencies from diverse sources
We know we can avoid most of these problems by simply managing all repositories explicitly. However, the underlying issues remain valid.
Next up is to propose some fixes. De-duplication seems a quick win, but the others need more investigation.