Skip to main content

kt doctor: Verifying Your Gateway Health

The fastest way to lose time during an outage is to debug the wrong layer first. kt doctor exists to prevent that. It is the Keeptrusts command that checks the local operator path end to end: API connectivity, config readability, state directory health, and local gateway liveness. It is deliberately not a deep inspection tool. It is the first pass that tells you where to start.

Use this page when

  • A gateway or CLI workflow suddenly stops behaving the way it did yesterday.
  • You changed configuration, credentials, or profiles and want a fast sanity check.
  • You need machine-readable diagnostics before a deployment or as part of an incident runbook.

Primary audience

  • Primary: Technical Engineers and platform operators
  • Secondary: Technical Leaders maintaining support and onboarding runbooks

The problem

Most gateway failures do not look unique at first. A request fails, events stop appearing, a rollout seems stuck, or a local command cannot reach the control plane. From the outside, those can all look like "Keeptrusts is broken." In reality, the failure domain is usually much smaller.

Maybe the API token is missing. Maybe the gateway process never started. Maybe the config file path changed. Maybe the state directory became unwritable after a permissions change. Maybe the API URL is wrong for the active environment.

If an operator starts with deep debugging immediately, they can spend a long time inspecting policy content or provider settings when the actual problem is that no valid CLI auth exists. That is an expensive habit, and it shows up most often in new environments and after routine operational changes.

The first question should be narrower: which layer is failing right now?

The solution

kt doctor answers that with a short set of targeted checks.

The command verifies whether the configured Keeptrusts API is reachable, whether the selected config file exists and parses as YAML, whether the state directory is available, and whether a local gateway appears reachable. If the command finds a failure, it reports the failed check directly instead of forcing the operator to infer it from a generic error.

That matters because each failed check suggests a different next action.

  • API connectivity failures point toward URL, token, or network issues.
  • Config failures point toward the file path or YAML content.
  • State directory failures point toward local permissions or environment setup.
  • Gateway liveness failures point toward the runtime not being up where you expect it.

There is also a machine-readable mode. kt doctor --json makes the same diagnostic set easy to feed into a CI preflight, a provisioning script, or a runbook step that decides whether to continue.

Implementation

The default invocation is enough for most local checks:

kt doctor

When you are working with an environment-specific file or profile, use the command with explicit inputs:

kt doctor \
--config environments/staging/policy-config.yaml \
--profile staging \
--json

That does two useful things.

First, it removes ambiguity about which environment you are testing. Second, it makes the output scriptable. You can immediately reduce the result to the failing checks:

kt doctor --config policy-config.yaml --json | \
jq '.[] | select(.status != "ok") | {name, status, message}'

This is a good preflight step in operational scripts because it keeps the script from moving into a rollout or verification phase when the basics are already failing.

A practical local workflow looks like this:

  1. Run kt config show --json if you suspect config precedence or profile confusion.
  2. Run kt doctor to confirm the resolved inputs can actually work.
  3. If the gateway liveness check fails, start or inspect the gateway runtime.
  4. If the API connectivity check fails, fix auth or network before you inspect policies.

That order matters. kt doctor is strongest when it is used early enough to redirect the investigation before deeper work starts.

The command is also useful during onboarding. A new engineer does not need a long troubleshooting guide just to find out that their API token is missing or their config file path is wrong. If the team standardizes on "run kt doctor first," support load drops because many setup issues become self-service.

In automation, the JSON form is especially valuable. A pipeline or bootstrap script can call kt doctor --json, examine the status of each check, and stop before it attempts to push a config or start a dependent workflow. That is a better failure boundary than letting a later command fail in a less informative way.

Another practical pattern is environment regression checks. If you rotate tokens, move state directories, or adjust machine-level config, running kt doctor before and after the change gives you quick evidence that the operator path still works. It is a very small command, but it catches the kind of small breakage that causes outsized confusion later.

Results and impact

Teams that use kt doctor consistently troubleshoot faster because they stop treating every failure as a policy problem. The command narrows the failure layer immediately, which usually means fewer blind edits and fewer unnecessary restarts.

It also improves automation quality. A release job or setup script that includes kt doctor --json is less likely to fail in the middle of a larger workflow for a reason that should have been detected at the start.

The soft benefit is standardization. When every operator starts from the same first diagnostic step, incident collaboration becomes cleaner. People spend less time comparing guesses and more time working from the same evidence.

Key takeaways

  • kt doctor is the right first command when a Keeptrusts CLI or gateway workflow behaves unexpectedly.
  • It checks API reachability, config validity, state directory health, and local gateway liveness.
  • Use --config and --profile when you need to validate a specific environment explicitly.
  • Use --json when the result needs to feed a script, CI job, or structured runbook.
  • The command is most valuable early, before you start debugging policies or providers in detail.

Next steps