GPU infrastructure that developers love

Run GPU inference workloads with a Python decorator. Instant provisioning, real-time logs, automatic scaling. No YAML, no Docker, no SSH.

Coming SoonContact Us

Programmable infra

Define everything in code, no YAML or config files. Keep environment and hardware requirements in sync.

app.py

import vietkong

app = vietkong.App("my-app")

@app.function(

gpu="A100",

image=my_image,

)

def predict(input):

return model(input)

Built for performance

Launch and scale containers in seconds to keep feedback loops tight and latency low.

Stable Diffusion Cold Starts

●

✔3.23s

VietKong (snapshots)

8.90s

VietKong

116s

Provider A

190s

Provider B

266s

Kubernetes + EC2

Elastic GPU scaling

Elastic GPU capacity and access to thousands of GPUs across clouds. No quotas or reservations. Scale back to zero when not in use.

01@app.function(gpu="A100")

02def inference(batch):

03results = model.generate(batch)

04return results

B200H100A100L40SRTX4090

Unified observability

Detailed logs, metrics, and traces for every function call. Debug and optimize without leaving your workflow.

Dashboard

AppsLogsMetricsCosts

12:0018:0000:00

fc-01J...Succeeded1.31s

fc-02K...Running0.42s

fc-03L...Succeeded2.10s

Products & Platform

Everything you need for GPU inference

Inference

Deploy and scale inference for LLMs, audio, image and video generation.

Learn more →

Multi-cloud

Run on any GPU provider — Vast.ai, Lambda, RunPod, AWS, or your own hardware.

Learn more →

Python-native runtime

No containers to manage. Define your environment with Python decorators and VietKong handles the rest.

Built-in container images

Start from any Docker image, layer Python packages and system deps with a fluent builder API.

Real-time log streaming

GPU metrics, function logs, and cost data stream back to your terminal via SSE as your code runs.

Multi-cloud capacity pool

Access GPUs across Vast.ai, Lambda, RunPod, AWS, GCP, Azure, or your own on-prem hardware.

Examples

Ship your first app
in minutes

No vendor lock-in. Deploy anywhere.

Coming SoonLearn More

GPU infrastructure that developers love

Programmable infra

Built for performance

Elastic GPU scaling

Unified observability

Everything you need for GPU inference

Inference

Multi-cloud

Python-native runtime

Built-in container images

Real-time log streaming

Multi-cloud capacity pool

Built with VietKong

Serve Llama 3 with vLLM

Deploy a streaming chat API

Run batch completions

Ship your first app
in minutes

GPU infrastructure that developers love

Programmable infra

Built for performance

Elastic GPU scaling

Unified observability

Everything you need for GPU inference

Inference

Multi-cloud

Python-native runtime

Built-in container images

Real-time log streaming

Multi-cloud capacity pool

Built with VietKong

Serve Llama 3 with vLLM

Deploy a streaming chat API

Run batch completions

Ship your first appin minutes

Ship your first app
in minutes