Skip to content
View Venkat2811's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@wso2-incubator

Block or report Venkat2811

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Venkat2811/README.md

Mechanical Sympathy is All You Need

Hi 👋, I'm Venkat !

Twitter LinkedIn GitHub

Built and scaled systems that can handle 5k->250k RPS w/o breaking a sweat.

Got into model serving and inference, enjoyed solving cold start, intelligent routing and optimizing GPU cluster utilization. Did a bit of RAG & Agents infra. Currently ML Infra - training, inference, comms collectives, storage, compiler backends, custom kernels optimizations & researching novel techniques.

High Agency individual deep in agentic-engineering mode. AI tools have enabled me to touch end-to-end infra from user facing APIs & Infra to tensors to metal. Always looking to maximize my learning curve 📈

ʕ•ᴥ•ʔ venkat.systems


Highlights


Projects


Technologies

HomeCodex Claude CLI macOS pi.dev Tailscale tmux AutoResearch
LanguagesRust Go Python Java CUDA English Markdown does it matter anymore?
InferencevLLM SGLang HuggingFace TensorRT-LLM Transformers
InfraK8s Helm Argo Docker NVIDIA Dynamo vLLM AIBrix
AcceleratorsPyTorch Triton CUTLASS CuBLAS Mojo ThunderKittens
StorageMySQL PostgreSQL Redis S3 SlateDB
MiddlewareKafka Apache Iggy NATS Redpanda ZeroMQ RabbitMQ
CloudAWS GCP Terraform Ansible
BuildEarthly Makefile Bash Bazel

Writings

Hashnode Medium Blogger

Acknowledgements

Inspired by

  • GitHub
  • GitHub
  • GitHub

Pinned Loading

  1. yali yali Public

    Speed-of-Light SW efficiency by using ultra low-latency primitives for comms collectives

    Cuda 13

  2. vllm-project/aibrix vllm-project/aibrix Public

    Cost-efficient and pluggable Infrastructure components for GenAI inference

    Go 4.8k 574

  3. ai-dynamo/dynamo ai-dynamo/dynamo Public

    A Datacenter Scale Distributed Inference Serving Framework

    Rust 6.7k 1.1k

  4. sgl-project/sglang sgl-project/sglang Public

    SGLang is a high-performance serving framework for large language models and multimodal models.

    Python 27.1k 5.7k

  5. tokasaurus tokasaurus Public

    Forked from ScalingIntelligence/tokasaurus

    Python

  6. LMCache LMCache Public

    Forked from LMCache/LMCache

    Supercharge Your LLM with the Fastest KV Cache Layer

    Python