Servers in stock
 Checking availability...
50% off 1st month on Instant Servers - code 50OFF +1-718-873-9104
Configure server
L40S · NVIDIA GPU servers

NVIDIA L40S dedicated servers

Experience consistent excellent AI and Machine Learning performance with the revolutionary NVIDIA L40S GPU, perfect for AI training, graphics rendering, video transcoding and virtualization.

Global locations 24/7 support 5 minutes deployment

Turn your GPUs into passive monthly revenue

Got idle server or desktop GPU setups? List them on the Primcast marketplace today and earn steady monthly rents from AI teams, developers, and enterprises needing production-grade compute.

Go to Marketplace

NVIDIA L40S GPU specifications

Deliver multi-workload acceleration for large language model inference and training, graphics and video applications through the Ada Lovelace architecture.

NVIDIA L40S

The L40S GPU delivers exceptional performance with 1466 TFLOPS in Tensor operations, 212 TFLOPS in RT core performance, and 91.6 TFLOPS in Single-precision performance.

Architecture

Ada Lovelace

Video memory

48GB GDDR6 with ECC

CUDA cores

18,176 pcs.

Max Bandwidth

864 GB/s

Max Power

350 W

Performance metrics

Advanced tensor processing and ray tracing capabilities optimized for AI workloads and high-fidelity rendering tasks.

FP32

91.6 teraFLOPS

FP16 Tensor Core

733 teraFLOPS

FP8 Tensor Core

1,466 teraFLOPS

RT Core

212 teraFLOPS

Perfect for AI training, rendering, and video workloads

Enterprise-grade NVIDIA L40S GPU servers built on Ada Lovelace architecture, delivering exceptional performance for AI model training, 3D rendering, and video production workflows.

AI model training

The L40S GPU accelerates AI model training through the utilization of structural sparsity and the optimized TF32 format, delivering breakthrough performance for deep learning workloads.

LLM training and inference

L40S leverages fourth-generation Tensor Cores with FP8 support, providing exceptional computing performance for accelerated training and inference of advanced LLM and Generative AI models.

Ray tracing

The L40S enhances ray tracing performance, speeding up renders for design and engineering workflows with advanced RT cores for realistic visualization.

Rendering and 3D graphics

3D workloads are enhanced with the NVIDIA L40S for faster rendering and increased productivity. Work in real-time on intricate designs with high-resolution textures.

Video and streaming

The NVIDIA L40S GPU boosts streaming and video workloads with three video encoding and decoding engines, featuring AV1 encoding for breakthrough performance.

DLSS 3 technology

Enhanced rendering and frame rates through DLSS 3, leveraging deep learning innovations for higher FPS and reduced latency in demanding applications.

Compare A100 vs L40S vs H100

Compare performance metrics and pricing across NVIDIA GPU options to find the best fit for your AI and graphics workloads.

L40S A100 H100
Architecture Ada Lovelace NVIDIA Ampere Hopper
Memory 48GB GDDR6 80GB HBM2e 80GB HBM3
Memory Bandwidth 864 GB/s 2039 GB/s 3352 GB/s
FP32 91.6 TFLOPS 19.5 TFLOPS 66.9 TFLOPS
TF32 Tensor Core 366 TFLOPS 312 TFLOPS 989 TFLOPS
FP16/BF16 Tensor Core 733 TFLOPS 624 TFLOPS 1979 TFLOPS
Power Up to 350W Up to 400W Up to 700W
Loading... Loading... Loading...

FAQ about NVIDIA L40S GPU servers

Common questions about deploying and managing your NVIDIA L40S GPU-accelerated servers for AI, rendering, and video production workloads.

What makes NVIDIA L40S GPUs ideal for AI and creative workloads?

NVIDIA L40S GPUs are built on the Ada Lovelace architecture, offering a unique combination of AI acceleration and graphics performance. With 18,176 CUDA cores, 48GB GDDR6 memory, and fourth-generation Tensor Cores with FP8 support, the L40S excels at AI training, LLM inference, 3D rendering, and video production. It delivers 1,466 teraFLOPS of FP8 performance for AI workloads while providing advanced ray tracing capabilities for visualization tasks.

How long does it take to deploy an L40S GPU server?

Instant L40S configurations are delivered within 5 minutes with your verified payment. Your GPU dedicated server includes instant OS reload capabilities, allowing you to iterate quickly without re-opening support tickets. Deploy in global locations with optimized network routes for immediate productivity.

What are the key advantages of L40S for AI and rendering?

L40S offers a balanced combination of AI and graphics performance. For AI workloads, it provides FP8 Tensor Core acceleration for efficient training and inference of large language models. For rendering, it includes advanced RT cores for ray tracing and three video encode/decode engines with AV1 support. The 48GB GDDR6 memory enables handling large models and high-resolution assets, while DLSS 3 technology delivers enhanced frame rates for real-time visualization.

What workloads are best suited for L40S GPU servers?

L40S GPU servers are ideal for multi-workload environments requiring both AI and graphics capabilities. Perfect use cases include: AI model training and inference (especially LLMs), 3D rendering and CAD visualization, video transcoding and streaming with AV1 encoding, virtual desktop infrastructure (VDI), and mixed workloads that combine AI processing with high-quality graphics output.