Two weeks ago during their GDC Keynote Nvidia announced the first few video cards from their upcoming 4000 series of GPUs based on the Ada Lovelace architecture. They announced the RTX 4090 and two RTX 4080 cards, one with 16GB of VRAM and the other with 12GB which in addition to the memory those cards have a different base GPU as well. But today the performance embargo for the RTX 4090 is lifted so we are going to focus on that and the RTX 4090 will be available for sale tomorrow should you be interested in picking one up. Today I’m going to dive into the 4000 series of cards to tell you a little about them if you haven’t read about them yet then we will check out the RTX 4090 Founders Edition closer and then dive into testing, it’s a lot to go over so let’s dive in!

Product Name: Nvidia RTX 4090 Founders Edition

Review Sample Provided by: Nvidia

Written by: Wes Compton

Amazon Affiliate Link: HERE

 

What is the 4000 Series all about

Nvidia’s 3000 Series of cards launched back in September of 2020 and with that brought their Ampere microarchitecture, the Samsung 8N manufacturing process, and a completely new card design as well. Here we are two years later but it feels like it's been four years and a lot has changed and a lot is changing with the new 4000 Series of cards. Ampere is in the mirror and the new cards are now based on the Ada Lovelace microarchitecture which is named after the English mathematician Ada Lovelace who is often regarded as the first computer programmer. The new Ada Lovelace based GPUs would also be produced by TSMC’s 4N process which is a 5 nm process and itself brings the possibility of efficiency improvements alongside the new microarchitecture. So what all has changed with Ada Lovelace?

The new microarchitecture has a new fourth-generation tensor core design and a new third-generation ray tracing core design to continue to expand on the ray tracing performance established over the previous two generations of cards and the AI processing originally introduced with Volta. The new tensor cores are capable of up to 1400 Tensor TFLOPs, four times that of the 3090 Ti which alongside the introduction of DLSS 3 are set to bring huge leaps in DLSS performance. Lovelace now has what they call Shader Execution Reordering or SER that can reschedule the workload to ensure that similar shaders are being processed together which helps a lot with processing ray tracing by allowing the multiprocessors to work more efficiently.

nvidia 1

nvidia 2

nvidia 3

Nvidia has also doubled up on the AV1 processing which was added last generation with the new cards having two AV1 processors. This is important because AV1 which is a royalty-free video encoding format which is also 30% more efficient than H.265 meaning with it you can stream higher bitrate video with less bandwidth. AV1 is set to be added to streaming tools like OBS and Discord as well as video production tools as well and having two encoders in NVENC will be huge.

nvidia 8

DLSS 3 in itself is a big change and sadly it is exclusive to the new 4000 series cards because it relies on Lovelace’s new Optical Flow Accelerator as well as the new much more powerful Tensor Cores. On the plus side though if you have a previous generation Nvidia card DLSS 3 is an extension of DLSS 2 so games that add DLSS 3 will also have DLSS 2. DLSS 3 combines the DLSS Super Resolution from DLSS 2 and adds in DLSS Frame Generation. So previously with DLSS Super Resolution, each frame generated would use the AI tensor cores to scale the resolution up to get the detail you want while keeping the processing time down. DLSS Frame Generation on the other hand takes every other frame and generates a complete frame. With the two combined DLSS 3 is reconstructing 7/8th of the pixels displayed. The DLSS Frame Generation has a really interesting side effect as well that DLSS Super Resolution didn’t have. When you are rendering an entire extra frame like that, it also brings performance improvements to games that are CPU limited which means that games that are CPU limited could implement DLSS 3 and see up to twice the frame rate improvement.

nvidia 4

nvidia 5

As far as 4000 Series cards, Nvidia did announce three different cards with their announcement of Ada Lovelace and the new generation of cards. The flagship of those cards is the RTX 4090 and then they have two different RTX 4080 models which have caused some confusion and uproar in the hardware community. All three are based on Ada Lovelace but they are on three different GPU variations. The RTX 4090 is on the AD102, the RTX 4080 16GB is on the AD103, and the RTX 4080 12GB is on the AD104. I will dive more into the new 4080 models as those launch later but obviously with them sharing the same name but having different memory levels has happened in the past with Nvidia cards but never before have those different memory capacities had a completely different GPU or memory controller which they have done here. The “104” designation on the GPU is normally seen on the xx70 cards which have had some people accusing Nvidia of rebranding their xx70 up to an xx80 so they could raise pricing but on the previous calls with Nvidia, they have responded that they felt that the cards performance was that of an xx80 card and that it should get the naming. I agree that the naming is going to be confusing, I don’t think we will know on the rest until the 4080s are tested and more importantly until we see the xx70 card from the 4000 series which may be a while.

nvidia 6

With our focus on the new RTX 4090 today I put together a table to compare its specs alongside the RTX 3090 Ti which was the former flagship, the RTX 3090, and the RTX 3080 Ti which are all Ampere-based cards at the top of Nvidia’s lineup over the last few years. It is interesting right in the top few specifications to see that the RTX 4090 has 11 graphics processing clusters whereas all three of the other cards had 7. This means the RTX 4090 is using 11 out of the 12 GPCs that Ada Lovelace has on its full size. TPCs or texture processing clusters and SMs (streaming multiprocessors) also scaled up which the TPC consists of SMs inside so that isn’t a surprise. This leads us to the big number though, all together the RTX 4090 has 16384 CUDA cores which is 52% more than the RTX 3090 Ti, which is on top of the architectural improvements as well. So with these numbers in mind, we know that each SM has a total of 128 CUDA cores. The 512 new 4th generation Tensor cores tell us each SM has 4 total and then the 128 total RT or ray tracing cores tells us each SM also has one ray tracing core as well.

On top of the brute force rise in processing power with more CUDA, Tensor, and RT cores as well as architectural improvements with the new tensor and rt cores Nvidia has also really stepped things up in clock speeds. They were able to do this because of the efficiencies that the TSMC 4N process brings with making things smaller. But the RTX 4090 has a GPU boost clock speed of 2520 MHz which is almost a full GHz over the clock speed on the RTX 3080 Ti and even compared to the RTX 3090 Ti which was clocked at 1860 MHz is a massive jump. Memory clocks have seen the same clock speed improvement as well. The RTX 4090 has the same 24 GB capacity that the RTX 3090 and RTX 3090 Ti had before and 384-bit memory interface that all four cards have. The L2 cache has been scaled up from 6144 KB up to a full MB which is an improvement of just under 71%. Altogether the changes have the Ada Lovelace die size at 76.3 billion compared to 28.3 Billion for the Ampere cards. Interestingly the TDP and minimum power supply requirements haven’t changed at all but the power connection lists having a new 4 8-pin to 16-pin dongle compared to the 3 8-pin to 16-pin dongle used on the RTX 3090 and RTX 3090 Ti.

Specifications

RTX 3080 Ti

RTX 3090

RTX 3090 Ti

RTX 4090

Graphics Processing Clusters

7

7

7

11

Texture Processing Clusters

40

41

42

64

Streaming Multiprocessors

80

82

84

128

CUDA Cores

10240

10496

10752

16384

Tensor Cores

320 (3rd Gen)

328 (3rd Gen)

336 (3rd Gen)

512 (4th Gen)

RT Cores

80 (2nd Gen)

82 (2nd Gen)

84 (2nd Gen)

128 (3rd Gen)

Texture Units

320

328

336

512

ROPs

112

112

112

176

Boost Clock

1665 MHz

1695 MHz

1860 MHz

2520 MHz

Memory Clock

1188 MHz

1219 MHz

1860 MHz

2520 MHz

Memory Data Rate

19 Gbps

19 Gbps

21 Gbps

21 Gbps

L2 Cache Size

6144 KB

6144 KB

6144 KB

10501 KB

Total Video Memory

12 GB GDDR6X

24 GB GDDR6X

24 GB GDDR6X

24 GB GDDR6X

Memory Interface

384-bit

384-bit

384-bit

384-bit

Total Memory Bandwidth

912 GB/s

936.2 GB/s

1008 GB/s

1008 GB/s

Texture Rate (Bilinear)

532.8 GigaTexels/second

556 GigaTexels/second

625 GigaTexels/second

1290.2 GigaTexels/second

Fabrication Process

Samsung 8 nm 8N

NVIDIA Custom Process

Samsung 8 nm 8N

NVIDIA Custom Process

Samsung 8 nm 8N

NVIDIA Custom Process

TSMC 4 nm

NVIDIA Custom Process

Transistor Count

28.3 Billion

28.3 Billion

28.3 Billion

76.3 Billion

Connectors

3 x DisplayPort

1 x HDMI

3 x DisplayPort

1 x HDMI

3 x DisplayPort

1 x HDMI

3 x DisplayPort

1 x HDMI

Form Factor

Two Slots

Triple Slot

Triple Slot

Triple Slot

Power Connectors

1x16-pin

(Dongle to 2x 8-Pins)

1x16-pin

(Dongle to 3x 8-Pins)

1x16-pin

(Dongle to 3x 8-Pins)

1x16-pin

(Dongle to 4x 8-Pins)

Minimum Power Supply

750 Watts

850 Watts

850 Watts

850 Watts

Total Graphics Power (TGP)

350 Watts

450 Watts

450 Watts

450 Watts

Maximum GPU Temperature

93° C

92° C

93° C

90° C

PCI Express Interface

Gen 4

Gen 4

Gen 4

Gen 4

Launch MSRP

$1199

$1499

$1999

$1599

 

Before getting into testing I did also run GPUz to double-check that our clock speeds match up with the specifications which it did. I also have this to document the card BIOS revision as of testing as well as the driver which is the 521.90 Beta driver for before the launch.

image 37

 

Log in to comment

We have 3243 guests and one member online

supportus