Steam Deck - SteamOS, Zen2 4C/8T, RDNA2 1.0-1.6 TF, 16 GB LPDDR5 88 GB/s, starting at $399 [2021-12]

It really annoys me that the Steam Deck has such cutting edge hardware inside, but from the outside looks like something straight from the 90s. Why those huge bezels?

I mean, even the Switch OLED looks much more modern.
 
very excited about this, super nice super efficient. Might as well just stop wating for a desktop GPU given the current situation and buy Steam Deck.

No way I will use Linux though, but Windows 11. I have most of my games on gamepass PC and GoG -specially-, although I also like Steam and have many games there
 
It really annoys me that the Steam Deck has such cutting edge hardware inside, but from the outside looks like something straight from the 90s. Why those huge bezels?

I mean, even the Switch OLED looks much more modern.

My guess is

1) Gabe said it was painful to hit the current price point. At $400 the newer small bezel tech might have been to much or a larger screen might be to much

2) Larger screen would have meant higher res which would have broken the performance profile they were going with ?

3) As part of but separate from 2 using the same size screen but with no bezels would have left a lot more empty space on the device or they would have had to make it smaller negating cooling ?

very excited about this, super nice super efficient. Might as well just stop wating for a desktop GPU given the current situation and buy Steam Deck.

No way I will use Linux though, but Windows 11. I have most of my games on gamepass PC and GoG -specially-, although I also like Steam and have many games there

I am hoping that valve or another company provides a good booting solution so that you get a nice menu at start up to select steam os or a third party os you have installed.

I bought the 256gig version and would most likely buy a 512 or 1tb sd card to use with this depending on pricing of the cards and performance. I do agree game pass on this will be amazing.
 
It really annoys me that the Steam Deck has such cutting edge hardware inside, but from the outside looks like something straight from the 90s. Why those huge bezels?

I mean, even the Switch OLED looks much more modern.

gabe in an interview says it was quite painful to achieve the lowest price
 
gabe in an interview says it was quite painful to achieve the lowest price

I'd imaging they'll have a screen upgrade later if it's successful.

I don't think bezels are that big compared to a laptop or tablet. Showing stuff in 720 rather than 800 doesn't help though. Being LCD probably forces their hand a bit on bezels, as the screen controller can't be bent round out of the way, like with OLED.
 
Will be interesting to see whether or not they've included any ∞$, though that might be overkill in terms of performance with LPDDR5. And while it may save a power at the system level, it'll further put a strain on the targeted 4 Watt lower TDP boundary.
 
If so, the dual C5 alone is seemingly capable of 2 TMACs INT8 / GHz, or up to 3.2 TOPs if it uses the same clocks as the GPU. Not too shabby on paper.
So paging @Nebuchadnezzar who wrote this article.
Would it be possible to use the dual-C5 + dual-Q6 in Van Gogh to help offload the GPU in a ML-based upscaler?
Are the buses fast enough for real-time usage on a 1280*800p60 stream of frames (i.e. getting data + running inferences + returning data within ~5ms), or is it only capable of operating on higher latencies?



Will be interesting to see whether or not they've included any ∞$,
I'm 85% sure it doesn't, because Van Gogh's bandwidth-per-GPU-TFLOP is practically the same as the consoles.

If we assume around 20GB/s per 4-core CCX, the PS5's CPU will take 40GB/s, leaving 408GB/s for 10.23 TFLOPs on the GPU. That leaves ~40 GB/s per GPU TFLOP on the PS5.
A similar calculation puts the Series S with 46 GB/s per GPU TFLOP and the Series X with 42.8 GB/s.

In the case of Steam Deck's Van Gogh, if the single 4-core CCX takes 20GB/s out of 88GB/s then that leaves 68 GB/s for 1.6 TFLOPs of GPU, so the result is a similar 42.5GB/s per TFLOPs.

In the end, it seems the RDNA2's sweetspot for bandwidth-per-TFLOP stands at 40-45 GB/s and we can see it on all consoles and the Steam Deck alike.
If Van Gogh had only 44GB/s at its disposal, then I guess the inclusion of Infinity Cache would be more plausible.
 
Would it be possible to use the dual-C5 + dual-Q6 in Van Gogh to help offload the GPU in a ML-based upscaler?
Not sure about Cadence, but I don't think they are much good for that.
We are using Snapdragon inbuilt aDSP, mDSP and sDSP and you need a different SDK (Hexagon Tool chain) to build the modules to offload processing from apps processor.
They don't really do flops, more like MIPS (Millions of instructions per sec).
You do things like for example Android compressed offload decoding, offload Image filtering etc.
You can make a topology of modules which pass the data samples through the pipeline to apply all sorts of signal processing, filtering etc.
These modules are basically things like Band pass filters, gain, transform etc
Not really sure they are any good in ML, even if at all you can make some topology to do some matrix operation an tensor operation.
 
Last edited by a moderator:
Not sure about Cadence, but I don't think they are much good for that.
We are using Snapdragon inbuilt dDSP, mDSP and sDSP and you need a different SDK (Hexagon Tool chain) to build the modules to offload processing from apps processor.
They don't really do flops, more like MIPS (Millions of instructions per sec).
You do things like for example Android compressed offload decoding, offload Image filtering etc.
You can make a topology of modules which pass the data samples through the pipeline to apply all sorts of signal processing, filtering etc
Not really sure they are any good in ML, even you can make some topology to do some matrix operation an tensor operation.

According to Cadence, at least the C5 is indeed oriented at ML.
Here is what Cadence says about the DSPs:

The Cadence® Tensilica® Vision digital signal processor (DSP) family is designed for demanding imaging, computer vision, and neural network (NN) applications in the mobile, automotive, surveillance, gaming, drone, and wearable markets. The Vision P5 DSP and the Vision P6 DSP are our two imaging- and computer vision-specific products that establish a new standard in high-performance, low-energy digital signal processing. With addition of the Vision C5 DSP, we now have a member designed specifically for NN processing.
The Q6 seems to be the same as the P6 but optimized for higher clock speeds.


Each Q6 does 256 MACs per cycle, and each C5 does 1024 MACs per cycle, all on 8bit matrix operations. With two of each, we get 2560 MACs per cycle.
Assuming the same 1.6GHz clocks as the GPU*, the total combined throughput of the 4 DSPs is 4.096 TMACs, or 8.192 TOPs.

If it's not to enhance some future FSR / ML-upscaler (like the one we saw on AMD's patents) then I have no idea why AMD or Microsoft would want this on a low-power SoC for a Surface device.. It sounds like too much throughput compared to Intel's Gaussian Neural Accelerator in Ice/Tiger Lake, and at the same time too little compared to the latest automotive solutions out there.

* assuming 1.6GHz on these DSPs should be on the conservative side, since Cadence says they clock up to 1.5GHz on 16FF and Van Gogh is on N7.
 
According to Cadence, at least the C5 is indeed oriented at ML.
Here is what Cadence says about the DSPs:
I see, seems they want to stick NN everywhere.
OK, for us using Snapdragon we don't have NN in the Hexagon SDK, you can do matrix operations for sure, but most of these modules are very tailored for signal processing.
On top of that there is a memory copy from the SoC (HAL in Android to the Graph Service Library) to the DSP because they are in a different memory and power plane.

If it's not to enhance some future FSR / ML-upscaler (like the one we saw on AMD's patents) then I have no idea why AMD or Microsoft would want this on a low-power SoC for a Surface device
DSP is very well suited for low power SoC. In Android you can do compressed offload for Audio playback and put the SoC cluster to sleep, same for Media. The DSP will take a fraction of the power needed to do the same thing in SoC
DSP can do in few cycle what would take many cycles in CPU to do FFTs and such.
 
Last edited by a moderator:
DSP is very well suited for low power SoC. In Android you can do compressed offload for Audio playback and put the SoC cluster to sleep, same for Media. The DSP will take a fraction of the power needed to do the same thing in SoC
DSP can do in few cycle what would take many cycles in CPU to do FFTs and such.

I know why DSPs would be used to save power in mobile applications. The DSPs in question should be for computer vision and machine learning (in AMD's leaked roadmaps, Van Gogh and Rembrandt appear with a "CMVL" block). Unless Microsoft was planning to release a Surface device with the capability of taking many AI-enhanced pictures and videos, the only other use case for power saving I see here would be Augmented Reality. Perhaps Microsoft wanted to release a Surface companion to Hololens, like a pocketable or belt-clipped PC that locally feeds a consumer-version of Hololens..

Regardless, I wonder if those dual C5s that are made for NN matrix operations wouldn't be over-engineered for either use case.
 
I know why DSPs would be used to save power in mobile applications. The DSPs in question should be for computer vision and machine learning (in AMD's leaked roadmaps, Van Gogh and Rembrandt appear with a "CMVL" block). Unless Microsoft was planning to release a Surface device with the capability of taking many AI-enhanced pictures and videos, the only other use case for power saving I see here would be Augmented Reality. Perhaps Microsoft wanted to release a Surface companion to Hololens, like a pocketable or belt-clipped PC that locally feeds a consumer-version of Hololens..

Regardless, I wonder if those dual C5s that are made for NN matrix operations wouldn't be over-engineered for either use case.

Microsoft has a really big surface pro change in the works. I just don't know if it will be 8 or 9 at this point. AMD in surface laptop has also been really successful and we might see it come to the pro and also the book in the nearish future.

The chip shortages are real fulfillment on models can be 2-3 months out depending on which one
 
Last edited:
I know why DSPs would be used to save power in mobile applications. The DSPs in question should be for computer vision and machine learning (in AMD's leaked roadmaps, Van Gogh and Rembrandt appear with a "CMVL" block). Unless Microsoft was planning to release a Surface device with the capability of taking many AI-enhanced pictures and videos, the only other use case for power saving I see here would be Augmented Reality. Perhaps Microsoft wanted to release a Surface companion to Hololens, like a pocketable or belt-clipped PC that locally feeds a consumer-version of Hololens..

Regardless, I wonder if those dual C5s that are made for NN matrix operations wouldn't be over-engineered for either use case.
Most Snapdragons (and Exynos, others too I believe) have something called AOP (Always On Processor which is a block which never shut down) which can wake up the CPU clusters in deep sleep if there are events coming from the DSP. DSP never goes into the LPM as long as there are sessions active even if the main CPU clusters are in deep sleep.
You can use the DSP to trigger a wake event on detection of a face for example. Very useful for Windows Hello or something to run without the main processor working.

On Android SoundTrigger HAL works in same way, there is hotword detection in DSP without ever engaging the SoC. For Windows Hello to work without ever waking the main CPU they need such a mechanism. Or for Cortana for example.
In my opinion though, using DSP for face recognition would be too much work, better leave to NN which can be coded by somebody without a master degree in Signal Processing, use the DSP only for image processing.
 
Last edited by a moderator:
Microsoft has a really big surface pro change in the works. I just don't know if it will be 8 or 9 at this point. AMD in surface laptop has also been really successful and we might see it come to the pro and also the book in the nearish future.

<2cents>
I think if MS is happy with the partnership with AMD for XSX|S, they might be looking at another custom SoC.
VGH has HSP which is probably the first SoC integrating MS IP.
I bet VGH for some device got canned when it seems it will be falling way short of the likes of M1.
A Semi custom device with basically all the blocks found in a modern Snapdragon or Exynos, complete with TZ, Image/Sound DSP, Sensor Fusion, NN block and few more is what they would have ordered, hence the collab with Cadence. Using the N5 and the latest Zen4 for example.
There is no way the likes of Lenovo using run of the mill CPUs can put out something on the PC ecosystem that can get any where close to what Apple for example can offer.
They might feel the need to take the matter in their own hands.
</2cents>
 
Steam Deck has better RAM specs than initially thought, and that's awesome.

https://www.gamesradar.com/updated-steam-deck-specs-have-even-better-ram-than-we-thought/
Locuza offered a more pointed comparison in a follow-up tweet. According to their calculations, this memory setup will afford the Steam Deck GPU more GB/s per teraflop (an increasingly common metric for raw computational power) than what the PS5 and Xbox Series X can deliver. It's not necessarily a tremendous margin, and the new-gen consoles are unquestionably more powerful overall – you can see how the Steam Deck compares to Nintendo Switch, PS5, and Xbox Series X in our breakdown – but simply put, the Steam Deck's memory is punching above its weight.

It is actually quad-channel 32-bit LPDDR5, which makes a huge difference to the overall system performances on Ryzen-based computers.

 
Battery life?

But power users will surely sacrifice parts of the convenience and ease of use to customize this PC to their liking.
actually I'd say that battery life wise, Windows 11 might be a much better choice.

Not 'cos Linux has a problem with that but because the same game on Linux usually runs worse than on Windows.

Whether you use a locked or unlocked framerate, Linux consumes the same or more power with less fps.

If you lock the framerate at 60fps then the advantage for Windows should be more noticeable, 'cos with unlocked framerate both would use as much power as the system can deliver.

Bero Tech channel has a lot of videos on Linux vs Windows performance comparisons. You can judge for yourself:

https://www.youtube.com/user/beronori/videos




 
Last edited:
Back
Top