What to expect from Adreno 225 & Krait

Discussion in 'Mobile Graphics Architectures and IP' started by french toast, Jan 5, 2012.

  1. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    AFAICT, 225/305/320 have 2/1/4 TMUs respectively. So with only twice as many TMUs and possibly not twice as much effective memory bandwidth, ~2x performance seems like a good guess to me given that Adreno 2xx was already very strong in the ALU department. That doesn't mean it cannot be a better/more balanced/whatever architecture for other reasons but it seems to me that Qualcomm's performance estimates make sense.
     
  2. french toast

    Veteran

    Joined:
    Jan 5, 2012
    Messages:
    1,667
    Likes Received:
    9
    Location:
    Leicestershire - England
    Well i supose i was comparing the 225-320 projections to the ARM MALI 400mp4 -T604 which was '5' times and then the T658 which they say is between 2-4 times faster than that with the same number of unified shaders as Adreno 220!?:???:

    According to Anand the 220 has alot of performance unlocked under the hood, so even with the same TMU's and same amount of shaders, the increase in efficiency that a new achitecture should bring would bring 2x improvement with out any new hardware..just my take, and thats not bringing IMG TECH 5 series - Rogue into it.

    Never the less i agree Qualcomm usually delievers a solid 2x when they say they will.
     
  3. metafor

    Regular

    Joined:
    May 26, 2010
    Messages:
    463
    Likes Received:
    0
    The Adreno design hasn't changed much since the first Snapdragon. Additional ALU's have been added and frequencies have increased but the base shader architecture and (relatively bad) drivers have remained. This allows performance increases to be relatively predictable compared to others.

    The 300 series will be the first time a new shader architecture will be used. So we'll see how well that fares. A surprisingly big limitation thus far has been the CPU's to bin scenes. That can be taken care of either by brute force (faster CPU) or just having the driver filter out times when someone calls draw 50k times more than they should.
     
  4. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Why not just add some binning hw?
     
  5. metafor

    Regular

    Joined:
    May 26, 2010
    Messages:
    463
    Likes Received:
    0
    Idk, blame Canadians?
     
  6. argor

    Newcomer

    Joined:
    Nov 25, 2008
    Messages:
    96
    Likes Received:
    0
  7. Lazy8s

    Veteran

    Joined:
    Oct 3, 2002
    Messages:
    3,100
    Likes Received:
    19
    Between the jump in clock speed and the refined drivers, graphics on S4 platforms can definitely compete with Tegra 3.
     
  8. french toast

    Veteran

    Joined:
    Jan 5, 2012
    Messages:
    1,667
    Likes Received:
    9
    Location:
    Leicestershire - England
    I think the only reason NVIDIA is able to put out any competitive graphics chips with such crummy innards, is because they are so much better than every one else with their drivers, especially in mobile space where no one else has the experience of a gpu dog fight:smile:

    I would love to see Nvidia write drivers for Adreno....i think we would be suprised...
     
  9. ToTTenTranz

    Legend Veteran

    Joined:
    Jul 7, 2008
    Messages:
    12,144
    Likes Received:
    7,108
    There, I fixed that for you.
     
  10. Wishmaster

    Newcomer

    Joined:
    Nov 16, 2008
    Messages:
    238
    Likes Received:
    0
    Location:
    Warsaw, Poland
    hahaha, so true :smile:
     
  11. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,468
    Likes Received:
    187
    Location:
    Chania
    NVIDIA's marketing claimed two things when they started out with the initial Tegra:

    1. Tiling stinks for anything >DX7.
    2. USC is questionable for anything up to DX9.

    No it wasn't worded exactly like that, but I don't recall the claims word for word either. Marketing wash aside and any funky excuses in the direction "what we don't have sucks", I am always trying to keep a broader perspective just in case. Assuming there's some truth to both and ULP GFs in Tegras not having posed as "weak" in terms of performance, I'd like to stand corrected that it might not be any truth whatsoever behind the above. If it should be then of course is it software related but not in the exact same sense as implied so far.

    One good indication that could point in the above direction is if USC based tilers of the current generation suddenly pose quite a bit better against Tegra GPUs under OGL_ES3.0.

    Have a look here: http://www.codeplay.com/company/partners.html

    What would a company like Qualcomm need third party graphics compilers for if things aren't as complicated as I suspect them to be?

    Definitely; but I'd still expect from 8 Vec4 USC ALUs @400MHz to deliver quite a bit more than 2 Vec4 PS + 1 Vec4 VS ALUs @520MHz.
     
  12. french toast

    Veteran

    Joined:
    Jan 5, 2012
    Messages:
    1,667
    Likes Received:
    9
    Location:
    Leicestershire - England
    Ha:smile:
     
  13. darkblu

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,642
    Likes Received:
    22
    I'd rather see the original statements you got those impressions from : )

    But I'll still comment on 2: the only truth it may have would be entirely power-savings related (e.g. limiting fragment cores to only mid/lowp, etc). Performance-wise, it's generally wrong - I'd always take a gpu that has N+M USCs over one that has N vertex and M fragment cores.
     
  14. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,468
    Likes Received:
    187
    Location:
    Chania
    http://forum.beyond3d.com/showpost.php?p=1267164&postcount=2

    My memory isn't as weak as I think most of the times.

    I won't disagree one bit; however what I'm asking here is if (apart from the above) it could be that <DX10 equivalent environments (and specifically OGL_ES2.0) could be complicating things for USCs. It doesn't make sense from where I stand, but it doesn't hurt to ask.

    Nothing directly comparable of course, but one thing that made me raise an eyebrow was computerbase's recent article of how older and newer GPUs of different generations compare nowadays: http://www.computerbase.de/artikel/...rten-evolution/3/#abschnitt_leistung_mit_aaaf

    They've used a collection of games that were released from 2005 up to recently and while I recall the G80 to have a sizeable difference against the G71 with AA/AF, it never seemed to be as big as this review shows. Almost 6x times the difference with AA/AF for just one generation and it coincidentially being the turn between DX9 and DX10 doesn't sound like a coincidence.
     
  15. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    I don't get nVidia's argument that increased per-vertex state adds to binning overhead. Binning should only have to touch coordinate data, so no increase there.. and on a tiler that's a good incentive to keep the coordinate data from being interleaved with everything else.

    There probably is a much lower fragment to vertex ratio on newer games, I'll give them that. But I wonder what the tiling they used for evaluation was like, vs IMG's.
     
  16. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    595
    Likes Received:
    18
    Location:
    UK
    I'm guessing that as a tiler can revisit multiple states from tile to tile they incorrectly assumed that this would significantly impact bandwidth when the reality is that it remains a tiny proportion of overall BW.

    The vertex BW "issue" is also typically hugely overstated by IMR guys. Although vertex BW has increased at the same time pixel related BW has typically increased by an order of magnitude more which tends to sway things even more in favour of a TBDR.
     
  17. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I guess the binning scheme they evaluated stored the vertex attributes to the parameter buffer as well and not just position.
     
  18. mboeller

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    923
    Likes Received:
    3
    Location:
    Germany
    And we are still patiently waiting for the promised article about the bandwidth advantages of an TBDR. :)

    edit: ARRGGG..stupid missing "i" :oops:
     
    #38 mboeller, Feb 2, 2012
    Last edited by a moderator: Feb 2, 2012
  19. darkblu

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,642
    Likes Received:
    22
    Apologies, but I just had to ask as my memory plays the occasional trick on me.

    IMO USCs handle gles2 just fine (keep in mind gles2 is just a very streamlined gl). The level of differentiation that USCs provide over split architectures is rather orthogonal to what most modern APIs expect from the hw. Perhaps the one API requirement most relevant to USCs is that the order of draw calls (and related draw state changes) should be effectively preserved on the output as that order arrives from the client on the input (which apparently could be a stronger limitation for scene capturers than for USCs). But drivers and hw are usually free to do whatever they like in the span between client's draw/state emits and fragments reaching the framebuffer (which is why scene capturers exist in the first place). Now, USCs, by virtue of being more flexible workload schedulers, might face the dilemma of 'Will this thread I can schedule right now be a problem WRT framebuffer's consistency to draw emit order?' more often than split architectures. But I'm yet to see a combination of driver and hw that manages to break there, as these are things that are usually take care of with high priority.

    I think the above could be mainly attributed to the advancements of FSAA and AF implementations during that timespan.
     
  20. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,468
    Likes Received:
    187
    Location:
    Chania
    8th November 2006 (G80 launch):

    http://www.computerbase.de/artikel/...orce-8800-gtx/28/#abschnitt_performancerating

    2560*1560
    W/o AA/AF
    8800GTX = 7900GTX+93%
    With AA/AF
    8800GTX = 7900GTX+137%

    Quite a difference to today's 3x times 1xAA/AF and ~6x times 4xAA/16xAF

    But anyway thanks for the detailed explanation above.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...