Another NV3x series and multiple precisions thread

I disagree. The only time FP32 all the time makes sense is it your pixel shaders require FP32 precision, and it is rare that they do given their length. Perhaps if people start running really long pixel shaders with FP textures as the default case it will make more sense. Other SIMD architectures have preserved the ability to do multiprecision even as they matured. If you know you only need an 8-bit or 16-bit add, then you can do 2x or 4x the work in the same space.

You should optimize for the common case, and the common case today appears to be short and medium shaders that don't require a lot of precision, but require alot of performance.

(p.s. Yes, there are pathological cases where even a very short shader would need high precision. )

Even you recognize the case for integers. You could, in fact, reuse FP registers for counters, but of course, you would agree that's it's a waste, especially if it ties up an FP unit. Well ditto for simple shaders.

NVidia's problem is one of poor implementation, not really a poor idea. I argued months and months ago that writing a compiler or driver optimizer to handle these things wasn't simple, and that's one of the main reasons for Cg (too bad their compiler implementation is a hack, crappy optimization)

FP32 all the way really just makes sense from an economic/development point of view: simplifies the silicon and simplifies the driver.
 
Humus said:
We will need a integer type too though (not talking fixed point) for loops and stuff.
Well, unless there's an increment instruction (for fast incrementing of values), integer types are not really necessary for loops. Floating point values will work just fine as well. Of course, if you're talking about one or more specialized registers meant to be used exclusively as counters. . . that's a different case entirely.

Here's a thought: do you think the GeforceFX's floating point capabilities would be so limited as they are if NVidia decided not to include support for FX12?
 
that's the million dollar question . Why int12....? i think to even begin to justify it's inclusion it would have to have been patently obvious (atleast to Nvidia) that a significant majority of operations of any given Pixel Shader would be run using the lessor precision. It doesn't seem to get you too the quality level most ppl. seem to be expecting at this point though. That could be because the R300 really really blind-sided Nvidia to the point were ATI raised the bar higher than they anticipated the expected normal quality or speed would be. Only the engineers know the true reason becuase it doesn't seem like a cost related decision.
 
Ostsol said:
Here's a thought: do you think the GeforceFX's floating point capabilities would be so limited as they are if NVidia decided not to include support for FX12?
At the same time, do you think the FX would have the same peak shader processing power if not for the support of FX12?
 
Chalnoth said:
Ostsol said:
Here's a thought: do you think the GeforceFX's floating point capabilities would be so limited as they are if NVidia decided not to include support for FX12?
At the same time, do you think the FX would have the same peak shader processing power if not for the support of FX12?
Well, I'm not implying FX12 is what is holding FP performance back, but I think that if NVidia had decided not to support FX12 they would have "spent" those transistors in a way that would improve FP performance. That's somewhat of a simplification, but it makes sense to suppose that without FX12 NVidia would try to ensure that their card could do everything in floating point precision shaders (PS2.0) approximately as fast as it could do the same thing in an integer based shader version (PS1.x, of course).

EDIT: I'm still a little confused, though. Obviously, I know nothing about how pixel shaders are implemented in hardware. Is it that PS versions are really nothing more than standards and all a GPU really supports is the highest version, with everything else being emulated? Thus, the GeforceFX would translate PS1.x code into PS2.0 code and use integer precision. The R3xx would do the same, but stay with floating point precision. However, that would mean that even PS1.x on the R3xx would retain the high dynamic range of PS2.0, rather than values being clamped to -1/1 or -8/8 (PS1.4). . .
 
DemoCoder said:
FP32 all the way really just makes sense from an economic/development point of view: simplifies the silicon and simplifies the driver.

I think those two reasons are good enough to go along that route.

The additional cost of supporting multiple precisions is quite high for floating point. For fixed point math it's very cheap to implement multiple precisions, not so for floating point. To support fp16 along with fp32 you basically need to add a full fp16 unit and may reuse parts of the fp32 unit for the other fp16 operation.
 
Ostsol said:
Here's a thought: do you think the GeforceFX's floating point capabilities would be so limited as they are if NVidia decided not to include support for FX12?

Maybe not. FX12 is kinda superfluous. Overall I think the GFFX archetecture is a little odd, tries to do too much, keeps too much legacy stuff around. I think it should have dropped FX12, emulated the register combiners with fragment programs instead of smacking that on top of everything, dropped fixed function T&L. Having two precisions is a reasonable inclusion, but I think it would be better at this time to just go with one.
 
Humus said:
Maybe not. FX12 is kinda superfluous. Overall I think the GFFX archetecture is a little odd, tries to do too much, keeps too much legacy stuff around. I think it should have dropped FX12, emulated the register combiners with fragment programs instead of smacking that on top of everything, dropped fixed function T&L. Having two precisions is a reasonable inclusion, but I think it would be better at this time to just go with one.
Well, now that the NV35 architecture apparently shows very little performance drop when changing all FX12 ops to FP16 ops, it is conceivable that those FX12 units were updated to FP16 units. Unless one is dealing with very long shaders where recursive errors are problematic (such as the Mandelbrot shader), this is probably a very good architecture. Peak shader performance will still be higher than an 8 PS op per clock architecture while keeping full dynamic range.
 
Humus said:
Maybe not. FX12 is kinda superfluous. Overall I think the GFFX archetecture is a little odd, tries to do too much, keeps too much legacy stuff around. I think it should have dropped FX12, emulated the register combiners with fragment programs instead of smacking that on top of everything, dropped fixed function T&L. Having two precisions is a reasonable inclusion, but I think it would be better at this time to just go with one.
You make it sound rather hard to believe the NV30 is really an entirely new architecture. . .
 
You know what someone said to me recently, that I found a little surprising: the texture address processors in NV25 were already float. Given the fact that NV30 used its floating point processors for texture address generation I'm beginning to wonder just how new it is as well...
 
Chalnoth, two things:

  • This question:

    Chalnoth said:
    Ostsol said:
    Here's a thought: do you think the GeforceFX's floating point capabilities would be so limited as they are if NVidia decided not to include support for FX12?
    At the same time, do you think the FX would have the same peak shader processing power if not for the support of FX12?

    is based on the NV30's FX12 decision not being a mistake, but necessary for performance.

    And it directly contradicts this assertion:

    Chalnoth said:
    Well, now that the NV35 architecture apparently shows very little performance drop when changing all FX12 ops to FP16 ops, it is conceivable that those FX12 units were updated to FP16 units.
    ...

    It seems simple: fp16 and fp32 is a good decision, fx12 and fp16 and fp32 was not. To me, it seems illogical to simultaneously propose that "FX12 was necessary and not wasteful" and "the NV35 is able to improve a similar design significantly", and isn't even consistent with what nVidia themselves has recognized.

    I'm not sure at all why a persistence in defending FX12 with greater transitor count, higher clock speed, and many shader limitations being necessary to show advantage versuse the R300 made much sense before, and I don't understand at all how you propose it makes any sense now when a viable alternative in the same family is apparently ready to be delivered within similar transistor budget: the NV35. The NV35 just simply seems to remove question of whether the NV30 was a wasteful design, unless there are hidden drawbacks (doesn't seem too likely).
  • Another comment:

    Chalnoth said:
    Well, now that the NV35 architecture apparently shows very little performance drop when changing all FX12 ops to FP16 ops, it is conceivable that those FX12 units were updated to FP16 units.

    Why do you still persist in concentrating on the peak performance of the NV3x (NV35 in this case), ignoring the limitations affecting its ability to reach its peak, ignoring the peak performance of the R3xx (which, btw, is 16 ops if you want to ignore limitations), and then concluding that "shader performance will still be higher than an 8 PS per clock architecture"?

    How common is vec3/scalar non dependent op occurrence?
    How often does it need to access textures?
    How important is granularity of optimization opportunity?
    What role does parallelism play in accomplishing the workload?
    How significant a role do instruction execution performance differences play in performance for a particular shader?

    All these questions are very important and directly relevant for comparison, and they are questions you consistently ignore when you state "12 versus 8" in what seems to me to be a useless fashion...you're quoting maximum versus minimum, and surely you must realize that such a comparison is atleast slightly biased?!

This is before we get into discussing things like:

  • The NV30 introduces an additional dependency on using FX12 to reach the peak...why is FX12 good at all? Surely you can atleast recognizethat the NV35's ability to do better within similar transistor budget is a much better design, so why defend FX12 on a similar design and transistor count?
  • The impact on performance of exceeding register usage restrictions...perhaps likely to have some atleast minor impact on long shaders?

...

For some other (not self-contradicting) statements:

DemoCoder,

Why do we need integer datatypes for looping with integer processing being a subset of floating point processing? I'm assuming the answer is efficiency of hardware utilization for indexing such registers. If I have that correct, why are we looking at integer processing as the only way to efficiently address this looping? How about the idea of being able to use a separate scalar operation instead of requiring a full 4 component unit to be tied up for the same loop incrementing operation? Isn't that even more efficient than a separate 4 component integer unit that can only do the same op for all components? This example is regarding existing designs, R3xx versus NV30-NV34.

Further, if this is indeed efficient, what sense does it make to add an extra integer processing unit for looping, when you can just have the same unit capable to be used for other scalar opportunities, including floating point, by simply allowing it to handle multiple data types on input and output instead of being restricted. This example is regarding actual R3xx, versus hypothetical R3xx with integer only units or hypothetical NV35-alike with the ability to split off separate scalar/vec 3 ops (assuming the architecture doesn't make that too difficult).

Is it just a matter of not considering that the R3xx limitation is only in processing, not input and output, and that its fp processing is only "wasted" for one clock cycle and wastes, at worst, fp24 vec3 for that pipe and clock? I.e., viewing them as "FP only" when they could also be viewed as FP/INT/VEC3/SCALAR "flexible".

Isn't the waste over the entire scene what is significant? Having units that are "int only" and "vec4 only" seems to me to be more likely to be wasteful, and focusing only on the "waste" of "FP only" seems to be ignoring that AFAICS.

...

My view on the NV3x is that it is a "bust" for PS 2.0 in comparison to the R3xx, pure and simple. I do see the possibility of good advantage with data creation operations like SINCOS for aiding in reaching peak throughput, with the NV35 specifically, for long shaders and trying to minimize texture fetches (in contrast to fixed point dependency, which to me seems more likely to depend on a greater proportion of texture usage for effects...the specular normalization for Doom 3 being an example of this, I think), but that's the only advantage and depends on ignoring significant disadvantages as well. Maybe if the NV35 has vec3/scalar optimization opportunity, hidden thus far for some reason, it could excel.

What I think is that the R3xx is a, relatively, poor design approach to build upon for the full PS 3.0/VS 3.0 spec, and that the NV3x approach is a better one. But the NV3x doesn't do the full PS 3.0/VS 3.0 spec, and is a, relatively, poor design for what it does do. I think that nVidia got caught flat-footed in taking their own time in transitioning to the goal of PS/VS 3.0, and their adherence to incremental improvements with performance gained through process implementation, versus ATI's commitment to an, apparently, more extensive re-invention of design, with process dependency being somewhat more secondary, bit nVidia in the rear.
I think NV40 versus "R390" (I think likely to be the more proper name) could easily be PS 3.0/VS 3.0 versus speedy PS 2.0/VS 2.0, with a lot more nice things being able to be said about nVidia compared to ATI's part at that time.

But we're not at that time yet, and the comparative praise that I'm seeing either seems nonsensical to me, Chalnoth, or is something coherent that still seems flawed from what I can see, Demo.
 
My take is still that what really got NV30/31/34 in trouble is that the PS 2.0 and ARB_fragment_program specs ended up far closer to what ATI had in mind than what Nvidia did. Wavey's point that both include hints which allow FP16 is well taken, but even this is (as I understand it) only on a per-shader basis, not per instruction ala NV_fragment_program.

From this perspective, the real problem with NV30/31/34's focus on FX12 is not that it is inherently a poor design but that it matches up poorly with the important low level pixel shader language specs. I don't have any idea when the decisions were made relative to each other, but it would not surprise me if by the time MS and the ARB had made their decisions, it was too late for Nvidia to replace the FX12 functionality of NV30/31/34 with FP16; instead, that move would have to wait for NV35.

demalion said:
What I think is that the R3xx is a, relatively, poor design approach to build upon for the full PS 3.0/VS 3.0 spec, and that the NV3x approach is a better one.

What makes you say that?
 
Why would R3xx's approach be detrimental to a VS/PS 3.0 implementation? The R3xx is flexible enough as it is and seems to be quite scalable.
 
An aside and thoroughly a matter of my own personal opinion about the NV3x design philosophy (as it appears to me) :

NVIDIA listened a lot (perhaps far too much insofar as you geeks here are concerned, but perhaps not enough to the hardcore gamer, who only cares about available games) to the 3dfx guys they took on when they devoured 3dfx.

:)
 
I didn't mean to say detrimental, I meant inferior to NV3x design elements as an approach to achieving that particular end. For the "less than 3.0" end, the R3xx seems to me to be decisively superior.

1) I view a unified shader model as an efficient goal for hardware utilization in and of itself, and my take on the NV3x looks to me like more work towards that end has been done.

2) With conditional branching for both, it seems to me that work along this line is convergent with VS 3.0/PS 3.0 support.


The lack of existence of fp32 is a drawback with regards to 1), where it is not established as one for PS 2.0 or ARB_fragment_program, and the R3xx pixel processing pipeline does not look like it has a focus on 2).


This is exactly the way in which the NV35 is "poor" for PS/VS 2.0...the virtue of an "fp32 checkbox" in the pixel shader without load balancing architecture has not been made clearly evident, especially not with register usage performance limitations (a hurdle for the NV40 as well, AFAICS).

For ATI, what I expect is, as per SOP, ATI will substantially reinvent the architecture to be more optimal for both VS/PS 3.0 and load balancing, and I think they may have been "bitten" in the delay of the "original" R400 when trying to achieve this. I do happen to think they can implement PS/VS 3.0 on a R3xx based design like the part for the end of the year, but I don't think fp32 support can be added in a simple fashion, so it will be somewhat disadvantaged, as well as likely being inefficient without drastic redesign (the only ones who can answer conclusively is ATI, I think).

How much the "R390" is disadvantaged for not supporting both fp32 and PS/VS 3.0, which am presuming, I feel with reason, that it is unlikely to do, would depend on what is done in the lifetime of the product and what the competition was able to deliver with features and performance...given the timeframe of R500 expected delivery, this doesn't look to be much of a disadvantage with the rumored performance expectations, but that depends on NV40 execution.

IMO, I expect the NV3x design elements in the NV40 to either be more useful for the above expectations than a R3xx design, or what ATI calls a R3xx design to be more significantly redesigned than I expect. I do think the R3xx could be modified in this way, but I don't see how while at the same time achieving performance figures.

ATI is free to be amazing again, I guess, but with the R500 timeframe and what PS 2.0/VS 2.0 offers in terms of product inflection points (HL2, Doom 3) in that time period, I don't see why they'd try.
 
demalion-

Correct me if I'm wrong, but PS/VS 3.0 don't present a unified shader model, nor does PS 3.0 require FP32. If I understand you correctly, you're making a leap from the fact that VS 3.0 requires texture lookup functionality to the conclusion that a unified shader is the best implementation to achive PS/VS 3.0. (Which would then require FP32 in the pixel shaders.)

I'm sure I'm missing a lot of the issues here, but this conclusion seems premature IMO. But I'm sure you're thinking of something that I'm not aware of.

2) With conditional branching for both, it seems to me that work along this line is convergent with VS 3.0/PS 3.0 support.

...

and the R3xx pixel processing pipeline does not look like it has a focus on 2)

What makes you say that?

but I don't think fp32 support can be added in a simple fashion

Here I feel informed enough to disagree, for certain values of "simple". That is, converting from FP24 to FP32 ought to be (AFAICT) straightforward; you just add the extra bits to all your functional units, enlarge internal buses and registers to handle the extra bits, enlarge/rebalance your cache sizes, and relayout. Not so tough. The primary negatives are the higher transistor budget, the possibility of a more or less extensive relayout, and a decent amount of tweaking.

That said, I doubt we'll see it for "Loci" (or whatever we're calling that chip these days), but more because I don't see a need for it than because it can't be done. And more to the point because I doubt Loci will support PS/VS 3.0.
 
Dave H said:
Correct me if I'm wrong, but PS/VS 3.0 don't present a unified shader model, nor does PS 3.0 require FP32. If I understand you correctly, you're making a leap from the fact that VS 3.0 requires texture lookup functionality to the conclusion that a unified shader is the best implementation to achive PS/VS 3.0. (Which would then require FP32 in the pixel shaders.)
The only way that I can figure it is that since VS3.0 allows for a texture lookup, there is a need for texture addressing to be performed in FP32. The R3xx already uses FP32 in the vertex shader, of course, but it's texture addressing (as is used in the fragment shader) is only FP24. Of course, this assumes that the instruction will be derived from the same capabilities. Perhaps a texture addressing instruction for the vertex shader would have to be entirely different, which seems likely since the vertex and fragment pipelines are separate anyway. Of course, if ATI could achieve FP32 texture addressing in the vertex shader, why not go the extra step and provide FP32 support in the pixel pipeline?

Of course, there's the other side of this, which says that the address for the VS texture lookup could be converted from FP32 to FP24, thus losing some precision, but being more than enough for texture sampling and meaning that the instruction could be based on already existing technology from the fragment pipeline (actually, it not likely to be nearly so simple).
 
demalion said:
It seems simple: fp16 and fp32 is a good decision, fx12 and fp16 and fp32 was not. To me, it seems illogical to simultaneously propose that "FX12 was necessary and not wasteful" and "the NV35 is able to improve a similar design significantly", and isn't even consistent with what nVidia themselves has recognized.
Well, going for higher-speed FP16 is obviously better, but that does not mean FX12 was necessarily bad, either. Still, Microsoft is making it very hard to get good performance and image quality from the NV31-34 cards through DirectX.

Since I really don't have much information on exactly what kinds of shaders and how many shaders used in real games will need what sorts of precision, I can't give an accurate picture as to whether FX12 was a good decision or not. FP16 is just clearly better (for performance...it has a larger transistor count, which may not have been possible in the NV30's timeframe, esp. given the other development problems). All that I do know is general information on what precision is needed where, an common sense from this knowledge tells me that it will be rare to need FP32 throughout most shaders.

Why do you still persist in concentrating on the peak performance of the NV3x (NV35 in this case), ignoring the limitations affecting its ability to reach its peak, ignoring the peak performance of the R3xx (which, btw, is 16 ops if you want to ignore limitations), and then concluding that "shader performance will still be higher than an 8 PS per clock architecture"?
The implication is that with enough optimization (hopefully available through a HLSL compiler, eventually if it's not there yet), performance close to that peak can be realized. The FX architecture is hard to write assembly for. Hopefully these compilers can help (DX9 HLSL and Cg now, GL2 HLSL later).

As for the coincidence of vec3/scalar and texture/32FP ops, these will prevent the architecture from reaching peak performance, but if other DX9-level games are anything like DOOM3, they won't be enough to drop the optimized shader performance of the FX architecture below R3xx levels.

All these questions are very important and directly relevant for comparison, and they are questions you consistently ignore when you state "12 versus 8" in what seems to me to be a useless fashion...
All I can state is what I know. Real, solid info on the PS2-level shader performance capabilities of the NV3x in real games just isn't yet available. All we have is conjecture.
 
Dave H said:
demalion-

Correct me if I'm wrong, but PS/VS 3.0 don't present a unified shader model, nor does PS 3.0 require FP32. If I understand you correctly, you're making a leap from the fact that VS 3.0 requires texture lookup functionality to the conclusion that a unified shader is the best implementation to achive PS/VS 3.0. (Which would then require FP32 in the pixel shaders.)

Hmmm...I tried to make it clear that I was proposing a unified shader model, executed with a load balancing architecture, as having the virtue of being efficient, period.
Then, I was trying to point out that achieving that efficiently is conducive to executing PS/VS 3.0 efficiently as well, since that exposes a high degree of convergence between the PS and VS, in ways that neither the R3xx or the NV3x achieve.

I'm mentioning fp32 and being "closer" to conditional branching in the pixel shader as indicating to me that the NV3x design requires less rethinking achieve both than the R3xx.

An analogy (not proposed as fact, but as parallel to the way I'm seeing things)...you have two archers aiming at two successively farther targets ("2.0" and "3.0"), and as the observer, we are taking snapshots in time A,B, and C:

ATI is an archer that quickly took aim at the closer target, "2.0", fired, and hit the mark pretty well (R300). They then drew again, and split the arrow, for extra markmanship points, at time A. Ambitious and speedy archer that they are, they then notched another arrow intending to aim at the farther target, "3.0", but happened to fumble a bit in haste (R400) and is trying to recover and re-aim to actually punch through the first target (R390), getting a boatload of archery points at time B. They are also planning ahead to take a breather, mentally reset, and take more careful aim at the farther target to hit that well too, maybe with the same marksmanship bonuses, on or around time C.

nVidia is an archer initially thinking about the farther target, "3.0". Primarily concerned with warming up for aiming for the farther target, they suddenly notice the other archer has scored well, and that they'll have to try and refocus on the near target. They do so, and manage a relatively poor hit on the near target, at time A, when the other archer has gotten off their other arrow. They are still thinking of the farther target, and decide to try and use that focus to quickly notch and hit somewhere on the farther target at time B. However that shot turns out, they'll try to refine the next shot at time C.


In short, what I'm saying is that nVidia's design approach for NV3x looks like it needs less rethinking to be suitable for PS/VS 3.0 than the R3xx's design does. I.e., incremental improvement versus redesign.

I'm sure I'm missing a lot of the issues here, but this conclusion seems premature IMO. But I'm sure you're thinking of something that I'm not aware of.

I'm using this to say something about the modification to design approach required by ATI and nVidia...going from the design approaches I believe are displayed on current hardware, to trying to achieve what I'm thinking of as a good approach to PS/VS 3.0 implementation for a future architecture. Can't conclude about implementation, the arrows haven't landed for time "B", yet. I do think I have reason to conclude about the aiming involved for those shots, though, and my evaluation of where the shots at time A have landed already.

2) With conditional branching for both, it seems to me that work along this line is convergent with VS 3.0/PS 3.0 support.

...

and the R3xx pixel processing pipeline does not look like it has a focus on 2)

What makes you say that?

Comparing PS 2.0 extended flow control to base PS 2.0 and MRT. I evaluate the the latter able to usefully offer similar functionality to the first, but also consider the first closer to PS 3.0 implementation. VS 2.0 and higher flow control seems closer than both.

but I don't think fp32 support can be added in a simple fashion

Here I feel informed enough to disagree, for certain values of "simple". That is, converting from FP24 to FP32 ought to be (AFAICT) straightforward; you just add the extra bits to all your functional units, enlarge internal buses and registers to handle the extra bits, enlarge/rebalance your cache sizes, and relayout. Not so tough. The primary negatives are the higher transistor budget, the possibility of a more or less extensive relayout, and a decent amount of tweaking.

"Nothing's for free in 3D" applies to everyone, though. :D That's a lot of functional units, don't forget the amount of replication. In any case, simple in that case was referring to the task of trying to achieve performance improvement at the same time within a limited transistor budget. I do think a R3xx based design could manage fp32 in the pixel shader, and it could manage PS/VS 3.0, and it could manage a performance increase. However, I don't see how it can manage all on 0.13 by the end of the year.

ATI seems to indicate the last of that list as a priority for R390, and I'd consider it pretty amazing, even for a continuation of past execution achievements, to achieve anything else on the list at the same time. I think the NV40 is going to have the first two, and its hurdle is the last. The opportunity offered to execute using a load balancing architecture, which to me would help in that regard, seems to me to be one way to do that. Again, limited transistor budget.

That said, I doubt we'll see it for "Loci" (or whatever we're calling that chip these days), but more because I don't see a need for it than because it can't be done. And more to the point because I doubt Loci will support PS/VS 3.0.

Well, I think need for it and whether money spent on it being done are factors that are closely related. I don't remember saying it couldn't be done based on a R3xx design, though, just that it seemed to require more changes to be "PS/VS 3.0" than than the NV3x would. How each would perform with PS/VS 3.0 if it was done is a different question.
 
demalion said:
ATI is an archer that quickly took aim at the closer target, "2.0", fired, and hit the mark pretty well (R300). They then drew again, and split the arrow, for extra markmanship points, at time A. Ambitious and speedy archer that they are, they then notched another arrow intending to aim at the farther target, "3.0", but happened to fumble a bit in haste (R400) and is trying to recover and re-aim to actually punch through the first target (R390), getting a boatload of archery points at time B. They are also planning ahead to take a breather, mentally reset, and take more careful aim at the farther target to hit that well too, maybe with the same marksmanship bonuses, on or around time C.

nVidia is an archer initially thinking about the farther target, "3.0". Primarily concerned with warming up for aiming for the farther target, they suddenly notice the other archer has scored well, and that they'll have to try and refocus on the near target. They do so, and manage a relatively poor hit on the near target, at time A, when the other archer has gotten off their other arrow. They are still thinking of the farther target, and decide to try and use that focus to quickly notch and hit somewhere on the farther target at time B. However that shot turns out, they'll try to refine the next shot at time C.

I think Nvidia has been trying to shoot the Judge. ;)
 
Back
Top