If you only need 1gb of RAM to do a job and you end up using 2, that's not "maximising your cache", that's being lazy.
Oh obviously; my point is that if your working set is < your physical memory, there should be *no* paging, which is vacuously true. So granting that the OS has stuff in it that while you're using it is going to take up a chunk of memory, I'd like that memory to be resident when there's nothing else that needs it. If the OS is now speculatively not pulling in stuff that I may well need (that can be instantly paged out if it is never touched), and when I do need it I have to wait for it to get paged in, that's a step backwards.
And while your usage model may change, it's almost certainly remarkably predictable when you look at memory accesses on a page table level. That's why "boot time optimization" is so important, and there's no good reason why that sort of thing can't empirically/heuristically apply to other applications, which is what the Vista pre-fetcher is taking advantage of.
That aside, as far as desktop stuff goes, there just aren't performance issues at all on modern hardware. Any Core 2 with 2GB RAM (or even 4... it's <$50 now!) and a $10 graphics card shouldn't be seeing any discernible slowdown in typical user applications. I don't see how you'd be seeing any differences with your custom build unless you're running on fairly low-end hardware.
Also liking the
anecdotal Win7 evidence.
Cool! And actually that's precisely the type of data that - while anecdotal - is a lot more useful than Task Manager numbers after boot.
If possible, OS should also keep some memory unused in the anticipation of a new process being launched or the existing, active process acquiring more resources (not being forced to page out to make room for new data improves UX).
Ah, but clean (not dirty) physical memory pages are "free" to page out, so excepting trade-offs with pre-fetching IO and so forth, it's always better to have "potentially something useful" there rather than "guaranteed nothing useful".
First: knowing how much memory background services are consuming is an indication (not 100% reliable but a reasonable one) of how much physical memory will be available to your foreground app.
I disagree with this. The OS needs to allocate lots of virtual address space for various things - and even uses a good chunk of that while it's compositing the desktop for instance - but for instance if a full screen game is running it rarely needs to touch any of this memory and it can be safely paged out. Even for a non-game/non-exclusive mode application there is a much smaller working set of the OS that should need to be touched than if the user is actively interacting with OS-level services.
(Note that there was a very similar situation when the .NET-based control panels launched with the ATI drivers and people freaked out about the numbers in the task manager, while in reality all of that memory was efficiently paged out when required.)
Second: knowing how working sets in Win7 compare to working sets in Vista can give you some idea of how the load of background tasks changed.
Maybe, but nothing says that in Win7 the processes that are presumably using less memory after boot aren't just going to go ahead and allocate/start using an arbitrarily large amount of memory when their services are called upon. For instance one could theoretically "save" a chunk of memory by delay-loading most of the OS services, but it's not actually solving any problems and is indeed slowing things down in the long run.
Again I'm not sure much can be gleaned from looking at a snapshot of virtual/physical memory resources after boot. As an application/service writer, I could make that look arbitrarily different with no relation to system responsiveness (either idle or under load).
There are at least two more important things to consider: the amount of IO generated by background threads and whether or not there are some improvements in "idle state" detection.
I too would love to see info on that sort of thing in Windows 7, but I'd like to see the theory first. There are just too many variables to consider in a modern OS/virtual memory implementation for a few performance counters to be very meaningful IMHO.