Solid state drives?

randycat99 · Jan 31, 2009

Wow!...this is some neat stuff being exposed in this discussion!

It is also very encouraging that some have actually found ways to make the most out of mlc ssd by careful choice of i/o options.

randycat99 · Jan 31, 2009

Another question I have (given all of the concern over write performance) is- does this JMicron controller issue emerge for just any kind of writing operation or just when the writing falls into a particular pattern? Will the stall behavior be the same if I am writing a thousand 100 KB files, the same 100 KB file onto itself 1000x, or one 100 MB file? Is it the frequency of transactions over time that eventually overwhelms the trash collection or simply the size of the transaction that ends up consuming the "unused" cells at the moment?

I guess the answer is somewhat posed in the question, itself, eh? If I am writing a bunch of files to essentially an empty drive, everything should run swimmingly, as opposed to if I am overwriting existing files with updated ones, this causes a BIG problem? Is that the gist of this problem, or is there no real functional distinction between empty writing and overwriting?

If any of this is the case, does this trouble scenario pretty much match the scenario of any application (as in a "web browser", for example) that happens to rely heavily on temp files as opposed to cache in RAM? Thinking along the same lines for other kinds of applications, a database-oriented application could be pure death for an ssd?...but maybe in a bipolar sense- maybe it would work fantastical if it could somehow be a database that strictly reads records to disseminate information to clients, but would be spectacularly poor if called upon to update and add records on a realtime scale?

Silent_Buddha · Jan 31, 2009

BRiT said:
You mean Gig instead of Meg? Or are you talking at a per chip level so it's Meg? I'm a bit surprised that 8Gig out of 128Gig wouldn't be enough for garbage collection/clean up in typical scenarios.

Ooops yeah I means Gig not Meg. /blush

Regards,
SB

Silent_Buddha · Jan 31, 2009

randycat99 said:
Another question I have (given all of the concern over write performance) is- does this JMicron controller issue emerge for just any kind of writing operation or just when the writing falls into a particular pattern? Will the stall behavior be the same if I am writing a thousand 100 KB files, the same 100 KB file onto itself 1000x, or one 100 MB file? Is it the frequency of transactions over time that eventually overwhelms the trash collection or simply the size of the transaction that ends up consuming the "unused" cells at the moment?

I guess the answer is somewhat posed in the question, itself, eh? If I am writing a bunch of files to essentially an empty drive, everything should run swimmingly, as opposed to if I am overwriting existing files with updated ones, this causes a BIG problem? Is that the gist of this problem, or is there no real functional distinction between empty writing and overwriting?

If you don't do a lot of writes then the issue doesn't come up as frequently. The drive will periodically do garbage collection. However if you browse the web alot, install programs (especially games), etc...you'll end up writing a lot.

In theory if you left say 80 gigs free on a 120 gig SSD, and then used only light activity that wrote to the disk then you may not run into the stuttering much.

An SSD used as a program drive and not a primary drive won't be affected as badly either unless your games writes to the drive frequently. And again leaving a large portion of it free will reduce the frequency of the drive being forced into garbage collection at a bad time.

The garbage collection itself isn't a horrible thing. It's when Windows is trying to write and the SSD goes into garbage collection and windows has to wait for it that is the problem.

Regards,
SB

BRiT · Jan 31, 2009

JMicron controller has no cache, so it seems to more likely occur with any sort of write operations, which is why the OCZ drives using SLC has issues too. The controllers that have cache can accept more writes while the flash unit may be doing the clearing operations. Those without cache end up waiting until the clearing operations finish. This seems to occur during small or large file writes. The no cache and clearing issues are magnified in MLC drives because they need to do more per write than SLC does.

randycat99 · Jan 31, 2009

I guess I am curious if my recent experiences with a usb thumbdrive indicative, at all, of how these mlc ssd's like to be treated? I mean, using a thumbdrive, I have never really encountered any kind of stuttering when transferring files. It just flies whether it is reading or writing, and it is in relatively large chunks of data at a time.

Now if I try transferring a similar amount of data, except they are a mindboggling number of very small files (1 KB, 10 KB, 100 KB), would it still be as fast?

Now what if the thumbdrive isn't even empty, and I am actually overwriting an older version of all of those tiny files?...certain controller-stall death?

...or is it that the thumbdrive will actually perform quite well over any of those scenarios the first time around, but if I leave it connected and mounted long enough, operations will degrade considerably after some threshold of writes have gone over the wire over time? The drive size and wear leveling behavior adds an additional trickiness to it all, I'm sure. Maybe it is just a matter of how quickly/often you pass 128 MB's worth of write data (just picking a number), that determines how often you encounter a trash collection stall situation? Maybe on a larger drive, that "128 MB" number is larger, so conceivably you can write more before trash collection kicks in, but the downside is the trash collection takes commensurately longer when it does happen?

...or are the respective natures of the thumbdrive and ssd pretty far removed, altogether? ...or do the typically smaller sizes of thumbdrives suggest that their behavior all along would have been more similar to how slc ssd's behave, anyway?

Silent_Buddha · Jan 31, 2009

Here's the thing. With flash drives and wear leveling algorhythms. It fills the entire drive before it goes back to rewrite cells that weren't written to this cycle. Before it can write to a new cell the cell must be emptied (garbage collection).

When you delete a file in windows the flash drive doesn't know the information is no longer valid. Deleting a file on a flash drive just marks it as deleted. Windows has no way of implicitly telling a flash drive that those cells are no longer needed and to go ahead and clear them.

So if you write to the entire drive before the drive has a chance to do garbage collection that's when you run into serious stalls. In otherwords you are forcing the drive to do garbage collection because all flash cells have been written to and there are no free cells to write to. Thus it has to start clearing cells (a very slow process) in order to write to them. This is where a pathologically continuous write cases cause even the Intel MLC flash drive to go into stuttering long pauses. As it overwhelms both the reserved (unformated) flash cells as well as the memory cache. It takes a LOT of continuous writes on the Intel drive to do this BTW as they reserve far more cells for garbage collection than other manufacturers.

If you don't write a lot then when the drive periodically does it's garbage collection, you won't see it as often. As the drive will try to do this when the computer is idle I believe.

In regards to a USB flash drive, it's unlikely that you are continuously writing to it.

Regards,
SB

randycat99 · Jan 31, 2009

So I guess what is happening here is that the garbage collection operation is sort of like a "reformat" operation that happens on some portion of the drive, ahead of what you have used already. The cells are essentially read-only until a reformat operation has passed over them. The reformat operation is a relatively slow operation, so it is hoped that it is allowed to happen "invisibly" during an idle period, but if no such opportunity comes about, it will definitely be noticed?

That is where the traditional hdd demonstrates a significant advantage, in that it offers large capacity and any given memory block can be overwritten with new data on-the-fly (albeit, at typical hdd speeds)?

The hdd is more like a true random access read and write system (across the entire capacity, at least), while the ssd is more like a random access read and sequential/revolving write system plus a tracking reformat-ahead partition?

MfA · Feb 1, 2009

It's just an artefact of implementation ... nothing fundamental.

hoom · Feb 1, 2009

Need better solid state tech.
One that both reads & writes fast & has decent rewrite life.
Like MRAM...

pcchen · Feb 1, 2009

randycat99 said:
So I guess what is happening here is that the garbage collection operation is sort of like a "reformat" operation that happens on some portion of the drive, ahead of what you have used already. The cells are essentially read-only until a reformat operation has passed over them. The reformat operation is a relatively slow operation, so it is hoped that it is allowed to happen "invisibly" during an idle period, but if no such opportunity comes about, it will definitely be noticed?

That is where the traditional hdd demonstrates a significant advantage, in that it offers large capacity and any given memory block can be overwritten with new data on-the-fly (albeit, at typical hdd speeds)?

The hdd is more like a true random access read and write system (across the entire capacity, at least), while the ssd is more like a random access read and sequential/revolving write system plus a tracking reformat-ahead partition?

Rewrites in flash works quite like in HDD: they are block based. You can't just rewrite one byte, you have to erase a whole block, and then write the new data. So basically if you just want to change one byte of a block, you have to read the whole block back, erase the block, then write the block back with that byte changed. HDD works like this too.

However, there is a big difference between flash and HDD: flash can't sustain too many block erases. MLC flash can only sustain about 10k ~ 100k erases each block. If you use a flash just like a normal HDD, this number will be reached very quickly.

To handle this problem, a method is basically to spread the erases (which is called "wear leveling"). For example, supposed that a block needs to be updated, you don't erase that block directly, but you find a new unused block which is "less erased" and erase it instead. Then all future access to this block is redirected to the new block. With this method, you need an internal mapping table to record which logic block maps to which physical block, and you need an algorithm to quickly find a currently unused block which is least erased.

Furthermore, many SSD has spare blocks (more physical blocks than logical blocks) to make sure if some blocks become "unerasble" the user won't notice it.

So, the performance of a SSD is actually best when new. After some uses, the performance will degrade a bit, then it stays there. Therefore, to more accurately benchmark a SSD, it should be done on a used one, not a new one.

GourdFreeMan · Feb 1, 2009

pcchen said:
Rewrites in flash works quite like in HDD: they are block based. You can't just rewrite one byte, you have to erase a whole block, and then write the new data. So basically if you just want to change one byte of a block, you have to read the whole block back, erase the block, then write the block back with that byte changed. HDD works like this too..

Almost all hard drives produced in the last 30 years use a 512 byte sector size. (There have been proposals in recent years to move to a 4kib sector size).

Most modern filesystems (e.g. NTFS and ext3) use a 4 kib cluster size by default. Doing random writes with a convention hard drive thus doesn't incur any extra write penalty beyond what the file system imposes.

Testing of early MLC SSDs such as the OCZ Core series suggested the first generation JMicron 602 controller used an internal 128 kib block size. This means writing a single 4 kib cluster, actually means doing 32 times as much writing than would be necessary.

SSDs based on Intel and Samsung controllers don't have this problem, as they include more onboard RAM to permit a smaller block size. I haven't seen any testing done yet on the new JMicron 602b based SSDs, so I cannot comment on whether newer drives have this issue.

BRiT said:
JMicron controller has no cache, so it seems to more likely occur with any sort of write operations, which is why the OCZ drives using SLC has issues too. The controllers that have cache can accept more writes while the flash unit may be doing the clearing operations. Those without cache end up waiting until the clearing operations finish. This seems to occur during small or large file writes. The no cache and clearing issues are magnified in MLC drives because they need to do more per write than SLC does.

AFAIK, no commercial SSD uses onboard DRAM as a write-back cache between the OS and drive. Doing so would permit data loss if there were a large number of random writes pending and there was a power loss. The RAM on the Intel controller is used to permit a smaller sector size and to implement an improved write-leveling algorithm. Anand's article mentions this is true for the Intel SSDs, and I presume this is true for the Samsung controller as well.

pcchen said:
So, the performance of a SSD is actually best when new. After some uses, the performance will degrade a bit, then it stays there. Therefore, to more accurately benchmark a SSD, it should be done on a used one, not a new one.

This is very true. However, another issue is involved. The internal block map of blocks with free space and spares used in write-leveling can become fragmented, degrading the performance of sequential writes. AFAIK all existing SSDs are eventually affected by this issue. Here is a link, showing the effect on even Intel's premium X25-E SSD.

---
beyond3d lies technical marketing

Whoops, my post (which I presume is still pending moderator approval) should read KiB not kib.

<mutter>Damn fool IEC kowtowing to the telecoms! I remember when a mb was a MB and a kilo depended upon who you were talking to...</mutter>

---
beyond3d lies technical marketing

randycat99 · Feb 2, 2009

Seems like growing file fragmentation can further exacerbate this procedure of respawning data blocks anew with subsequent changes of data.

It also illustrates my suspicion on how dramatically different the situation is between writing "a" 1 GB file of entirely new data onto ssd vs. updating 1 GB's worth of assorted, existing 100/500/1000 KB files already on an ssd. If there is significant fragmentation involved, it only increases the number of blocks involved (easily by a factor of 2, more likely even higher as in 5x, 10x, 100x?) in the management process just to change a logical x amount of data.

That does come about as most ironic given that file fragmentation is something that most would expect to be the achilles heel to any physical/mechanical-based storage device. Yet, it becomes an even stronger penalty to theoretical performance on an ssd that is the epitomy of no moving parts and random access at the speed of electrons and transistors.

BRiT · Feb 2, 2009

File fragmentation is not an issue on SSD since the random access times are so quick.

The only part of what you said that's correct is updating files on SSD has higher costs than writing a file once and never updating it.

randycat99 · Feb 2, 2009

File fragmentation could be a problem, but not for the same reasons that it is a problem in the classic hdd. If the fragmentation increases the number of blocks that need to be updated in order to update one changed file, then that puts more load on the whole disk write process. It's not about disk access times, at that point, rather accelerating the onslaught of the read block-change contents-write to new block process (and subsequent garbage collection operation).

It is agreed, though, that fragmentation would remain irrelevant for any read-only operation on an ssd.

BRiT · Feb 2, 2009

I see what you mean now. Initially I thought you were talking about fragmentation of a single file as it's layed out on the disk. However what you meant was SSD block level fragmentation. I really don't like that name, as I'm not sure the name 'fragmentation' applies, more like Block Splitting. (Or something else). If multiple files make use of the same block and one file needs to be updated it could cause issues. However, if it was smart enough, it wouldn't move the pages that weren't modified. That may minimize block usages.

Also given a Page = 4K and a Block = 64 to 128 pages (256K - 512K), one shouldn't use filesystem cluster sizes below the page size, if it's even possible.

Silent_Buddha · Feb 2, 2009

It's not the wear leveling that causes a slowness in writing data.

MLC flash cells take an extremely long time to erase compared to SLC flash cells. They also take slightly longer to read, but the read difference isn't nearly as bad as the write difference.

And compared to the speed of a HD erase. A HD would be like a waterfall while the Flash cells would be like a glaciar.

And that's one of the reasons garbage collection is done with there is very little activity. The stuttering comes about when the flash drive is forced to erase cells before it can write to them.

In other words a new drive will always be fast because all cells are empty. If you fill up the drive before garbage collection can erase cells, then you get massive pauses.

If you have a large enough memory cache, then small erases can be hidden. The cache will just hold the data until enough cells are erased then write to them. This is one of the main benefits of the Intel MLC drives.

However, if the writes are both large and prolonged, even the cache on the Intel drive will not be able to hold all the data and you will then start to experience pauses.

So that's why they also reserve part of the flash drive as "unformated." For example if you have a drive with 128 GB of physical flash memory but it's only "formatted" to show say 80 GB. If you wrote your firmware well, then the drive will still write to all 128 GB, but only 80 is available for actual storage.

I'm sure many people are now going..."What?" It's simple and elegant. What happens is that instead of forcing garbage collection to kick in and erase used cells for new data, the drive instead just continues writing to (hopefully) empty cells. The controller will keep track of what is and isn't available to the OS.

In this way, if you have a need for prolonged writes and deletes, it'll take far longer for the drive to be forced into garbage collection rather than doing garbage collection when activity is low.

Some may consider it wasted space, but if you need performance above all in a write intensive environment it's the only way to go. Some of the more expensive professional systems use up to and sometimes more than 50% of the Flash drives for this purpose.

Regards,
SB

pcchen · Feb 2, 2009

To my understanding, NAND MLC erases pretty quickly, but NOR MLC is dead slow (in the region of several hundred ms).

Of course, it still somewhat related to wear leveling: if there's no need for wear leveling, one can always erase a whole page when writing something and load everything to another page. There would be no need for garbage collection.

Mintmaster · Feb 2, 2009

randycat99 said:
File fragmentation could be a problem, but not for the same reasons that it is a problem in the classic hdd. If the fragmentation increases the number of blocks that need to be updated in order to update one changed file, then that puts more load on the whole disk write process. It's not about disk access times, at that point, rather accelerating the onslaught of the read block-change contents-write to new block process (and subsequent garbage collection operation).

Okay, but I don't see how many small files is any different from one big file. All that matters is how much is being modified. 100kb being changed in a fragmented 1 GB file is the same as one out of 10,000 100kb files being changed.

Mintmaster · Feb 2, 2009

pcchen said:
Of course, it still somewhat related to wear leveling: if there's no need for wear levelinifg, one can always erase a whole page when writing something and load everything to another page. There would be no need for garbage collection.

If you look at that Micron blog, you can see that you can't erase one page. You have do erase a block of 128 pages, i.e. 500KB.

Your method would require reading everything in a block and re-writing it, so at 100MB/s any write would take a minimum of 10ms. It'll change a bit for different reading/writing/erasing throughputs, but this is in the vicinity of HDD times. Writing can be done one page at a time, though, so using garbage collection it improves by a factor of 128, and now you see the big performance benefits.

Maybe ~10ms for writes is acceptable but the up to 128x increased wear is the showstopper for this type of scheme, but not IMO. Even if wear wasn't an issue, a 10ms write product is not going compete well against a 0.1ms write product (assuming stuttering from garbage collection is properly addressed in the latter).

Finding unused blocks is very easy. Even bubble sorting a 256,000 entry list after each write is a piece of cake at these time scales, because each block's entry is only going to move a few places after the wear counter is incremented.

Solid state drives?

randycat99

randycat99

Silent_Buddha

Silent_Buddha

BRiT

(>• •)>⌐■-■ (⌐■-■)

randycat99

Silent_Buddha

randycat99

MfA

hoom

pcchen

Moderator

GourdFreeMan

randycat99

BRiT

(>• •)>⌐■-■ (⌐■-■)

randycat99

BRiT

(>• •)>⌐■-■ (⌐■-■)

Silent_Buddha

pcchen

Moderator

Mintmaster

Mintmaster

Similar threads