Discussion:
Kingston SSD filesystem corruption
Jeppe Øland
2011-08-09 15:33:12 UTC
Permalink
Hi all,

About a year ago, I switched to running the full pfSense 2.0 (beta
something at the time) on a Kingston SS100S2/8G embedded SSD.

Since then, every 3 months or so I noticed (in connection with
installing a new release) that the filesystem was corrupted, and I
have had to format the drive some 3 times in that year!

I believe I read somewhere that some of the earlier beta/RC's had
filesystem corruption issues, so I guess that *could* be the issue.

I have also noticed that the system log occasionally has this line:
ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4892879
(The LBA is fixed ... it only changes if I reformat/reinstall).

Apparently I'm not the only one with problems using this drive (as
evidenced by the low-feedback scores here. Some are pfSense users):
http://www.newegg.com/Product/Product.aspx?Item=20-139-427
http://www.newegg.com/Product/Product.aspx?Item=20-139-428

Does anybody have more information about this issue.
Is there a problem with the motherboard?
Is the drive crap? (Kingston support isn't exactly helpful).
Are there known problems with AHCI or NCQ?

Any suggestions would be much appreciated.

Regards,
-Jeppe

---------------------------------------------------------------------
To unsubscribe, e-mail: support-unsubscribe-***@public.gmane.org
For additional commands, e-mail: support-help-***@public.gmane.org

Commercial support available - https://portal.pfsense.org
Jim Pingle
2011-08-09 15:47:51 UTC
Permalink
Post by Jeppe Øland
About a year ago, I switched to running the full pfSense 2.0 (beta
something at the time) on a Kingston SS100S2/8G embedded SSD.
Since then, every 3 months or so I noticed (in connection with
installing a new release) that the filesystem was corrupted, and I
have had to format the drive some 3 times in that year!
I believe I read somewhere that some of the earlier beta/RC's had
filesystem corruption issues, so I guess that *could* be the issue.
ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4892879
(The LBA is fixed ... it only changes if I reformat/reinstall).
If the LBA id fixed, that usually (but not always) points to an issue
with the disk itself. If it's random, then one might suspect the
controller/cables/memory/etc. Doing a full read/write test on the drive
might be helpful. I'm not sure about that model and how it reports
things, but the S.M.A.R.T. data might be useful. Though a good report
from SMART doesn't always mean everything is fine and dandy, if it shows
issues it's generally correct.
Post by Jeppe Øland
Does anybody have more information about this issue.
Is there a problem with the motherboard?
Is the drive crap? (Kingston support isn't exactly helpful).
Are there known problems with AHCI or NCQ?
I'd lean more toward blaming the drive, given the symptoms. Unless you
are seeing a bunch of kernel panics/random sudden unclean reboots, you
shouldn't be getting filesystem corruption.

Jim

---------------------------------------------------------------------
To unsubscribe, e-mail: support-unsubscribe-***@public.gmane.org
For additional commands, e-mail: support-help-***@public.gmane.org

Commercial support available - https://portal.pfsense.org
Tim Dickson
2011-08-09 16:33:15 UTC
Permalink
About a year ago, I switched to running the full pfSense 2.0 (beta something at the time) on a Kingston SS100S2/8G embedded SSD.
I installed the 30G version in 12 systems, all of which failed within 6 months. I moved to Intel 320s and/or WD Greens (depending on budget of the site) so we'll see how they hold up.
I also had the 64G version running Untangle systems which failed as well... in short I would not recommend the Kingston SSDs at all... it's been a major pain having to swap them all out of live
Bao Ha
2011-08-09 16:36:15 UTC
Permalink
Post by Jeppe Øland
Post by Jeppe Øland
About a year ago, I switched to running the full pfSense 2.0 (beta
something at the time) on a Kingston SS100S2/8G embedded SSD.
I installed the 30G version in 12 systems, all of which failed within 6
months. I moved to Intel 320s and/or WD Greens (depending on budget of the
site) so we'll see how they hold up.
I also had the 64G version running Untangle systems which failed as well...
in short I would not recommend the Kingston SSDs at all... it's been a major
pain having to swap them all out of live systems.
SSD is just flash memory. You will need to mount the filesystem with sync
and noatime.
--
Best Regards.
Bao C. Ha
Hacom - Embedded Systems and Appliances
http://www.hacom.net
voice: (714) 564-9932
Adam Thompson
2011-08-09 17:26:07 UTC
Permalink
Even though it’s flash memory under the hood, SSDs are supposed to have wear-levelling algorithms baked into the firmware so that they function like a normal ATA HDD. I know of someone else who has deployed the Kingston SSDs into Windows XP machines (a *recommended* use case by Kingston), where there is no possibility of modifying mount options, and all of his Kingston SSDs have now failed, too. (For what it’s worth, 3M’s SSDs are also not recommended, for the same reason.)



It appears that lower-price SSD manufacturers don’t do much testing of their gear to validate expected service life. Either that, or their expectations of what ‘normal’ disk I/O patterns are doesn’t match reality.



So far, the Intel SSDs appear to be the most reliable in general-purpose use (i.e. HDD replacement), but you pay a substantial premium: perhaps in the SSD world, you really do get what you pay for?



Worth noting that there is a reasonably-complete reference chart of which vendor uses which parts at <http://pcper.com/ssd-decoder> http://pcper.com/ssd-decoder. Note that some newer Kingston models use Intel controllers, which should help – *IF* Kingston ever gets around to releasing firmware updates! (Or if the Intel f/w updates work on Kingston drives – not tested.)

The Sandforce controllers seem to generally have decent service life combined with decent performance.



FreeBSD 8.2 (apparently) fully supports TRIM for UFS filesystems, so using SSDs in long-life applications should become a more viable option if you’re on an 8.2 or newer kernel. (IIRC, pfsense2.0 is based on 8.1, while 2.1 is to be based on 9.x.)



It looks like ad(4) supports BIO_DELETE in 8.1-Release (and therefore pfSense 2.0), but you have to use newfs(8) to make that happen
 not exactly suitable for daily use! That means that during installation of pfSense 2.0, your SSD should release all blocks, which will still help somewhat.



-Adam Thompson

<mailto:athompso-gKoiEJA+***@public.gmane.org> athompso-gKoiEJA+***@public.gmane.org

(204) 291-7950 - direct

(204) 489-6515 - fax



From: Bao Ha [mailto:bao-XsHO/***@public.gmane.org]
Sent: Tuesday, August 09, 2011 11:36
To: support-***@public.gmane.org
Subject: Re: [pfSense Support] Kingston SSD filesystem corruption
About a year ago, I switched to running the full pfSense 2.0 (beta something at the time) on a Kingston SS100S2/8G embedded SSD.
I installed the 30G version in 12 systems, all of which failed within 6 months. I moved to Intel 320s and/or WD Greens (depending on budget of the site) so we'll see how they hold up.
I also had the 64G version running Untangle systems which failed as well... in short I would not recommend the Kingston SSDs at all... it's been a major pain having to swap them all out of live systems.



SSD is just flash memory. You will need to mount the filesystem with sync and noatime.
--
Best Regards.
Bao C. Ha
Hacom - Embedded Systems and Appliances
http://www.hacom.net
voice: (714) 564-9932
Jeppe Øland
2011-08-09 18:19:24 UTC
Permalink
It appears that lower-price SSD manufacturers don’t do much testing of their
gear to validate expected service life.  Either that, or their expectations
of what ‘normal’ disk I/O patterns are doesn’t match reality.
It's amazing how unreliable many SSDs still are :-(

I had a OCZ Vertex 1 (Indilinx) in my home PC for 2 years ... every 3
months it would corrupt fatally (BIOS wouldn't even see it).
After 3 RMAs I got them to replace it with a Vertex 2 (Sandforce), and
that one is stable as a rock.
... Slightly slower than the Indilinx - but who cares about that when
it's at the expense of stability.

SMART on the Kingston is useless.
No errors registered at all.
Full surface write/read shows no problems either.

I guess I'll be best off putting something else in the box ... too
bad. I really liked the idea of a tiny SSD.

Regards,
-Jeppe

---------------------------------------------------------------------
To unsubscribe, e-mail: support-unsubscribe-***@public.gmane.org
For additional commands, e-mail: support-help-***@public.gmane.org

Commercial support available - https://portal.pfsense.org
David Rees
2011-08-09 18:44:37 UTC
Permalink
Post by Jeppe Øland
It's amazing how unreliable many SSDs still are :-(
I had a OCZ Vertex 1 (Indilinx) in my home PC for 2 years ... every 3
months it would corrupt fatally (BIOS wouldn't even see it).
After 3 RMAs I got them to replace it with a Vertex 2 (Sandforce), and
that one is stable as a rock.
... Slightly slower than the Indilinx - but who cares about that when
it's at the expense of stability.
Interesting. Have a few 30-120 GB Vertex 1s around here. Been OK
once OCZ got the firmware stablized and pretty stable.

The Vertex 2 should be MUCH faster than the Vertex 1 - at least that's
what all the benchmarks say.

Have a Vertex 2 around here somewhere - it also has had a few minor
issues where it wasn't always detected at boot, but OK now that the
firmware has stabilized.

I have a 120GB Intel 320 in my laptop - been flawless so far - but the
Intel forums report that if it loses power unexpectedly it can
basically "brick" and you lose all your data. Intel is still working
on a firmware fix for this.

Seems that SSDs have traded one type of failure mode for another at
this point. I expect them to get all the bugs worked out eventually.
The performance and power usage of them is so great that I use them in
any new build where random IO performance is an issue.

-Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: support-unsubscribe-***@public.gmane.org
For additional commands, e-mail: support-help-***@public.gmane.org

Commercial support available - https://portal.pfsense.org
Jeppe Øland
2011-08-09 19:19:27 UTC
Permalink
Post by Jeppe Øland
I had a OCZ Vertex 1 (Indilinx) in my home PC for 2 years ... every 3
months it would corrupt fatally (BIOS wouldn't even see it).
After 3 RMAs I got them to replace it with a Vertex 2 (Sandforce), and
that one is stable as a rock.
... Slightly slower than the Indilinx - but who cares about that when
it's at the expense of stability.
Interesting.  Have a few 30-120 GB Vertex 1s around here.  Been OK
once OCZ got the firmware stablized and pretty stable.
The thing with V1 is that they don't move data around on the flash cells.
In other words, if you fill the drive 90% with static data
(Windows/Applications), and then write like crazy ... the remaining
10% + the overprovisioned area will be wearing out very quickly.
The Vertex 2 should be MUCH faster than the Vertex 1 - at least that's
what all the benchmarks say.
V2 is faster with *some* data.
The controller employs data compression - partly to give you longer
life by having to write fewer physical bytes to the flash - and partly
to get speed.
The numbers quoted are for "average" data that compresses 2:1 or even 3:1.
Use the drive for incompressible data, and the speed is actually
slower than a V1.
Seems that SSDs have traded one type of failure mode for another at
this point.  I expect them to get all the bugs worked out eventually.
The performance and power usage of them is so great that I use them in
any new build where random IO performance is an issue.
I completely agree.
Just don't trust any important data to them .... either back up
religiously, or just use the SSD for the boot/applications drive, and
keep your hard-to-replace data on an HDD.
(And spend the money that a bigger SSD would have cost on lots and
lots of RAM instead).

Regards,
-Jeppe

---------------------------------------------------------------------
To unsubscribe, e-mail: support-unsubscribe-***@public.gmane.org
For additional commands, e-mail: support-help-***@public.gmane.org

Commercial support available - https://portal.pfsense.org
David Rees
2011-08-09 19:45:16 UTC
Permalink
Post by Jeppe Øland
Post by Jeppe Øland
I had a OCZ Vertex 1 (Indilinx) in my home PC for 2 years ... every 3
months it would corrupt fatally (BIOS wouldn't even see it).
After 3 RMAs I got them to replace it with a Vertex 2 (Sandforce), and
that one is stable as a rock.
... Slightly slower than the Indilinx - but who cares about that when
it's at the expense of stability.
Interesting.  Have a few 30-120 GB Vertex 1s around here.  Been OK
once OCZ got the firmware stablized and pretty stable.
The thing with V1 is that they don't move data around on the flash cells.
In other words, if you fill the drive 90% with static data
(Windows/Applications), and then write like crazy ... the remaining
10% + the overprovisioned area will be wearing out very quickly.
I can tell you that it definitely does move data around looking at the
smart data for drives I have. The minimum erase count climbs on all
drives I have even with plenty of static data.
Post by Jeppe Øland
The Vertex 2 should be MUCH faster than the Vertex 1 - at least that's
what all the benchmarks say.
V2 is faster with *some* data.
The controller employs data compression - partly to give you longer
life by having to write fewer physical bytes to the flash - and partly
to get speed.
The numbers quoted are for "average" data that compresses 2:1 or even 3:1.
Use the drive for incompressible data, and the speed is actually
slower than a V1.
OK, so I reviewed the benchmarks and the Vertex 2 is only slower when
writing sequential random data to the drive. Which doesn't really
matter for most use cases (especially pfsense) as it's random IO
performance kills the Vertex 1 - with or without random data.
Post by Jeppe Øland
Just don't trust any important data to them .... either back up
religiously, or just use the SSD for the boot/applications drive, and
keep your hard-to-replace data on an HDD.
(And spend the money that a bigger SSD would have cost on lots and
lots of RAM instead).
My luck with rotating drives isn't any better than with SSDs - those
need to be backed up as well. Regardless of the type of drive I'm
using - if the data and downtime is important - you need to use the
drive in a RAID array and it needs to be backed up to separate media
regularly.

-Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: support-unsubscribe-***@public.gmane.org
For additional commands, e-mail: support-help-***@public.gmane.org

Commercial support available - https://portal.pfsense.org

Eugen Leitl
2011-08-09 19:33:47 UTC
Permalink
Post by Jeppe Øland
I guess I'll be best off putting something else in the box ... too
bad. I really liked the idea of a tiny SSD.
I found all my small SSDs (with SLC, though) to be very reliable.
On the other hand all my Intel SSD (several 10) with the exception of two, which
were are from the same batch did not fail so far.

Intel 311 Series "Larson Creek" (20 GBytes, SLC) should be quite
reliable (or so I hope, I'm using one as slog/cache device for
a zfs storage appliance).
--
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE

---------------------------------------------------------------------
To unsubscribe, e-mail: support-unsubscribe-***@public.gmane.org
For additional commands, e-mail: support-help-***@public.gmane.org

Commercial support available - https://portal.pfsense.org
Loading...