Researchers publish first large-scale, in-field SSD reliability report

Discussion in 'hardware' started by ronjor, Jun 24, 2015.

  1. ronjor

    ronjor Global Moderator

    Joined:
    Jul 21, 2003
    Posts:
    57,719
    Location:
    Texas
  2. Rasheed187

    Rasheed187 Registered Member

    Joined:
    Jul 10, 2004
    Posts:
    8,010
    Location:
    The Netherlands
    If anyone is up for it, I would like to read a quick summary of the report.
     
  3. newbino

    newbino Registered Member

    Joined:
    Aug 13, 2007
    Posts:
    377
    There you are . an extract from the Abstract of the paper, which you can find in full following the article original link:

    This paper presents the first large-scale study of flash-based SSD reliability in the field. We analyze data collected across a majority of flash-based solid state drives at Facebook data centers over nearly four years and many millions of operational hours in order to understand failure properties and trends of flash-based SSDs.
    ...
    Based on our field analysis of how flash memory errors manifest when running modern workloads on modern SSDs, this paper is the first to make several major observations:
    (1) SSD failure rates do not increase monotonically with flash chip wear; instead they go through several distinct periods corresponding to how failures emerge and are subsequently detected,
    (2) the effects of read disturbance errors are not prevalent in the field,
    (3) sparse logical data layout across an SSD’s physical address space (e.g., non-contiguous data), as measured by the amount of metadata required to track logical address translations stored in an SSD-internal DRAM buffer, can greatly affect SSD failure rate,
    (4) higher temperatures lead to higher failure rates, but techniques that throttle SSD operation appear to greatly reduce the negative reliability impact of higher temperatures, and
    (5) data written by the operating system to flash-based SSDs does not always accurately indicate the amount of wear induced on flash cells due to optimizations in the SSD controller and buffering employed in the system software.
     
  4. Rasheed187

    Rasheed187 Registered Member

    Joined:
    Jul 10, 2004
    Posts:
    8,010
    Location:
    The Netherlands
    @ newbino

    Thanks. So the current state of SSD reliability is not THAT bad, if I'm correct.
     
  5. jwcca

    jwcca Registered Member

    Joined:
    Dec 6, 2003
    Posts:
    716
    Location:
    Toronto
    My first SSD failure, a SATA2 60GB drive purchased March 2011, occurred after only 55 days. It was the system drive but I had a backup image and I'd bought two of the drives to have one as a spare, so I restored the image and I was OK. The failed drive was RMA'd without question. Those two drives have now been relegated as a data drive and a backup drive for the data drive.

    My second SSD failure, a SATA3 120GB drive purchased March 2012, occurred today but luckily it was only being used as a backup drive to store Reflect images, so I'm still OK, I just changed the target to a WD Passport and Reflect created the image, just a bit slower. This fail was just over the 3 year warranty by 3 months so although I'll contact the manufacturer, an RMA is unlikely.

    Both failures were from the same manufacturer.

    I had a hint that a failure might occur when I ran a performance test and the drive was performing well below it's theoretical specs, it's a SATA3 but was performing like a SATA2.

    (I also backup to removable HDDs...just to be very safe)
     
Loading...