Hard disk reliability study - 2005-2020

Discussion in 'hardware' started by Mrkvonic, Feb 19, 2020.

  1. Mrkvonic

    Mrkvonic Linux Systems Expert

    Joined:
    May 9, 2005
    Posts:
    10,223
    Yuki, how about you add a third dimension - you get five disks AND you replace faulty ones AND you refresh disks every say 10 years.

    Bill, yes for big orgs, money is a factor; for home users, the 10-dollar difference for a couple of disks for this or that model over 3-5 years isn't really.

    Mrk
     
  2. Bill_Bright

    Bill_Bright Registered Member

    Joined:
    Jun 29, 2007
    Posts:
    4,042
    Location:
    Nebraska, USA
    I agree for home users, the money is not really a factor. But for home users its not about the money. Its all about stress, blood pressure, receding hair lines, etc. If I know I have to replace my drive in 4 years 11 months and 30 days because if I don't it will die the next day, that would really be helpful to my health! ;)
     
  3. reasonablePrivacy

    reasonablePrivacy Registered Member

    Joined:
    Oct 7, 2017
    Posts:
    2,010
    Location:
    Member state of European Union
    Wow, I didn't expect that. Great work!
     
  4. 142395

    142395 Guest

    Not even one day, unless some ultra ESP prophet or alien tech come on earth. It's simply impossible to draw a deterministic equation for failure of disks unless you can incorporate all the variables, internal and external, into it. OTOH drawing a probabilistic equation is not that hard if we had enough data - I'll be surprised if Amazon or Microsoft had not utilized such a data-driven decision. The problem of probabilistic model is while it may predict what percentage of thousands of disks will fail at what times and under what condition, it's silent about whether your disk will fail tomorrow or not. On my previous simulation 49,895 users who replaced failed disks haven't had data loss, but this fact doesn't help those 6 users who lost data within the first year. All of these strictly followed equations I put.

    That's obviously better w/out simulation, isn't it (*)? I simulated because it's not so obvious whether 'replace-on-break' with 2 disks is better than no-replace with 5 disks. In fact while on 40 or 20y scales it's better, it was not on 1y scale. Everyone knows more backups is better, but I don't think that makes keeping 10 simultaneous backups very reasonable for all home user. Maybe what ppl have interest in is how many backups are reasonable given their value of data. I think it's actually hard to gauge the value of data in money - long before, I lost my photos which can't be calculated in bucks for me. But we still need to assign measurable value to allow some calculation, so as a test I assumed these:

    1. I think 40y in previous simulation is too long to be meaningful, so reduced to 20y. Instead, I cranked up the probability of failure for external disks to 0.0027778% (amounts to 100% failure after 30y as an expectation), and separated a main disk which will be more short-lived thx to more usage which has 0.0041667% probability.

    2. Users replace any failed disk in a month, and any disk other than the initial main disk costs $50. OTOH if all data are lost it costs N * $10000 where N is a parameter ranges from 1 to 10. Users always keep (M - 1) external disks other than the main disk where M ranges from 1 to 4. In this setting more disk means more likely to get failed disks and thus more costs, but less likely to loose all the data.

    Needless to say, these are not meant to replicate reality - there are a number of ways to make them more realistic; e.g. using a more realistic curve rather than linear assumption (the best γ for the bathtub curve can be calculated if we know what month these 5 disks are bought and failed), incorporating a sale where user can many disks in cheap price even before failure, mixing some bad disks which is significantly more prone to fail, assuming users won't always replace failed disk soon, etc. all of which can be trivially implemented but that's not a point. Results are shown below where each cell represents expected cost in dollars over 20y on each condition averaged over 100,000 users. As you see, if the cost of data loss amounts to only $10,000 for you, one external disk is the most reasonable while all the other conditions 2 external disks - this is because 'replace when fail' strategy works so well. Personally I use 2 external disks and 2 cloud services for data while different 2 external disks for system backup. I guess that's enough for me.
    https://i.imgur.com/jAdvWBd.png

    (*) Or you might want to know if that can achieve 0 failure rate - but you know, 0 failure is impossible thx to stochastic nature of these, as long as I use large enough sample size. Before we find the condition to achieve almost 0 failure we'll come across the limitation of effective digit.
     
  5. Bill_Bright

    Bill_Bright Registered Member

    Joined:
    Jun 29, 2007
    Posts:
    4,042
    Location:
    Nebraska, USA
    I'll believe alien tech before an ultra ESP prophet. There has to be a mind for mind readers to read. So that leaves a TARDIS.

    But I do believe advances in the technologies to mine and refine raw materials into purer and purer forms, along with advances in design and in the manufacturing techniques will bring hard drive technologies to the where they will be expected, and actually will easily last 10, 15 years or even longer. When humans learn to consistently create perfect, flawless motor bearings and bearing lubricants that totally eliminate friction, for example, well will be getting close to motors that will last forever.

    But there is another stipulation. AS LONG AS "spinners" in general are not made totally obsolete and supplanted as the media of choice for mass, long term data storage, then hard drive life spans will continue to improve.

    That is, "if" or rather "when" SSD technologies (or something better, newer, and cheaper) take over the data storage universe, the technologies and capability to produce hard drives that have longer and longer life spans will continue to improve. I emphasize "technologies and capability to produce" because having the capability does not mean it will be implemented. There will still be cheap, "entry level", generic models that fail prematurely. So buyers will still need to do our homework.
     
    Last edited: Feb 26, 2020
  6. 142395

    142395 Guest

    @Bill_Bright Probably you know better than me on this, but as the disk space increases they're actually becoming more fragile. I also don't believe SSD can fully replace HDD, it's not suitable to save large data for a long time, and while free from many mechansitic trouble, it has its own problems, you know. As both HDD and SSD techs has entered nano scale, now quantum effects play a role, meaning more unpredictable things can happen - Row hammer cleverly abused this tho it was on DRAM - the similar things can happen on SSD, and also cosmic lay is no more negligible. There is a joke that, future descendants after we perished investigated past civilization. First thing they salvaged were optical disks which they couldn't recover any useful data. Next they found a thing which they could barely extract some info, it was paper. Finally they found perfect records of past civilization, stone slabs. As to bearings or such parts, I heard the best-precision parts of them for special use cases still rely on very low-tech of craftsman's cutaneous sensation because nano-scale cutting by machine is not at all reliable currently. It will sound too pessimistic, but I think tech goes to cloud-backups-by-default or such redundancy strategy rather than the hard task of making disks last more, regardless whatever we wish.
     
  7. Bill_Bright

    Bill_Bright Registered Member

    Joined:
    Jun 29, 2007
    Posts:
    4,042
    Location:
    Nebraska, USA
    Umm, no. Not true. I have not seen one study that correlates higher platter densities to greater failure rates. Got a link to a study showing that?

    What is true is the risk of losing more data is much greater with higher density disks than smaller disks - but that's simply because they store more data. But there is nothing to suggest if you store a copy of a photo on a big drive, and store another copy on a small drive, that the copy on the big drive will become corrupt or the big drive will fail sooner than the small drive.

    And sorry, but your logic is flawed when comparing HD and SSDs. First, DRAM is a totally different memory technology than used in SSDs. So it is simply wrong to suggest it relates to SSDs. A primary characteristic of DRAM memory is that it dumps all data when power is removed! It is called "volatile" for that reason. So it is silly to compare it to SSDs where no power is required to retain the data stored on it.

    Optical disks have nothing to do with this so it is moot to even mention them.

    Another HUGE flaw in your logic is you are talking about today and yesterday. Not tomorrow. Hard drive technology really has not changed since the very first hard drive nearly 70 years ago. They have gotten faster and can store more data, but they work pretty much exactly the same way. With motors, spinning platters, and a Read/Write head mounted to a arm that is swung back and forth over the spinning platters by another motor.

    SSD technologies are new and still evolving and not just getting bigger, but much more reliable too. You don't know where SSD technologies will be in 5 years but there is nothing to suggest they will not be even better and more reliable.

    Motors are archaic. They have moving parts that create friction and wear. They make noise and vibrate and they are physically big and heavy. And while the magnetic particles on the platters may stay aligned to accurately reflect the appropriate 1s and 0s of the data for may years, motors stuck in long term storage don't always survive as well. Lubricants can breakdown, separate, harden and crack causing the bearings to seize. There are no such issues with SSDs.

    So I have no doubts, eventually hard drives will go away and be replaced by solid state technologies OR as I said above, with "something better, newer and cheaper".
     
  8. 142395

    142395 Guest

    @Bill_Bright
    I spoke more as a kinda common sense form one involved in nano tech, but if you insist on papers this is publicly available for example:
    https://www.researchgate.net/profile/Sudhanva_Gurumurthi/publication/4144849_Disk_Drive_Roadmap_from_the_Thermal_Perspective_A_Case_for_Dynamic_Thermal_Management/links/0deec537a0d2b172b0000000/Disk-Drive-Roadmap-from-the-Thermal-Perspective-A-Case-for-Dynamic-Thermal-Management.pdf
    So the industry addressed the more fragileness by cleverer error-correcting codes, that's great but does not hurt my statement. And this:
    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.160.1618&rep=rep1&type=pdf
    Tho it's about capacity and not density, usually they are proportional. You see more capacity means more hazard rate.

    Actually DRAM and SSD or NAND flash memory is much the same in their mechanisms. NAND flash memory stores data "0" as charge (electron) stored in the floating gate which is separated by tunneling insulator film, that's it. And DRAM stores the data "1" as charge stored in a capacitor (so separated by insulator) connected to the transistor, only reason it is "volatile" is the charge leaks within milliseconds, but basic mechanisms of them are almost the same. And my point was that they are not free from quantum effect, in this regard your remark is irrelevant - I exampled HR only because it is one example of what the original designer didn't expect and what can't be happen in larger circuit - please, don't nitpick at details but look at the whole argument. Assume you built the same two circuits EXCEPT FOR their size - the one is micro scale and the other is nano scale. Turn switches on each. Do you expect they work in the same manner? Not at all. The former is just a classic electric circuit following Ohm's law. The latter is a quantum circuit, there's no "current" or "resistance" in usual sense, your common sense about electric circuit doesn't help any more. Now various quantum effects such as quantum Hall effect or Aharonov–Bohm effect play essential roles, you can no more dismiss heat flux or spin as in the micro-scale case. Underlying tech on HDD may have not changed, but when things enter nano scale whole things go different story.

    And I believe the joke is a very important hint, the more technology advances the less long storage. IDK how you expect a new tech suddenly breaks this continuing tendency over thousands years, particularly when there wouldn't be STRONG incentive in the industry - I handle so called big data, meaning hundreds of csv files each of which are GBs in size, and thus several TBs of storage tend to become full soon. It's also important we can read and write them very quickly. So I don't doubt there's strong incentive and pressure on storage industry for quicker storage w/ more capacity in this big data era, but as to reliability - yes, more reliable the better, but needless to say we have redundant backups. If capacity increases with the same price, the reliability can be complemented by backups, but the opposite doesn't hold.

    Not to mention currently SSD has much smaller capacity, they are BY DESIGN not for long-term storage. The charge on the gate eventually leaks, so I don't expect they fully replace HDD, ofc not sure about the something new. I'd also note papers and magnetic tapes won't go away at least within 50y, as no perfect alternatives of them are so far known - but I won't speak it too much, it's another and long story.
     
    Last edited by a moderator: Feb 27, 2020
  9. Bill_Bright

    Bill_Bright Registered Member

    Joined:
    Jun 29, 2007
    Posts:
    4,042
    Location:
    Nebraska, USA
    Did you read your paper? I did. And, unless I missed it - pretty sure I didn't) NO WHERE does it support your claim that higher density disks are less reliable. Not to mention that article is 15 years old.
    Gee whiz. Now you are changing your story to rationalize your previously incorrect statement. :(

    We were discussing long term storage of data. DRAM is volatile - it MUST constantly have voltage applied to retain its data. Remove the voltage and any stored data is instantly dumped. Remove the voltage an SSD, and the data remains. Just because both DRAM and NAND memory are solid state and thus use similar "mechanisms" to flip their gates, claiming they are "much the same" is nonsense.

    NO WHERE did anybody say SSDs are "currently" designed for long term storage. I specifically said I was talking about "tomorrow", and "eventually" hard drives will go away, but you keep ignoring that.

    And much smaller? Come on! You need to do your homework before talking because, sorry, but you don't know what you are talking about. 100TB SSDs are already here. That is nearly 7 times larger than the worlds largest hard drive which is just 16TB. And that SSD has a 5 year warranty that covers an unlimited number of writes per day! Unlimited!!! So yes, long term storage with SSDs is almost here.

    Contrary to reports from 2015, SSDs do NOT lose their stored data within a few days of no power. That claim was totally debunked. If the SSD has not already exceeded its write limits (which would be rare with today's SSDs, and nearly impossible in the home environment), and if the SSD is stored in a normal home environment (that is, where the temperature and humidity is "comfortable" for humans), the data will remain viable for many years. And that is for today's technologies.

    ***

    I will not participate in this OT sidetrack further. This thread is about HD reliability.
     
  10. 142395

    142395 Guest

    @Bill_Bright I read, ofc it doesn't state "higher density means less reliable"; that's not the theme of the first paper so they just implied the more fragileness as context info - note even myself didn't say that, I meant "more density, thus less size, means more fragile by itself, thus IF not compensated by other technology (e.g. better error-correcting code) less reliable". I thought it's so obvious, sorry if it's hard for you to get it. The second paper didn't mention density yet more capacity often means more density. My short search didn't give me any of such "density & reliability" study.

    That's completely irrelevant to my point and I don't need to rationalize. My point was once things go nano scale, quantum effect which we still don't have clear understanding play roles, THIS was my point on that example and NOTHING ELSE. What you said has never been a point from my first post. Predictable errors can be addressed and complemented, but more and more things go unpredictable when it goes to nano scale.

    Okay, if you say HDD can go away after 100y thx to a new tech (but clearly not SSD - SSD for long-term storage is no more SSD, it requires complete redesign) then I don't oppose, but for me (and probably most ppl) that's completely irrelevant.

    Ofc I was not talking about just a few days or even few years, but decades (BTW your link is linked to another article of them which states "SSDs simply shouldn’t be relied upon for long-term storage"). Nevertheless, I didn't know that 100TBs SSD, great, thx for the info.
     
    Last edited by a moderator: Feb 28, 2020
  11. 142395

    142395 Guest

    As trivially expected, row-hammer style attack was actually possible on SSD too.
    https://www.semanticscholar.org/paper/Vulnerabilities-in-MLC-NAND-Flash-Memory-Analysis%2C-Cai-Ghose/ba9e1afd59fb0f2aa529e9bf3f2070464abe9919
    I have no interest in this attack here, it requires unusual access pattern which won't happen in usual usage. That's irrelevant to discussion, when it was simply used as an example of unexpected problems caused by more minute architecture (it's also an evidence that error-correcting code is not a panacea) - already both HDD & SSD designers are utilizing quantum effect, but utilizing doesn't mean controlling or they know what can happen.

    BTW most promising next-gen storage which we may see within our lives seems to be some kind of molecular storage, e.g.
    https://www.uni-kiel.de/pressemeldungen/index.php?pmid=2017-387-nanodatenspeicher&lang=en
    https://www.wsj.com/articles/scient...c-dna-embedded-in-a-plastic-bunny-11575907200

    Once they come to production the capacity will jump, which is what's definitely required in current and coming days other than cost. But I don't expect any more reliability than current enterprise-class HDDs - I hope at least they come to a close level.
     
    Last edited by a moderator: Feb 28, 2020
  12. Keatah

    Keatah Registered Member

    Joined:
    Jan 13, 2011
    Posts:
    1,029
    I've got a couple of hard disks from 1985-1993 (~35 years old) and they still retain their data just fine. Granted they are out of regular usage nowadays because I keep them sentimental reasons. I do give them a spin-up every year or so now. But there was a time from 2000-2016 where they just sat, and sat, and did nothing. So long term retention is quite good.

    I also purchased a 1TB HDD 2009'ish and it just died now in 2020. That's a solid 10 years on a modern drive. It had nearly 50,000 hours of power up time at final retirement. A good portion of that was constant gentle use, interspersed with occasional busy workloads like hour-long copy sessions and such. I also took it out to friends' and relatives' places from time to time. Fantastic reliability in my humble opinion.

    First sign of trouble was throwing bad sectors during a surface scan. Running a little slow, and showing pending sectors in SMART. I refreshed the sectors thinking it was from a dirty power-off situation. It worked for a while. But more of them failed in the same area days later, then hours later. Ok. We're done. I ordered a replacement, restored from backup, and here we are.

    I'm also watching another drive that's going on 51,000 hours. Same time frame. Daily use. But the drive is rather stationary and was bounced around in a laptop only for a couple 2-3 years. It's been on desk duty since.

    An interesting note about the failed drives, the one I mentioned and a few others. The very first signs of failure seem to be the range of temperature at which they reliably work. It becomes smaller and smaller.
     
    Last edited: Feb 28, 2020
  13. Bill_Bright

    Bill_Bright Registered Member

    Joined:
    Jun 29, 2007
    Posts:
    4,042
    Location:
    Nebraska, USA
    B0md1l*PP%&%%
    NO! This is just more of you trying to rationalize your incorrect and/or irrelevant claims. :(

    You said, "as disk space increase, they are actually becoming more fragile"

    I disagreed and asked for supporting evidence to backup your claim.

    So you post that link which does NOT backup your claim, then rationalize it by claiming it implied it did. :rolleyes:

    They did not imply anything. What they found was excessive heat may affect a bigger drive more so than a smaller drive. BUT nobody in this thread is talking about running drives in an excessive heat environment.

    Using your logic, you could truthfully say high density drives are more fragile because a speck of dust will corrupt more data than that same speck of dust would on a lower density drive. But drives have forever been sealed with a tiny filter specifically designed to block dust and other contaminates while allowing the equalization of pressure. So nobody is talking about running drives in a dusty environment either.
    And your point is completely irrelevant to the topic of this thread! So please! Let's get back on topic.

    3 or 4 years ago I still had a few drives from that era on the shelf. But of course, drives from that far back where ATA (IDE/EIDE, later called PATA) drives since SATA became the new standard in 2001 and motherboards phased out EIDE interfaces a few years after that. Adapters are still available, but I knew they would be phased out too. And I know motors often refuse to spin up after sitting still for years.

    Plus, I wanted to clean out my basement store room and turn it into a spare bedroom. I think the biggest was 1GB but most were 500MB or less. So I unburied my EIDE adapter and transferred the data to a modern SATA drive, then destroyed the old EIDE drives. There was a mix of Seagate, Western Digital and Maxtor drives.
     
  14. 142395

    142395 Guest

    I clearly stated "that is not the theme (of the paper)" and "I didn't find any `density & reliability' study". Generally main theme is not all the info a paper gives. If you looked up Google scholar you can find bunch of papers & docs w/ various different themes all stating (as context info) how higher density caused the more fragility for various kinds of noise and soft-errors and how the industry has been struggling to overcome it (*). It's more of a common sense in nano tech so apparently nobody bothered to "prove" it. The question is if all those flagiliness can/will be and has been compensated by advances in technology like stronger ECC - particularly soft-error by cosmic ray became serious problem only after it came to such a small scale, IDK if all those soft-errors can already be corrected by ECC or prevented by other mechanisms, any info on it is appreciated. I believe "reliability" is not only about failure (if so, sure mechanical problems should be dominant), but also includes if all data on disk remain sound for a long time - and BTW the dust example is NOT about flagiliness. Okay, I'll stop here.

    (*) in case you doubt, another example from dozens of such one:
    https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2012/20120821_TA12_Yoon_Tressler.pdf
     
  15. Adric

    Adric Registered Member

    Joined:
    Feb 1, 2006
    Posts:
    1,762
    Might be of interest to some. SSD reliability in the enterprise: This survey yields a few surprises
     
  16. Bill_Bright

    Bill_Bright Registered Member

    Joined:
    Jun 29, 2007
    Posts:
    4,042
    Location:
    Nebraska, USA
    Gee whiz! :( Yes you said that and then you immediately, in the very same sentence, said it "implied more fragileness". You can't have it both ways! And regardless, that study did not imply that. You just assumed that to "rationalize" using that link to support your inaccurate claims. It didn't imply that higher densities drives are more fragile. And that study involved those drives being operated in excessive heat environments. That is, not normal operating conditions. So again, it is irrelevant to the topic of this thread.

    It is time to let it go, Yuki! Let's get back on topic and move on. I am.
     
  17. 142395

    142395 Guest

    @Bill_Bright It seems we can agree that any more talk on this is not productive for both of us and all the others. My last comment on this is thank you for telling me about that 100TB SSD because I didn't know that, at least I learned one thing directly from you and also other things through the whole process despite we're not on the same page.

    Interesting, thx for heads up!
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.