Dennis Technology Labs: PC Total Protection Suites 2012

Techfox1976 · Apr 11, 2012

S.B. said:

Thanks for your analysis Techfox1976. I agree that your analysis make good objective sense, and undercuts the "real world" meaning of the testing results reported.

I'd also add that I've worked closely with analytical testing personnel in various industries and have repeatedly seen the complexity of statistical aspects of analytical testing design. Statistically reliable testing results require input and analysis by qualified statistical experts. I have a very hard time believing that the minimal tests and samples reported in the subject test report can be considered statistically reliable.

Many thanks. BTW I don't use WSA or have any personal interest in same. My only interest is in reliable internet security testing.
Click to expand...

No problem. Just pointing out issues with the testing methodology for one of the test participants as well as the reliability of the published test results really just calls to question the whole test more than defending a specific participant. When reliability in any case comes into question, one may then speculate that the whole test may be faulty. At that point, taking any of its results seriously can be bad.

The number one rule of Statistics and Testing is that it's relatively easy to make the numbers say what you want with the proper data presentation. Weighting, scaling, omission of certain aspects, etc...

"99 out of 100 people said they like this product!"
That sounds great if that's the entirety of what they tell you. Dig into this fictional test and find out the methodology, and it tell a different story:
"We went to 200 people on the street and said, 'Say the words `I like this product.' and we will pay you this five dollar bill. 101 of them cussed us out and ranted about hating the product so much that they wouldn't lie for cash. 99 of them accepted the money and said they liked it. We tossed out 100 of the ranters and of the remaining 100, 99 of them said they like it because we paid them to say that. So we can publish that 99 out of 100 people say they like it.'

Thankfully it's not normally that extreme, but at the same time, when results really need to be good, they can be massaged. For example, the "Malicious Site Detection System"... How does it work? Does it visit a site and then scan the results with Norton to find out if Norton says it's malicious, then count everything Norton sees as malicious and therefore it can be used in the testing? Less-extreme, does it scan and test the results with the same DB that Norton uses to make its own determinations? None of these are explained, so one can only assume that either situation could be the case. It could be fully legit, or it could be weighted in favor of a specific contender.

40 samples of malware that can only affect under 2% of the web-browsing population just doesn't strike me as a reliable test though.

S.B. · Apr 11, 2012

There are some very non-intuitive statistical constraints in comparative testing. A simple example; norm and standard deviation. Unless we know the norm and standard deviation, we can not say that any given difference in participant test results is significant. More specifically, a 'norm/standard deviation' value of '90 +or- 5' would dictate that result values of '85' and '95' have no statistically significant difference and therefore must be considered identical.

All of this gets even more complicated when a test is conducted to determine multiple different testing variables. Still more complicated with different numbers of testing participants. Here we have issues such as whether the measurement of one variable can be considered as independent of a different variable unless the number and variety of tests is calculated to achieve this.

A proper scientific test would at least reference important values such as mean and standard deviation.

__

STV0726 · Apr 11, 2012

steve1955 said:

testing on unpatched OS's etc is actually very relevant in some ways as a very large number of users do use PC's in that condition,lots of average users don't even know you have to update the PC in any way,thing is though to counter that is would those users install an AV product
Click to expand...

You did not read what I said carefully enough...

...I clearly acknowledged the importance of running such "high risk" or "worst case scenario" tests, but my main point was that it is ALSO VERY IMPORTANT to be VERY careful with the term "real world test".

STV0726 · Apr 12, 2012

Techfox1976 said:

I still stand by saying "A test that is representative of at most 1-2% (or likely much lower) of the systems out there cannot claim to be representative of the 'Average User' or 'Average Computer Environment'." and also saying "Norton is great for the unaware corner cases who never patch, but Webroot (and others) is still better for power users and educated users who do."

Plus, you really have to wonder about the material errors such as impossible score and tests marked as both Pass and Fail concurrently. You also have to be aware that they put arbitrary limitations on the functionality of the software. They straight out decided to not permit more than one rescan despite the fact that at least one of the packages uses multiple rescans to ensure cleanliness.

It doesn't take paranoia or strong conspiracy theory to say that this looks like the tests were directly stacked against Webroot either. I mean, really, fail a test because a broken VB script that can affect under 2% of the general computer population -if it actually worked- was allowed to remain inactive in TIF? So those few people who are vulnerable to it should go get Norton, and the rest of people who want a better system can get something else.
Click to expand...

You already won the Internet. I don't know what else to award you with.

As for the AHAM certification, I'm not sure which parts of this test or test methods are certified for Dennis Labs, but these results definitely have issues that normally don't exist with the known-reputable/valid organizations like AV-Comparitives and AV-TEST. They are by no means perfect as people will question their "real world" tests too...but there is not these MAAAAJOR issues with them at least.

FUNNY EDIT: Look at what I wrote above. It's so silly and stupid I can't even remove it. I put "AHAM". I meant "AMTSO". I'm confusing air purifier certification organizations with anti-malware ones!

Osaban · Apr 12, 2012

guest said:

I wonder why in the Dennis Technology Labs tests norton is always the first ;
Click to expand...

Furthermore I wonder why AVG, Panda, and Avira have done so badly....Oh wait, they all offer free solutions, what a coincidence! Isn't it Symantec's mantra that with free solutions you get what you pay for?

Stefan Kurtzhals · Apr 12, 2012

Nobody wondering why Avast is not included? I guess their results were too good?

Well, let's see:

- very low number of samples/malware URLs used
- penalities for not detecting every part of the infection chain, even if the final payload was blocked
- there are quite some Blackhole samples, too focused?

STV0726 · Apr 12, 2012

For me personally:

GREEN: AV-Comparatives, AV-TEST, Westcoast Labs Real-Time Testing

BLUE: PC Magazine, Maximum PC (Specifically those; most magazines I don't hold this highly)

YELLOW: Dennis Technology Labs, Malware Research Group

RED: Homegrown/informal testers, YouTesters, (and other types/bodies deemed untrustworthy by AV-Comparatives PDF report on which AV testing bodies can be trusted)

These of course can be subject to change, except for the case of informal testers or YouTesters. Those are Banquet frozen dinners. Forever the bottom of the barrel.

EDIT: Still deciding if Passmark is Blue or not. They do commissions, but they don't seem to conduct crap tests like Dennis does, so I'm leaning towards Blue for them.

Also undecided is Matousec...stay tuned...

si_ed · Apr 12, 2012

Thanks for taking an interest in this test. I'll address some of the issues below as best I can:

Techfox1976 said:

The numbers don't add up.
Pages 10-12. Testing FPs against 40 "legitimate" programs. In summary, a security program gets a point if it allows the legitimate program to run, and loses between 0.05 and 5 points for warning about the program and/or blocking it.
Click to expand...

Please note that the points lost are affected by the prevalence of the legitimate file. Although the exact details for the FP tests are not published, I'll let you know the reason for Webroot's score, as you highlight it.

Webroot generated a blocking FP on a 'very low impact' file, which means a penalisation of -0.1. It also generated a warning FP on a 'medium impact' file, which means a penalisation of -0.5.

This comes to a total of -0.6.

40 - 0.6 = 39.4

Techfox1976 said:

Specifically neutering the way Webroot's WSA product works:
"In some cases a product might request a further scan to complete the removal. We consider secondary scans to be acceptable, but further scan requests would be ignored."
Click to expand...

I don't think that it's fair to say that allowing all products an equal chance to protect the system, using real-time and on-demand scanners, cloud lookups and sandboxing is neutering any specific product.

The threat is introduced to the system and the security product is allowed a chance to stop it. Even if it fails it is given a second chance to scan the system. If the threat survives the real-time protection, any sandboxes, a reboot, cloud lookups and finally an on-demand scan then I think it's fair to draw the line. Obviously a product may detect and remove the threat properly at some point in the future, but these are the rules of the test and the conclusion don't make any claims that product X could never remove the threat. We just consider it more desirable that the threat is blocked, then less ideally if it can be stopped and removed.

It could be interesting to run a test in which systems are infected and then the products are given one chance each day to remove the threat. We'd then see which took the longest to receive useful updates. That would take a long time and I'm not sure if it would be much more useful, but if the demand for such tests is there we'll certainly explore that approach.

Techfox1976 said:

What I call "Exceptionally Stupid User" activities. The rules of engagement are effectively "Pick the default, or pick the Topmost or Leftmost option." Hand-typed malware URLs (Yep, everybody does that). XP SP3 with no patches beyond that (That's not as common as people claim). Intentionally out-of-date and exploitable third party software. IE 7. IE7 has somewhere between 0.1 and 2.5% of the total browser use right now depending on where you look. XP falls between 17 and 28% of the Windows visitors, however the prevalence of IE9 indicates that people are getting their patches (since IE9 doesn't exist on XP unless the system gets updates). So "XP SP3 with no other patches and IE7" is a "Realistic and normal state" for users? Further analytical data munging shows that evidence of an unpatched XP SP3 system running IE7 is a 0.003% rate. FFS, the test is "realistic" for three in 100,000 people?!
Click to expand...

User behaviour: We have to choose some consistent approach. If a security application makes a recommendation it seems the fairest option to take. If it does not, then at least our 'user' needs to be consistent. In practice, all of the products make a recommendation. Those top/left rules are just there in case a particularly cowardly product says, "I don't know - you decide!"

Platform: We see that there are very many threats targeting this platform. This rather suggests that plenty of potential victims are too. I'd be genuinely interested to see any figures that suggest the contrary.

Vulnerable software: We are trying to test the security software, not the user's approach to patching.

Realism in general: We use this term to distinguish the methodology from on-demand scan tests, in which sites are not visited, exploits are not run and products are not given a chance to intercept any hostile actions.

Techfox1976 said:

And finally, the actual tests' results...
Visit a PHP page that uses a VB script in IE 7 and unpatched-beyond-SP3-XP to download and start an executable (already reducing the possibility of even encountering this infection vector to under 0.1%). WSA blocked and killed the executable, however the VB script was left in TIF. Failed. -2 points. Despite the fact that the VB Script couldn't do anything else anymore.
Click to expand...

The presence of the files you mention are not sufficient for a compromise to be registered. The system will have a number of issues, including Registry entries (e.g. Runonce) and so on.

Techfox1976 said:

Did we mention the numbers don't add up? Test 21, WSC is labeled as "Complete Remediation", "Defended", and "Compromised" all on that one line, which assumedly the score is automatically calculated from... ?!?!? Seriously?! Come on guys, pick one or the other, you can't have both.
Click to expand...

You are right. This is an error and I am glad that I now know about it. Fortunately it is the only such error and I will ensure that the report is updated ASAP. Obviously removing one compromise from quite a large set of compromises isn't going to change the order of products in the tables and charts (I've just checked and WSC would score 52.4 in the Total Accuracy graph).

Techfox1976 said:

Oh, and of course if a VB downloader that fails to actually download anything is left on the system by WSA, it failed that test, despite the fact the downloader a: will never operate again unless the web page is revisited. b: Will only operate at all on 3 in 100,000 cases, aside from this test.
Click to expand...

For the reasons mentioned above, that's not the case. The presence of an inert dropper following a scan would count as a neutralisation, not a compromise.

Best wishes,
Simon Edwards
Dennis Technology Labs

steve1955 · Apr 12, 2012

STV0726 said:

You did not read what I said carefully enough...

...I clearly acknowledged the importance of running such "high risk" or "worst case scenario" tests, but my main point was that it is ALSO VERY IMPORTANT to be VERY careful with the term "real world test".
Click to expand...

I did read what you said and in the "real world" I would like to bet there are more not up to date machines than there are ones running with all updates/patches applied,so what would you say was a real world scenario?
But I also added as a caveat that the likelihood of those users running an up to date anti-virus(or even any AV product!)is unlikely to say the least

Techfox1976 · Apr 12, 2012

steve1955 said:

I did read what you said and in the "real world" I would like to bet there are more not up to date machines than there are ones running with all updates/patches applied,so what would you say was a real world scenario?
Click to expand...

Looking up any web metric report indicates otherwise. You can acquire information about the OS, version of the browser, and often the version of Flash player and other things.

steve1955 · Apr 12, 2012

Techfox1976 said:

Looking up any web metric report indicates otherwise. You can acquire information about the OS, version of the browser, and often the version of Flash player and other things.
Click to expand...

the number unpatched machines we get exceeds fully updated ones,not just talking malware problems,which would be expected, just in general with software and hardware issues,what is surprising its younger folk that seem not to care/know/bother about security rather than older folk that you would think are less savvy
what figures are you looking at regarding patched v unpatched?

kdcdq · Apr 12, 2012

steve1955 said:

the number unpatched machines we get exceeds fully updated ones,not just talking malware problems,which would be expected, just in general with software and hardware issues,what is surprising its younger folk that seem not to care/know/bother about security rather than older folk that you would think are less savvy
what figures are you looking at regarding patched v unpatched?
Click to expand...

Steve1955, that is EXACTLY what my experience has been: many, MANY young folks either can't or won't be bothered by security. And it's not just their computers but their phones and tablets as well. My nephew, for example, doesn't even think security is his responsibility; he believes, sadly, that the makers of the software and hardware should "protect" them from themselves somehow. It beats anything I have even seen...

Techfox1976 · Apr 12, 2012

Hi Simon! Thanks for the return responses. Though I usually don't like in-lining things on Wilders, I will.

si_ed said:

Please note that the points lost are affected by the prevalence of the legitimate file. Although the exact details for the FP tests are not published, I'll let you know the reason for Webroot's score, as you highlight it.

Webroot generated a blocking FP on a 'very low impact' file, which means a penalisation of -0.1. It also generated a warning FP on a 'medium impact' file, which means a penalisation of -0.5.

This comes to a total of -0.6.

40 - 0.6 = 39.4
Click to expand...

From the report:
"Each time a product allowed a new legitimate program to install and run it was awarded one point."

So in order to get to the starting number of 40 that you quote above, it would have had to allow the one item that was, in theory, blocked. Which did it do? Block it, or allow it? If it blocked it, then it would not have gotten that 40th point for allowing it, so it would start at 39 points, minus 0.6, which would be 38.4 points total. I think most people view a difference between "Warn, but still allow", in which case it would get the 40th point for allowing it, but also lose points for warning, and "Block", which directly indicates "Doesn't allow", which not-allowing would prevent it from getting the 40th point necessary for the score.

I don't think that it's fair to say that allowing all products an equal chance to protect the system, using real-time and on-demand scanners, cloud lookups and sandboxing is neutering any specific product.

The threat is introduced to the system and the security product is allowed a chance to stop it. Even if it fails it is given a second chance to scan the system. If the threat survives the real-time protection, any sandboxes, a reboot, cloud lookups and finally an on-demand scan then I think it's fair to draw the line. Obviously a product may detect and remove the threat properly at some point in the future, but these are the rules of the test and the conclusion don't make any claims that product X could never remove the threat. We just consider it more desirable that the threat is blocked, then less ideally if it can be stopped and removed.

It could be interesting to run a test in which systems are infected and then the products are given one chance each day to remove the threat. We'd then see which took the longest to receive useful updates. That would take a long time and I'm not sure if it would be much more useful, but if the demand for such tests is there we'll certainly explore that approach.
Click to expand...

In the case of WSA specifically, I've seen numerous situations where it finds a threat, removes it, then runs scans multiple times until it decrees that the threat is fully removed. Due to the speed of the scans, this isn't a matter of being a multi-hour process. Even four repeat scans to fully clean can be done within 10 minutes in many cases. Given that that specific product is designed explicitly to perform multiple scans to fully remediate in a safe manner, stopping it after two of the numerous it requests is very likely to reduce its efficacy. Notably, that may not have applied in this situation, however it's something to consider.

Familiarity with the operation of the product being tested should always be considered a prerequisite for testing. Although I do agree. Performing a test that worked across several days would be another interesting thing to look at, especially with the prevalence of cloud analysis these days. Situations where traditional definition-based could lag by weeks when things are often addressed in days or hours in the cloud.

User behaviour: We have to choose some consistent approach. If a security application makes a recommendation it seems the fairest option to take. If it does not, then at least our 'user' needs to be consistent. In practice, all of the products make a recommendation. Those top/left rules are just there in case a particularly cowardly product says, "I don't know - you decide!"
Click to expand...

Understandable. This is one of those cases where missing information caused concern. The more up front the report is, the less cause for cynicism.

Platform: We see that there are very many threats targeting this platform. This rather suggests that plenty of potential victims are too. I'd be genuinely interested to see any figures that suggest the contrary.
Click to expand...

Looking up any public web stats is a start there. Since the platform specifically requires IE7, for example, looking at IE breakdowns is informative. Your own website's stats can be a start also, though the spread of technology and "Who is interested enough to visit this site?" will create a shift in a given direction simply due to demographics.

Looking at the analytics on two small sites I run, I see a 0.9% to 3.4% distribution of IE7 specifically. Though IE7 alone is not enough to indicate that the machine is also unpatched on the OS itself, it's a decent enough baseline.

Public locations such as:
http://gs.statcounter.com/#browser_version-ww-monthly-201103-201203
http://www.w3schools.com/browsers/browsers_explorer.asp

show IE 7 at 2.9% and 2.5% use respectively for March of this year. Although reducing it by machines that are actually patched would make the results lower, even giving a high estimate of 3% still results in an insignificant number of potential victims meeting the vector specification.

In all cases, it's very difficult to get accurate numbers for open attack vectors, simply due to a lack of non-discriminatory data, however I personally would prefer to see a test address attacks that can affect a larger number of potential victims.

The threats target that, yes. But not specifically. The threats don't run in a state wherein they will use the logic "I must have unpatched SP3 and IE 7 otherwise I will not work properly at all". Most web threats I have seen use the wall-stick method. Throw -everything- at it and see if anything sticks.

Even if a month-old flash exploit is more likely to find a target than a year-old browser, there is no reason not to target the year-old browser exploit as well. Adding another exploit to check against doesn't cost the criminal anything at all, especially with kits like Blackhole and others. When a test creates a situation where the "real life" targets must meet a specific set of requirements that are met by a ridiculously small portion of the population, it calls into question the realism and thus the applicability of the test.

To put it into perspective, it is similar to feeding chili to 100 people who are part of the 2% (fictional quantity) who always suffer from chronic heartburn to the point of needing prescription help, and testing an antacid against that. Perhaps the antacid will fail for 80 of them, despite the fact that it works perfectly fine for people who don't suffer from chronic extreme heartburn. If the report then said "This antacid is ineffective", it would not be true if the antacid is effective for 99% of the 98% of the population that don't suffer even though it is not effective for 80% of the 2% that do.

Vulnerable software: We are trying to test the security software, not the user's approach to patching.
Click to expand...

The downside there is that it once again reduces the reality link.

Realism in general: We use this term to distinguish the methodology from on-demand scan tests, in which sites are not visited, exploits are not run and products are not given a chance to intercept any hostile actions.
Click to expand...

See above. This antacid failed for 80 out of 100 real people.

The presence of the files you mention are not sufficient for a compromise to be registered. The system will have a number of issues, including Registry entries (e.g. Runonce) and so on.
Click to expand...

Good to know now.

You are right. This is an error and I am glad that I now know about it. Fortunately it is the only such error and I will ensure that the report is updated ASAP. Obviously removing one compromise from quite a large set of compromises isn't going to change the order of products in the tables and charts (I've just checked and WSC would score 52.4 in the Total Accuracy graph).
Click to expand...

The purpose behind pointing that out is not to try to "get something a better score" but rather to show that there were material errors, which calls reliability into question.

First thing of interest:
Play both defense and devil's advocate at the same time. Why do your results vary so dramatically from the results of very highly reputed and trusted testing organizations?

Second thing in general:
Security is a balance. The AV should be the final line of defense, not the first. By the extremes of a security test: "This computer that has been filled with cement was not compromised by any of the threats whatsoever. Thus, cement is the best security."

Less-Extreme, but seems to be the case in this situation: "This security suite monitors for threats that can only affect a tiny fraction of the online population, so we are going to test specifically against those threats, since it will A: force the suite to react, and B: also find out whether other suites monitor for these ancient threats." Spending the customer's computer resources catching corner cases that one knows the competitors don't bother to look for anymore due to obsolescence is an excellent way to ensure that a test done explicitly with those edge cases will put the competitor in bad light.

A quick investigation of several of the exploit sites (this is not inclusive of all of the sites, so some sites may no exhibit this behavior) indicated a few things:
1: A fully patched system was not compromised.
2: An unpatched test system with general software specs of the test resulted in "First available exploit" behavior from the attacking site. This means that once it hit an obsolete exploit that was successful, it did not continue the attack attempt to other vectors. The downside to this is that any security suite that intelligently no longer checks for code that exploits the obsolete vectors will not catch this. Thus, the several hundred megabytes some suites dedicate on a customer's computer to blocking mostly-extinct vectors will definitely give it a better score than any suite that overlooks corner cases in order to not consume the customer's resources.

I still stand behind the view that "You always want to compare your brightest side against the opponents weakest side" and the fact that it's possible to make numbers in tests give any answer you want when you want a specific answer. As this was commissioned by Norton specifically, the stigma of bias is very hard to shake.

So the most telling thing really will be the diagnosis of what caused this test to have results so distantly removed from numerous other tests. The second thing of course will be an evaluation of whether the test is actually relevant in general considering its target frame.

---- Segueing to a philosophical bent
All in all though, with the current threat landscape, I am beginning to wonder if anything besides very basic sanity checking and reality itself can provide good tests of AV software, and whether the software alone really means as much as it used to.
- Nothing is 100% effective 100% of the time.
- A bad week during a test can kill an otherwise-excellent record for a package.
- A good week or good random selection can exalt an otherwise horrible package.

With the way things are working these days, detection rates alone are not enough. System impact, assisted remediation, support, stability, and various other factors really need to be looked at. For every story or test showing how good a program is, you can find a story or test that tells how bad it is.

Reality tests...
- How much of my computer will belong to the suite and no longer belong to me?
- What kind of protection on average will I get for this cost in both money and computer resources?
- If I call for help, will I reach somebody I am comfortable speaking to who understands me and I can understand?
- If (when?) something gets by, what kind of help will I get to fix it?
- Can they repair tertiary damage?
- How much will the extra help cost?
- How much will it get in the way when I try to do legitimate things?

There are others, but I think everybody can see where I'm going with that and I've probably rambled on enough right now. ^.^

Techfox1976 · Apr 12, 2012

steve1955 said:

the number unpatched machines we get exceeds fully updated ones,not just talking malware problems,which would be expected, just in general with software and hardware issues,what is surprising its younger folk that seem not to care/know/bother about security rather than older folk that you would think are less savvy
what figures are you looking at regarding patched v unpatched?
Click to expand...

I'm under the assumption that when you say "we get", you mean "in for repair" or something similar. If that is the case, then it's completely natural. Unpatched machines are intrinsically more likely to need repair than patched ones, while the patched ones are substantially less-likely to be brought in for repair. A repair shop is going to naturally get a higher percentage of unpatched machines by that simple concept.

When you look at a larger demographic, millions of web surfers*, for example, things like "Surfer is using IE7" drop to under 3% this past month pretty much across the board. Given that the use of IE7 doesn't guarantee that the rest of the machine is unpatched, the number of unpatched machines becomes even lower. While there is no method from web metrics alone to get a very solid value or specifically check for the state of individual patches, certain general assumptions are relatively safe and do not introduce too large a margin of error or uncertainty.

(* http://www.w3schools.com/browsers/browsers_explorer.asp for example... Even this collection of statistics is skewed by the nature of the sites the stats are gathered from)

S.B. · Apr 12, 2012

steve1955 said:

the number unpatched machines we get exceeds fully updated ones,not just talking malware problems,which would be expected, just in general with software and hardware issues,what is surprising its younger folk that seem not to care/know/bother about security rather than older folk that you would think are less savvy
what figures are you looking at regarding patched v unpatched?
Click to expand...

Although this may be correct, it doesn't justify the selection of an unpatched and attack susceptible system for the Norton-sponsored Dennis Technology Labs Tests. The Dennis Technology Labs Tests report was clearly directed at an audience that is interested in security. As you acknowledge, the individuals you discuss are not interested in security. One would not expect such individuals, i.e., those who are not interested in security, to read the Dennis Technology report.

By the same token, it is to be expected that the majority of individuals who are sufficiently interested in security to read the Dennis Technology Report would not be using an unpatched and attack susceptible system.

__

S.B. · Apr 12, 2012

si_ed said:

Thanks for taking an interest in this test. I'll address some of the issues below as best I can:

...

I don't think that it's fair to say that allowing all products an equal chance to protect the system, using real-time and on-demand scanners, cloud lookups and sandboxing is neutering any specific product.

The threat is introduced to the system and the security product is allowed a chance to stop it. Even if it fails it is given a second chance to scan the system. If the threat survives the real-time protection, any sandboxes, a reboot, cloud lookups and finally an on-demand scan then I think it's fair to draw the line. Obviously a product may detect and remove the threat properly at some point in the future, but these are the rules of the test and the conclusion don't make any claims that product X could never remove the threat. We just consider it more desirable that the threat is blocked, then less ideally if it can be stopped and removed....

Best wishes,
Simon Edwards
Dennis Technology Labs
Click to expand...

With all due respect, I must say I believe your test is misleading and do not believe your explanation resolves the matter. You claim to be testing the efficacy of the software in question. The fact that the software manufacturer has chosen an operational methodology of which you do not approve, to remove malware from a user's system, does not justify your choice of not using the software's operational methodology and then leading your readers to believe that the software failed to achieve its intended purpose.

You have the choice of not including the software in your tests if you disagree with it's methodology. You could alternatively accurately report how the software does function when properly used pursuant to the manufacturer's instructions, and further inform your readers of your objections to the software's methodology. But it clearly is not proper to claim to test software while using the software in a manner contrary to the manufacturer's instructions. My point is simply this: if you choose to test specific software, it is only fair that you use and test that software in accordance with the manufacturer's instructions for use of the software.

If I have misunderstood your explanation, please let me know.

__

si_ed · Apr 13, 2012

Thanks again for your logical and passionate responses. It's great to know that so many people really care about AV testing as much as we do.

The testing methodology we use is not defined by the sponsor. It's ours and the sponsor has to accept it or we won't do the work. In fact, a test using this same methodology was accepted by AMTSO as being 100 per cent compliant with its testing principles. Currently ours is the only test that has been found to match the fairness and accuracy of those principles.

As I mention AMTSO, did you know that it has its own public forum, to which all of you are welcome to join? You'll find many vendors represented there, as well as testers including AV Test and AV Comparatives.

That forum is pretty quiet right now so you guys could really make a difference (and some noise!)

One word about AMTSO: the members and some associates meet fairly regularly and we all know and trust each other pretty well. As an example, two months back we had all of the major vendors (techie guys, not marketing) plus AV Comparatives, VB, DTL (me), Veszprog and ICSA all in attendance. We work out ways to conduct fair tests as best we can, while we all have our own strong opinions.

Can there ever be a perfect AV test? I don't think so, but we can all make our own valid approaches and you, the readers of the reports, can make judgements based either on those tests that you trust more *or* by looking at the entire set of information and forming an opinion from there.

You do get weird results sometimes, when a vendor's back-end develops a problem. I remember doing a test for Vendor A in which its product performed really well. One week after that test finished we started a similar test for Vendor B. Suddenly Vendor A's performance dropped off and my heart sank. These results are going to look biased, because how can a Vendor A-sponsored test show its product as being the best and one week later a Vendor B-sponsored test show its product as being the best?

The reason, it turned out, was that coincidentally Vendor A's cloud back-end developed a problem just after our test for them finished.

With that in mind, DTL is actually moving to a regular, non-sponsored testing model from July. This means that you'll be able to see the products' performance span months/years, rather than reading the odd test that pops up annually. Although vendors will have to pay to gain access to certain engineering information, the top-line results will be free to all. I can't think of a much fairer way to deliver good, inexpensive test results.

Given that the tests we usually do until now *are* sponsored, there are some fairly regular (incorrect) assumptions made whenever we publish a test, and I've seen a few of them appear in this thread. I've written a sort of FAQ that provides information on the bulk of these.

If you want some fun reading, you may also be interested to see a redacted internal report that we provided to a vendor when they challenged one of our results. This may be interesting to you because it shows the level of detail that we go into when analysing an incident to see if it was defended/neutralised/compromised.

Techfox1976 said:

From the report:
"Each time a product allowed a new legitimate program to install and run it was awarded one point."
Click to expand...

I'll admit that this is misleading. What it should say, and I'll change this for future reports, is that each product is initially awarded 40 points and is then penalised for every FP that it generates during the test.

I think I've rambled on enough myself now!

Best wishes to you all.
Simon Edwards
Dennis Technology Labs

S.B. · Apr 13, 2012

si_ed said:

...

Given that the tests we usually do until now *are* sponsored, there are some fairly regular (incorrect) assumptions made whenever we publish a test, and I've seen a few of them appear in this thread. I've written a sort of FAQ that provides information on the bulk of these.

If you want some fun reading, you may also be interested to see a redacted internal report that we provided to a vendor when they challenged one of our results. This may be interesting to you because it shows the level of detail that we go into when analysing an incident to see if it was defended/neutralised/compromised.

...
Click to expand...

Just to get things straight. In your prior comments you used the following description:

"... allowing all products an equal chance to protect the system, using real-time and on-demand scanners, cloud lookups and sandboxing..."
to characterize your behavior in which you forced an internet security product to participate in a test having rules that precluded proper functioning of that product.

Given that level of double speak, why should anyone expect that a Dennis Technology Labs' "FAQ", or "redacted internal report" would be fully forthcoming? As to the "level of detail" used by Dennis Technology Labs in its tests, it seems to me that the current thread is most enlightening, without more.

__

si_ed · Apr 13, 2012

S.B. said:

you forced an internet security product to participate in a test having rules that precluded proper functioning of that product.
Click to expand...

Please explain how the product's proper functioning was precluded. I suspect that we may have our wires crossed here, because Webroot is familiar with our methodologies and has never raised an objection.

FYI, we always communicate with vendors involved in the tests and Webroot is no exception. In fact the people that I speak to at that company are always glad to hear from us. We have a good working relationship with all vendors, whether they are members of AMTSO or (like Webroot) are not.

Best wishes,
Simon

S.B. · Apr 13, 2012

si_ed said:

Please explain how the product's proper functioning was precluded. I suspect that we may have our wires crossed here, because Webroot is familiar with our methodologies and has never raised an objection.

FYI, we always communicate with vendors involved in the tests and Webroot is no exception. In fact the people that I speak to at that company are always glad to hear from us. We have a good working relationship with all vendors, whether they are members of AMTSO or (like Webroot) are not.

Best wishes,
Simon
Click to expand...

Thank you for addressing this question. On page 17 of Dennis Technology Labs' 03/02/2012 report you state:

"In some cases a product might request a further scan to complete the removal. We considered secondary scans to be acceptable, but further scan requests would be ignored."

The above states that your testing rules provide that "scan requests [from a product undergoing testing] would be ignored". It seems clear that stopping a product from completing an ongoing task would preclude proper functioning of the product.

Thank you for any comments you have on this.

__

steve1955 · Apr 13, 2012

Techfox1976 said:

I'm under the assumption that when you say "we get", you mean "in for repair" or something similar. If that is the case, then it's completely natural. Unpatched machines are intrinsically more likely to need repair than patched ones, while the patched ones are substantially less-likely to be brought in for repair. A repair shop is going to naturally get a higher percentage of unpatched machines by that simple concept.

When you look at a larger demographic, millions of web surfers*, for example, things like "Surfer is using IE7" drop to under 3% this past month pretty much across the board. Given that the use of IE7 doesn't guarantee that the rest of the machine is unpatched, the number of unpatched machines becomes even lower. While there is no method from web metrics alone to get a very solid value or specifically check for the state of individual patches, certain general assumptions are relatively safe and do not introduce too large a margin of error or uncertainty.

(* http://www.w3schools.com/browsers/browsers_explorer.asp for example... Even this collection of statistics is skewed by the nature of the sites the stats are gathered from)
Click to expand...

The stats from that site show firefox and chrom to be 2x more popular than IE,where are thay getting their numbers from,just their customer base/website visitors?

si_ed · Apr 13, 2012

S.B. said:

Thank you for addressing this question. On page 17 of Dennis Technology Labs' 03/02/2012 report you state:

"In some cases a product might request a further scan to complete the removal. We considered secondary scans to be acceptable, but further scan requests would be ignored."

The above states that your testing rules provide that "scan requests [from a product undergoing testing] would be ignored". It seems clear that stopping a product from completing an ongoing task would preclude proper functioning of the product.

Thank you for any comments you have on this.

__
Click to expand...

OK, I understand. We would not stop a scanning process that was undertaken automatically. So if Product X wanted to run a series of scans automatically, we wouldn't interfere with that.

What the above from the report means is:

1. The system is infected by the malware.
2. The system is rebooted and an on-demand scan is run.
3. The products says something to the effect of, "I couldn't remove it but please reboot and re-scan, then I'll be able to."
4. The products instructions are followed.

If, however, it's clear that no progress is being made then we'd not follow these instructions for an infinite period of time. We draw the line at twice, although we might make an exception if the product was clearly making some progress.

Best wishes,
Simon

S.B. · Apr 13, 2012

si_ed said:

OK, I understand. We would not stop a scanning process that was undertaken automatically. So if Product X wanted to run a series of scans automatically, we wouldn't interfere with that.

What the above from the report means is:

1. The system is infected by the malware.
2. The system is rebooted and an on-demand scan is run.
3. The products says something to the effect of, "I couldn't remove it but please reboot and re-scan, then I'll be able to."
4. The products instructions are followed.

If, however, it's clear that no progress is being made then we'd not follow these instructions for an infinite period of time. We draw the line at twice, although we might make an exception if the product was clearly making some progress.

Best wishes,
Simon
Click to expand...

Just to be sure I understand what you are actually saying.

If the product requests a further scan to complete the removal, you do not ignore that request, as stated in your report. Instead, you grant the product's request for a further scan to complete the removal.

Is that correct?

If so, I would agree that such methodology isn't a problem, (even though such methodology would be at odds with the description of the methodology in your report).

Kind regards.

__

si_ed · Apr 13, 2012

S.B. said:

Just to be sure I understand what you are actually saying.

If the product requests a further scan to complete the removal, you do not ignore that request, as stated in your report. Instead, you grant the product's request for a further scan to complete the removal.

Is that correct?

If so, I would agree that such methodology isn't a problem, (even though such methodology would be at odds with the description of the methodology in your report).

Kind regards.

__
Click to expand...

That is correct, up to a reasonable point.

If we ran a scan and the product claimed that it detected W32_Threat_ABC123, but that it needed just one more scan to remove this threat, then we'd allow it the second chance.

If it then made *the very same request*, for the third time, we'd stop the test because the product clearly isn't achieving what it hopes to.

In other words, it gets not just one attempt to remove the threat using an on-demand scan but two.

I believe that this matches what the report says we do: up to two manual scans.

We do reserve the right to be more flexible/reasonable if it is apparent that the product is making some progress. (e.g. It removes W32_Threat_ABC123 and now needs another scan to remove W32_Threat_DEF987).

But if it just repeatedly asks for a re-scan then continuing is pointless.

Please also note that we have to figure out what most people will actually care to read about in a report. I think our methodology is pretty detailed in there - more so than in most.

It doesn't cover absolutely every possible situation and actually I think that future reports will contain even less detail. We'll probably put the methodology online as a separate document, though.

I hope that clears things up.
Have a good weekend all.
Simon

S.B. · Apr 13, 2012

si_ed said:

That is correct, up to a reasonable point.

If we ran a scan and the product claimed that it detected W32_Threat_ABC123, but that it needed just one more scan to remove this threat, then we'd allow it the second chance.

If it then made *the very same request*, for the third time, we'd stop the test because the product clearly isn't achieving what it hopes to.

In other words, it gets not just one attempt to remove the threat using an on-demand scan but two.

I believe that this matches what the report says we do: up to two manual scans.

We do reserve the right to be more flexible/reasonable if it is apparent that the product is making some progress. (e.g. It removes W32_Threat_ABC123 and now needs another scan to remove W32_Threat_DEF987).

But if it just repeatedly asks for a re-scan then continuing is pointless.

Please also note that we have to figure out what most people will actually care to read about in a report. I think our methodology is pretty detailed in there - more so than in most.

It doesn't cover absolutely every possible situation and actually I think that future reports will contain even less detail. We'll probably put the methodology online as a separate document, though.

I hope that clears things up.
Have a good weekend all.
Simon
Click to expand...

Thank you for your clarification.

With same in mind, I must return to my original objection. In particular, if you choose to test a product that is designed to remove malware by employing three scans, it is only fair that you allow the product a chance to complete the third scan.

Although I can certainly understand that you might object to the third scan on the basis of convenience, such an objection is hardly a proper basis for presuming that the product would not remove the malware if allowed to continue as per the product manufacturer's design and intent.

Kind regards.
__

Log in or Sign up

Dennis Technology Labs: PC Total Protection Suites 2012

Techfox1976 Registered Member

S.B. Registered Member

STV0726 Registered Member

STV0726 Registered Member

Osaban Registered Member

Stefan Kurtzhals AV Expert

STV0726 Registered Member

si_ed Registered Member

steve1955 Registered Member

Techfox1976 Registered Member

steve1955 Registered Member

kdcdq Registered Member

Techfox1976 Registered Member

Techfox1976 Registered Member

S.B. Registered Member

S.B. Registered Member

si_ed Registered Member

S.B. Registered Member

si_ed Registered Member

S.B. Registered Member

steve1955 Registered Member

si_ed Registered Member

S.B. Registered Member

si_ed Registered Member

S.B. Registered Member

Log in or Sign up

Dennis Technology Labs: PC Total Protection Suites 2012

Techfox1976 Registered Member

S.B. Registered Member

STV0726 Registered Member

STV0726 Registered Member

Osaban Registered Member

Stefan Kurtzhals AV Expert

STV0726 Registered Member

si_ed Registered Member

steve1955 Registered Member

Techfox1976 Registered Member

steve1955 Registered Member

kdcdq Registered Member

Techfox1976 Registered Member

Techfox1976 Registered Member

S.B. Registered Member

S.B. Registered Member

si_ed Registered Member

S.B. Registered Member

si_ed Registered Member

S.B. Registered Member

steve1955 Registered Member

si_ed Registered Member

S.B. Registered Member

si_ed Registered Member

S.B. Registered Member

Useful Searches