The data is coming from the user's computers and NOT from the users themselves - that is the difference. Some products use a user-based approach where if the user clicks 'Allow' their database logs that and then says if x% of users have clicked 'Allow' it will always click 'Allow' - that is <not> what we do.
Our researchers do not sit and write definitions: they periodically will update heuristics, but they do not go and find a sample of, say, XP Antivirus and mark it. They will mold the heuristics to better handle it if necessary in the case that a program was not automatically found. However, we see literally tens of thousands of new malicious samples per day and the reason why we can stay on top without huge cost is that our infrastructure is scalable - it doesn't require more manpower to handle more samples. Infections are always changing and nothing is perfect which is why we still have researchers to keep up with mutations that can't be caught automatically. Our systems prioritize infections and report many screens full of data to the researchers so that they can quickly make a decision on the file. The database then finds correlations between the decision which the researcher made and other samples and will mark similar infections as bad automatically and then handle variants and mutated infections based on the original decision as bad.
False positives are a completely different story. A majority of the false positives reported here are on unpopular software and by the time I get a log with the file in it, the database has already corrected the determination so I don't need to do anything. However, of course there are times when a signature became a bit too heuristic and needed taming and the opposite is true as well. Some pieces of software do bizarre things you would never expect them to do which is why they get flagged. When I "fix" a false positive, I mark the original file and then forward the file on to the research team to correct that part of the heuristic engine to prevent future similar false positives.
Our false positive rate is barely noticeable (far less than 1/1000th of 1% based on some rough math) compared to the staggering number of infections we block every day and the masses of good software we see every day. FPs just rise to the top of forum posts while real detections remain hidden because most of those users aren't on Wilders with 15 active security products
