Google search weirdnesses

axial · Jul 6, 2009

I've been finding some really odd-ball results on Google searches and/or Google News alerts lately; is there any place in particular where one might read about or even report such apparent Google search hacks? Would it even do any good to report them?

For example: search for the following (without quotation marks):

olson's food emporium

which is a store in Seattle. 4 out of 10 hits on the first page of results have urls with a changing 6-letter code + ".ws" as the TLD (.ws is apparently Western Samoa, according to wikipedia) One of the (slightly obfuscated here so as to be unclickable) results is:
http://xxxx-yyyy-elegant-foods.hgg4id.ws/

Is it nasty? I haven't clicked on the links so I don't know what they go to, but my bet is that the average surfer wouldn't even notice such strange links.

This is just one example -- I have "collected" several other examples that can be quite easily reproduced for testing, if anybody would care for more details.

JRViejo · Jul 1, 2009

axial said:

is there any place in particular where one might read about or even report such apparent Google search hacks? Would it even do any good to report them?
Click to expand...

axial, the URLs look like doorway pages to more search results links. Use Google's Report a Spam Result unauthenticated form (no need to register) and yes, Google will take action if any abuse is found.

axial · Jul 2, 2009

Thanks for the Google reporting link, JRViejo, I'll definitely submit the issue.

JRViejo · Jul 2, 2009

axial, you're welcome! Take care.

callmebc · Jul 6, 2009

Umm...that's actually probably being caused by at least one really huge botnet that's been manipulating Google search results via a combination of duping Google's search algorithms into thinking certain files and folders are where they aren't and then changing the robots.txt file on the infected sites involved to block Google's Googlebot web crawler from verifying the actual contents of the site. I'm pretty sure Google and others know of the issue.

I just registered here mostly to ask a question, but I spotted this. That question I want to ask is semi-related: I mapped out a tiny portion of the probable botnet involved and in the course of doing that, I generated a 69 page list of infected web sites. Some were obviously throw away sites, some were bogus results from DNS manipulation, but there were still a good many legitimate sites that were thoroughly infected, including a volunteer fire department in the US. I then discovered that there appears to be no US agency or department really geared up to do something with a list like that, regardless of whether they use the word "cyber" in their name or mission statement. US-CERT was especially useless -- the guy I talked to said they would only deal with that if the list contained government or critical infrastructure web sites (fire departments don't cut it, apparently) and that I should contact the site owners. And he had no idea who else I could contact.

I ended up giving the info to someone at isc.sans.org who said they would deal with it, but that fire department site was infected for a while longer. When I noticed that it appeared to be no longer a botnet hub I contacted the site manager, a volunteer fireman himself, and he said he had to fix things himself (I had let him know about the problem when I noticed there was a fire department site involved, and gave him some suggestions), but that it still wasn't completely clean.

My question is about what sort of experience any of you had in dealing with federal and even state agencies when it comes to big cyber issues. I do have dealings with one US agency, but cyber is not their main mission and their interests there are pretty narrowly focused. I've seen and heard nothing but bad things regarding DHS, the NSA seems to have been only taking care of their own random, unhelpful side projects the past several years, and that US-CERT guy might as well has been a delivery guy who picked up the phone because nobody else was around.

So if anyone has had similar or different experiences with these or other agencies, I'd be very curious to hear of them, especially any good experiences in particular.

axial · Jul 6, 2009

callmebc, I appreciate your info on the possible reason for google search weirdnesses. I'm continuing to look for other examples.

If I might make a suggestion, I think you would do well to create a brand new topic for your question, and repost your message -- it's an interesting topic, and I think folks here would be interested in it, but it'll be lost here under a different (wrong) msg title.

1boss1 · Jul 6, 2009

They have other domains such as wb5doj.ws and q1ww0a.com, they are being used as "feeder" sites to all drive traffic to megaonlinesearch.com which is an affiliate script to generate revenue from clicks.

You can report them to Google, but there's not a whole lot of point. These guys can push out 1,000's of these subdomains in minutes and they do it knowing full well the domains will be banned in a week or so anyway.

It's all about volume, they can buy a .ws for a few bucks, make a few hundred dollars before they get banned and repeat it all over again. Most of it's automated right down to scripts/API's registering and paying for the domains, hence random numbers & letters to ensure the domains not already taken.

The wheels need to turn faster at Google, because the banning delay is a huge hole that's being heavily exploited.

@callmebc - Often the best way is do a lookup on the Domain/IP and contact the sites web hosting provider if a legitimate site is infected. Most will act quite quickly because they don't want compromised servers, it's detrimental to their business and other customers sites. Hosts will have direct contact with the sites owner for billing and so forth, plus have the ability to pull the site offline so it doesn't infect regular traffic visiting the site.

callmebc · Jul 6, 2009

I was thinking of that -- which forum do you think would be the most appropriate?

Also if you do a Google search for an infected website plus one of the bogus folders on that site created by the botnet, that will generally list a pile of other infected sites. Some of the Google folder listings will likely be from DNS manipulation, though, and not actually physically present on the websites listed.

For example, one infected site linked to that fire department site is studentmix.com -- some sort of site for students. According to info contained in infected files on the fire department site, the botnet created a folder called "blowjob" off of the student site -- which is actually very unusual: most of the created folders off the root have much more innocuous names like "admin" or "language" or such. So if you now do a Google search on: studentmix.com blowjob

You should then get a few thousand or so listings of other bad sites linked in with that. There was a British high tech company also linked to the fire department site and I just had to search on their domain name plus a much more innocuous sounding "sitebuilder" to generate a long list. But they seemed have cleaned things up so there is only some latency results now.

If you guys are that interested, I can post some of the info I sent to the isc. Just let me know where.

callmebc · Jul 6, 2009

1boss1 said:

@callmebc - Often the best way is do a lookup on the Domain/IP and contact the sites web hosting provider if a legitimate site is infected. Most will act quite quickly because they don't want compromised servers, it's detrimental to their business and other customers sites. Hosts will have direct contact with the sites owner for billing and so forth, plus have the ability to pull the site offline so it doesn't infect regular traffic visiting the site.
Click to expand...

The problem is that there are so friggin' many and I can pretty much generate lists of them at will now -- which gets back to my original point: is there anybody I can just hand this info to? I kind of have more than a full plate already....

callmebc · Jul 6, 2009

This is an example of an infected webpage:

~Link removed. No links to possible malware are to be posted on the forums.~

Typically the botnet will create a pile of files with names derived from legitimate names on the site. So the "l__atest-news.html" may be based on a file called "latest-news.html" and there may be several variants of the name used for other files. The infected files will be of two types: existing files like index.html getting a malicious code injection, or other existing files and stuff like the above that will usually contain a long list of other infected web sites and folders. If you load that above link into your browser (while it appears to be code free, you might want to use Sandboxie or such) you should see an old broken Happy Valentines Day Google page. If you then view the page source -- with word wrapping on -- you should see a long list of other infected web sites.

The botnet seems to run updates pretty regularly, most frequently on the code-containing files. The code will generally not be detectable by most if any antivirus product right away - it might take a week or so at best, and by that time, the code will likely have changed.

Now it's important to note that the above seemingly obviously infected page may not actually be on studentmix.com -- it might just be some DNS manipulation that only makes it appear that way, as counterintuitive as that might seem. But it's still very likely the web site -- or even the hosting site -- has been compromised in some manner. It's fairly tricky business....

-BC

axial · Jul 6, 2009

I have other examples of Google search and Google Alert issues.

In the "Alert" service, the results e-mailed to the user often contain "manufactured" links, i.e. web pages where the bulk of the page content is random text in English but the page is sprinkled with the Alert term so it must have been created on-the-fly.

Another search came up with an interesting twist, where the whole site was a collection of current -- and quite legitmate -- press releases. The page title (the original press release title) was intact, but the text on the page was horrendously mangled English that seemed to be slightly based on the original text.

I hesitate to post the exact search terms for the two above examples so as not to lead anybody into trouble.

1boss1 · Jul 6, 2009

Yeah this sort of thing is getting very common, many of those hacked sites will not do anything malicious most of it's link spam to game search rankings/traffic for profit or redirects to affiliate sites.

Typically what happens is the site owners FTP credentials get compromised, or an insecure web script/form gets exploited allowing the hacker to upload a c99 or r57 shell script. This gets them full control over the server or file system, allowing them to inject code in to existing pages or make new ones.

Here it can go two ways, they can add trojan droppers and drive-by type code and be dangerous or they can inject links to boost their other spam sites or even add sub-directories with thousands of spam pages to "piggyback" on the sites authority.

When they inject links or add pages, they usually try to make them as hidden as possible so they stay up longer then the botnet as you mention (Google Xrumer) will spam forums, guestsbooks, web 2.0 sites etc and point links to the sites they have hacked so these pages rank high and get more traffic.

The "DNS Manipulation" you mentioned is likely a redirect, as soon as you land on the URL you are shunted off to some other nefarious page to either earn them money, or infect you.

It can be hard trying to contact the site owners, for instance the other week i emailed 20 domains letting them know their servers were exploited and hacked and i got 6 replies. The other 14 either didn't reply or bounced.

If it's actual malware, you can report the list of domains to Google here: http://www.google.com/safebrowsing/report_badware/

This will get the "This site may harm your computer" warning in Google at least, and prevent a lot of people visiting the page.

callmebc · Jul 6, 2009

1boss1 said:

Y
The "DNS Manipulation" you mentioned is likely a redirect, as soon as you land on the URL you are shunted off to some other nefarious page to either earn them money, or infect you.
Click to expand...

Nope, it's actually full DNS trickery, especially where Google is concerned. One way it might work is that you do a search on whatever and in your search results you might get:

1) Result 1
http://thiswebsite/result1/

2) Result 2
http://anotherwebsite/admin/result2/

3) Result 1
http://yetanotherwebsite/result3/

But the /result2/ and /result3/ folders don't actually exist on those sites. Google, however, thinks they are there. As I mentioned before, on these infected sites, usually the robots.txt file is changed to disallow Google's Googlebot from crawling the site. The real proof comes when that site gets cleaned up and all the bogus files and folders get deleted and the robots.txt file gets put back to normal, allowing Googlebot to do its thing. If you are the site owner and look at the subsequent error logs, you will then see Googlebot generate a pile of 404 errors looking for files and folders that were never there, even when the site was infected.

As I said, it's somewhat tricky business....

But getting back to my original point -- has anybody had good experiences with any federal agency dealing with stuff like this?

-BC

1boss1 · Jul 6, 2009

axial said:

Another search came up with an interesting twist, where the whole site was a collection of current -- and quite legitmate -- press releases. The page title (the original press release title) was intact, but the text on the page was horrendously mangled English that seemed to be slightly based on the original text.

I hesitate to post the exact search terms for the two above examples so as not to lead anybody into trouble.
Click to expand...

Yeah great content huh, when they scrape (pinch) content from other sites, the chances of them ranking and getting traffic to the page is almost nil because the document is an exact copy of an existing one. So they "spin" it using Markov Chains, Synonym Replacements etc and generally butcher it so it's classed as "original" and ranks.

It reads like gibberish to humans, but search engine bots don't actually "read" articles they look at keywords, word and phrase densities, word relationships and basic context.

Sadly this junk often cuts the mustard enough to rank ok for less competitive search terms, so they churn these pages out with a scrip, slap some ads or a redirect on it and call it a day.

With the Google Alerts, it's not an exploit or anything since nobody knows who's got an alert setup for what terms. It's just coincidence, since they are making thousands of pages of gibberish there's a high chance these sites will publish the word/phrase you have an alert setup for.

1boss1 · Jul 6, 2009

callmebc said:

But the /result2/ and /result3/ folders don't actually exist on those sites. Google, however, thinks they are there.
Click to expand...

If i saw an actual URL showing this behavior i could tell you what was happening. Some of these exploits are pretty elaborate, this one for example could just be if IP = Googlbot show spam page else show 404.

This way only Google sees the spam page with links, but nobody else can so people don't even know they exist. Some use cookies, some use the referrer so if i posted a link to the spam page here on Wilders the page would show 404 but if the referrer was the Google search result page for the term "Security" you would see the page but if the search term was "Securities" you wouldn't.

As for an Agency, have a look over this page: http://www.usdoj.gov/criminal/cybercrime/reporting.htm

axial · Jul 6, 2009

1boss1 said:

With the Google Alerts, it's not an exploit or anything since nobody knows who's got an alert setup for what terms. It's just coincidence, since they are making thousands of pages of gibberish there's a high chance these sites will publish the word/phrase you have an alert setup for.
Click to expand...

Isn't it more likely the opposite, that it is an exploit because there would be an impossible number of alert terms that could be used, so the only way to "ensure" that the nefarious clicks get made is to do it automatically with an exploit?

callmebc · Jul 6, 2009

1boss1 said:

If i saw an actual URL showing this behavior i could tell you what was happening. Some of these exploits are pretty elaborate, this one for example could just be if IP = Googlbot show spam page else show 404.

This way only Google sees the spam page with links, but nobody else can so people don't even know they exist. Some use cookies, some use the referrer so if i posted a link to the spam page here on Wilders the page would show 404 but if the referrer was the Google search result page for the term "Security" you would see the page but if the search term was "Securities" you wouldn't.

As for an Agency, have a look over this page: http://www.usdoj.gov/criminal/cybercrime/reporting.htm
Click to expand...

It looks as though a good many of the DNS issues were fixed since I created my initial list from a couple of weeks ago. That's good at least (although enough people were told about it.) One quick example I can show you that's also now fixed, but is still visible in Google's search and cache, is the result of doing a Google search on: PER-xml-20080205 talkdigger

You should get only one result: a site called talkdigger.com that apparently is hosting a bunch of w3.org folders and files starting with the "/sioc/www.w3.org/xml/" folder. They aren't there, though, and very, very likely actually never, ever were there, even as bogus infected copies. But if you had clicked on that search result a couple of weeks back, you would have probably ended up on a Chinese site operated by some Russian hackers. Clicking on it now will just generate an error message. You can still click on Google's cache version, though, and see a rather odd steam-of-consciousness web page.

As far as that DOJ site recommendation goes, I already know those guys and they deal with specific instances of cybercrime and hacking, like, oh say, DNS poisoning involving banks and such. While they wouldn't mind getting information on this type of widespread infection, they really aren't equiped to try to contact and assist individual websites infected by a botnet. There are a lot of other agencies and government departments tasked with something involving cyber security, though, and most seem to have pretty large budgets but with absolutely nothing to show. So I'm wondering mostly if I'm missing something here -- it doesn't seem quite right at all that the implied responsibility for contacting and trying to help these web site operators belongs to the people who notice something wrong and try to get them some help. Oh well....

axial · Jul 6, 2009

it doesn't seem quite right at all that the implied responsibility for contacting and trying to help these web site operators belongs to the people who notice something wrong and try to get them some help.
Click to expand...

The overwhelming complexity of weeding out honest trouble reports from the inevitable evil-doer-spawned reports makes it seem an impossible task.

Maybe somebody like the Web of Trust folks could offer this as a service.

Log in or Sign up

Google search weirdnesses

axial Registered Member

JRViejo Super Moderator

axial Registered Member

JRViejo Super Moderator

callmebc Registered Member

axial Registered Member

1boss1 Registered Member

callmebc Registered Member

callmebc Registered Member

callmebc Registered Member

axial Registered Member

1boss1 Registered Member

callmebc Registered Member

1boss1 Registered Member

1boss1 Registered Member

axial Registered Member

callmebc Registered Member

axial Registered Member

Log in or Sign up

Google search weirdnesses

axial Registered Member

JRViejo Super Moderator

axial Registered Member

JRViejo Super Moderator

callmebc Registered Member

axial Registered Member

1boss1 Registered Member

callmebc Registered Member

callmebc Registered Member

callmebc Registered Member

axial Registered Member

1boss1 Registered Member

callmebc Registered Member

1boss1 Registered Member

1boss1 Registered Member

axial Registered Member

callmebc Registered Member

axial Registered Member

Useful Searches