Even Anonymous Coders Leave Fingerprints

ronjor · Aug 10, 2018

Louise Matsakis 08.10.18

Researchers who study stylometry—the statistical analysis of linguistic style—have long known that writing is a unique, individualistic process. The vocabulary you select, your syntax, and your grammatical decisions leave behind a signature. Automated tools can now accurately identify the author of a forum post for example, as long as they have adequate training data to work with. But newer research shows that stylometry can also apply to artificial language samples, like code. Software developers, it turns out, leave behind a fingerprint as well.
Click to expand...

mirimir · Aug 10, 2018

I'm not surprised that it also applies to code. However, so far it seems to be language dependent:

There’s also the question of whether the same attribution methods could be used across different programming languages in a standardized way. For now, the researchers stress that de-anonymizing code is still a mysterious process, though so far their methods have been shown to work.
Click to expand...

Also interesting is that spoofing is possible. And that implies that code stylometry can be readily defeated.

In a separate paper, for instance, a team led by Lucy Simko at the University of Washington found that programmers could craft code with the intention of tricking an algorithm into believing it had been authored by someone else. The team found that a developer may be able to spoof their "coding signature," even if they're not specifically trained in creating forgeries.
Click to expand...

Also, regarding prose in different languages, they say:

Imagine you wrote a paper and used Google Translate to transform it into another language. While the text might seem completely different, elements of how you write are still embedded in traits like your syntax.
Click to expand...

OK, so I'm somewhat active online as my meatspace identity. But in my native tongue, and not in English. But I certainly don't just translate stuff. It's almost like I'm a different person when I'm thinking in American English. I draw on usage and slang that I've picked up from friends and coworkers, from many places. From "all y'all" (Southern US) to "gobsmacked" (British) to "take the decision" (Mexican). And that's totally absent from prose in my native tongue, so I doubt that stylometry would link it to Mirimir.

deBoetie · Aug 11, 2018

I suspect this would be programming language and IDE dependent. For the coding I do, the stylistic aspects are in fact constrained by the IDE, architectural, component, library and style guidelines imposed by the projects - so relatively little to go on there, particularly when others are doing the same thing. I'd view what I do as more being a librarian and hooker-upper rather than a coder.

Of course, stylometry on natural free text is far more likely to be more individual, for example, in the use of pronouns.

As usual, the article is somewhat coy in quantifying false-positives, and obviously its reasonable success rate is only when predicated on a known pool where previous example of work is already known.

mirimir · Aug 11, 2018

They do note that those who code based on searching and combining stuff are harder to identify. As opposed to those who write from scratch.

Log in or Sign up

Even Anonymous Coders Leave Fingerprints

ronjor Global Moderator

mirimir Registered Member

deBoetie Registered Member

mirimir Registered Member

Log in or Sign up

Even Anonymous Coders Leave Fingerprints

ronjor Global Moderator

mirimir Registered Member

deBoetie Registered Member

mirimir Registered Member

Useful Searches