How antifraud works. The story of Russian bots, auction thieves and stolen billions.

Lord777

Professional
Messages
2,583
Reputation
15
Reaction score
1,296
Points
113

Article content​

  • Intro
  • Dangerous connections in Markov networks
  • Wrong friendship of Russian robots
  • Outro
Hunters of online scammers are unlikely to become the heroes of an action movie. No complicated multi-pass missions, no chases and shootouts. But who cares when there are hundreds of billions of dollars at stake? Huge amounts of money protect against criminals by using mathematical models that detect any deviation from the norm.

Intro​

In 2011, PricewaterhouseCoopers conducted a major study of online fraud. The data collected shows that during the twelve months preceding the study, 37% of Russian companies were victims of fraud, and 7% of companies admitted that they lose more than $ 100 million a year in this way. Since then, it is unlikely that anything has changed for the better.

In other countries, things are no more fun. Experts estimate that tens or even hundreds of billions of dollars flow into the pockets of online fraudsters every year. No one knows the exact scale of the disaster, because companies don't like to talk about how much money they lose because of scammers. They can be understood. Unnecessary details will only scare away customers.

Where do such monstrous sums come from? Everything is simple. Large Russian banks, such as Sberbank or Alfa-Bank, process more than a million transactions per day. Visa processed 150 million transactions a day four years ago.
Imagine that they lose ten dollars on one transaction out of a thousand. This means that such a large bank will lose approximately $10,000 per day, and $300,000 per month. At the end of the quarter, the loss will amount to $ 1 million.
It is obviously impossible to check all these transactions manually. You need automation here. Payment systems and banks have been using expert systems for many years, which, following a pre-selected set of rules, identify the most suspicious transactions. The rules are usually kept secret, but it is not difficult to guess the content of some of them. For example, tourists know that a sudden attempt to withdraw a significant amount from the account or make a large purchase in another country often leads to a card lock, and the same result is obtained by purchasing a foreign SIM card. These are the results of such rules being triggered.

The key word here is "sudden". The surest sign of fraud is abnormal behavior. This is what rule sets reveal. However, there are many other ways to look for deviations from the norm, and online fraudsters know them all. Recently, all sorts of statistical methods, machine learning and neural networks have become fashionable. In some cases, algorithms learn to distinguish cheaters by patterns (so-called supervised learning).

This is based on the same principle as email antispam, which works better if you show them what an unwanted email looks like. In other cases, the bet is made on the search for oddities or anomalies. This approach is valuable because it will not be deceived even by a completely new method of fraud. In addition, it is insured against errors that occur as a result of training on inaccurate data.

New methods provide more accurate results than traditional rule sets. A few years ago, the Visa payment system improved its system for detecting fraudulent transactions, which in the past checked about four dozen features of each transaction using a set of rules. Now it analyzes about five hundred features in real time, starting with statistics for a specific user (for example, the average number of transactions that they make during the day) and ending with the ATM number. Soon, Visa reported two billion dollars that were saved thanks to the new system.

Dangerous connections in Markov networks​

A significant proportion of this type of crime occurs at online auctions. It is understandable: it is much easier to deceive a simple user than a large bank or payment system. Customer reviews and various reputation systems don't solve the problem. On the contrary, sometimes they even help the fraudster. Building a reputation in an online auction is much easier than ingratiating yourself with a real person, and the result is the same.

A few years ago, Symantec specialists and researchers from Carnegie Mellon University discovered that criminals who trade on the largest online auction site eBay have developed a strategy that allows them to gain good ratings, deceive buyers and not be afraid of the inevitable ban.

Fraudsters assume from the very beginning that they will often have to change the accounts from which transactions are made. So that potential victims do not have any doubts, before using a fresh account, you must get a good reputation. The secret to success is to put the generation of fraudulent accounts with a good reputation on stream.

For this purpose, there are networks of accomplice accounts. When the need arises, they will quickly create a reputation for anyone. At the same time," accomplices " behave as naturally as possible, regularly interact with honest sellers and never break the law. They can be active for years without attracting the attention of the service administration.

Researchers from Carnegie Mellon University have suggested that analyzing the relationships between users of an online auction will automatically identify fraudulent and accomplice accounts. Indeed, accomplices are much more likely to interact with scammers than the average user. Fraudsters, on the contrary, never encounter other scammers — only with accomplices and honest users.

The researchers presented the auction in the form of a Markov network — an undirected graph whose vertices can be in one of several states. In our case, the vertexes correspond to accounts. They can be scammers, accomplices, or honest users — this is, to use Markov network terms, their state. If the accounts have completed at least one trade, the corresponding vertexes will be linked by an arc.

The state of each vertex in a Markov network depends on its current state and the states of its neighbors. How exactly it depends is determined by the so-called propagation matrix. It contains the most likely following states for all combinations of the current state and the state of the neighboring vertex. The researchers selected plausible probabilities manually.

To determine the most likely status of each vertex, the confidence propagation algorithm was used. First, each vertex counts its state from the propagation matrix. Then the vertices inform each other about the changed state. When they receive new data about their neighbors, they check their status. This starts the next stage of computation, followed by a new message chain. This continues until the system reaches equilibrium.

In this illustration, vertices with an undefined state are marked in gray, fraudsters are marked in red, and their accomplices are marked in yellow

In this illustration, vertices with an undefined state are marked in gray, fraudsters are marked in red, and their accomplices are marked in yellow

To test the effectiveness of this method, the researchers set up a self-made robot on eBay that collected information about users and transactions between them. Based on the resulting data set, they constructed a graph consisting of 66,130 vertices and 795,320 arcs. Ten vertices in this graph belonged to scammers who had already been caught and reported in the news. The algorithm correctly identified each of them and marked possible accomplices. There is another sign that the idea is correct: the reputation of accounts that the algorithm suspected of fraud was several times worse than that of others.

Interestingly, in order for everything to work, the algorithm does not need to know in advance who is the accomplice and who is the fraudster. You don't even need a user's reputation. Only the relationships between them can be analyzed. Everything is determined by the graph topology.

Wrong friendship of Russian robots​

In 1881, the American mathematician Simon Newcomb noticed something very strange: for some reason, the first pages of books with logarithmic tables are always more frayed than the last. And it's not that no one reads them to the end. Logarithmic tables are not an ordinary book that is supposed to be read in order. This is a tool that significantly speeds up multiplication and division of large numbers.

Pre-calculated logarithms of a set of numbers are combined into logarithmic tables. To multiply two numbers, it is enough to find the corresponding logarithms in the table, add them up, and then determine from the same table which result corresponds to the sum. This is much easier and faster than column multiplication, which is taught in school.

At the beginning of the logarithmic table, the logarithms of numbers with one in the highest digit are listed, then the logarithms of numbers starting with two, and so on to nine. If a book is more worn out at the beginning than at the end, then people need multipliers that start with one more often than numbers that start with two, let alone nine.

Simon Newcomb

Simon Newcomb

Newcomb suggested that the lower the value of the highest digit of a number, the more often it occurs. According to the formula that the scientist deduced, the probability of encountering a number with one at the beginning is about 30%. The probability decreases with each digit until it reaches 4.6% — this value corresponds to nine.

Common sense protests against this idea, but you can't argue with the facts. In 1938, the physicist Frank Benford, who independently stumbled upon the same pattern, tested the validity of his conclusions on tens of thousands of measurements. He calculated the probability with which different digits occur in the highest order of dozens of physical constants. The results matched the formula's predictions. River basin areas? The molecular weight of hundreds of chemicals? Population of randomly selected localities? Stock prices on the stock exchange? Benford checked one data set after another, but couldn't find any errors. The distribution of digits in the highest order was subject to the law that today bears his name — Benford's law.

In the early seventies, economist Hal Varian proposed using Benford's law to distinguish falsified data from genuine data. Values taken from the ceiling may look very plausible, but they do not stand up to the test of Benford's law. By the end of the twentieth century, this method was adopted by forensic accounting. They check whether the figures in the financial statements fit into the correct distribution. If Benford's law is not followed, then someone has adjusted the financial indicators.

Benford's law easily finds traces of human interference in the natural order. Do I need to explain how valuable this quality is for finding anomalies in the data? The algorithm constructed in this way is simple and efficient. However, it is not suitable for analyzing data that is obviously unnatural. This is a limitation, but who doesn't have it?

A beautiful example of using Benford's law to detect deception is provided by the recent work of Jennifer Golbeck, a well-known expert in the field of social network analysis. She showed that it can be used to expose bots — fake accounts on Facebook or Twitter.

Jennifer Golbeck

Jennifer Golbeck

Golbeck began by studying data sets on subsets of users on five major social networks: Facebook, Twitter, Google+, Pinterest и LiveJournal. In most cases, user data was extracted using the software interface of the corresponding social network. The exceptions were Google+ and LiveJournal. Information about their users was borrowed from the Stanford Network Analysis Project.

First, the researcher checked the number of links between accounts in each social network. As expected, these values coincided with the indicators predicted by Benford's law. An exception is Pinterest: when creating an account, the service adds five links automatically, and this spoils all statistics.

Golbeck then began analyzing individual accounts. She selected those that have at least a hundred social connections. It turned out that the distribution of the first significant digits of the number of "friends" in the accounts to which these connections lead almost always fits into Benford's law. For example, in the Twitter data set, a significant deviation was observed only in 1% of cases.

And what is this percentage? Golbeck checked 170 Twitter accounts that do not comply with the Benford law, and found that only two of them are not suspicious. The vast majority of the rest turned out to be Russian bots. These accounts are very similar to each other: the user's photo is clearly borrowed from the photo bank, the tweets themselves are meaningless fragments of book quotes, and friends are other bots. They disguise themselves as ordinary people, but Benford's law easily reveals their artificiality.

Outro​

In one short article, it is impossible to list (and even more so explain) all the methods for detecting anomalies that are useful in hunting for online scammers. But such a goal is not worth it — this is not an " Anti-fraud for dummies "(such a book, by the way, exists). If you want to dive deeper into the topic, then the best way is to read academic publications.

Scholar.google.com it will help you find them. And then-himself.

(c) https://xakep.ru/2016/01/19/how-antifraud-works/
 
Top