Numbers leave a trail that the statistician can sniff and follow, says S.Ananthanarayanan.
Natural processes have characteristics that get disturbed when there is motivated action. Numerical markers of normality can then signal anything unusual, in a way that those responsible may find it difficult to conceal or where normal detection may take more time or effort.
Professor Shankar Venkatagiri, mathematician and member of the Decision Sciences and Information Systems area at Indian Institute of Management, Bengaluru, at the annual meeting of the Indian Railway Accounts Service, at the Rail Wheel Factory, Bengaluru, described features of numbers and the way that fraud detection agencies, as well as the world of business, make use of patterns in numbers to detect threats and opportunities.
A little known property of numbers that arise in natural processes is that the first digit of these numbers is not uniformly distributed, but tends to be low, like 1, 2, or 3, rather than high, like 8 or 9. For example, the height of mountains, in feet, or of buildings, in millimeters, would be numbers, typically, from a few hundreds to many thousands. Now, the first digits of actual numbers, which may be 12,335 or 8,322 or 6,345, for instance, are: 1, 8 and 6, in these examples. Would there be a tendency for this first digit to lie preferentially some in some range, rather than be uniformly distributed from 1 to 9?
While one would normally expect that all the digits from 1 to 9 are equally likely to be the first digit in long lists of numbers, which cover many orders of magnitude (rather than stay in a limited range), Professor Venkatagiri explained that there was a counter-intuitive law which said this was not so. The Benfords Law, he said, was that the number, 1 was the first digit as often as 30% of the time and the number 9 appeared at the first place only 4.6% of the time. The percentage of times that all the digits arise and a graph of how the percentages fall, from 30.1% to 4.6%, is shown in the picture
This rule about how the first digit is more often a lower number has been verified in a great many instances, like the area of lakes in a district, population sizes, birth or death rates, electricity bills, commodity prices. It will be noticed that these are numbers that arise naturally, or without a design that affects the first digit. This would not be the case, say, in the height, in inches, of the average 12 yr old, which would be between 50 and 60 inches, with 5 as the most common first digit. The area of a lake, in square meters, or populations, for instance, could be anything from a few hundred to thousands or even hundreds of thousands.
While this feature of the first digit of numbers being low numbers rather than high ones would seem surprising at first, it can be understood with a little analysis. The number, 1, we can see, occurs as the first digit, first, by itself, then from the numbers, 10 to 19 and then from 100 to 199, and so on. The number, 2, similarly, occurs as the first digit first by itself, then from 20 to 29, and then from 200 to 299 and so on, the number 3, first by itself, then from 30 to 39, from 300 to 399 and so on, and so on. What we notice is that the number 1 gets repeated first within nine numbers of its first appearance, and then after just the next 80 numbers. But the number, 2 has to wait for 18 numbers before the first repetition and then for 170 numbers before the second repetition. The wait before repetition keeps extending, like this, for the numbers, 3 to 9, the wait being from 100 to 899, or 799 numbers before the second repetition of the number, 9 as first digit which is ten times the wait of only 80 numbers, for the number, 1. When we reach higher numbers, the distance between successive appearances of the higher digits extends exponentially, or the greater the number, the more marked the higher separation of occurrence.
Application in business
This is the reason that in a collection of numbers that cover a wide range, the distribution of first digits follows the Benfords law. A direct application is to capture the numbers generated in a system and to keep checking if the first digits follow Benfords law. One kind of fraud in banks, for instance, is with the daily interest calculated on balances. The fraudster manipulates the system to add some small figure to the interest worked out on a thousand accounts and transfer the total amount to a separate account that the fraudster can access. If the bank had a Benfords law check system in place, it would regularly inspect the first digits, and also some other features of the numbers in the banks records. If all is well, the numbers follow Benfords law. But if there is a systematic change being made, this would reflect in how the first digits appear and alert the banks auditors.
A similar application could be in the data collected through surveys. Figures that arise from honest surveys show features that do not appear in fictitious data or even in data where there have been errors in sampling. Applying statistical checks on the numbers could then show that corrections need to be applied. This kind of check could be vitally important in statistical quality checks or checks that ensure safety.
Prof Venkatagiri went on to describe other uses of capturing and analysing numbers, like in maintaining law and order, public health, scheduling material movement or public transport, etc. An area of great use was in advertising and marketing. The clicks on pages of search engines like Google, or in the course of purchases on the net were captured and made use of to send specifically selected advertisement messages to individual users, based on their browsing behavior. Prof Venkatagiri also described how Google may be able to detect an epidemic before the health administration of a state came to know of it. Particularly in countries where medical help or dispensing was expensive, the occurrence of symptoms was revealed first in the way Internet users carried out searches rather than in the records of their visits to doctors or hospitals. Google could hence use its data to alert governments of apparent rise in the incidence of body pain and fever, for instance, to set in motion a process of investigation and containment.
The Indian Railways Accounts Service was created in 1929, soon after the decision in 1924 that the accounts of the Railways need to be separated from the general budget. The anniversary is celebrated every year by a get together of serving and retired officers, all over the country.
The Indian Railways were among the first, in India, to automate a part of their work and used digital computers in the accounting of goods and passenger earnings as early as 1970. Financial accounting was modernised and went digital by 1984, which was before the more publically known initiatives in passenger reservation. A large part of the goods and freight booking is now conducted in sidings with all activities, including accounting, being computerised.
Prof Venkatagiri said that huge data generated could be mined with tools now available, to reveal trends and activate action of purchase, replacement or repair, or passenger amenity, which would lead to economy and value and increase safety.
Benford's lawThe logarithm of a number is the power to which the number 10 needs to be raised to result in that number. The use of the logarithm is that if we know the logs of numbers, we can multiply the numbers simply by adding the logarithms. The logarithms of all numbers, starting from zero to one, to many decimal places, have hence been worked out and tabulated. These table were exceedingly useful for work in science and technology during all the centuries and before we have had calculators and computers.
In the year 1881, the astronomer Simon Newcomb noticed that it was the early pages of a book of logarithms that were most worn with use, rather than the later pages. This suggested that the numbers that arose in the course of scientific work started with the lower digits like 1, 2, rather than the larger digits like 8,9. The physicist, Frank Benford again noticed this phenomenon in 1938 and he tested numbers that arose in different domains, like surface areas of 335 rivers, the sizes of 3259 US populations, 104 physical constants, 1800 molecular weights, 5000 entries from a mathematical handbook, 308 numbers contained in an issue of Reader's Digest, the street addresses of the first 342 persons listed in American Men of Science and 418 death rates. (credit Wikipedia).
Benford then established the rule, which has been named after him
This is paragraph 1, yes it is...
The image will appear along the...isn't it?
This is the third paragraph that appears...