Tales numbers tell
(appeared in Nov 2016)

(link to main website)

Numbers leave a trail that the statistician can sniff and follow, says S.Ananthanarayanan.

Natural processes have characteristics that get disturbed when there is motivated action. Numerical markers of ‘’normality’ can then signal anything unusual, in a way that those responsible may find it difficult to conceal or where normal detection may take more time or effort.

Professor Shankar Venkatagiri, mathematician and member of the Decision Sciences and Information Systems area at Indian Institute of Management, Bengaluru, at the annual meeting of the Indian Railway Accounts Service, at the Rail Wheel Factory, Bengaluru, described features of numbers and the way that fraud detection agencies, as well as the world of business, make use of patterns in numbers to detect threats and opportunities.

A little known property of numbers that arise in natural processes is that the first digit of these numbers is not uniformly distributed, but tends to be low, like ‘1’, ‘2’, or ‘3’, rather than high, like ‘8’ or ‘9’. For example, the height of mountains, in feet, or of buildings, in millimeters, would be numbers, typically, from a few hundreds to many thousands. Now, the first digits of actual numbers, which may be 12,335 or 8,322 or 6,345, for instance, are: ‘1’, ‘8’ and ‘6’, in these examples. Would there be a tendency for this first digit to lie preferentially some in some range, rather than be uniformly distributed from ‘1’ to ‘9’?

While one would normally expect that all the digits from 1 to 9 are equally likely to be the first digit in long lists of numbers, which cover many orders of magnitude (rather than stay in a limited range), Professor Venkatagiri explained that there was a counter-intuitive law which said this was not so. The Benford’s Law, he said, was that the number, ‘1’ was the first digit as often as 30% of the time and the number ‘9’ appeared at the first place only 4.6% of the time. The percentage of times that all the digits arise and a graph of how the percentages fall, from 30.1% to 4.6%, is shown in the picture

This rule about how the first digit is more often a lower number has been verified in a great many instances, like the area of lakes in a district, population sizes, birth or death rates, electricity bills, commodity prices. It will be noticed that these are numbers that arise ‘naturally’, or without a design that affects the first digit. This would not be the case, say, in the height, in inches, of the average 12 yr old, which would be between 50 and 60 inches, with ‘5’ as the most common first digit. The area of a lake, in square meters, or populations, for instance, could be anything from a few hundred to thousands or even hundreds of thousands.

While this feature of the first digit of numbers being low numbers rather than high ones would seem surprising at first, it can be understood with a little analysis. The number, ‘1’, we can see, occurs as the first digit, first, by itself, then from the numbers, ’10 to 19’ and then from ‘100 to 199’, and so on. The number, ‘2’, similarly, occurs as the first digit first by itself, then from ’20 to 29’, and then from ‘200 to 299’ and so on, the number ‘3’, first by itself, then from ’30 to 39, from ‘300 to 399’ and so on, and so on. What we notice is that the number ‘1’ gets repeated first within nine numbers of its first appearance, and then after just the next 80 numbers. But the number, ‘2’ has to wait for 18 numbers before the first repetition and then for 170 numbers before the second repetition. The wait before repetition keeps extending, like this, for the numbers, ‘3’ to ‘9’, the wait being from ‘100’ to ‘899’, or 799 numbers before the second repetition of the number, ‘9’ as first digit – which is ten times the wait of only 80 numbers, for the number, ‘1’. When we reach higher numbers, the distance between successive appearances of the higher digits extends exponentially, or the greater the number, the more marked the higher separation of occurrence.

HApplication in business

This is the reason that in a collection of numbers that cover a wide range, the distribution of first digits follows the Benford’s law. A direct application is to capture the numbers generated in a system and to keep checking if the first digits follow Benford’s law. One kind of fraud in banks, for instance, is with the daily interest calculated on balances. The fraudster manipulates the system to add some small figure to the interest worked out on a thousand accounts and transfer the total amount to a separate account that the fraudster can access. If the bank had a ‘Benford’s law check system’ in place, it would regularly inspect the first digits, and also some other features of the numbers in the bank’s records. If all is well, the numbers follow Benford’s law. But if there is a systematic change being made, this would reflect in how the first digits appear and alert the bank’s auditors.

A similar application could be in the data collected through surveys. Figures that arise from honest surveys show features that do not appear in fictitious data or even in data where there have been errors in sampling. Applying statistical checks on the numbers could then show that corrections need to be applied. This kind of check could be vitally important in statistical quality checks or checks that ensure safety.

IRAS Day

The Indian Railways Accounts Service was created in 1929, soon after the decision in 1924 that the accounts of the Railways need to be separated from the general budget. The anniversary is celebrated every year by a get together of serving and retired officers all over the country.

The Indian Railways was the forerunner of computerization in India and used digital computers to account for all goods and passenger earnings as early as 1970. Financial accounting was modernized and went digital by 1984, which was before the more publically known initiatives in passenger reservation. A large part of the goods and freight booking is now conducted in sidings with all activities, including accounting, being computerized.

Prof Venkatagiri said that huge data generated could be mined with tools now available, to reveal trends to and activate action of purchase, replacement or repair, or passenger amenity, which would lead to economy and value and increase safety.

Benford's law

The logarithm of a number is the power to which the number 10 needs to be raised to result in that number. The use of the logarithm is that if we know the logs of numbers, we can multiply the numbers simply by adding the logarithms. The logarithms of all numbers, starting from zero to one, to many decimal places, have hence been worked out and tabulated. These table were exceedingly useful for work in science and technology during all the centuries and before we have had calculators and computers.

In the year 1881, the astronomer Simon Newcomb noticed that it was the early pages of a book of logarithms that were most worn with use, rather than the later pages. This suggested that the numbers that arose in the course of scientific work started with the lower digits like ‘1’, ‘2’, rather than the larger digits like ‘8’,’9’. The physicist, Frank Benford again noticed this phenomenon in 1938 and he tested numbers that arose in different domains, “like surface areas of 335 rivers, the sizes of 3259 US populations, 104 physical constants, 1800 molecular weights, 5000 entries from a mathematical handbook, 308 numbers contained in an issue of Reader's Digest, the street addresses of the first 342 persons listed in American Men of Science and 418 death rates.” (credit Wikipedia).

Benford then established the rule, which has been named after him

------------------------------------------------------------------------------------------

Do respond to : response@simplescience.in