This article demonstrates that market-capitalization (market-cap) breakpoints are usually based solely on market conventions and not empirical evidence. This is not to say the conventions are incorrect. Rather, we show there is very limited empirical evidence for the use of common factors (e.g., bid/ask spread and number of shares traded) to substantiate where breakpoints are set.
Methods Of Determining Market-Cap Breakpoints
As calculated by index providers, market-cap breakpoints delineate a prescribed percentage of the total market capitalization of the specified universe of stocks. A common set of breakpoints is 70, 20 and 10. The 70 refers to all the stocks that make up the first 70 percent of the total market capitalization of the universe of stocks under examination. This first breakpoint typically determines the percentage of stocks that are largecapitalization (large-cap) stocks. The 70 percent cutoff also determines the so-called large-cap floor.1 The next 20 percent of market capitalization covers midcap stocks, with the last value in that range being the midcap floor. The final 10 percent of market capitalization covers small-caps (and in some cases, micro-cap stocks as well2); one can refer to the market capitalization of the first stock after the midcap floor as the small-cap ceiling. Not all index providers use the 70, 20, 10 breaks, but most of their breakpoints are within 5 percent either side of these values (e.g., 65, 20, 15).
Mutual fund analytical firms such as Lipper and Morningstar also compute market-cap breakpoints. These breakpoints are used to classify mutual funds by market capitalization, i.e., large-, mid- or small-cap. These firms' breakpoints are more often than not similar to or the same as those calculated by index providers.
Finally, asset management firms also calculate marketcap breakpoints. These are used internally to set the market-capitalization guidelines for portfolio managers. The breakpoints are set so that both management and the portfolio manager can confirm that a particular manager's market-cap-weighted portfolio has stayed within its stated capitalization range. For example, the prospectus of a mutual fund may state that it is a large-cap fund. Using the internal capitalization guidelines, the fund manager knows the range to stay within so the market capitalization does not break the large-cap floor.
What has not been shown by any of these calculators of market-cap breakpoints are the rules for determining them, i.e., why is 70, 20, 10 used? The author's discussions with asset management firms points to a combination of subjective and implied empirical data to justify the breakpoints, so it seems market conventions and internal politics are the primary drivers of the marketcap decision process.
The author in his 10-plus years of experience has rarely seen the "70, 20, 10" rule (slightly varying versions of it) violated in the U.S. This is also true for the G-7 countries, as best the author can tell.
However, as has been shown in study after study over the last 13 years, for developed markets there is no significant dependence between: market cap and trading volume; market cap and transaction volume per trade; market cap and transaction value per hour or tick; market cap and total number of orders; market cap and bid/ask spread; or market cap and volatility. The implication of these results is that most if not all of the commonly cited empirical reasons market-cap breakpoints exist as they do is not justified for the developedmarket countries. This article confirms these results for developed-market countries and shows that for developing-market countries, market-cap breakpoints have no empirical basis either (except in certain cases where a micro-cap breakpoint appears or a micro-cap/ small-cap breakpoint appears).
Universality And Scaling
The work of Plerou et al. ,  established the existence of universality and scaling in financial data. Their work has been extended by several authors, including Zumbach , Li et al.,  and Kertesz and Eisler . It is the existence of universality and, particularly, scaling that we show refutes market-cap breakpoints being—in general—empirically based.
To begin our discussion of scaling, we note that many empirical quantities cluster around a typical value: the speeds of cars on a highway, the weights of apples in a store, air pressure, sea level, the temperature in New York at noon on July 4. All of these things vary somewhat, but their distributions place a negligible amount of probability far from the typical value, making the typical value representative of most observations. For instance, it is a useful statement to say that an adult male American is about 71 inches tall, because few deviate very far from this size. Even the largest deviations, which are exceptionally rare, are still only about a factor of 2 from the mean in either direction; hence, the distribution can be well characterized by stating just its mean and standard deviation.
Not all distributions fit this pattern, however; while those that do not are often considered problematic or defective, they can be some of the most interesting observations. The fact that they cannot be characterized as simply as other measurements is often a sign of a complex underlying process that merits further study.
Among such distributions, the power law has attracted particular attention over the years for its mathematical properties, which sometimes lead to surprising physical consequences. The power law appears in a diverse range of natural and man-made phenomena. For example, the populations of cities, the intensities of earthquakes and the sizes of power outages are all thought to have power-law distributions. Quantities such as these are not well characterized by their typical or average values. For instance, according to the 2000 U.S. Census, the average population of a city, town or village in the United States is 8,226. This average is not a useful one for most purposes because a significant fraction of the total population lives in cities (New York, Los Angeles, etc.) whose population is larger by several orders of magnitude.
The main property of scaling (or power) laws is their scale invariance. Given a relation f(x) = axk, scaling the argument x by a constant factor c causes only a proportionate scaling of the function itself. That is,
Scaling by a constant c multiplies the original powerlaw relation by the constant ck. All phenomena that scale have a particular scaling exponent and are equivalent to constant factors, since each is simply a scaled version of the others. This behavior produces a linear relationship when logarithms are taken of both f(x) and x, and the straight line on the log-log plot is often called the "signature" of a power law. With real data, such straightness is a necessary but not sufficient condition for the data to follow a scaling relationship (see Stumpf and Porter  for the necessary and sufficient conditions). In this article, we use Stumpf and Porter's guidelines as well as others' to confirm that our results do (or do not) exhibit scaling.
Figures 7-9 show the results of all the countries examined (U.S., Great Britain, Japan, Switzerland, Malaysia, Korea, Hong Kong, India, Brazil, Israel, Italy and South Africa). However, plots for only the U.S. and Malaysia are shown in Figures 1-6. The U.S. can serve as a reasonable proxy for most of the countries on the list, while Malaysia can stand in for the few countries where there are either two or more single scaling exponents or no scaling exponent at all.
The possible existence of scaling does not conflict with the existence of market-cap breakpoints. If the breakpoints do exist, there should be a different scaling exponent for each capitalization group. We say this because within each capitalization group one would expect the relationships between the "activity" (such as bid/ask spread) and the stocks to be approximately the same. If they are approximately the same, this generates a scaling exponent that covers most if not all the stock-"activity" relationship vis-à-vis the market-capitalization range. Conversely, the between-group relationship of "activity" and capitalization should differ, e.g., the scaling exponent for the "activity" and midcap stocks should be different than that of small-caps and the same "activity." If this is not the case, then the "activity" is not something that can distinguish small-caps from midcaps.
To make this a little clearer, it is generally accepted that daily stock returns do not follow a normal distribution. It has been established by Bouchaud and Potters  as well as others that stock return series typically follow two if not three distributions, each with its own scaling exponent. To take an example, "extreme" negative returns typically follow a power law whose scaling exponent is approximately 1.5. Extreme positive returns have a slightly larger scaling exponent (closer to 2), and the remainder of the return series follow a power law with a scaling exponent of approximately 2. So, here we have an example of three different scaling exponents in the same time series. We look for tripartite scaling exponents in our market-capitalization breakpoint tests, since they indicate the breakpoints are validated by the "activity" data.
Our tests take the form of plotting the logarithm of the market cap against the logarithm of an "activity," such as the bid/ask spread, that is thought to be related to market cap. A single scaling value (a single line through the data) indicates the "activity" is not confirming the market-cap breakpoints. More than one scaling value could confirm the breakpoints, and we will look at those situations where this occurs. We follow the guidelines established by Stumpf and Porter by using four or five orders of magnitude in our calculations (this can correspond to market capitalizations going from tens of millions or hundreds of millions of U.S. dollars to hundreds of billions of U.S. dollars, based upon the market). We also follow Zumbach in the calculation of the linear fits we make of the data so we can assess the goodness of fit.
We show in the next section that for most of the countries we examined, there is in most cases a single scaling exponent across market capitalizations, not multiple ones. This lack of multiple exponents argues against empirical evidence for market-cap breakpoints, and this appears to be true regardless of the measures ("activities") used, whether in this or in other articles. (As an aside, it should be noted that single exponents—but not multiple exponents— do not exist at different time scales either, as can be seen in the work of Zumbach and Li et al.)
The "activities" we examine are the bid/ask spread, volume (number of shares traded) and the price per share times the volume traded (often referred to as "dollar volume traded" but, since we are working with several different countries, this activity could be "pounds sterling volume traded" or "ringgit volume traded"). As Kertesz and Eisler and others do, we look at the logarithm of the "activity" versus the logarithm of market capitalization. We plot individual data points, since our averaging of the stock data covers a single month versus years. We look at the period from January 2000-February 2012 on a monthby- month basis, and we try to fit a straight line through the data, following the method of Zumbach.3
Scaling Exponents For Market-Caps And Various 'Activities' In Different Countries
In this article's tests, daily data is used; the stocks are those in the Thomson Reuters Country Stock Indexes. Most if not all of these indexes include some micro-cap stocks. The inclusion or exclusion of micro-cap stocks does not add or subtract from our breakpoint examination, since we are looking for small-, mid- and large-cap breakpoints.
In Figures 1 and 2, it is clear that volume and dollar volume traded have a single scaling exponent when plotted against market cap. The lack of multiple exponents means anyone pointing to either of these variables as an empirical justification for the existence of breakpoints in the U.S. would be mistaken. In Figure 3, there is a second scaling exponent, appearing at approximately $103 million, i.e., $1 billion. Given its size, this is a breakpoint that signifies either the existence of a small-cap floor or a micro-cap ceiling. Either way, this cannot be done in terms of separating small-, mid- and large-caps, as there is a single scaling exponent.
In Figure 4, there is evidence of two scaling exponents. One exponent's market-cap starts between 1 million ringgits and 1.5 million ringgits and ends between 2.5 million ringgits and 3 million ringgits. The second exponent's market cap starts at approximately 3 million ringgits and covers the rest of the market-cap range. There is no evidence of three scaling exponents but some evidence of two. And, as can been seen graphically, the spread of points about the fits is wide, suggesting the data may not follow a power law. We note this poor-to-middling fit in Figure 7 and discuss its implications later.
The same issues that appeared in the log-volume plot appear here for the log-spread plot. There is the appearance of two scaling exponents but also clear evidence of a poor-to-middling fit. Please note that the end of one scaling exponent (and the start of the other) is between 2.5 million ringgits and 3 million ringgits, just as occurred in the log-volume plot.
The plot of price times volume has the best linear fit and has an absence of multiple scaling exponents. Here there is clear evidence of price times volume not supporting market-cap breakpoints.
Figures 7-9 detail the results for each "activity" for each country. Since the results were very consistent across the periods (January 2000-February 2012), we show one value per country per activity. Also shown is the goodness of fit.
In the developed-market countries, both volume measures, especially price times volume, have single exponents and excellent fits. The two exceptions are Switzerland and Italy, where volume traded and bid/ ask spread have a moderate goodness of fit, while price times volume has an excellent fit. It needs to be kept in mind that both Switzerland and Italy have a substantially smaller number of stocks in their universe, and this contributes to the lack of fit. However, it may also be the case that for both countries, a power law is not being followed.
For developing-market countries, there is a consistent presence of one scaling exponent, and in general, excellent fits. As mentioned earlier, Malaysia is the country with the consistently poor fit. Although Malaysia is not plagued by a lack of stocks, it clearly does not follow a scaling law. So, in Malaysia's case, we cannot say empirical evidence does not support market-cap breakpoints.
We have shown that in most of the countries examined, there is good-to-excellent evidence that empirical data such as volume, bid/ask spread and price times volume do not support commonly used market-cap breakpoints. These results are in line with other work that looked at other "activities" on both a daily and intraday basis. Again, our conclusions do not invalidate current market conventions. What our research shows is that in most cases, empirical measures of stock activity do not support the common market-cap breakpoints.
1. Plerou, V. et al. "Universal and Nonuniversal Properties of Cross Correlations in Financial Time Series," Physical Review Letters. 1999, vol. 83, 7.
2. Plerou, V. et al. "Random Matrix Approach to Cross Correlation in Financial Data," Physical Review E. 2002, vol. 65.
3. Zumbach, G. "How the Trading Activity Scales with the Company Size in the FTSE 100," Quantitative Finance. 2004, vol. 4, 4.
4. Li, W. et al. "Financial Factor Influence on Scaling and Memory of Trading Volume in Stock Markets," Physical Review E. 2011, vol. 84.
5. Kertesz, J. and Eisler, Z. "Limits of Scaling and Universality in Stock Market Data," (Online) Dec. 21, 2005. (cited: Jan. 12, 2012) http://arxiv.org/abs/physics/0512193v1.
6. Stumpf, M.P.H. and Porter, M.A. "Critical Truths about Power Laws," Science. 2012, 335.
7. Bouchaud, J-P and Potters, M. "Theory of Financial Risk and Derivative Pricing," Cambridge University Press, 2000.
1 Mega-cap stocks are considered to be part of the group comprising the 70 percent. While this article does not specifically work with mega-cap stocks, the conclusions it
draws about market-cap breakpoints apply to mega-cap stocks as well.
2 Most market-capitalization schemes that are tripartite in nature, such as the one above, do not tend to include micro-cap stocks. Market participants who are interested
in micro-cap stocks typically make an estimate as to where small-caps end and micro-caps begin. This article shows that such estimates may suffer from the same problem
that makes other market-cap breakpoints hard to justify empirically.
3 The usual least squares (LSQ) estimate assumes a variance in the y variable but none in the x variable, which is assumed to be known exactly. However, market capitalization
is a time series and therefore has its own variance. To account for this, we have to use a more sophisticated LSQ estimate with an error in both variables. Zumbach
used an LSQ estimator that is more complex to compute, since it involves a minimization problem (to find the best parameters) to find the roots of a one-dimensional
function (to compute the error on the parameters). The values of the minimum (slope and intercept) and the errors on the parameters (standard deviations of the slope
and intercept) are fairly insensitive to the choice for the standard deviation (s1 vs. s2), but the goodness of fit depends directly on the choice for the standard deviation.
Since s1 is lower than s2, this produces systematically worse goodness of fit. For this reason, we use the more conservative standard deviation s1.