Simpler Better Market Betas

Overview Information:
Sample Use:
$ make ; ./mkdistributable < crspsample.csv

Directory Contents

File Name  ↓ File Size  ↓ Date  ↓ 
1.1 KiB2022-Feb-12 21:34
124.5 KiB2022-Feb-12 21:34
106.7 KiB2022-Feb-12 21:34
424 B2022-Feb-12 21:34
27.3 MiB2022-Feb-12 21:31
51.0 KiB2022-Feb-12 21:30
8.2 KiB2022-Feb-12 18:13
200 B2022-Feb-12 02:15
1.4 KiB2022-Feb-12 01:17
60.9 KiB2022-Feb-12 01:11
859 B2020-Aug-18 20:53
1.2 KiB2020-Jul-09 22:44
4.6 KiB2020-Jul-09 22:44
2.9 KiB2020-Jul-09 22:44
602 B2020-Jul-09 19:45
1.5 KiB2020-Jul-09 19:43
4.1 KiB2020-Jul-09 03:07
1.2 KiB2020-Jul-09 03:04
4.9 KiB2020-May-16 19:11



The market-beta estimates are the result of a years-long academic study. These bswa32 market-beta estimates are known to be far better than those from Bloomberg-Merrill-Lynch (Capital IQ or Yahoo-Finance or Google-Finance), Vasicek, Dimson, industry, or any other market-beta estimate when it comes to forecasting future OLS betas (over the next 1 to 12 months, and beyond). Note that regardless of econometric estimator, it is this future not-yet-known to-be-realized OLS beta that most investors care about, because it measures the to-be-realized hedge against market-factor risk. (The lagged OLS beta is not as good a predictor of its own future self as the bswa32 estimator.)

To accomplish its performance, the bswa market-beta estimator does three things:

  1. it uses daily stock returns, not monthly stock returns as inputs;
  2. it ages past returns in a smooth exponental fashion; and
  3. it removes outliers in a novel (slope-winsorized) manner that avoids biases.
Although the inputs are daily, the files only report month-end statistics. If you need intra-month statistics, either run the code yourself, or just take a weighted average of the surrounding month-end measures.

For more detail, please confer


  • Caveat: Do not believe that better betas make the CAPM work. No (ex-ante) market-beta has reliably predicted future average returns in the past (as suggested not only by the CAPM but almost any sensible model). Recall what beta truly is: it is not a measure of expected returns, but a measure of the market-hedge provided by individual stocks.

    The leap to think this risk should influence expected returns makes sense but it is a leap that is not supported by the data. Nevertheless, beta is useful to improve portfolio performance, but in a portfolio optimization context through the second moment, not through the first moment.

Standard Deviations

The occasionally-provided standard-deviation estimates, sd0111 are very good estimates of the 1-month ahead plain standard deviation. If someone can find a simple predictor of the one-month ahead plain standard deviation for the CRSP universe that is economically better, please let me know. (No intra-day data and/or implied vol-based estimators, please, because this data neither covers enough securities nor is sufficiently widely available.)

Other Details

These files provide only estimates of prevailing in-time [a] market-betas and [b] daily rate-of-return standard deviations.

In-time means the estimates are calculated with data only up to this point in time. No future data has been used.

The estimates are forecasts of the 1-12-months ahead plain OLS market-betas (and plain standard deviations). That is, they are noisy estimates of the true but unknown prevailing market-betas and plain standard deviations at the end of the quoted month.

In 2020, there were 4,465,811 monthly market-beta observations by permno-month, ranging from 1926/07 to 2019/12, growing by about 4,500 x 12 every year. The compressed file was about 23MB. (There are fewer observations when stock identification is not by permno.) My intent is to update the data once a year.

Although the database contains market-beta estimates early on (i.e., in months with as-of-yet few daily return observations), it is advisable not to use market-betas when they are based on too few returns. A good filter is to use only months that also have standard-deviation observations in the database. The latter requires at least one year's worth of data in order not to be set missing. This helps with market-beta reliability.

Utilized Source Inputs:
The only data used to create the betasd-by-permno.csv.gz file is CRSP. Compustat information does not materially improve the estimates --- despite some claims by earlier papers to the contrary.
Creating Programs: (The block sampler is in mkblock.R. Because its block-sampled [=moving window] estimates are worse, you need to run it yourself if you insist on them.) (The programs rely on regression code (fregr), and a basic (ideally pre-cleaned) CRSP daily database (with variable names explained in, and another R program calculating standard deviations [not used at the moment].)
The key output is the above gzip-compressed csv file. On linux and macos, use `gunzip` to decompress it. On Windows, you need to use a 3rd-party decompression program, but most have this built-in, because .gz is one of the oldest formats.

The betas.csv.gz file has too many lines to fit into excel, but it will read fine into R.

Illustrated File Format (Content):
tic,	permno,	yyyymmdd,	n,	bswa32
AAC,	14944,	20211231,	1801,	0.654
AAMD,	85390,	20211231,	6111,	1.548
AAME,	15579,	20211231,	25227,	1.187
AAN,	20062,	20211231,	279,	1.127
AAP,	89216,	20211231,	5067,	1.192
Directory check-tesla shows calculations for Tesla 2014-12 as an example. If you want to rewrite code, please check that we agree on Tesla first. Note: Because older days are aged (downweighted), the bswa betas can use all data since inception, not just a few months or years.
Thanks to CRSP for providing the input data for these calculations and WRDS for making it easy to use their data.