案例:案例CS之python案例Collecting mutual fund informati
当前位置:以往案例 > >案例:案例CS之python案例Collecting mutual fund informati
2018-09-19

Collecting mutual fund information from N-SAR filing

Sean Shin 2018.10.30

Purpose: collect mutual fund information from N-SAR filings from 2002 through 2015 to obtain more detailed information such as cash-like holdings, short sales, and line of credit

Note: only available semi-annually and should be manually collected from SEC website

SEC filings can be obtained either by hand-collecting them or by writing a web scraping code. Writing a code is possible as the reported filings are in a standard format. You can do either way. Running a program will be more efficient if you are capable of doing, but please send me the code to double check.

Let me first describe how to access N-SAR filing manually and specify the items we need to collect. This description is merely to help your understanding in case you are not familiar with NSAR filing. You can of course use your own way if you know or can come up with a more efficient way to do the work.

1. Go SEC website https://www.sec.gov/edgar/searchedgar/mutualsearch.html and search for mutual funds by using name or ticker. This is a new page and older version is here: https://www.sec.gov/edgar/searchedgar/legacy/mutualsearch.htm. (For some reason, some tickers could not be searched using the new version but can be founded in the old version, at least when I tried some) TICKER might be a clear way, but some sample funds might have missing ticker. Please confirm whether ticker-matching is correct by looking at fund or share-class name. Of course, names are not exactly matched so you need to search with some key words from a name.

2. Open the NSAR filings. There are two documents useful: answer file (often the first file named as answer.fil) and Complete submission text file. Both contain answers of all items. The annual report file is more condensed and machine-friendly and complete submission text file is more human- friendly but not condensed. Ex)

image.png

3. Collect the information (items)

We need everything under item 70 and item 74. The definition of items can be founded from here https://www.sec.gov/about/forms/formn-sar.pdf.

For exaple, item 74:

image.png

Here is an example how they document each item. If you open the N-SAR file, the item 74 can be founded like this:

image.png

The first block represents the item number (074) and the first letter of second block represents sub- section of each item (A, B, C, etc.). The third block represents the numbers we want.

4. Tricky part

The N-SAR filing is available at a company-level not an individual fund-level. Which means that one filing may contain information for multiple funds. Therefore, you need to pick a right item for each fund. The final three digits will identify a fund.1

For example, see this filing:

https://www.sec.gov/Archives/edgar/data/52347/000095012908005777/0000950129-08-005777- index.htm

There are five funds within one file. See item 7C below.


1 To be honest, I am not completely sure whether it’s just the third digit from right or the final three digits, or even it can be fourth and third digits from the right. Please find it out and let me know if you have a correct information.


image.png

Here, “riversource balanced fund” is the first one (final three digits of 100) and the diversified equity income fund is the second one (final three digits of 200) and so on. This means that all items for the “riversource balanced fund” will end with 100. For example, 074 A000100.

5. Final product

The final dataset should be a fund-semiannual-level panel data containing the following columns: fundid; fund name; filing date of NSAR form; columns for each item.

A name of item variable should be a combination of the first block and the first three digits of the second block. For example, there should be a variable named 074B00 with a value 3000 when NSAR filing has a record is like 074 B000100 3000.

* Sample funds

I attached a list of our sample funds. This contains 624 unique funds, represented by fundid. The data I attached contains much more observations because I included all available tickers and names of all share classes of sample funds. The NSAR filing, however, is available at a fund-level. Thus, you can reduce the list to 624 funds by selecting one (for example the first one) ticker per a fundid from the list. I am including all tickers and names in case NSAR filing is not searched for a certain name or ticker but can be searched using name or ticker of another share-class. Theoretically, one could find a NSAR filing by using any tickers of a fund.

Other variables include name of a fund from MorningStar (ms_fundname), name of a share-class from CRSP (crsp_fundname), CRSP fund identifier (crsp_fundno), and ticker of a share-class (ticker_crsp).


在线提交订单