In Brazil, about 20 percent of adults are illiterate. Yet prior to the late ’90s, votes were cast via paper ballots with only written instructions, and citizens had to write down their vote. In 1998, the government rolled out electronic voting technology using visual aids that were much simpler to understand. Fujiwara (2015) studies the impact of this new technology on the 1998 elections when the devices were first rolled out. He exploits the fact that, due to a limited supply of devices, the technology was only used in municipalities with at least 40,500 registered voters, according to 1996 voter rolls.
To analyze the data, you will need to install a new package. Enter into STATA
search rdrobust, all, and then click on
st0366 from http://www.stata-journal.com/software/sj14-4
to install. Use describe to figure out which of the variables in the dataset are needed in the problems below.
1. Initially, our main outcome of interest will be the number of valid votes cast, with the theory being that with a more understandable voting technology, fewer flawed ballots will be submitted.
(a) Define the population model.
(b) Assess the plausibility of the RD identification conditions.
(c) Interpret the RD estimand, and discuss its relevance for the question being studied.
2. Let’s begin our analysis with the most basic RD graph, plotting the outcome against the running variable.
(a) Consider the subsample of observations for which the running variable is between 4500 and 100,000. Using twoway, graph this subsample on a scat- terplot along with a quadratic best fit line on each side of the discontinuity using qfit. Also display a vertical line at the discontinuity using the xline option.1
(b) From the result in the previous question, it might not surprise you to learn that most papers don’t display the raw data like that. Instead, they divide the outcome into bins and plot the average outcome within each bin to get a cleaner graph. Let x be the running variable.
1You can use the option lc(blue) with qfit to make the best fit line blue. You can do the same with the scatterplot using the option mc(blue).
i. Input egen bin_x=cut(x), at(500(4000)200000) to create a new binned version of x.
ii. Use the egen command to generate the mean outcome variable by values of bin_x.
iii. Redo the graph in the previous problem, except replace the scatter- plot with one plotting mean outcome against bin_x (keep the qfit commands the same).
3. To estimate the treatment effect, implement RD using an OLS regression with a linear specification on both sides of the discontinuity. Remember to center your running variable at the threshold. Discuss your results. What’s the meaning of the magnitude of the treatment effect estimate?
4. Repeat the previous problem for two more specifications. The first should use a quadratic specification on both sides of the discontinuity. The second should use a cubic specification. Compare your results with the previous problem.
5. Next we’ll try nonparametric RD by restricting our sample to a neighborhood around the threshold.
(a) Rerun the linear specification using the subsample of observations whose running variable is at most 5000 more or less than threshold. Repeat with 10000 in place of 5000. Compare your point estimates, standard errors, and sample sizes to previous regressions.
(b) The command rdbwselect y x, kernel(uni) bwselect(IK) generates an “optimal” bandwidth for outcome y and running variable x. Its output is stored in the variable e(h_IK). Input the first command, and then rerun the linear specification using the subsample of observations whose running variable is at most e(h_IK) more or less than the threshold. Compare your point estimates, standard errors, and sample sizes to previous regressions.
6. For all future problems, we will use the linear specification with the optimal bandwidth. Make sure to rerun the rdbwselect command with each new out- come and running variable.
Let’s look at the effect of the policy on electoral outcomes. Replace the outcome with the share of the vote that went to right-wing parties, measured by the variable right.
(a) Run the regression, and replicate the binned mean scatterplot for this new outcome.
(b) Comment on the results. What’s a story for the sign of the point estimate?
7. Run balance tests using at least four other relevant variables included in your dataset. Comment on the results and what the tests mean.
8. We’ll do a visual placebo test by adding to the binned mean plot in question #2 the same scatterplots and quadratic fits for two new outcomes, the share of valid votes in the 1994 elections (prior to rollout of electronic voting), and the share of valid votes in the 2002 elections (when all municipalities used electronic voting). Comment on the shape of the plots.
9. Lastly, we’ll implement a visual version of McCrary’s density test. Create a new variable tabulating the number of observations within each bin of the bin_x variable.2 Graph a scatterplot of the new variable against bin_x, restricted to values of the running variable between 15,500 and 100,000. Also include a vertical line at the threshold. What does the graph indicate?
2Hint: generate a variable ones equal to 1 for all observations. Then use egen to create a new
variable that sums ones within each bin.