因特定公式,本编辑器不能很好兼容,故显示不完全,见谅!
AT H 11176 Project 2
Ioannis Papastathopoulos
November 21, 2017
• Subm ission: The submission details for the number of attempts is set to Single At t em pt . All answers related to calculation of posterior distributions must be written in a pdf or wor d
document called mat r i cul at i onnumber A2mat h. Your code must be written separately in
script scr i pt A2. R which is available on Lear n. When submitting, rename the script from scr i pt A2. R to mat r i cul at i onnumber A2. R where mat r i cul at i onnumber refers to your ma- triculation number. Failure to rename the script file will incur a 5% penalty.
You have t o subm it a single script file and a single p df/ word docum ent . Failure to comply with this will incur a 5% penalty. Guidance on what your answer must be is given in each question. Your answer for each question must be included in the corresponding section of scr i pt A2. R file or the corresponding section of mat r i cul at i onnumber A2mat h. For example, your answer/ code for question 1.1 must be included in the section below
## ; ;
##
## Q1: – – add your code bel ow
##
## ; ;
## 1. 1
code goes her e
##
• G uidance – Assessm ent crit eria.
– □ A marking scheme is given. Additionally to the marking scheme, your code will be
assessed according to the following criteria:
□ Style: follow ht t ps: / / googl e. gi t hub. i o/ st yl egui de/ Rgui de. xml with care;
□ Writ ing of funct ions: avoid common pitfalls of local vs global assignments; wrap your code in a coherent set of instructions and try to make it as generic as possible; Also, functions that are meant to be optimized with opt i mmust be written accordingly, see ?opt i m.
□ Execut ability: your code must be executable and should not require additional code in order to run. A common pitfall is failure to load Rpackages required by your code.
• D eadline: Sunday 3rd December 23:59.
• Individual feedback will be given.
1 Quest ion 1
Consider a set of data relating two score tests, LSAT and GPA, at a sample of 15 American law
schools. Of interest is the correlation = cor (l sat ; gpa) between these measurements and the
variance ratio = var(l sat )=var(gpa).
l i st ( l sat = c( 576, 635, 558, 578, 666, 580, 555,
661, 651, 605, 653, 575, 545, 572, 594) ,
gpa = c( 3. 39, 3. 30, 2. 81, 3. 03, 3. 55, 3. 07, 3. 00,
3. 43, 3. 36, 3. 13, 3. 12, 2. 74, 2. 76, 2. 88, 2. 96) , n = 15)
1. Write a function in R called CI . cor that returns 95% bootstrap confidence intervals for the
correlation parameter using the basic bootstrap interval method and the percentile interval
method.
Your answer should contain the R function CI . var . cor .
(12 m arks)
2. Write a function in R called CI . var . r at i o. that returns 95% bootstrap confidence intervals
for the variance ratio using the basic bootstrap interval method and the percentile interval
method.
Your answer should contain the R function CI . var . r at i o.
(13 m arks
2 Quest ion 2
Let X1; X2; : : : be a sequence of independent and identically distributed random variables. An asymptotically justified model for the distribution function of the excesses Xi j Xi > u ab ove a large t hreshold u is given by the generalized P aret o distribution function
FX i jX i > u (x) = 1
{
1 +
( x
u ) }
1=
x > u;
where
2 R, > 0 and x+ = max(x; 0).
In all subsequent exercises, you are allowed to use the following function called f i t . gpd which
fits the generalized Pareto distribution to exceedances of data x above the threshold u = t hr esh.
f i t . gpd <- f unct i on( x, t hr esh, t ol . xi . l i mi t =5e- 2, . . . )
{
l l i k. gpd <- f unct i on( par , x, t hr esh, t ol . xi . l i mi t =5e- 2)
{
y <- x[ x>t hr esh]
si gma <- exp( par [ 1] )
xi <- par [ 2]
n <- l engt h( y)
i f ( abs( xi ) <=t ol . xi . l i mi t )
{
l l i k <- n*l og( si gma) + sum( ( y– t hr esh) / si gma)
r et ur n( l l i k)
}
par . l og. max <- l og( pmax( 1+xi *( y– t hr esh) / si gma, 0 ) )
l l i k <- – n*l og( si gma ) – ( 1+( 1/ xi ) ) *sum( par . l og. max )
l l i k <- – i f el se( l l i k > – I nf , l l i k, – 1e40 )
r et ur n( l l i k)
}
f i t <- opt i m( par = c( 0, 0) , f n = l l i k. gpd,
x=x, t hr esh=t hr esh,
cont r ol =l i st ( maxi t =10000, . . . ) )
si gmahat <- exp( f i t $par [ 1] )
xi hat <- f i t $par [ 2]
r et ur n( c( si gmahat , xi hat ) )
}
##
> f i t . gpd( par =c( 0, 0) ,
x=bur l i ngt on$Pr eci pi t at i on,
t hr esh=quant i l e( bur l i ngt on$Pr eci pi t at i on, 0. 8) )
[ 1] 1. 4869389 0. 2784732
##
2.1 P aram et ric b oot st rap
If X1; : : : ; Xn is a sequence of independent and identically distributed random variables and pu = Pr (Xi > u), it then follows that the number of exceedances above u, say Nu = # f Xi > ug, has
probability mass function
Pr (Nu = k) =
( n)
k
pku (1 pu )n k ; k = 0; : : : ; n;
that is, Nu Binomial (n; pu ). This suggests the following
PARAMETRIC BOOTSTRAP ALGORITHM
1. Fit by maximum likelihood the generalized Pareto model to exceedances above a threshold u. Set counter I = 0, bootstrap sample size R;
2. Increment I to I + 1. Simulate Nu
I
from Binomial(n; p^u );
3. Simulate Nu
I
exceedances from the fitted generalized Pareto model;
4. Fit the generalized Pareto model to the simulated exceedances to get
I
= (^ I ; ^ I );
5. If I < R go back to step 2. Otherwise return
( 1; : : : ; R ):
Write a function called par boot . gpd that takes as inputs a vector of numeric values for the raw data, a threshold argument t hr esh and bootstrap size R. The function should return a named list with elements
• $ml e: a vector containing the maximum likelihood estimates of , and pu ;
• $bi as: a vector containing the estimated bias of ^, ^ and p^u ;
• $se: a vector containing the standard error of ^, ^ and p^u ;
• $di st n: a matrix of dimension R by 3 with the bootstrap distribution of ^, ^ and p^u .
Your answer should contain the R function par boot . gpd.
(10 m arks)
2.2 N on-param et ric b oot st rap
NON-PARAMETRIC BOOTSTRAP ALGORITHM
1. Fit by maximum likelihood the generalized Pareto model to exceedances above a threshold u. Set counter I = 0, bootstrap sample size R;
2. Increment I to I + 1. Simulate n variates from the empirical distribution function
^(x) =
1
n
∑n
i= 1
1(Xi x):
3. Fit the generalized Pareto model to the exceedances above u to get
I
= (^ I ; ^ I );
4. If I < R go back to step 2. Otherwise return
( 1; : : : ; R ):
Write a function called npboot . gpd that takes as inputs a vector of numeric values for the raw data, a threshold argument t hr esh and bootstrap size R. The function should return a named list with elements
1. $ml e: a vector containing the maximum likelihood estimates of , , and pu
2. $bi as: a vector containing the estimated bias of ^, ^ and