R语言代写|R语言代做|R语言代考
当前位置:以往案例 > >R 案例 | project help | R code案例
2020-01-15

R project案例 Problem 1

This problem involves the OJ data set which is part of the ISLR package.

BIA-656 – project 4


Problem 1

This problem involves the OJ data set which is part of the ISLR package.

Create a training set containing a random sample of 800 observations, and a test set containing the remaining
Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results
What are the training and test errorrates?
Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
Compute the training and test error rates using this new value for
Repeat parts b) through e) using a support vector machine with a radial kernel. Use the default value for
Repeat parts b) through e) using a support vector machine with a polynomial kernel. Set degree=2.
Overall, which approach seems to give the best results on thisdata?


R project案例R project案例
Problem 2
Use a program to fit a single hidden layer neural network (ten hidden units) via back- propagation and weight decay.

Apply it to 100 observations from themodel


忽略公式

1 2



where 忽略公式 is the sigmoid function, 忽略公式 is standard normal, 忽略公式each 忽略公式being independent standard normal, and 忽略公式 = (3, 3),忽略公式2 = (3, −3). Generate a test sample of size 1000, and plot the training and test error curves as a function of the number of training epochs, for different values of the weight decay parameter. Discuss the overfitting behavior in each case.

Vary the number of hidden units in the network, from 1 up to 10, and determine the minimum number needed to perform well for this

Problem 3
The Bureau of Transportation Statistics maintains data on all aspects of air travel, including flight delays at departure and arrival ( http://www.bts.gov ). LaGuardia Airport (LGA) is one of three major airports that serves the New York City metropolitan area. United Airlines (UA) and American Airlines (AA) are two major airlines that schedule services at LGA. The zip file FlightDelays.zip contains information on all departures of these two airlines from LGA during November and December 2017. Each row of the data set is an observation and each column represents a variable.



Perform some exploratory data analysis on flight delays lengths for UA and AA
Bootstrap the mean of flight delay lengths for each airline separately and describe the distribution.
Bootstrap the ratio of means. Provide plots of the bootstrap distribution and describe the
Find the 95% bootstrap percentile interval for the ratio of means. Interpret the
What is the bootstrap estimate of the bias? What fraction of the bootstrap standard error does itrepresent?
For inference, we usually assume that the observations are independent. Is this condition met in thiscase?

在线提交订单