Python基础编程:Use python to perform logistic reg
当前位置:以往案例 > >Python基础编程:Use python to perform logistic reg
2018-06-28

Workshop 9

Goal

Use python to perform logistic regression.

A trade is considered as success when it makes money. Success will be represented by 1 and failure by 0.

This strategy works with News and Sentiment. The column news intensity represents the number of news / minute and the column sentiment represents the news sentiment (1: good news…4: bad news).

Create a dataframe df by loading the data using read_csv

ts=

print (df.tail())

success news intensity price sentiment

395

0

620 60.00

2

396

0

560 45.60

3

397

0

460 39.45

2

398

0

700 54.75

2

399

0

600 58.35

3

Analyze the statistics of df

print df.XXX()

success news intensity

price sentiment

count 400.000000

400.000000 400.000000 400.00000

mean

0.317500

587.700000  50.848500

2.48500

std

0.466087

115.516536

5.708502  0.94446

min

0.000000

220.000000

33.900000

1.00000

25%

0.000000

520.000000

46.950000

2.00000

50%

0.000000

580.000000

50.925000

2.00000

75%

1.000000

660.000000

55.050000

3.00000

max

1.000000

800.000000

60.000000

4.00000

Display the mean of each column separately

print df.XXX()

success

0.3175

news intensity 587.7000

price

50.8485

sentiment

2.4850

Since the news sentiment has only 4 levels, draw the following table using crosstab

print(df.crosstab(….))

sentiment

1   2  3  4

success

0

28 97 93 55

1

33 54 28 12

Draw the histogram for each column

df.XXX()

Sentiment is a categorical variable. We are going to transform this variable into 4 dummy variables using the command get_dummies from pandas.

Example:

b=pd.DataFrame({'test' : pd.Series([1,2,3,1,2,3,1,1,1])})

print(pd.get_dummies(b['test'],prefix='test'))

test_1 test_2 test_3

0

1

0

0

1

0

1

0

2

0

0

1

3

0

1

0

4

1

0

0

5

0

1

0

6

0

0

1

7

0

0

1

8

0

0

1

Following the previous example create dummy variables for Sentiment using the function get_dummies. Store the result into data_dummy

Create a joint to keep success, news intensity, price, sentiment_2, sentiment_3 and sentiment_4

data = df[‘XXXXXXXX’].join(…..)

success

news intensity

price

sentiment_2

sentiment_3

sentiment_4

0

0

380

54.15

0

1

0

1

1

660

55.05

0

1

0

2

1

800

60.00

0

0

0

3

1

640

47.85

0

0

1

4

0

520

43.95

0

0

1

5

1

760

45.00

1

0

0

6

1

560

44.70

0

0

0

7

0

400

46.20

1

0

0

8

1

540

50.85

0

1

0

9

0

700

58.80

1

0

0

10

0

800

60.00

0

0

1

Add the intercept manually (a column named intersect will only 1)

Perform logistic regression.

Step 1: Remove the column name ‘success’

colnames=

Step 2: Create the logistic model

Logit_model= sm.Logit(data['success'], data[colnames])

Step 3: Fi the model

result = logit.fit()

Interpret the following result:

print (result.summary())

Logit Regression Results

==============================================================================

Dep. Variable:

success

No. Observations:

400

Model:

Logit

Df Residuals:

394

Method:

MLE

Df Model:

5

Date:

Sun, 20 Nov 2016

Pseudo R-squ.:

0.08292

Time:

02:54:02

Log-Likelihood:

-229.26

converged:

True

LL-Null:

-249.99

LLR p-value:

7.578e-08

==================================================================================

coef std err z P>|z| [95.0% Conf. Int.]

———————————————————————————-

news intensity

0.0023

0.001

2.070

0.038

0.000

0.004

price

0.0536

0.022

2.423

0.015

0.010

0.097

sentiment_2

-0.6754

0.316

-2.134

0.033

-1.296

-0.055

sentiment_3

-1.3402

0.345

-3.881

0.000

-2.017

-0.663

sentiment_4

-1.5515

0.418

-3.713

0.000

-2.370

-0.733

intersect


在线提交订单