Scikit Learn 股票投资:p19

前言

本视频主要是用scitkit learn测试我们p18中重新获得的数据。

视频

视频出处

视频系列:Scikit-learn Machine Learning with Python and SKlearn

本视频出处:Scikit Learn Machine Learning for investing Tutorial with Python p. 19

哔哩哔哩:Scikit Learn Machine Learning for investing Tutorial with Python p. 19

内容

本教程主要是基于p15的代码,然后修改需要测试的CSV。
我们只需要在P15的代码中添加以下代码用作替代N/A数据。

data_df = data_df.replace('NaN',-9999).replace('N/A',-9999)

最后,我们在读取csv中分别替换之前取得的两个文件,用作测试准确率:

#data_df = pd.DataFrame.from_csv("key_stats_acc_perf_WITH_NA.csv")
#data_df = pd.DataFrame.from_csv("key_stats_acc_perf_NO_NA.csv")

结论

本次教程只是简单的回测数据,其得出的准确率没有太大意义。因为你可能有80%的胜率,每次赚¥1;但有20%的失败率,每次输¥100。

源代码

import numpy as np
from sklearn import svm ,preprocessing
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import style
style.use("ggplot")

FEATURES =  [
  'DE Ratio',
  'Trailing P/E',
  'Price/Sales',
  'Price/Book',
  'Profit Margin',
  'Operating Margin',
  'Return on Assets',
  'Return on Equity',
  'Revenue Per Share',
  'Market Cap',
  'Enterprise Value',
  'Forward P/E',
  'PEG Ratio',
  'Enterprise Value/Revenue',
  'Enterprise Value/EBITDA',
  'Revenue',
  'Gross Profit',
  'EBITDA',
  'Net Income Avl to Common ',
  'Diluted EPS',
  'Earnings Growth',
  'Revenue Growth',
  'Total Cash',
  'Total Cash Per Share',
  'Total Debt',
  'Current Ratio',
  'Book Value Per Share',
  'Cash Flow',
  'Beta',
  'Held by Insiders',
  'Held by Institutions',
  'Shares Short (as of',
  'Short Ratio',
  'Short % of Float',
  'Shares Short (prior '
]


def Build_Data_Set(features = FEATURES):
    #读取key_stats.csv
    data_df = pd.DataFrame.from_csv("key_stats_acc_perf_NO_NA.csv")

    data_df = data_df.reindex(np.random.permutation(data_df.index))
    #替代N/A数据
    data_df = data_df.replace('NaN',-9999).replace('N/A',-9999)

    #将features转换为np.array
    X = np.array(data_df[features].values)

    #将'outperform和outperform'和转换为0 和 1, 因为machine learning只会区分数字
    y = (data_df['Status']
         .replace('underperform', 0)
         .replace('outperform',1)
         .values)

    X = preprocessing.scale(X)
    return X, y


def Analysis():
    #用于回测侧数据大小
    test_size = 1000    
    X, y = Build_Data_Set()
    print(len(X))

    #设为linear回归模型
    clf = svm.SVC(kernel = 'linear', C=1.0)
    #训练我们的模型
    clf.fit(X[:-test_size],y[:-test_size])

    correct_count = 0

    for x in range(1, test_size+1):
        if clf.predict(X[-x])[0] == y[-x]:
            correct_count += 1
    print('accuracy:', (correct_count/test_size)*100)

Analysis()

最后

虽然分c君_BingWong只是作为一名搬运工,连码农都称不上。 但制作代码中的注释、翻译和搬运都花了很多时间,请各位大侠高抬贵手,在转载时请注明出处。

阅读量: | 柯西君_BingWong | 2017-09-05