Scikit Learn 股票投资:p25

前言

本节视频教学的内容并不多,主要是回测p24中获得的新数据。其数据是基于跑赢大盘5%

视频

视频出处

视频系列:Scikit-learn Machine Learning with Python and SKlearn

本视频出处:Scikit Learn Machine Learning for investing Tutorial with Python p. 25

哔哩哔哩:Scikit Learn Machine Learning for investing Tutorial with Python p. 25

内容

本教程将用到p23的代码。我们唯一需要更改的代码为:

data_df = pd.DataFrame.from_csv("key_stats_acc_perf_WITH_NA.csv")
更改为:
data_df = pd.DataFrame.from_csv("key_stats_acc_perf_WITH_NA_enhanced.csv")

输出

36
['aapl', 'amd', 'brk-b', 'c', 'chk', 'cmcsa', 'csco', 'cvx', 'dis', 'disca', 'fb', 'fti', 'ftr', 'ge', 'goog', 'intc', 'jci', 'jcp', 'jpm', 'ko', 'luk', 'mrk', 'myl', 'nok', 'pfe', 'pm', 's', 'shld', 'siri', 'sne', 't', 'ua', 'v', 'vale', 'wfc', 'xom']

大家可以看到,将跑赢大盘设为>5%后,得出的股票从400+变为36只。

源代码

# back testing
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, preprocessing
import pandas as pd
from matplotlib import style
import statistics

style.use("ggplot")

FEATURES =  [
  'DE Ratio',
  'Trailing P/E',
  'Price/Sales',
  'Price/Book',
  'Profit Margin',
  'Operating Margin',
  'Return on Assets',
  'Return on Equity',
  'Revenue Per Share',
  'Market Cap',
  'Enterprise Value',
  'Forward P/E',
  'PEG Ratio',
  'Enterprise Value/Revenue',
  'Enterprise Value/EBITDA',
  'Revenue',
  'Gross Profit',
  'EBITDA',
  'Net Income Avl to Common ',
  'Diluted EPS',
  'Earnings Growth',
  'Revenue Growth',
  'Total Cash',
  'Total Cash Per Share',
  'Total Debt',
  'Current Ratio',
  'Book Value Per Share',
  'Cash Flow',
  'Beta',
  'Held by Insiders',
  'Held by Institutions',
  'Shares Short (as of',
  'Short Ratio',
  'Short % of Float',
  'Shares Short (prior '
]

def Build_Data_Set():
  data_df = pd.DataFrame.from_csv("key_stats_acc_perf_WITH_NA_enhanced.csv")
  # data_df = pd.DataFrame.from_csv("key_stats_acc_perf_NO_NA.csv")

  # shuffle data:
  data_df = data_df.reindex(np.random.permutation(data_df.index))

  data_df = data_df.replace("NaN",0).replace("N/A",0)
  # data_df = data_df.replace("NaN",-999).replace("N/A",-999)

  X = np.array(data_df[FEATURES].values)#.tolist())

  y = ( data_df["Status"]
        .replace("underperform",0)
        .replace("outperform",1)
        .values.tolist()
  )

  X = preprocessing.scale(X)

  Z = np.array( data_df[ ["stock_p_change", "sp500_p_change"] ] )

  return X,y,Z

def Analysis():
  test_size = 1
  invest_amount = 10000 # dollars
  total_invests = 0
  if_market = 0
  if_strat = 0

  X, y, Z = Build_Data_Set()
  print(len(X))

  clf = svm.SVC(kernel="linear", C=1.0)
  clf.fit(X[:-test_size],y[:-test_size]) # train data

  correct_count = 0
  for x in range(1, test_size+1):
    invest_return = 0
    market_return = 0
    if clf.predict(X[-x])[0] == y[-x]: # test data
      correct_count += 1

    if clf.predict(X[-x])[0] == 1:
      invest_return = invest_amount + (invest_amount * (Z[-x][0] / 100.0))
      market_return = invest_amount + (invest_amount * (Z[-x][1] / 100.0))
      total_invests += 1
      if_market += market_return
      if_strat += invest_return


  #数据读取
  # data_df = pd.DataFrame.from_csv("forward_sample_NO_NA.csv")
  data_df = pd.DataFrame.from_csv("forward_sample_WITH_NA.csv")
  data_df = data_df.replace("NaN",0).replace("N/A",0)
  #转换为array形式
  X = np.array(data_df[FEATURES].values)
  X = preprocessing.scale(X)
  Z = data_df["Ticker"].values.tolist()
  invest_list = []
  for i in range(len(X)):
    p = clf.predict(X[i])[0]
    if p == 1:
      print(Z[i])
      invest_list.append(Z[i])
  print(len(invest_list))
  print(invest_list)

Analysis()

最后

虽然分c君_BingWong只是作为一名搬运工,连码农都称不上。 但制作代码中的注释、翻译和搬运都花了很多时间,请各位大侠高抬贵手,在转载时请注明出处。

阅读量: | 柯西君_BingWong | 2017-09-05