首次公开发行股票申请企业情况的数据爬取,pyecharts和表格生成(四)

效果页面

先发一个最终效果图的网址:https://www.cnvar.cn/ipostatus/

下载

fundviz/IPOStatus

步骤

页面数据和excel文件的爬取 -> 读取excel文件并将其合并统计-> 将此表格转为markdown形式(方便放在HEXO上显示)

目录结构

+--main.py
+--processing
|      +--data
|      |      +--graph.html
|      |      +--index.md
|      |      +--IPOstatus
|      |      |      +--data
|      |      |      |      +--20180727.xls
|      |      |      |      +--20180803.xls
|      |      |      |      +--20180810.xls
|      |      |      |      +--20180817.xls
|      |      |      |      +--20180824.xls
|      |      |      +--md
|      |      |      |      +--20180727.md
|      |      |      |      +--20180803.md
|      |      |      |      +--20180810.md
|      |      |      |      +--20180817.md
|      |      |      |      +--20180824.md
|      |      |      +--stat.csv
|      |      |      +--termination
|      |      |      |      +--20180803.xls
|      |      |      |      +--20180810.xls
|      |      |      |      +--20180817.xls
|      |      |      |      +--20180824.xls
|      +--datatomd.py
|      +--data_crawler.py
|      +--generator.py
|      +--__init__.py

最后步骤

到目前为止,所有步骤已经完成了。现在就要用到main.py 来调用这些文件中的functions。

# -*- coding: utf-8 -*-
"""
@author: 柯西君_BingWong
#"""

from processing.datatomd import data2md
from processing.data_crawler import parse
from processing.generator import create_charts


def main():
   parse(url,DIRTH_DATA,DIRTH_TERMINATION,DIRTH_MD)
   create_charts(DIRTH_DATA).render('processing/data/graph.html')
   data2md(DIRTH_DATA, DIRTH_TERMINATION, DIRTH_MD)

if __name__ == '__main__':    
    DIRTH_DATA = 'processing/data/IPOstatus/data/'
    DIRTH_TERMINATION = 'processing/data/IPOstatus/termination/'
    DIRTH_MD = 'processing/data/IPOstatus/md/'
    url = "http://www.csrc.gov.cn/pub/zjhpublic/G00306202/201803/t20180324_335702.htm"

    main()

阅读量: | 柯西君_BingWong | 2018-08-26