- 首次公开发行股票申请企业情况的数据爬取,pyecharts和表格生成(一)
- 首次公开发行股票申请企业情况的数据爬取,pyecharts和表格生成(二)
- 首次公开发行股票申请企业情况的数据爬取,pyecharts和表格生成(三)
效果页面
先发一个最终效果图的网址:https://www.cnvar.cn/ipostatus/
下载
步骤
页面数据和excel文件的爬取 -> 读取excel文件并将其合并统计-> 将此表格转为markdown形式(方便放在HEXO上显示)
目录结构
+--main.py
+--processing
| +--data
| | +--graph.html
| | +--index.md
| | +--IPOstatus
| | | +--data
| | | | +--20180727.xls
| | | | +--20180803.xls
| | | | +--20180810.xls
| | | | +--20180817.xls
| | | | +--20180824.xls
| | | +--md
| | | | +--20180727.md
| | | | +--20180803.md
| | | | +--20180810.md
| | | | +--20180817.md
| | | | +--20180824.md
| | | +--stat.csv
| | | +--termination
| | | | +--20180803.xls
| | | | +--20180810.xls
| | | | +--20180817.xls
| | | | +--20180824.xls
| +--datatomd.py
| +--data_crawler.py
| +--generator.py
| +--__init__.py
最后步骤
到目前为止,所有步骤已经完成了。现在就要用到main.py
来调用这些文件中的functions。
# -*- coding: utf-8 -*-
"""
@author: 柯西君_BingWong
#"""
from processing.datatomd import data2md
from processing.data_crawler import parse
from processing.generator import create_charts
def main():
parse(url,DIRTH_DATA,DIRTH_TERMINATION,DIRTH_MD)
create_charts(DIRTH_DATA).render('processing/data/graph.html')
data2md(DIRTH_DATA, DIRTH_TERMINATION, DIRTH_MD)
if __name__ == '__main__':
DIRTH_DATA = 'processing/data/IPOstatus/data/'
DIRTH_TERMINATION = 'processing/data/IPOstatus/termination/'
DIRTH_MD = 'processing/data/IPOstatus/md/'
url = "http://www.csrc.gov.cn/pub/zjhpublic/G00306202/201803/t20180324_335702.htm"
main()