# SimpleSpider Instruction
## how to install
```pip install SimpleSpider```
This is a module to help you use network spider easier.
## How to install
```pip install SimpleSpider```
## Using in command
There are 9 argument when you use in the command.
| argument | type |default|desctipyion|
| --- | --- | --- |---|
|url|str|None|Your url
single|bool|True|If you want to use script to get the content from series of page,you can set it as False and se the index.
|re|str|None|Regular Expression setting use,dont forget to use "" ,eg: --re "ab*c"
|xpath|str|None|Xpath setting use, dont forget to use "",eg:--xpath "//*div[0]/text()"
|index|str|default|use "," to spite the index, eg --index 1,2,3,4,5,6,7
|print|bool|True| if you dont want to print out it in the console,set it as False
|output|str|None| if you want to export your result, use it to set the path,eg: --output "D:/data.xlsx."
|mode|str|None|you can use "img", "xp" and "re" to set mode get img urls,or use xpath, or regular expression
|indexfile|str|None|you can directly read the link by file
Example 1:
get the data with Regular Expression from single Page.
```
SimpleSpider --mode re --url https://www.163.com --re "<title>(.*.?)</title>"
```
output:
```网易```
Example 2:
get the data with Xpath from single Page
```SimpleSpider --mode xp--url https://www.163.com --xpath "//title/text()"```
output:
```网易```
Example 3:
get the data with Xpath from mulitiple Page
```SimpleSpider --mode xp --url https://ent.163.com/20/0323/ --re "<title>(.*.?)</title>" --single False --index 08/F8D2BVI700038FO9.html,10/F8D8B35800038FO9.html```
output:
```'疫情期间还出游?网友在巴厘岛偶遇霍建华林心如_网易娱乐'```
```'台湾女星刘真去世:上《康熙》走红 当郭台铭红娘_网易娱乐'```
Example 4:
get the data with Xpath from mulitiple Page/link
```SimpleSpider --mode xp --url https://ent.163.com/20/0323/ --re "<title>(.*.?)</title>" --single False --indexfile data.txt```
the indexfile should write like this:
1.html
2.html
3.html
and the url are http://example.com/test (here is the index)
Example 5:
get the data with Xpath from single Page
```SimpleSpider --mode img --url https://www.baidu.com ```
output:
```//www.baidu.com/img/gs.gif```
## If you want to use the function in this model,you just need to:
```from SimpleSpider import SimpleSpider```
there are some function for you to simply the code
Example 1:
```result = SinglePageGetByRegEx(Url=http://www.163.com,Re="<title>(.*?.)")```
the value of result is ```['网易']```
Example 2:
```List = [53,54,55,56] ```
```result = MulityPageGetByRegEx(Url="http://www.oursteps.com.au/bbs/forum.php?mod=forumdisplay&fid=", IndexList=List,RegEx="<title>(.*?.)</title>")```
the value of result is ```[['生活其他 - 新足迹 - 新足迹澳洲华人生活大全'], ['证券外汇 - 新足迹澳洲华人生活大全'], ['个人理财 - 新足迹澳洲华人生活大全'], ['生意种种 - 新足迹澳洲华人生活大全']]```
Xpath and Regular Expression are avaluable to be used.
also you can directly get the middle string in a page.
Example 3:
the html page is
```
<html>
<title>网易</title>
</html>
```
```result = SinglePageGetMiddleStr(http://www.163.com,front="<title>,back="</title>")```
output
```['网易']```
also you can directly get the image in a page.
```result = SinglePageGetImgUrl(http://www.baidu.com")```
output
```//www.baidu.com/img/gs.gif```
if you want to know more, please visit : https://github.com/shanzhengliu/SimpleSpider