go语言爬虫等第三方库汇总

汇总会在github上更新

地址:https://github.com/ScrappyZhang/go_awesome_third_party_libraries


与javascript相关

  • go-v8:V8 JavaScript engine bindings for Go.
  • github.com/robertkrimen/otto:A JavaScript interpreter in Go (golang)https://godoc.org/github.com/robertkrimen/otto
  • PuerkitoBio/goquery:goquery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/html packageand the CSS Selector library cascadia.goquery是一个使用go语言写成的HTML解析库,可以让你像jQuery那样的方式来操作DOM文档。

与数据库有关

  • go-sql-driver:mysql增删改查
  • gocolly/redisstorage:This is a redis based storage backend for Colly collectors.
  • jinzhu/gorm:The fantastic ORM library for Golang
  • mgo:mgo是MongoDB的Go语言驱动,它提供了遵循Go语言的简单API,实现了丰富的特性,并经过良好测试。
  • LedisDB:Ledisdb is a high-performance NoSQL database, similar to Redis, written in Go. It supports many data structures including kv, list, hash, zset, set.

与网页解析相关

  • anaskhan96/soup:Web Scraper in Go, similar to BeautifulSoup
  • andybalholm/cascadia:CSS selector library in Go
  • antchfx/htmlquery:htmlquery, lets you extract data from HTML documents using XPath expression
  • antchfx/xmlquery:xmlquery, is XML parser to extract data from XML documents using XPath expression
  • antchfx/xpath:XPath package for Golang
  • antchfx/xquery:xquery, extract data or evaluate value from HTML/XML documents using XPath expression in Go
  • PuerkitoBio/urlesc:Proper URL escaping as per RFC3986

与爬虫相关

  • go_spider:A crawler of vertical communities achieved by GOLANG. An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.

  • antchfx/antch:Antch, a fast, powerful and extensible web crawling & scraping framework for Go

  • gocolly/colly:Elegant Scraper and Crawler Framework for Golang

  • mozillazg/request:A developer-friendly HTTP request library for Gopher.Inspired by Python-Requests.

  • PuerkitoBio/fetchbot:A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

  • PuerkitoBio/gocrawl:gocrawl is a polite, slim and concurrent web crawler written in Go.

    For a simpler yet more flexible web crawler written in a more idiomatic Go style, you may want to take a look at fetchbot, a package that builds on the experience of gocrawl.

  • temoto/robotstxt:The robots.txt exclusion protocol implementation for Go language

  • pholcus:Pholcus is a distributed, high concurrency and powerful web crawler software.

  • surfer:Package surfer is a high level concurrency http client. It has surf andphantom download engines, highly simulated browser behavior, the function of analog login and so on.

WEB相关

  • ungerik/go-start:A high level web-framework for Go
  • ungerik/go-rest:A small and evil REST framework for Go

区块链相关

  • tendermint:Tendermint 是一个模块化的区块链应用框架, 能够实现拜占庭容错 (BFT),

字符处理

字符编码

  • axgle/mahonia:character-set conversion library implemented in Go.

字符匹配

  • gobwas/glob:This library is created for compile-once patterns. This means, that compilation could take time, but strings matching is done faster, than in case when always parsing template.

字符处理

  • jinzhu/inflection:Inflection pluralizes and singularizes English nouns
  • mozillazg/go-pinyin:汉字转拼音pinyin
  • saintfish/chardet:chardet is library to automatically detect charset of texts for Go programming language. It's based on the algorithm and data in ICU's implementation.

处理路径

  • kennygrant/sanitize:Package sanitize provides functions to sanitize html and paths with go (golang).

中文分词

  • sego:Go中文分词词典用双数组trie(Double-Array Trie)实现, 分词器算法为基于词频的最短路径加动态规划。支持普通和搜索引擎两种分词模式,支持用户词典、词性标注,
  • Jiebago:Jieba 分词 Go 语言版

图像

  • captcha:https://github.com/afocus/captcha simple captcha for golang (go验证码生成器)
  • go-qt5:qt5 bindings for go

包管理

  • gpmgo/gopm:Gopm (Go Package Manager) is a Go package manage and build tool for Go.Go development environment: >= go1.2

终端命令

  • jawher/mow.cli:Package cli provides a framework to build command line applications in Go with most of the burden of arguments parsing and validation placed on the framework instead of the user.

你可能感兴趣的:(go)