driller-cpp-web-crawler

Who am i:
my name is meir yanovich and im c++/java developer mostly doing server infra cross platform (unix/linux/window) stuff in my day job.
but sometimes i like to experiment stuff in my spare time.
also If you interested in facebook api and ways to interact this project may interest
you:
http://code.google.com/p/facebook-cpp-graph-api/
or if you have young kids:
http://code.google.com/p/kidsbrowser/


you can find my on-line profile in here :
http://il.linkedin.com/in/meiryanovich 
if you have any cool ideas on how to use this code and you need help please email me
Email: [email protected]

Implementation of web crawler / spider in c++ 
------------------------------------------------------------------------------------

Web crawler / spider used for web data mining or data aggregations 

  • using regular expressions rules to collect data.
  • Programmed using pure c++ (stl) and bunch of open source libraries.
  • web spider that can fallow links based on single domain.
  • output to xml file with configurable tags.

 

I tried to keep the "keep it simple keep it clean" rule , using as much of ready made open source c/c++ libraries.

How to build it:
The application only tested on windows xp 32 bit although I pay attention on using only cross platform libraries.
and not to write OS depended code.
The libraries the Driller depend on are :

  • pcre : for regular expressions.
  • Pthreads : for cross platform threads wrapper.
  • Curl + c-ares : for http requests / response.
In Driller source code I supply visual studio express 2008 solution and project files and all the libraries are already build in debug mode. all you have to do is configure it and build it
this will save you time on configuring and compiling to test the application.
for more information see  *how_to_build_drill*

How to configure it:
The driller web spider doesn’t come with fancy configuration GUI or configuration file.
All configurations must be done in code , then compile it then run it and see the results come in.
The reasons is because I used it for my personal use without much time in my hands and didn't planed to Open source it ..any way all those features will be added later.
Step by step guide can be found here in  how_to_configure_drill .


if you find this useful consider to donate.
all donations will go to charity.

你可能感兴趣的:(Web)