Web crawlers and Google App Engine Hosted applications

Question

up vote
1
down vote

favorite

2

Is it impossible to run a web crawler on GAE along side with my app considering the I am running the free startup version?

edited Mar 25 '09 at 12:26

myroslav
987 2 11

asked Mar 24 '09 at 7:44

Spikie
107 1 11

50% accept rate

myroslav · Answer 1 · 2010-04-29 11:37:46Z

While Google hadn't exposed
scheduling, queue and background tasks API, you can do any processing
only as an answer to external HTTP request. You'd need some heartbeat
service that will process one item from crawler's queue at a time (not
to hit GAE limits).

To do crawling from GAE, you have to split your application into
queue (that stores queue data in Datastore), queue processor that will
react to external HTTP heartbeat and your actual crawling logic.

You'd manually have to watch your quota usage and start heartbeat when you have spare quota, and stop if it is used up.

When Google introduces the APIs I've told in the beginning you'd have
to rewrite parts that are implemented more effectively via Google API.

UPDATE: Google introduced Task Queue API some time ago. See task queue docs for python and java.

Nick Johnson · Answer 2 · 2009-03-24 12:17:12Z

up vote
1
down vote

App Engine code only runs in
response to HTTP requests, so you can't run a persistent crawler in the
background. With the upcoming release of scheduled tasks, you could
write a crawler that uses that functionality, but it would be less than
ideal.

answered Mar 24 '09 at 12:17

Nick Johnson
41.1k 4 27 65

SilentGhost · Answer 3 · 2009-03-24 10:26:50Z

up vote
0
down vote

I suppose you can (i.e., it's not impossible to) run it, but it will be slow and you'll run into limits quite quickly. As CPU quotas are going to be decreased at the end of May even further, I'd recommend against it.

edited Mar 24 '09 at 10:26

answered Mar 24 '09 at 10:14

SilentGhost
40k 6 41 98

Vasil · Answer 4 · 2009-03-24 13:11:35Z

up vote
0
down vote

It's possible. But that's not
really an application for appengine just as Arachnid wrote. If you
manage to get it working I'll doubt you'll stay in the qotas for free
accounts.

answered Mar 24 '09 at 13:11

Vasil
5,272 4 21 48

gae crawler

Web crawlers and Google App Engine Hosted applications

4 Answers

Your Answer

你可能感兴趣的:(GAE)