waaaaaay offf topic!! grid computing project

Wed Aug 17 18:00:59 UTC 2005

hi.

i know this is way off topic, but i'm considering creating a seti at home like
grid project for testing purposes. the goal of the project would be to
extract book information from the amazon.com site/servers using the amazon
AWS services.

to get to the scale in a fast enough timeframe, i might have to create some
sort of distributed/grid application. the key issue is that while Amazon
permits the
extraction/use of the book data from their site/servers, Amazon restricts
how fast you can hit their servers with a given machine/IP. amazon allows a
server to hit their site oncer/second. the obvious solution is to create a
distributed app that would be used to parse/extract the information,
building the database.

while the initial app would be to test, to make sure everything would work
correctly, the obvious end result would be to use the database to support a
possible business venture.

the client app for this project would consist of a perl/python app used to
hit the amazon.com server, and then to return the data to the test server.
the goal is to extract information for ~2-3 million books. i estimate that
i'd have to have a network of 200-500 machines to accomplish this over 2-3
days.... with each client machine hitting the amazon server once every 5
seconds...

i've looked at some of the projects that have been setup for this kind of
process/app, and haven't found any site where one could more or less post a
potential project, and talk to like minded individuals.

so, i thought i'd turn to you guys to see what your thoughts might be. if
you'd like to help, or if you're interested and want further information, or
if you have other places that i might be able to turn to for possible help,
let me know.

thanks for whatever help (or pointers) you can give!!

-bruce
bedouglas at earthlink.net