Hello all, Previously i posted about this project at ILUG-D mailing list <a href="http://www.mail-archive.com/ilugd@lists.linux-delhi.org/msg23879.html">http://www.mail-archive.com/ilugd@lists.linux-delhi.org/msg23879.html</a>, but at that time, it had lots of dependency which made it hard(rather impossible) to install and test and get back with feedback. I presented this project at <a href="http://freed.in">freed.in</a> 09 (slides and sample are available at <a href="http://code.google.com/p/offline-wikipedia/downloads/list">http://code.google.com/p/offline-wikipedia/downloads/list</a>) and both the time feed back i got was exciting. It all started with idea from Stian Haklev(<a href="http://reganmian.net/blog/">http://reganmian.net/blog/</a>), who was attempting for same, and was targeting to get entire English wikipedia, onto a DVD, which can be distributed and used freely. So here i am with <a href="http://92.243.5.147/offline-wiki">http://92.243.5.147/offline-wiki</a> it includes two zipped files. Procedure is simple enough: <ul><li>Extract both of them. </li><li>Place the exact location of blocks/xml_block/, in offline-wikipedia/page/class_con.py file.</li><li>Run the server using command : $./manage.py runserver. </li><li>Now in browser, you can access any article via opening the link: <a href="http://localhost:8000/wiki/xyz/">http://localhost:8000/wiki/xyz/</a> (please remembering the following '/' failing which it gives the error for URL resolving).</li> </ul>This is about how to make it work, now about the content of setup: <ul><li>block.tgz contains xml_blocks folder with some 20k odd files, which are small chunks of huge XML dump provided by media-wiki (<a href="http://download.wikimedia.org/enwiki/">http://download.wikimedia.org/enwiki/</a>).</li> <li>offline-wikipedia.tgz has django setup, and csv files which have list of all the articles present in XML dump.</li><li>Some other files like segregate.py, index.py, index_file.py which i used to create indexing, both db and csv files, i have tried to document my steps, but still in case of confusion let me know.</li> <li>Media content, like css files, images logos i have taken from media-wiki site itself without making changes.</li></ul>Major Concern/Reaction of the audience/users (Future targets): <ul><li>How to keep it updated.</li> <li>How to make it editable.</li><li>How to manage different categories of articles, and segregation based on that to make refined and better education/learning tool(Rahul Sundram).</li></ul>Issues that are at hand: <ul> <li>From my-side, apart from following <a href="http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html">http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html</a>, i tried to make this thing work via django, it posed new problem of writing converter for wiki-markup-text to html, which as of now, is not perfect, needs improvement to utilise all the content available at hand, Here i am ready to trade off with any parser irrespective of language (python/PHP), but it should be better the this one, PHP i avoided, as for that, it will need Apache web server, which would be overkill.</li> <li>I am using django server, it can be replaced by simple python web server which can handle css, other requests, as i am not using any of other features provided by django(link MVC).</li><li>To make accessibility of individual article fast, am breaking huge file to small resulting in more then 20k odd files, any way we can skip that, hint idea help would be great.</li> <li>Last years October dump was 4.1G and this March 09 dump is already 4.6G making things more difficult.</li><li>Adding options of going live, searching for articles, making updates readily available.</li></ul>This is all about project, till now i am in conversation with Stian and Imran(from AU-KBC research lab) about the possible usage and options(i also talked to rahul and rakesh and shirish at freed), but am sure, there are lots of options, comments and suggestions among other people which can help develop this and make it highly useful project. I would really like to get some feed-back, response, help, guidance, so kindly reply back with valid comments so that we can get to best conclusion and result. This blocks.tgz file is huge, and i know it would be really difficult for many to download it and try it, so there is sample.tar.bz2 on google code, you can try that, I will update one more sample by tomorrow on <a href="http://92.243.5.147/offline-wiki/">http://92.243.5.147/offline-wiki/</a>, which is small enough and and at the same time handy to check out the present condition. -- Regards Shantanu PS: till now, pranav, nandeep and emmanuel were the one who tried it and got back with feed back, comments, suggestions, hope this time i get more.