Bad Wolf, in progress
-
Recent Bookmarks
-
Archives
| Mon | Tue | Wed | Thu | Fri | Sat | Sun |
|---|---|---|---|---|---|---|
| « Jun | Sep » | |||||
| 1 | 2 | 3 | 4 | |||
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| 19 | 20 | 21 | 22 | 23 | 24 | 25 |
| 26 | 27 | 28 | 29 | 30 | 31 | |
Link: Greenroom: a PHP / Mongo Framework
Creating a new PHP / Mongo Framework with the following guidelines:
1. Existing frameworks depend on SQL for much of their CRUD functionality. A fresh start will allow for the highest quality Mongo framework.
2. Code management is not responsibility of the framework.
3. A high quality default CRUD interface is a must. The most important API is the Field and Model API.
During the Alpha stage, exploratory work implementing basic types will be performed. Once the types crystalize, final behavior will be documented, tested, and implemented.
The following products were reviewed to assess their capabilities:
Cassandra
Used by Facebook, Twitter, Digg
http://cassandra.apache.org/
CouchDB
http://couchdb.apache.org/
MongoDB
http://www.mongodb.org/display/DOCS/Home
We also looked at these products, but ruled them out for various qualitative reasons:
Project Voldemort
Used by LinkedIn
http://project-voldemort.com/
Redis
http://code.google.com/p/redis/
Hbase
http://hadoop.apache.org/hbase/
Project Voldemort and Hbase did not have sufficient python drivers and were ruled out for analysis. Redis was ruled out because sharding was implemented outside of the project in a non-standardized manner.
Methodology
We tested each technology using the login data for ~ 1.5 million users. The first test inserted all the rows into the database, and the second test queried 100,000 random usernames. These were timed to provide a way of comparing relative performance.
Cassandra Results
Load: 52m9.835s
Query: 1m42.434s
Disk Usage: ~3.1G
CouchDB Results
Load: 198m3.774s
Query: 15m54.026s
Disk Usage: ~6.5G
MongoDB Results
Load: 15m38.976s
Query: 1m1.990s
Disk Usage: ~1.0G
Cassandra provided acceptable performance. Cassandra had the best cluster management. Schema design was difficult and required much more foresight. Single node performance was not as fast as Mongo’s, but presumably would do better with scale.
CouchDB had the most attractive development environment but was unacceptably slow and heavy.
MongoDB had the fastest single node performance. MongoDB’s toolset was well rounded and easy to understand. MongoDB was also the most flexible system — much easier to write ad hoc queries. The clustering support exists but is not as advanced as Cassandra.
Appendix: Project descriptions and links
Cassandra
The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.
http://cassandra.apache.org/
http://incubator.apache.org/thrift/about/
http://github.com/vomjom/pycassa/
http://github.com/digg/lazyboy
http://stackoverflow.com/questions/1502735/whats-the-best-practice-in-designing-a-cassandra-data-model
http://jetfar.com/installing-cassandra-and-thrift-on-snow-leopard-a-quick-start-guide/
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg02426.html
CouchDB
Apache CouchDB is a document-oriented database that can be queried and indexed in a MapReduce fashion using JavaScript. CouchDB also offers incremental replication with bi-directional conflict detection and resolution.
http://couchdb.apache.org/
http://code.google.com/p/couchdb-python/
http://davidwatson.org/2008/02/python-couchdb-rocks.html
MongoDB
MongoDB (from “humongous”) is a scalable, high-performance, open source, dynamic-schema, document-oriented database.
http://www.mongodb.org/