As there is no reasonable tool for that I ended up writing migration code by myself. I also think that there can't be a good tool for that, because the paradigm shift from schema-full to schema-less is too big. IMHO it's recommendable to refine your datastructures as with MongoDB it is sometimes recomendable to have realted data embedded instaed of linked.
Anyway, I tried to migrate the person table with 400.000+ rows. I decided for embedded addresses here as they are usually not shared among several persons. The data should look like this:
My first stupid attempt was to solve this completely with GORM. I installed the grails mongodb plugin and tried something like this:
{ "_id" : ObjectId("5044ab202b5e1ae2ffcd44a1"), "firstname" : "John", "lastname" : "Doe", "address" : [ { "city" : "Großerlach", "street" : "Sulzbacher Str. 50", "zip" : "71577", "loc" : { "lon" : 9.57669, "lat" : 49.03721 } } ] }
My first stupid attempt was to solve this completely with GORM. I installed the grails mongodb plugin and tried something like this:
Person.list().each() { it.mongo.save() }
GORM seemed to be horribly slow when handling huge datasets at once. After some research I learned that I should do it in chunks and flush the hibernate session inbetween chunks. I tried with a chunk size of 1000 rows.
def sessionFactory def copyToMongo() { int chunk = 1000 long offset = 0 long max = Person.count() while(offset < max) { Person.list(max:page, offset:offset).each { it.mongo.save() } sessionFactory.currentSession.flush() offset += chunk } }
But this was again very slow. After inserting about 60.000 rows the performance started to degrade and after 180.000 rows I ran out of heap memory. Finally I found out that it's best not to use GORM and Hibernate at all when handling a massive amount of data. No object mapping, just raw data and it worked quite well. My solution uses bare Groovy SQL and the low level API of MongoDB using the bulk insert feature:
def mongo def dataSource def copyToMongo() { int chunk = 1000 long offset = 0 def db = mongo.getDB("test") db.person.drop() Sql sql = new Sql(dataSource) long max = Person.count() while(offset < max) { def bulk = [] sql.eachRow("SELECT * FROM person LIMIT 1000 OFFSET $offset", { bulk << [ firstname:it.firstname, lastname:it.lastname, address:[[ city:it.city, street:it.street, zip:it.zip, loc: [lon:it.longitude, lat:it.latitude] ]] ] }) db.person.insert(bulk) offset += chunk } db.person.ensureIndex(['address.loc':'2d']) render "Done. " render "MySQL: $max " render "Mongo: ${db.person.count()}" }
Keine Kommentare:
Kommentar veröffentlichen