mongodb - Indexing schema-less dbs having user-defined schemas? -
one of essential features of database query speed. store data away , want quick access data matches our criteria. however, of late, schema-less databases have become popular. it's 1 thing if have schema-less database there inferred (in-the-head/in-the-app) schema; hasn't been declared formally database.
on other hand, let's need open database several users have own schemas own individual problem areas. user define own "domain". domain (a database on rdbms server) have types (tables in rdbms) , types have own properities (columns in rdbms). how create compound indexes pull specific objects/documents/records (what have you) given domain? query should select 1 or more domains (an in clause), 1 topic type (e.g. calendarevent), against columns (start_date >= today, start_date <= today + 1 week, open_for_registration = true, calendar_name = ‘public'). in database fixed schema (implied if not declared), simple: create compound index against columns.
the complexity have made single instance of let's mongodb act rdbms server many databases each database , related schema our "domain".
after busting brain on problem week , looking @ various databases (mongodb, neo4j, mysql, postgresql) have found few possible solutions:
- index properties. property represented in properties table or embedded document in mongodb. in rdbms property values have serialized strings. cons: a) can search against 1 property @ time (no compound indexes), b) gets index we're incurring needless overhead.
- index select properties. in postgresql done filtered index. basically, property record have bit called "indexed" have maintain. bit drive whether or not filtered index uses particular property. cons: a) can still search against 1 property @ time. eliminates "compound indexes" use. way can imagine mimic compound index search against each individual indexed property , return intersection of pks.
- create/maintain database constructs reflect working indexes. in mongodb, create "indexables" collection. document in collection might this: {domain_id: objectid(..), type_id: objectid(..), fields: {field1: "some int value", field2: "some date value", field3: "some bit value"}}. index "indexables" collection on {domain_id: 1, type_id: 1, "fields.field1": 1, "fields:field2": 1, "fields:field3", 1}. every time create/update document in "things" collection have plug it's values field1, field2, field3 slots of indexables. (this works nicely mongodb because can plug values of datatype placeholders. in mysql, using same pattern have serialize values strings.) have maintain domain_id , type_id. basically, it's index layer (that manage myself) built on top of indexes handled database. cons: there's additional overhead. whereas database manage indexes on behalf, have take care myself. mongodb has no concept of transactions couldn't guarantee document , it's various indexes committed in single step. pros: have compound indexes back. indexes maintained @ domain level.
- i have considered allowing users have own instances of database x. or in mongodb own collections. wondered if wouldn't create more issues run against practical limitations (number of databases or collections allowed). tossed idea out after not thought.
other ideas? other kinds of databases might better handle problem?
again, idea this: different users manage own domains. within domain can items of "type". each typed item have properties. want allow users run queries against domains items of type having properties match conditions. (thus compound indexes)
one last thought. domain in not intended humongous. might have 10-20 "types". within each type might many 5000 records (in cases) , 20000 in extreme cases.
unfortunately, 1 of cases despite joel spolsky's advice attempted astronaut architecture.
other kinds of databases might better handle problem?
have considered excel? maybe indexed flat files :)
look, basic problem you're going have here there not silver bullet. idea fine, @ point have accept set of trade-offs.
you can't index everything. @ point you'll have identify "commonly-used" queries , build indexes things. unless you're planning keep in memory, you'll end creating indexes @ point.
within each type might many 5000 records (in cases) , 20000 in extreme cases.
hey there's true limitation. how hardware can throw @ 5k records? how 200k records? going enough keep in ram? keep part of in ram? keep indexes in ram?
if want let users stuff in own "dynamic" schemas, feel mongodb natural fit. these small data sets you're indicating.
but it's not silver bullet means. each of these solutions have own set of problems. if there actual db handle of requirements put forth, let's face it, we'd using db :)
Comments
Post a Comment