Training yesterday at MarkLogic was good. We got an intensive course in XPath which was good. I've used XPath a fair bit before working with XSLT, but it was good to get a more indepth tutorial on it from an expert source. We also talked about search functionality. The search functionality if very advanced, but it seems to have been built more in the traditional model that in the Google model. There are functions to OR things together, and AND things together and to do case-sensitive or case-insensitiveor diacritic insensitive search etc, but there isn't a global sort of function that you can just pass a query string to and have it work like you can with the Google search box. This makes it seem like implementing a really cool search app that can do everything from the main search box will be a bit tricky, as you are going to have to break down the query that the user typed into it's component paths before figuring out which functions you need to execute. I know that XQuery is quite powerful based on what we've been seeing over the last couple of days, but this will stretch it to the limit. I think I would opt to implement in Java for this sort of thing somehow. It's going to be interesting to see what the Mark Logic guys do for the PoC.
One of the down sides it seems is that you can only use one language for stemming for a given database, and you have to build a stemmed index on your content, which is a ranged index which means that it must fit in RAM. I'm not sure how we are going to handle content from multiple languages in this context. I can't imagine how the Solr folks are even going to begin to address this sort of stuff thats more or less built in to Mark Logic. I can see this being an interesting competition. Stemming index in RAM, I hope I'm wrong on that point too.
One of the down sides it seems is that you can only use one language for stemming for a given database, and you have to build a stemmed index on your content, which is a ranged index which means that it must fit in RAM. I'm not sure how we are going to handle content from multiple languages in this context. I can't imagine how the Solr folks are even going to begin to address this sort of stuff thats more or less built in to Mark Logic. I can see this being an interesting competition. Stemming index in RAM, I hope I'm wrong on that point too.