MarkLogic and the New York Times
December 2, 2013
On Saturday, November 30, 2013, The New York Times published “Health Care Site Rushing to Make Fixes by Sunday.” As I now know, mission accomplished. But there was no aircraft carrier, brass band, or flag. (Here’s the link to the online story, but like so many “real” journalistic efforts, the link can go dead and you will have to hunt for a November 30, 2013 Times and look on pages A 1 with a jump to page A 12. Penguin, there is nothing I care to do about the link. Sorry.)
I wanted to document this passage from the Times’ story about MarkLogic. What’s interesting is that the company gets little attention from other “real” journalists. I suppose if I were curious, I would attempt to answer the question, “Why?”
I am not curious. Here’s what snagged my attention on the 30th:
Gary C. Boom, the chief executive officer of another vendor, MarkLogic, said his firm is also moving its software to differently configured servers.
The idea is from MarkLogic’s neighbor in Silicon Valley, Oracle. A few years ago, Oracle wrote a white paper banging on MarkLogic’s technology. You can find a copy of that analysis in “Mark Logic XML Server 4.1.” I wrote about the tempest in “A Coming Dust Up between Oracle and MarkLogic?”
The Times’ story continued:
MarkLogic provided the technology for the database that serves as the system’s internal filing cabinet and index.
The story does not make clear whether MarkLogic is an XML server that acts like a junction box among the moving parts of the HealthCare.gov site, a data management system interacting with Oracle’s technology, or a search engine for the Web site. MarkLogic positions its technology as doing each of these functions plus analytics, business intelligence, customer relationship management, publishing, and probably some other functions as well.
the Times quotes Mr. Bloom as having said:
I am picking up my house and moving it to a better foundation next door,” he [Mr. Bloom] said in an interview. He said MarkLogic is performing up to standard, but “the network and the storage systems are not properly sized and not properly run.”
It is not clear to me which vendor is providing the storage systems. Is it MarkLogic or is it another vendor such as Oracle, a company apparently unimpressed with some of MarkLogic’s technology if I understand the Oracle white paper.
The Times added:
“Another critical problem involved the specifications for a major computer switch that connects the computer services through a security firewall to the Internet. Mr. Bloom said it has been upgraded from four gigabytes a second to 60 [gigabytes a second]. He said the earlier speed was the equivalent of employing four security staffers to screen Heathrow Airport’s passengers. “The line to get through,” he said, “would go back to the city of London.”
I am not sure how these issues did not become known to the vendors pushing data through the system, but apparently, the 15X shortfall was not noticed. I wonder how many home builders move a completed house to a new foundation. Also, what if the security folks at Heathrow are more or maybe less efficient than those located where HealthCare.gov is?
I will keep my eye on this issue because MarkLogic has been emphasizing that it offers a search system. Where there is a search vendor, there seems to be some activity of interest. And where there are MarkLogic and Oracle, there may be some interesting discussion between the parties.
Stephen E Arnold, December 2, 2013
Vizit Announces SharePoint 2013 Enhancements
December 2, 2013
There is a growing landscape of SharePoint add-ons that provide increased functionality as well as ease of use. Vizit throws its name into the ring with their new release of Vizit Essential. PR Web offers the latest in their article, “Vizit Announces Essential SharePoint 2013 Enhancements.”
The article begins:
“Vizit a leading provider of solutions that enhance SharePoint usability, search, and document reviews announces improved PDF, Email, and SharePoint 2013 Document Library support for its leading SharePoint add-on solution, Vizit Essential™. Vizit continues to build on its legacy of making SharePoint more usable through efficient file previewing and viewing by adding PDF bookmarking support to Vizit Essential.”
There is a definite market for add-ons such as the one offered by Vizit. As SharePoint increases its scope and breadth, customers have to look elsewhere for customization and depth. It is an old rule that you cannot be all things to all people. But SharePoint serves as a good base and customers are increasingly comfortable looking elsewhere for special needs. Steven E. Arnold is a longtime leader in search and a SharePoint watcher. His Web service, Beyond Search, is a good way to track the latest in all things enterprise search, including new add-ons.
Emily Rae Aldridge, December 2, 2013
ThisPlusThat for Smarter Searches
December 2, 2013
Leave it to an astrophysicist to make search smarter. One of the fellows over at the Insight Data Science Fellows Program, Christopher Moody, describes how his search engine uses vector words to produce more accurate search results in, “ThisPlusThat.me: a Search Engine that Lets You ‘Add’ Words as Vectors.” The scientist says he was inspired by the possibilities presented by Google’s new vectoring algorithm, word2vec. He explains:
“What [Google] doesn’t do is understand the relationships between words and understand the similarities or dissimilarities. That’s where ThisPlusThat.me comes in–a search site I built to experiment with the word2vec algorithm recently released by Google. word2vec allows you to add and subtract concepts as if they were vectors, and get out sensible, and interesting results. I applied it to the Wikipedia corpus, and in doing so, tried creating an interactive search site that would allow users to put word2vec through its paces.”
Moody supplies several examples of his project in action. The first and most elementary: querying “King – Man + Woman” leads to “Queen.” Since the algorithm was trained using Wikipedia‘s vast collection of data, Moody explains, it has “a pretty good grasp of not only common words like ‘smart’ or ‘American’ but also loads of human concepts and real world objects, allowing us to manipulate proper nouns.” You can try ThisPlusThat.me for yourself here.
Moody explains how he approached word2vec’s huge dimensional vector table using Hadoop‘s Map functions. To speed computation, he tried a number of tools: NumPy, Cython, Numba, and Numexpr. Near the end of the article, Moody shares links to his code and notebook experiments. The write-up is worth a look for anyone interested in the development of natural language algorithms.
Cynthia Murrell, December 02, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
How Chicago State Handles Faculty Dissent
December 2, 2013
Here is an interesting approach to academic freedom. The Chicago Tribune informs us that “Chicago State University Wants Faculty Blog Shut Down.” The blog in question, the Faculty Voice Blog, has dared to be critical of the University administration, so the school and its lawyers have sent an official “cease and desist” notice. Rather than engage the unhappy professors in civil debate, it seems the school has suddenly decided it has a problem with the blog’s use of its trademarks and trade names. (The blog has been active, and using these “trade names and marks,” since 2009.) The notice also characterizes the posts as unprofessional and uncivil, thereby violating University policy. No word on why they feel their policy trumps the First Amendment to the U.S. Constitution.
Reporter Juan Perez Jr. cites Phillip Beverly, the associate political science professor who founded the blog. The article relates:
“Roughly eight faculty members contributed to the site, Beverly said, under their own names or pseudonyms. The website used a picture of an on-campus Chicago State University sign and ‘CSU’ hedge sculpture. But Monday evening, after receiving the letter, Beverly changed the site’s name to ‘Crony State University’ and replaced its main image with a building from another campus.”
That’s one way to deal with specious charges (don’t worry, he is also consulting a lawyer). Beverly started the website specifically to provide a forum for discussing problems at the University, like disappointing graduation rates, poor money management, and inadequate leadership. For its part, the school seems to feel it has the right to dictate what information makes it into the public sphere.
Perez writes:
“Last year, Chicago State officials instructed faculty and staff that only authorized university representatives could share information with the media—and that everything from opinion pieces to social media communications could require prior approval.”
This is not the first time Chicago State has run afoul of the First Amendment. Just last year, a federal judge decided the school had violated rights granted by that hallowed document when it fired another outspoken professor, Steven Moore. Perhaps University administrators should audit a few classes on constitutional law.
Cynthia Murrell, December 02, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Turn to Circa for News Tidbits
December 2, 2013
Is the effort to know and understand what is happening in this huge, complex world just too much to squeeze into the day? Lifehacker points us to another concession to the modern reader in, “Circa News Dishes Out Bite-Sized News Bits to Keep You up to Date.” The app, available for both iPhones and Android-based devices, lets users quickly catch up with news stories they are interested in. Well, the broad strokes, anyway.
Writer Thorin Klosowski summarizes:
“Circa is a curated list of news on a variety of topics. It has its own editorial team that uses a wide variety of sources, and those sources are collated into very quick, short news bits with just the essential facts, quotes, or photos about each topic. If you find a story interesting, you can click the follow button and Circa will point you to new information when it’s available. It’s by no means enough information for serious news junkies, but if you’re looking to keep up with what’s happening in the world while waiting in line for a cup of coffee, Circa News is a worth a look.”
I see how this can have its uses. I just hope users will, when they have a little more time and attention, turn to the more comprehensive articles on their subjects of interest. After all, most folks realize that there is more to every story than can be absorbed while waiting for a latte to be prepared. Right?
Cynthia Murrell, December 02, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Why Google is Letting People Down
December 1, 2013
Some people might say that Google abandons and starts projects on a whim. In the past, the search giant makes provided explanations for projects that could not be completed and promises they were unable to keep. But has the abandonment mentality and prideful hot hair stopped this habit? Marketing Land’s Danny Sullivan further explores this question in, “Google’s Broken Promises And Who’s Running The Search Engine?”
What promises has Google broken? Google Shopping was supposed to index prices of items across the Web, but it only displays results from paying vendors. Google once fought against shopping search engines that only included shopping results, but not the company claims that is the only way to get viable information.
Google also promised it would keep its searches banner free. Guess what they are doing now? Google stated that they are only conducting a US banner tests to allow advertisers to add images to relevant search queries.
Why Google is doing this may be that the company has had to adapt, but it goes against Google’s original philosophy:
“You’d think they caused some internal debate. Was there anyone at Google saying that if giant graphical units at the top of search results are useful to searchers, then maybe Google should be offering those for free, to ensure a consistent experience for those searchers? Was there anyone at Google saying that maybe a shift to paid inclusion was a bad move for shopping and other search products, because it opens up every search product to that possibility?”
Google is not sharing explanations with the public, however. In my opinion, the root of the problem is that no one is officially assigned to run search products. The company is instead focusing on other areas and neglecting its star. What is even worse is that the fuzzy management holds no one accountable for the broken promises. Google’s main search focus is making money and not providing accurate results.
Since Google is the biggest search player, what does this mean for other search components like SEO? Will paid results dwarf SEO? It also begs the question if SEO focuses on search? Money makes the world go around I guess.
Whitney Grace, December 01, 2013
Sponsored by ArnoldIT.com, developer of Augmentext