July 4, 2014
The purported father of NoSQL, Norman T. Kutemperor, made an appearance at this year’s Enterprise Search & Discovery conference, we learn from “Scientel Presented Advanced Big Data Content Management & Search With NoSQL DB at Enterprise Search Summit in NY on May 13” at IT Business Net. The press release states:
“Norman T. Kutemperor, President/CEO of Scientel, presented on Scientels Enterprise Content Management & Search System (ECMS) capabilities using Scientels Gensonix NoSQL DB on May 13 at the Enterprise Search & Discovery 2014 conference in NY. Mr. Kutemperor, who has been termed the Father of NoSQL, was quoted as saying, When it comes to Big Data, advanced content management and extremely efficient searchability and discovery are key to gaining a competitive edge. The presentation focused on: The Power of Content – More power in a NoSQL environment.”
According to the write-up, Kutemperor spoke about the growing need to manage multiple types of unstructured data within a scalable system, noting that users now expect drag-and-drop functionality. He also asserted that any NoSQL system should automatically extract text and build an index that can be searched by both keywords and sentences. Of course, no discussion of databases would be complete without a note about the importance of security, and Kutemperor emphasized that point as well.
The veteran info-tech company Scientel has been in business since 1977. These days, they focus on NoSQL database design; however, it should be noted that they also design and produce optimized, high-end servers to go with their enterprise Genosix platform. The company makes its home in Bingham Farms, Michigan.
Cynthia Murrell, July 04, 2014
June 6, 2014
It is a situation we have all faced. We are watching our favorite program, and then suddenly a song starts to play in the background. As the song emphasizes the action on screen, we have trouble identifying it. A smartphone might not be handy with a song recognition app and by the time it is downloaded the song is over. What do you do then? Beyond the obvious of rewinding (if you have that option), be glad that the Internet has a solution. LifeHacker tells us that “TuneFind Tells You What Songs Are In TV Episodes And Movies.”
There is now an entertainment database for everything online. TuneFind allows users to browse and search to find that song stuck in your head.
“TuneFind’s library is pretty extensive for both TV shows and movies. You can browse by shows, movies, and artists, but you can also browse by what’s popular. It’s pretty cool to see what other users have been searching for the most over the last week, month, and year. For TV shows, the selection goes back a ways, but nothing from the early 90s and earlier seems to be present. I’m probably wrong, but the earliest I could find was 1999′s excellent Freaks and Geeks. For movies the reach back is about the same.”
TuneFind works the same as other online databases and the content is extensive considering it goes back to 1999. If you also see something an actor’s worn on TV, you’ll also enjoy WornOnTV. Does anybody sense the next wave of advertisement and MTV?
May 14, 2014
I read “Europe’s Top Court: People Have Right to Be Forgotten on Internet.” Fascinating. The real news article said, “People can ask Google to delete sensitive information from its Internet search results.” The source of the assertion was Europe’s top court. After I read the item, I wondered what was being “deleted” and “from where”? When it comes to removing content, the concept of deletion may need some of Mr. Bill Clinton’s “is” type thinking. Content can disappear. An example would be information from government servers. In some cases, the removal of content is intentional. In others, a system administrator performs and operation and – poof – content is history.
Digital information is like “dark matter.” It may be hard to detect, but some people know that it is very real. For example, poke around the Internet Archive Wayback Machine. There is some interesting information on that system that may be otherwise difficult, if not impossible, to access.
Then there is the problem of deleting content from data management systems. I am confident that Europe’s top court knows that removing an item from an index does not remove the item from the data management system, back ups, or mirrors of content residing “out there” on the Internet or on a researcher’s personal computer.
The notion of deleting is fuzzy to me.
Almost as fascinating is the question of who gets to “remove” what? What are the procedures for getting content deleted from Google or any other system? How does one know that the information is gone? Run a query on a free Web search engine? A commercial system?
Like many ideas in the category “barn burned and horses gone”, deleting content from the “Internet” may be a challenging issue to resolve. In the case of removing content from some of the major online search systems, a Costco has already been erected on the site where the barn once stood, horses grazed, and sun touched information farmers once raised their data crops.
Stephen E Arnold, May 15, 2014
May 13, 2014
It is time for people to understand that relational databases were not made to handle big data. There is just too much data jogging around in servers and mainframes and the terabytes run circles around relational database frameworks. It is sort of like a smart fox toying with a dim hunter. It is time that more robust and reliable software was used, like Hadoop. GCN says that there are “5 Ways Agencies Can Use Hadoop.”
Hadoop is an open source programming framework that spreads data across server clusters. It is faster and more inexpensive than proprietary software. The federal government is always searching for ways to slash cuts and if they turn to Hadoop they might save a bit in tech costs.
“It is estimated that half the world’s data will be processed by Hadoop within five years. Hadoop-based solutions are already successfully being used to serve citizens with critical information faster than ever before in areas such as scientific research, law enforcement, defense and intelligence, fraud detection and computer security. This is a step in the right direction, but the framework can be better leveraged.”
The five ways the government can use Hadoop is to store and analyze unstructured and semi-structured data, improve initial discovery and exploration, making all data available for analysis, a staging area for data warehouses and analytic data stores, and it lowers costs for data storage.
So can someone explain why this has not been done yet?
May 12, 2014
InfoWorld reports there is going to be a halt in progress for Oracle in the article, “Beware Of NoSQL Standards In Oracle’s Clothing.” Industry standards help regulate and control information technology. They can even help push IT forward, but according to some anonymous sources Oracle is trying to make NoSQL startups sign up for a standards body in order to slow down change.
Just the very idea of this happening is sickening for the open source community:
“In reality, big vendors use standards to halt their larger customers from adopting new technology or create weird new-old hybrids to keep the old ways alive. There are many companies that, once they see a standardization effort, will wait for the BigCo-supported standard to be adopted before they upgrade their tech stack. Since such adoption tends to be slow anyhow, this is an effective delaying tactic. Meanwhile, the big vendor works to control the standards body.”
Oracle wants to slow down progress, because it eats into their profit margin. Oracles wants the future come at a pace it chooses, where they will control the market, get patented technology under FRAND terms, and buy up NoSQL vendors.
Standardization is a good thing, but Oracle needs to realize that relational databases are too small to handle the amount of big data in systems. It’s a call to arms for the open source community to fight relying on outdated technology. Echoes of keeping video rental stores over streaming services are in this.
May 9, 2014
In what they are calling its “biggest release ever,” the updated open source MongoDB 2.6 boasts even more features than before. Application Development Trends describes the improvements in, “MongoDB Releases Major Upgrade to NoSQL Database.” MongoDB Inc. has done the math, and says MongoDB is the now leading NoSQL database. The company also has high hopes for the future.
The article describes one concept key to the new version:
“The improved query engine features a new index intersection that will fulfill queries that are supported by more than one index. Also, index filters will limit the indexes that can ‘become the winning plan for a query.’ Developers using the database can now use the count method in conjunction with the hint method. You can learn more about that here.”
Writer David Ramel turns his attention to security:
“Security improvements include better SSL support, x.509 authentication, an enhanced authorization system that features more granular controls, centralized storage of credentials and better tools for user management. The new version also features TLS encryption, along with user-defined roles, auditing functionality and field-level redaction, which Horowitz described as ‘a critical building block for trusted systems.’ The database auditing feature is extended by the new capability to integrate with IBM InfoSphere Guardium.”
MongoDB CTO and co-founder Eliot Horowitz reports that his team has re-written the query execution engine for better scalability. The upgrade also includes an easier-to-maintain codebase, the ability to return result sets in any size, and improved support for bulk operations. Horowitz notes that this version includes the groundwork for improvements planned for version 2.8, like document-level locking. See the articles for more improvements and details.
The company behind the open source MongoDB database, MongoDB Inc. makes their money on related management services. Launched in 2007, the company has offices throughout North America, Europe, and the Asia-Pacific region.
Cynthia Murrell, May 09, 2014
March 31, 2014
Microsoft recently announced changes to SharePoint, some well received and others less so. For instance, the next SharePoint server update is planned for 2015. However, in other news, SQL server will be supported within SharePoint 2013. Read more in the Redmond article, “Microsoft Adding SQL Server 2014 Support to SharePoint 2013.”
The article says:
“SharePoint Server 2013 will be capable of supporting SQL Server 2014 when Microsoft releases the next SharePoint cumulative update next month, according to an announcement on Friday. SQL Server 2014 is currently in the release-to-manufacturing (RTM) stage, and is expected to hit general availability on April 1.”
SharePoint is continuing its quest to be all things to all people, incorporating more and more outside components. However, it is becoming more difficult and more complicated for users to manage such complex implementations. Stephen E. Arnold is a longtime leader in search and gives a lot of coverage to SharePoint on his Web site ArnoldIT.com.
Emily Rae Aldridge, March 31, 2014
March 27, 2014
I believe that MarkLogic opened for business in 2001. One of the founders was involved with Ultraseek, a search engine that eventually ended up in the hands of HP Autonomy. In case you did not recall Ultraseek, that product dates from the mid 1990s.
Why’s is this relevant to MarkLogic, a company offering an XML database?
I read “MarkLogic Poised for Continued Growth as the Industry Leader in NoSQL Marketplace.” The write up states:
growth in new markets including Japan and Europe, steady customer acquisition, strategic partner relationships and industry recognition, has further propelled the company into the leadership position within the NoSQL database market.
The company points to the release of MarkLogic, Version 7, which works out to one release every two years. The company “introduced new pricing and packaging, a free developer license, and cloud ready hourly pricing for Amazon Web services.” No details on the pricing were in the story. No information about MarkLogic’s revenues were included. After the last shift in senior management, MarkLogic seemed to be nosing toward $60 million in revenues in 2011, based on our estimates. Now three years later, the company is showing renewed press release activity, but I would have preferred some hard numbers. In those three years, MarkLogic has suggested that its XML database can work as an information retrieval system, a platform for conducting intelligence, and providing print publishers with a useful content processing system. In this 36 month period, open source solutions, JSON, and competitors have been moving in similar directions. Choice, at least in data management, abounds.
MarkLogic, since 2001, according to Crunchbase, has ingested $73.6 million in funding with the last cash infusion coming in 2013 from Sequoia Capital, Tenaya Capital, Northgate Capital, and Gary Bloom, who is, according to Businessweek, the chief executive Officer, President, and Director of MarkLogic.
The news release points out:
MarkLogic received many industry accolades during the last year. The company was favorably positioned in Gartner’s “Magic Quadrant for Operational Database Management Systems,” published in October 2013. In addition, MarkLogic was the only enterprise NoSQL database vendor featured in the report that integrates search and application services. The company was also recognized in the April 2013 “Gartner Magic Quadrant for Enterprise Search,”- the only company to have the same product featured on both reports. Other accolades include the 2013 Computerworld Honors Laureate, by IDG’s Computerworld Honors Program. The annual award program honors visionary applications of information technology promoting positive social, economic, and educational change. Furthermore, MarkLogic was selected as one of the 2013 Red Herring 100 Global Winners – recognized as a leading global private company and an innovator in the technology industry.
These types of awards are not identified as “content marketing” or pay-to-play studies. I assume these accolades are objective and based on the cited firms’ deep experience with Extensible Markup Language and its applications. Anything less would be suspect in my way of looking at the world of databases, semantics, search systems, and business intelligence solutions.
With fast moving deals for outfits like Oculus Rift, the surging growth of Elasticsearch among developers, and almost frantic efforts of some MarkLogic competitors to find a way to generate revenue growth and profits—MarkLogic appears in the news release to be showing signs of revivification.
My view is that investors may be looking some return on the money pumped into MarkLogic. Assuming that patience is a virtue, I wonder if this 2001 start up is ready to deliver a big pay day to its stakeholders. WhatsApp, founded in 2009, was a home run for its stakeholders. Cloudera seems to be on a similar trajectory.
MarkLogic is 13 years old and proving to be like a teen in a fancy private school. Money is needed periodically. Do teens repay their parents? My teens did not. Investors may not have the appetite for underwriting without a return that I did as a happy parent.
Stephen E Arnold, March 27, 2014
March 3, 2014
The legacy of TeraText is long, but many in the information field have never heard of the pioneering database. Our own Stephen E. Arnold shares his extensive knowledge on the subject in a free 30-page analysis, “TeraText: Decades in the Making, Still Performing Mission Critical Functions.” The report is number 11 in Mr. Arnold’s valuable Vendor Profiles series. Why should we learn about a veteran like TeraText? He explains:
“TeraText provides a robust, scalable information processing system to government entities in the U.S., Australia, and elsewhere. TeraText is the forerunner of such systems as Recorded Future (funded by In-Q-Tel and Google) and IBM i2 Analyst Notebook. Yet most vendors marketing search and content processing systems are unaware of this important system. My report fills an important gap in the literature describing advanced information retrieval systems.”
Originally funded by university research grants, TeraText became a core system for governmental entities in law-making, defense, and intelligence. Perhaps the system’s low profile stems from the company’s sales approach; they prefer to capture a few large-scale contracts on their product’s merits, rather than capture widespread attention with flashy marketing.
If you are not familiar with the Vendor Profile series, you owe it to yourself to check out this free resource. Arnold brings his formidable expertise to bear on analyses of search and content processing vendors like Convera, Entopia, Fulcrum, and Verity. These papers are no thin giveaways; they rival reports from firms that charge as much as $3,500. Arnold’s shares this work for free, because he believes knowledge about foundational search systems can help companies make better decisions about vendor claims. He also hopes that spreading basic information about important search and content processing systems will speed up innovation in this typically sluggish field.
Cynthia Murrell, March 03, 2014
January 24, 2014
Who knew LinkedIn could be so useful? The site’s Engineering blog supplies an thorough look at logs in, “The Log: What Every Software Engineer Should Know About Real-Time Data’s Unifying Abstraction.” Writer and LinkedIn Engineer Jay Kreps aims to fill what he sees as a large gap in the education of most software engineers. The site’s transition last year from a centralized database to a distributed, Hadoop-based system opened his eyes.
“One of the most useful things I learned in all this was that many of the things we were building had a very simple concept at their heart: the log. Sometimes called write-ahead logs or commit logs or transaction logs, logs have been around almost as long as computers and are at the heart of many distributed data systems and real-time application architectures. You can’t fully understand databases, NoSQL stores, key value stores, replication, paxos, hadoop, version control, or almost any software system without understanding logs; and yet, most software engineers are not familiar with them. I’d like to change that. In this post, I’ll walk you through everything you need to know about logs, including what is log and how to use logs for data integration, real time processing, and system building.”
He isn’t kidding. The extensive article is really a mini-course that any programmer who hasn’t already mastered logs should look into. Part one is, titled “What is a log?”, covers logs in general as well as their place in both databases and distributed systems. Part two discusses data integration, including potential complications, the relationship to a data warehouse, log files, and building a scalable log. Real-time stream processing is discussed in part three, as well as data flow graphs, real-time processing, and log compaction. Part four covers system building, delving into the prospect of unbundling and where logs fits into system architecture. At the end, Kreps supplies an extensive list of resources for further study.
Cynthia Murrell, January 24, 2014