Video Search: An Open Opportunity for GreenButton

April 9, 2012

New Zealand is known for its beautiful countryside and all the popular movies filmed there, sheep, and Dot Com. Business Insider reports there is another item to add to the island nation’s “list of reasons to be famous,” “Tiny New Zealand Company Brings Cool Microsoft Video Tech to the World.” The small startup GreenButton used search technology from Microsoft Research and created InCus, a service that transcribes audio and video files to make them searchable. It is aimed at corporation enterprises to make their digital media libraries searchable. We learned:

“InCus is based on Microsoft’s Audio Video Indexing Service (MAVIS), which was previously only being tested by a few government agencies. That makes this the first commercially available use of MAVIS, GreenButton CEO Scott Huston told Business Insider. Naturally, inCus is running on Windows Azure.”

GreenButton also sells an Amazon-like cloud and other cloud applications—they specialize in 3-D rendering apps. Other companies like Cisco and Autonomy have similar services for video and audio, but GreenButton’s InCus is the only one for the cloud. GreenButton has a corner in the market now, but it won’t be too long before a bigger company develops their own video indexing service. Things are heating in this part of the cloud market.

Whitney Grace, April 9, 2012

Sponsored by Pandia.com

Iowa Government Gets a Digital Dictionary Provided By Access

April 7, 2012

How did we get by without the invention of the quick search to look up information?  We used to use dictionaries, encyclopedias, and a place called the library.  Access Innovations, Inc. has brought the Iowa Legislature General Assembly into the twenty-first century.

The write-up “Access Innovations, Inc. Creates Taxonomy for Iowa Code, Administrative Code and Acts” tells us the data management industry leader has built a thesaurus that allows the Legislature to search its library of proposed laws, bills, acts, and regulations.  Users can also add their unstructured data to the thesaurus.  Access used their Data Harmony software to provide subscription-based delivery and they built the thesaurus on MAIstro.

“The project differed from typical index and thesaurus creation because the Iowa Legislative Services Agency needed to maintain its existing codes from each back-of-the-book index, rather than starting from scratch and creating new codes.  One reference alone, the Blue Index, included 2,300 index terms.  To create the thesaurus, Access looked at different methods to apply to each term according to the existing references, tied preferred terms to the existing codes, and added related terms to the preferred terms.   The codes covered previous legislation dating as far back as 1953 to legislation through 2010.  Also, the custom taxonomy was built with only four levels in order to meet Iowa Legislative Services’ navigation requirements.  Typically, thesauri are not limited by a specified number of levels.”

The new legal thesaurus makes it much easier to find new laws and their changes instead of having to browse through pages of book.  Access Innovations hopes their project for the Iowa Legislature General Assembly will encourage other government bodies to turn their libraries over to them for indexing.  Not only would that make it easier for politicians and their staff to conduct research, maybe it could improve the political situation in the US.  Making part of a job easier tends to make people happy.

Whitney Grace, April 7, 2012

Sponsored by Pandia.com

Michael Moody Joins Lucid Imagination

March 30, 2012

Market Watch recently reported on Lucid Imagination, the commercial company for Apache Lucene and Solr search technology, in the article “Lucid Imagination Names Software Development Luminary Michael Moody Senior Vice President of Engineering.”

According to the article, Michael Moody brings more than 30 years of software engineering to the search technology company.  He has held senior positions in several different companies including: Spigit, Jaspersoft, and Portal Software.

Mr. Moody said:

Thanks to Lucid Imagination, companies will be able to meet the challenge of analyzing their big data before the rapid adoption leads to operational chaos, lost opportunities, and reduced competitiveness,” said Moody. “We have the technology, business model and people in place to help drive a complete transformation of enterprise search and retrieval that will lead to phenomenally better and faster decision making.

My colleagues and I are very excited to see Michael Moody’s addition to the Lucid Imagination team.

I speak for the ArnoldIT team when I assert that we are confident that his expertise will help the company come up with even better ways to overcome the challenges of enterprise search and big data access.

We have noticed that a number of open source search vendors are touting performance enhancements, fail over methods, and value added indexing advantages which Lucid’s search system allegedly do not provide. Assertions are easy. Real world deployments are different from talking about delivering cost savings and improved efficiencies to a customer.

We have just completed an fly over of open source search vendors. In our view, Lucid’s search system out distanced the other Lucene-based search systems we examined.

We try to avoid Mac vs. PC type hassles, but the key difference among open source search vendors boils down to who can deliver efficiencies to the licensee, offer financial stability, and 24×7 engineering support and services. When measured against our “real world” yardstick, trust Lucid Imagination. There is more to the company than a single entrepreneur working nights and weekends to compete. Just our view. Maybe our Overflight report will become publicly available. Who knows?

In the meantime, navigate to www.lucidimagination.com and learn more about the company.

Jasmine Ashton, March 30, 2012

Sponsored by Pandia.com

How to Create Your Own Oracle Text Index

March 22, 2012

The Swiss-Army Development blog recently released some useful information about key word search with Oracle Text in the post “Keyword Search via Oracle Text.”

The post attempts to create a foundation for using Oracle Text to implement full text search in a table. It takes readers step-by-step through the process of building the back end of an Oracle Text Index and then leveraging that index to include full text search.

The writer states the reasoning behind this project:

“Oracle text is a feature available in the Oracle Database and is used to provide keyword search indexing to large blocks of text and even binary formatted files like Word and PDF files. As part of a project I am working on, I need to create a keyword search index that spans multiple columns. This will allow my users to search for keywords in the title, abstract and content of a note entered into the system. The note could be in the form of an uploaded file, or it could be manually entered through the interface.”

The Swiss wash their cows, useful activity if not germane to milk, cheese, and beef.

Similar to Oracle Text perhaps?

Stephen E. Arnold, March 22, 2012

Google and the Enterprise: The Point? Money

March 19, 2012

You must read “Google Enterprise chief Girouard Heads to Startup Upstart.com.” I wondered if a simple executive shuffle many months after a de facto demotion was news. Apparently the poobahs and “real” journalists find a Xoogler worthy of a headline. I have a different view about Google and the enterprise. I write about Google’s latest adventures in my Enterprise Technology Management column, published in the UK, each month.

Google pumped quite a bit of time, effort, money, and Google mouse pads into its enterprise initiative. In the salad days, Google could not learn enough about the companies dominating the enterprise search space. As I researched my Google monographs, I was picking up from interview subjects anecdotal information about the paucity of knowledge Googlers had about what enterprise procurement teams required.

In one memorable, yet still confidential interaction, Google allegedly informed a procurement manager that Google disagreed with a requirement. Now, if that were true, that is something one hears about a kindergarten teacher scolding a recalcitrant five year old. Well, that may have been a fantasy, but there were enough rumblings about a lack of customer support, a “fluid” approach to partners, and a belief that whatever Google professionals did was the “one true path.” I never confused Google and Buddha, but for some pundits, Google was going to revolutionize the enterprise. Search was just the pointy end of the spear. The problem, of course, is that organizations are not Googley. In fact, Googley-type actions make some top dogs uncomfortable.

What happened?

Based on my research, which I shifted to the back burner, I learned:

  • Google was unable to put on an IBM type suit. The Googley stuff opened doors, but the old Wendy’s hamburger ad sums up what happened after the mouse pads and sparkle pins were distributed: “Where’s the beef?”
  • The products and services were not industrial strength and ready for prime time. The notion of an endless beta and taxi meter pricing, no matter how “interesting”, communicated a lack of commitment.
  • The enterprise market likes the idea of paying money to be able to talk to a person who in most cases semi-cares about a problem. AT&T makes tons of dough making clients pay four times an engineer’s salary to get a human on the phone any time. Google delegated support down to partners. Won’t work. A Fortune 100 company wants to call Google, not send an email.
  • Pricing. If you are not sure what the ballpark cost for indexing 100 million documents using a search appliance, ensuring 24×7 uptime, and backing up—navigate to www.gsaadvantage.com and look up the price of a Google Search Appliance. Now figure out how much it will cost to process an additional one million documents. How’s that price grab you?

When Larry Page assumed control of the company, I wrote about the wizards who were reporting directly to him. The head of the enterprise unit was not one of those folks. My conclusion: game over.

Like AOL, the notion of having a Google person on staff is darned appealing to some, but as the AOL experience makes clear, a Xoogler is not a sure fire money maker.

Here’s the quote I jotted down from the GigaOM story:

Still, market share and revenue may never have been Google’s goal. By offering a lower-cost option to the Office/Exchange tandem, Google forced the market leader to respond, and that may have been the point all along.

Baloney. Google expected to have big outfits roll over and wag their tail. The US government did not roll over. Most big IBM, Microsoft, and Oracle customers did not roll over. More important, the new wave of enterprise service and solutions providers did not roll over. Why? A lack of focus and a dependence on online advertising, legal hassles, privacy chatter, and a failure to deliver competitive products and services made the enterprise initiative a tough sell. Betas may be great for market tests. For the enterprise, a beta may be a hindrance.

Stephen E Arnold, March 19, 2012

Sponsored by Pandia.com

Has Bing Caught Up to Rival Google?

March 15, 2012

Radical idea, right?

A Bing insider claims that Microsoft’s search engine has finally caught up to Google, technology-wise at least. Wired Enterprise reports, “Microsoft Says Decaffeinated Bing Tastes as Good as Google.” The Caffeine alluded to in the title refers to Google’s 2010 platform of that name, the purpose of which is to produce fresher search results. Microsoft’s Harry Shum, in charge of Bing’s research and development, says his team’s product is at least as good. Writer Cade Metz reports:

Harry Shum joined the Bing team in 2007, after eleven years with Microsoft’s research arm. The task at hand was enormous: catch up to Google. Five years on, Google is still the world’s dominant search engine — some estimates put its market share as high as 85 or 90 percent — but Shum believes that Bing has finally reached a point where it can compete with Google on a technical level.

The difference between Caffeine and its predecessor MapRequest are significant: the new platform allows sections of the search index to be updated continuously, rather than indexing the entire thing is huge batches. Shum hints that the current Bing approach is similar, but is guarded with the details. That’s understandable.

If Shum is right, this is a surprising development. We still believe that Blekko and Yandex are better than other Web search systems, however. Google has bet on social. We think Google should have put more money on search. Hedging bets is often a good idea.

Cynthia Murrell, March 15, 2012

Sponsored by Pandia.com

PDF Search from Dieselpoint

March 14, 2012

We heard Dieselpoint offers a PDF search engine, so we decided to check it out. This company keeps a very low profile, but we find it is worth looking into.

Dieselpoint’s PDF Search is an enterprise product that can navigate large collections of PDFs, extracting both metadata and text for indexing. Metadata can be searched and used to build more sophisticated interfaces in conjunction with Dieselpoint’s Search platform.

Often, titles are left out of a document’s metadata, making searches more challenging; Dieselpoint has an innovative solution for that. The product overview states:

Quite often, authors of PDFs neglect to enter titles into the document’s metadata. This makes it difficult to display a good, descriptive title when a PDF appears on a search results page. Dieselpoint Search eliminates this problem by providing ‘Smart Titles’. The system analyzes each PDF looking for clues as what the title might be, and employs advanced heuristics to select one. Studies show that Dieselpoint’s algorithm selects a title which is the same as the one that a human would have selected over 90% of the time.

This tool also takes advantage of XMP data, which resides in an XML file embedded within a PDF file. This data can contain information on subjects such as authors, digital rights, categories, and keywords.

Dieselpoint began developing the core indexing algorithms behind its search engine in 1999, and released version 1.0 the next year. Originally meant for use with engineered industrial goods, the product (and company) name reflects these origins.

Cynthia Murrell, March 14, 2012

Sponsored by Pandia.com

Reference Resource for Big Data Vendors

March 13, 2012

SoftArtisians and Riparian Data have been reporting on a series that examines some of the key players in Boston’s emerging big data scene.

The recent article, “Boston’s Big Datascape, Part 2: Nasuni, VoltDB, Lexalytics, Totutek, Cloudant”  is the second in the series and examines five companies who may differ in their growth stages and approach but are similar in their ideology that “big data is the castle, and their tools [are] the keys.”

The article breaks each company down by product, founder, technologies used, target industries, and location.

Tokutek’s mission is to transform the way data is stored and retrieved and deliver a quantum leap in the performance of databases and file systems. The company breakdown was:

“Product: TokuDB brings massive data processing and analysis capabilities to heretofore neglected MySQL. It’s a drop-in replacement for InnoDV that extends the capacity of MySQL databases from GBs to TBs.

Founders: Michael A. Bender, Martín Farach-Colton (ln), Bradley C. Kuszmaul

Technologies used: MySQL, MVCC, ACID, Fractal Tree™ indexing

Target industries: Online Advertising, eCommerce, Social Networking, Mobile Solutions, ePublishing.”

We’re interested to see how this series develops and the innovative new companies that come about from it.

Jasmine Ashton, March 13, 2012

Sponsored by Pandia.com

Open Source Projects at GitHub

March 5, 2012

We’ve run across a couple of interesting open source components on the collaboration site GitHub. Founded in 2008, the site hosts over two million code repositories and provides tools with which subscribers can manage their projects. The two we’d like to highlight are the Nutch-Elasticsearch-Indexer and MongoDB.

The Nutch-Elasticsearch-Indexer allows for the indexing of crawl data from the Nutch search system into the ElasticSearch system.  The project’s Readme explains:

“This is similar in nature to that of the SolrIndexer that comes with Nutch which let you index directly into Solr.  This provides a way directly index data into elasticsearch coming directly from Nutch.

This is just the code necessary to create the solution.  You must start by having the Nutch codebase and have it setup in your development environment (Eclipse) see http://wiki.apache.org/nutch/RunNutchInEclipse for how do this.  Once you are set up and is working well.  You are ready to get started.  The following files below are necessary to integrate into the Nutch base and then re-build Nutch.”

Participating developers must have access to the Nutch source and an ElasticSearch environment. See the GitHub project page for further details.

The page on MongoDB is the other project here that sparked our interest. It includes developer details such as related utilities; where to do for info on building a database; how to run Mongo; client drivers; documentation; build notes; and licensing information. We are pleased to see that this page sees a fair amount of activity.

Maybe the search mantra should be, “Go, open source”?

Cynthia Murrell, March 5, 2012

Sponsored by Pandia.com

Ontoprise GmbH: Multiple Issues Says Wikipedia

March 3, 2012

Now Wikipedia is a go-to resource for Google. I heard from one of my colleagues that Wikipedia turns up as the top hit on a surprising number of queries. I don’t trust Wikipedia, but I don’t trust any encyclopedia produced by volunteers including volunteers. Volunteers often participate in a spoofing fiesta.

seo danger transparent

Note: I will be using this symbol when I write about subjects which trigger associations in my mind about use of words, bound phrases, and links to affect how results may be returned from Exalead.com, Jike.com, and Yandex.ru, among other modern Web indexing services either supported by government entities or commercial organizations.

I was updating my list of Overflight companies. We have added five companies to a new Overflight service called, quite imaginatively, Taxonomy Overflight. We have added five firms and are going through the process of figuring out if the outfits are in business or putting on a vaudeville act for paying customers.

The first five companies are:

  1. Millenium
  2. Mondeca
  3. Nuance
  4. Synaptica
  5. Visual Mining
  6. Wand

We will be adding to the Taxonomy Overflight another group of companies on March 4, 2012. I have not yet decided how to “score” each vendor. For enterprise search Overflight, I use a goose method. Click here for an example: Overflight about Autonomy. Three ducks. Darned good.

I wanted to mention one quite interesting finding. We came across a company doing business as Ontoprise. The firm’s Web site is www.ontoprise.de. We are checking to see which companies have legitimate Web sites, no matter how sparse.

We noted that the Wikipedia entry for Ontoprise carried this somewhat interesting “warning”:

image

The gist of this warning is to give me a sense of caution, if not wariness, with regard to this company which offers products which delivered “ontologies.” The company’s research is called “Ontorule”, which has a faintly ominous sound to me. If I look at the naming of products from such firms as Convera before it experienced financial stress, Convera’s product naming was like science fiction but less dogmatic than Ontoprise’s language choice. So I cannot correlate Convera and Ontoprise on other than my personal “semantic”baloney detector. But Convera went south in a rather unexpected business action.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta