Quixey Snags $20 Million in B Round Funding
June 19, 2012
Quixey, who says their cross-platform app search engine is one-of-a-kind, has just raised $20 million in Series B funding bringing total capital raised to $24.2 million, according to TNW Insider’s “Smart App Search Engine Quixey Raises $20m from Eric Schmidt’s VC Firm and Others.” Google Executive Chairman Schmidt’s investment firm Innovation Endeavors was joined by Chinese firm WI Harper Group; US Venture Partners; Atlantic Bridge; SK Planet; and TransLink Capital in supporting the young company.
Quixey has found a potentially profitable search niche—they address the problem of finding, out of millions of existing apps across numerous platforms, the app you need when you need it. Writer Robin Wauters describes the company:
“Quixey says it has invented a ‘new type of search’ that allows users to find mobile, desktop and Web apps ‘that do what they want’ based on natural language-based technology. The Palo Alto, California-based company teams up with phone makers, carriers, browser and online search companies to power app search for them, and encourages app publishers and developers to ‘claim’ their apps.”
Wauters points out that several companies do offer similar services: AppsFire, Apple’s acquisition Chomp, Mimvi and Appolicious, to name a few. Does Quixey offer something special?
Founded in 2009 specifically to fill this niche, Quixey has trademarked the term “Functional Search” to describe their app-finding engine. The company is located in Palo Alto, CA.
Cynthia Murrell, June 19, 2012
Sponsored by PolySpot
Connecting CAD and CAE with Data Management
June 19, 2012
One of the biggest challenges facing designers and analysts across industries is the lack of fluidity between CAD, CAE and other programs. The difficulties experienced related to this disconnect causes roadblocks that are both costly and time consuming according to the article, “Fixing the CAD/CAE Disconnect”, on Design World.
The article explains the problem in detail:
“The disconnect has to do with the data flows between design and analysis tools. CAD and CAE models are distinct and different things. Only in the simplest of cases can a CAE model be derived automatically, or with minimal work, from a CAD model. In most cases, you start with the product definition geometry, massage it to create the geometry for building analytical models, then tune and optimize that data for analysis…Too often, even minor design changes can break the linkages used to synchronize analysis models to design geometry.”
Although there isn’t a perfect answer for this problem of communicating through data between departments but the article suggests that if an answer is to be found it will be through product lifecycle management (PLM). PLM providers like Siemens PLM and Inforbix specialize in helping companies manage data including CAD files. The solution for many disconnect problems, as the article defines it, lay in finding, reusing and sharing data efficiently.
Catherine Lamsfuss, June 19, 2012
Google Partner Interprets Knowledge Graph
June 19, 2012
The Big G is about to reaffirm its position as the most popular search engine according to Google Knowledge Graph – What it Really Does, by AppsCare. Google’s partners rave about the Knowledge Graph’s functionality when searching.
This might be the largest thinking virtual library ever created as the:
“Knowledge graph is the result of a 15-year revelation by John Giannandrea, who imagined a virtual catalogue of ‘everything in the world’. “Trying to understand the entire world’s information, catalogue all the human knowledge, the challenge was making it rich and intelligent. By tying our concept into [Google] search, we did it. The Knowledge Graph uses approximately 3.5 billion different attributed to organize results. Other websites will have to move further up the value chain in order to survive.”
The Graph is only available in the US, but some interesting features are:
- Discover live or cultural events
- Find recommendations for music, books, TV and movies
- Locate the closest service provider
- Know the must-see attractions when travelling
- It is predicted that Google will gain a monopolistic position in online search as the company attempt to recapture the intellectual high ground in an area where it remains strong.
If Knowledge Graph truly understands the relationships between real-world objects and people, utilizing that technology may be the key to keeping Google on top. This was an interesting partner’s view of Google’s Knowledge Graph and the possibilities it holds.
Jennifer Shockley, June 19, 2012
To Turn Back the Tweets of Time
June 19, 2012
You just can’t turn back the tweets of time, at least not too far back. They’ve been going for the top spot in spontaneous micro blogging, but to the dismay of its 140 million users Twitter has issues. For a social site that was designed for speed updates, it works. Your nightmare begins if you dare try to look into the past beyond a few days according; Topsy knows what you did on Twitter last year.
Ironically, Topsy Labs has a database of tweets including links going back to 2008, however Twitter does not provide this service and:
“Why doesn’t Twitter, which has all this data in the first place, already offer its own archive search? The San Francisco-based company’s answer each time people have asked, including my most recent query on Wednesday, has been “we’re working on it.”
“I would suggest that before it embarks on yet another site redesign, it tackle this issue. We’re not far from it being impossible to write a memoir or biography of the average public figure without looking over what they tweeted.”
Twitter currently has over 400 million tweets hitting the internet daily. They are donating their database to the Library of Congress in the name of scientific research. Perhaps they can work out the problems that Twitter’s countless updates could not. For now there will be no turning back the tweets of time on Twitter itself, but Topsy has them.
Jennifer Shockley, June 19, 2012
Sink or Swim, Indies Plunging Into Open Source
June 19, 2012
Some independent software developers are taking the big plunge into open source according to the article, “Independent Software Developers Go Full Time with Open Source.” It’s time to sink or swim, but if they succeed, it may start a new trend in the indie scene.
Lunduke, a Washington state based developer is getting $4000 a month in gaming subscriptions for his site. He feels those subscriptions along with community contributions should provide him with enough income to continue developing freelance. He plans on open sourcing all of the applications and games on his site under the GPL.
Patrick Verner in Wisconsin developed Parted Magic as an open source, multi-platform partitioning tool. In contrast to Lundukes financial needs, Verner requested just $1,200 a month to pay bills and cover life expenses.
“While many software developers get paid to work on open source by their employer or volunteer their time for free to various FOSS projects, some end up deciding to quit their day jobs to work on free and open source software (FOSS) full time. That’s exactly what Bryan Lunduke and Patrick Verner are now doing and they’re both hoping that the respective communities for their projects will help to support their efforts financially. “
Despite added expenses and challenges, these entrepreneurs are going for their dreams. This is a big step forward for young developers. These guys diving right in shows the world that it is possible for a close source indie developer to go open source successfully with help and communal support. Move over big IT, the Indies are plunging straight into open source, no life jackets.
Jennifer Shockley, June 19, 2012
Deduplication: Flawed Method or Just More Woes for HP?
June 19, 2012
Is HP duping consumers or is SEPATON waging a product war? Its sales versus sales in the article SEPATON: HP Offers ‘Least Capable’ Dedupe in the Industry, with SEPATON full of fire and not pulling any punches. HP did a little dodge and duck, but mainly stayed straight forward.
Linda Mentzer and Peter Quirk from SEPATON stated:
“The HP B6200 offers the least capable de-dupe in the industry. Each tape device on a B6200 is effectively a distinct de-dupe domain. Backups sent to a one drive don’t de-dupe against backups sent to another drive on the same node! What happens when you’re back up exceeds the capability of one drive?”
Beauty is in the eyes of the beholder and apparently HP finds the B6200 much more appealing than its own creator. Sean Kenney from HP Storage challenged SEPATON’s comments, stating:
“It is important to note that the B6200 is a single logical system. Any emulated tape drive within a VTL completely de-duplicates with all other tape drives in that VTL. The B6200 is a terrific solution for database and multiplexed workloads.”
Ah, hardware performance and software issues it seems. HP doesn’t claim the system to be perfect, but instead presents realistic options to prevent issues. Though SEPATON argues that B6200 lacks functionality, they also make it a point to promote their other products. Could this be a coincidence or just more woes for HP?
Jennifer Shockley, June 19, 2012
The Alleged Received Wisdom about Predictive Coding
June 19, 2012
Let’s start off with a recommendation. Snag a copy of the Wall Street Journal and read the hard copy front page story in the Marketplace section, “Computers Carry Water of Pretrial Legal Work.” In theory, you can read the story online if you don’t have Sections A-1, A-10 of the June 18, 2012, newspaper. Check out a variant of the story appears as “Why Hire a Lawyer? Computers Are Cheaper.”
Now let me offer a possibly shocking observation: The costs of litigation are not going down for certain legal matters. Neither bargain basement human attorneys nor Fancy Dan content processing systems make the legal bills smaller. Your mileage may vary, but for those snared in some legal traffic jams, costs are tough to control. In fact, search and content processing can impact costs, just not in the way some of the licensees of next generation systems expect. That is one of the mysteries of online that few can penetrate.
The main idea of the Wall Street Journal story is that “predictive coding” can do work that human lawyers do for a higher cost but sometimes with much less precision. That’s the hint about costs in my opinion. But the article is traditional journalistic gold. Coming from the Murdoch organization, what did I expect? i2 Group has been chugging along with relationship maps for case analyses of important matters since 1990. Big alert: i2 Ltd. was a client of mine. Let’s see that was more than a couple of weeks ago that basic discovery functions were available.
The write up quotes published analyses which indicate that when humans review documents, those humans get tired and do a lousy job. The article cites “experts” who from Thomson Reuters, a firm steeped in legal and digital expertise, who point out that predictive coding is going to be an even bigger business. Here’s the passage I underlined: “Greg McPolin, an executive at the legal outsourcing firm Pangea3 which is owned by Thomson Reuters Corp., says about one third of the company’s clients are considering using predictive coding in their matters.” This factoid is likely to spawn a swarm of azure chip consultants who will explain how big the market for predictive coding will be. Good news for the firms engaged in this content processing activity.
What goes faster? The costs of a legal matter or the costs of a legal matter that requires automation and trained attorneys? Why do companies embrace automation plus human attorneys? Risk certainly is a turbo charger?
The article also explains how predictive coding works, offers some cost estimates for various actions related to a document, and adds some cautionary points about predictive coding proving itself in court. In short, we have a touchstone document about this niche in search and content processing.
My thoughts about predictive coding are related to the broader trends in the use of systems and methods to figure out what is in a corpus and what a document is about.
First, the driver for most content processing is related to two quite human needs. First, the costs of coping with large volumes of information is high and going up fast. Second, the need to reduce risk. Most professionals find quips about orange jump suits, sharing a cell with Mr. Madoff, and the iconic “perp walk” downright depressing. When a legal matter surfaces, the need to know what’s in a collection of content like corporate email is high. The need for speed is driven by executive urgency. The cost factor clicks in when the chief financial officer has to figure out the costs of determining what’s in those documents. Predictive coding to the rescue. One firm used the phrase “rocket docket” to communicate speed. Other firms promise optimized statistical routines. The big idea is that automation is fast and cheaper than having lots of attorneys sifting through documents in printed or digital form. The Wall Street Journal is right. Automated content processing is going to be a big business. I just hit the two key drivers. Why dance around what is fueling this sector?
Sophisticated Online Searchers? Nope. Fewer.
June 18, 2012
TechCrunch published “Hitwise: Google US Search Share Down 5% In The Last Year; Bing, Yahoo Gained.” Check it out. We are less interested in the Google market share than the average goose. However, within the article was a table with some hefty data freight.
Here at the goose pond we hear from many folks, “I am a really good online researchers.” We find this amusing because about two thirds of the ArnoldIT team have degrees in library and information science. We have a handful of people with excellent research skills honed after years of wandering through the stacks of the Harrod’s Creek library with its collection of 37 volumes.
Here’s the table with a happy quack to TechCrunch and the ever reliable Experian Hitwise:
A happy quack to the ever reliable Experian Hitwise.
The key datum is the percentage of alleged Web search users who bag in a single term and take what the objective, relevance centric Web search vendors shovel out: 29.93 percent use single term queries. Call it 30 percent. How many of these expert searchers can identify disinformation? How many know how to determine the provenance of the source? How many spend time to double check the “facts” like the ones in this ever reliable Experian Hitwise table?
Another interesting point is that about 15 percent of the May 2012 users employ five or more terms. I am somewhat encouraged but that percentage is decreasing from the May 2011 figures. Bummer.
My view:
- As education erodes, the ability to figure out or even know how to sort out the goose feathers from the giblets will not be a growing asset.
- Based on my limited and skewed sample, MBAs and their ilk have little appetite to dig for information and check facts. The talk about data outweighs the actual value of meaningful factoids. I wish I could get paid by the fluff, an official unit of baloney.
- Has anyone thought about the political power of a Web search system which filters, shapes, and outputs information to one third of its customers? I do which is one reason why I am delighted to be an old, addled goose.
Yep, great data. I wonder if they are accurate. Hope not.
Stephen E Arnold, June 18, 2012
Sponsored by Ikanow
More Predictive Silliness: Coding, Decisioning, Baloneying
June 18, 2012
It must be the summer vacation warm and fuzzies. I received another wild analytics news release today. This one comes from 5WPR, “a top 25 PR agency.” Wow. I learned from the spam: PeekAnalytics “delivers enterprise class Twitter analytics and help marketers understand their social consumers.”
What?
Then I read:
By identifying where Twitter users exist elsewhere on the Web, PeekAnalytics offers unparalleled audience metrics from consumer data aggregated not just from Twitter, but from over sixty social sites and every major blog platform.
The notion of algorithms explaining anything is interesting. But the problem with numerical recipes is that those who use outputs may not know what’s going on under the hood. Wide spread knowledge of the specific algorithms, the thresholds built into the system, and the assumptions underlying the selection of a particular method is in short supply.
Analytics is the realm of the one percent of the population trained to understand the strengths and weaknesses of specific mathematical systems and methods. The 99 percent are destined to accept analytics system outputs without knowing how the data were selected, shaped, formed, and presented given the constraints of the inputs. Who cares? Well, obviously not some marketers of predictive analytics, automated indexing, and some trigger trading systems. Too bad for me. I do care.
When I read about analytics and understanding, I shudder. As an old goose, each body shake costs me some feathers, and I don’t have many more to lose at age 67. The reality of fancy math is that those selling its benefits do not understand its limitations.
Consider the notion of using a group of analytic methods to figure out the meaning of a document. Then consider the numerical recipes required to identify a particular document as important from thousands or millions of other documents.
When companies describe the benefits of a mathematical system, the details are lost in the dust. In fact, bringing up a detail results in a wrinkled brow. Consider the Kolmogorov-Smirnov Test. Has this non parametric test been applied to the analytics system which marketers have presented to you in the last “death by PowerPoint” session? The response from 99.5 percent of the people in the world is, “Kolmo who?” or “Isn’t Smirnov a vodka?” Bzzzz. Wrong.
Mathematical methods which generate probabilities are essential to many business sectors. When one moves fuel rods at a nuclear reactor, the decision about what rod to put where is informed by a range of mathematical methods. Special training experts, often with degrees in nuclear engineering plus post graduate work handle the fuel rod manipulation. Take it from me. Direct observation is not the optimal way to figure out fuel pool rod distribution. Get the math “wrong” and some pretty exciting events transpire. Monte Carlo anyone? John Gray? Julian Steyn? If these names mean nothing to you, you would not want to sign up for work in a nuclear facility.
Why then would a person with zero knowledge of how numerical recipes, oddball outputs from particular types of algorithms, and little or know experience with probability methods use the outputs of a system as “truth.” The outputs of analytical systems require expertise to interpret. Looking at a nifty graphic generated by Spotfire or Palantir is NOT the same as understand what decisions have been made, what limitations exist within the data display, and what are the blind spots generated by the particular method or suite of methods. (Firms which do focus on explaining and delivering systems which make it clear to users about methods, constraints, and considerations include Digital Reasoning, Ikanow, and Content Analyst. Others? You are on your own, folks.)
Today I have yet another conference call with 30 somethings who are into analytics. Analytics is the “next big thing.” Just as people assume coding up a Web site is easy, people assume that mathematical methods are now the mental equivalent of clicking a mouse to get a document. Wrong.
The likelihood of misinterpreting the outputs of modern analytic systems is higher than it was when I entered the workforce after graduate school. These reasons include:
- A rise in the “something for nothing” approach to information. A few clicks, a phone call, and chit chat with colleagues makes many people expert in quite difficult systems and methods. In the mid 1960s, there was limited access to systems which could do clever stuff with tricks from my relative Vladimir Ivanovich Arnold. Today, the majority of the people with whom I interact assume their ability to generate a graph and interpret a scatter diagram equips them as analytic mavens. Math is and will remain hard. Nothing worthwhile comes easy. That truism is not too popular with the 30 somethings who explain the advantages of analytics products they sell.
- Sizzle over content. Most of the wild and crazy decisions I have learned about come from managers who accept analytic system outputs as a page from old Torah scrolls from Yitzchok Riesman’s collection. High ranking government officials want eye candy, so modern analytic systems generate snazzy graphics. Does the government official know what the methods were and the data’s limitations? Nope. Bring this up and the comment is, “Don’t get into the weeds with me, sir.” No problem. I am an old advisor in rural Kentucky.
- Entrepreneurs, failing search system vendors, and open source repackagers are painting the bandwagon and polishing the tubas and trombones. The analytics parade is on. From automated and predictive indexing to surfacing nuggets in social media—the music is loud and getting louder. With so many firms jumping into the bandwagon or joining the parade, the reality of analytics is essentially irrelevant.
The bottom line for me is that the social boom is at or near its crest. Marketers—particularly those in content processing and search—are desperate for a hook which will generate revenues. Analytics seems to be as good as any other idea which is converted by azure chip consultants and carpetbaggers into a “real business.”
The problem is that analytics is math. Math is easy as 1-2-3; math is as complex as MIT’s advanced courses. With each advance in computing power, more fancy math becomes possible. As math advances, the number of folks who can figure out what a method yields decreases. The result is a growing “cloud of unknowing” with regard to analytics. Putting this into a visualization makes clear the challenge.
Stephen E Arnold, June 18, 2012
Inteltrax: Top Stories, June 11 to June 15
June 18, 2012
Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, how governments and the voting public are utilizing big data.
In “Government Leads Way in Big Data Training” we discovered the private sector lagging behind the government in terms of user education.
Our story, “U.S. Agencies Analytics Underused” showed that even though we have all that training, some agencies still need more to fully utilize this digital power.
“Cultural Opinion Predicted by Analytics” used the Eurovision song contest to show us the power of people using analytics and gives the nugget of thought as to how this could be used in government elections.
While sometimes the outcomes contradict one another, there’s no denying that big data analytics is a huge part of governments around the world. Expect these facts to only rise as the popularity catches fire.
Follow the Inteltrax news stream by visiting www.inteltrax.com
Patrick Roland, Editor, Inteltrax.
June 18, 2012