July 11, 2014
Over at OS News, Thom Holwerda disagrees with a recent, positive review on search engine DuckDuckGo in, “Review: DuckDuckGo Compared to Google, Bing, Yandex.” A user going by “sb56637” at LibreTechTips.com had found that:
“In many respects the tiny DuckDuckGo holds its own against the giant that is Google, and even more so if the user is willing to slightly manipulate the search query to work around DuckDuckGo’s temperamental intelligence layer. So it is heartening to see that DuckDuckGo is a viable alternative to Google by its own merits.”
As our readers may know, usage of DuckDuckGo has grown heartily as people have become more interested in not being tracked. That’s why sb56637 was so happy to call the site a “top-notch search engine.” Holwerda, however, did not have similar success when he tried to substitute the Duck for Google. He writes:
“I tried the ‘new’ DDG as well since it came out, setting it as my default search engine. Sadly, my experience wasn’t as positive – it simply didn’t find the things I was looking for about 80% of the time. Within a few days, I got into the habit of simply adding !g to every search query to go straight to Google anyway since that gave me the results I was looking for.”
Perhaps that is because DuckDuckGo is a metasearch engine, while the rest are not. (Metasearch engines mix results from several search engines.) Recall that reviewer sb56637 noted that having to adjust to DuckDuckGo’s “temperamental intelligence layer” is kind of a pain. It seems those willing to do some research and make the adjustments, though, can have both (comparatively better) privacy and good results.
Cynthia Murrell, July 11, 2014
July 10, 2014
Editor’s Note: This is information that did not make Stephen E Arnold’s bylined article in Information Today. That forthcoming Information Today story about French search and content processing companies entering the US market. Spoiler alert: The revenue opportunities and taxes appear to be better in the US than in France. Maybe a French company will be the Next Big Thing in search and content processing. Few French companies have gained significant search and retrieval traction in the US in the last few years. Arguably, the most successful firm is the image recognition outfit called A2iA. It seems that French information retrieval companies and the US market have been lengthy, expensive, and difficult. One French company is trying a different approach, and that’s the core of the Information Today story.)
In 1999, I learned about a Swiss enterprise search system. The working name was, according to my Overflight archive, was AMI Albert.The “AMI” did not mean friend. AMI shorthand for Automatic Message Interpreter.
Flash forward to 2014. Note that a Google query for “AMI” may return hits for AMI International a defense oriented company as well as hits to American Megatrends, Advanced Metering Infrastructure, ambient intelligence, the Association Montessori International, and dozens of other organizations sharing the acronym. In an age of Google, finding a specific company can be a challenge and may inhibit some potential customers ability to locate a specific vendor. (This is a problem shared by Thunderstone, for example. The game company makes it tough to locate information about the search appliance vendor.)
Basic search interface as of 2011.
Every time I update my files, I struggle to get specific information. Invariably I get an email from an AMI Software sales person telling me, “Yes, we are growing. We are very much a dynamic force in market intelligence.”
The UK Web site for the firm is www.amisw.co.uk. The French language Web site for the company is http://www.amisw.com/fr/. And the English language version of the French Web site is at http://www.amisw.com/fr/. The company’s blog is at http://www.amisw.com/fr/blog/, but the content is stale. The most recent update as of July 7, 2014, is from December 2013. The company seems to have shifted its dissemination of news to LinkedIn, where more than 30 AMI employees have a LinkedIn presence. The blog is in French. The LinkedIn postings are in English. Most of the AMI videos are in French as well.
Advanced Search Interface as of 2011.
The Managing Director, according to www.amisw.com/fr, is Alain Beauvieux. The person in charge of products is Eric Fourboul. The UK sales manager is Mike Alderton.
Mr. Beauvieux is a former IBMer and worked at LexiQuest, which originally formerly Erli, S.A. LexiQuest (Clementine) was acquired by SPSS. SPSS was, in turn, acquired by IBM, joining other long-in-the-tooth technologies marketed today by IBM. Eric
Fourboul is a former Dassault professional, and he has some Microsoft DNA in his background.
July 7, 2014
I don’t have much information about the “right to be forgotten” process at the GOOG. I have been watching the streams my Overflight system tracks. I did find one Web page that I found interesting. Navigate to Forgotten Results.
You can explore the links and the source for each entry. I clicked on a few and found the information suggestive, not definitive. I did a couple of quick checks and the content for which I looked was available via other indexes or from other Google domains when I used a Web proxy.
For most users, information not in the Google index does not exist. The approach is, I think, “Hey, Google indexes all the world’s information, right?”
You can ponder the value of being able to delete certain information from online indexes used to satisfy a Web query. My hunch is that some outfits who continue to grouse about Google (maybe, Foundem), certain types of content (information not deemed to be high priority), and other digital information can be deleted. Most folks won’t know the difference.
Keep in mind that among the people who are online searchers, almost everyone is an expert in their own mind. There are professionals like Marydee Ojala, Barbara Quint, Anne Mintz, and Ruth Pagell who are significantly more “expert” than the over confident MBAs, mobile phone search wizards, search engine optimization gurus, and the majority of short cut focused college students chasing a library or information science degree.
What’s important to me is that it is now possible to be confident that locating information on mind becomes much harder. Multiple queries and different search systems must be used. Will Bing maps show you the location of a certain facility in Scotland? Why are some government servers not in the USA.gov service? Why is Yahoo’s presentation of the “news” focused squarely on the inconsequential and stale?
The question about Google is a pretty good one. In our tests, identical queries across different search systems generate anywhere from 60 to 75 percent overlap. Flip this around and you will have to work really hard to find the other 25 to 40 percent.
Research is hard work. The right to be forgotten just ups the ante for specialists in open source online research. I suppose that’s one reason my intel conference briefings on alternatives to Google.com search continues to pack ‘em in.
Stephen E Arnold, July 7, 2014
July 5, 2014
In my lectures for members of the intelligence community, I talk about how to move “Beyond Google.” I rely on several online search services that are not embraced by the unwashed millions who perceive Google as the Alpha and the Omega of search and retrieval. Google is not objective. The more quickly online users accept the pervasiveness of subjectivity in search results, the more likely a mobile user will be able to locate the Cuba Libre Restaurant in Washington, DC, near the Google offices and pin down the whereabouts of a person like eBay’s chief technical officer. The MillionShort.com system allows me to jump over the irrelevant baloney generated by heavily SEO’ed sites. Man, I hate SEO. Does Matt Cutts’ leave of absence suggest that he too cannot cope with the rigors of eroding objective, relevant results to a Google query?
I noticed a few days ago that the MillionShort.com search system was returning no results, a blank screen, or a message saying that the service was down. I was worried. MillionShort uses a combination of Bing application programming interface calls, some proprietary scripts, and its own index to chop out the Web sites that I love to hate. The name “million short” means that I can NOT out the offensive entertainment sites that pump Justin Bieber information to me. The wildly distorted search engine optimized sites that display useless, off point content, and sites that I really want not to see ever again. Do you too feel this way about www.about.com or www.wikipedia.com or Google.com?
Here’s what MillionShort.com let’s me do. I can run a query and narrow the results with a single click to sites that are not in the Top 1000 most popular Web sites. Try the service and run a query. Instead of showing me the drivel that passes for news from Yahoo.com or CNN.com, I can pinpoint gems like YouTube videos that provide specific information about certain illicit activities, identify blogs that contain information about moderators (if you don’t know what this is, then you won’t appreciate the value of the links), and similar topics that often cannot be found using Blekko, Exalead search, Google, or Yandex in .com and .ru flavors.
MillionShort.com is operated by an entrepreneur whom I am chasing for more details about the system. If I uncover something useful via MillionShort or one of the other “off the radar” services I profile in my intel lectures, I may share some information nuggets in this blog. In the meantime, check out the service. If you get a “not available” message, check back every hour or so. The service comes back up, which is a very good thing for intrepid researchers. For those who want their pizza from the microwave, MillionShort.com may not fit your info life style. Your loss, I fear.
Stephen E Arnold, July 5, 2014
July 5, 2014
I read an article with what I think is the original title: “What does the Facebook Experiment Teach us? Growing Anxiety About Data Manipulation.” I noted that the title presented on Techmeme was “We Need to Hold All Companies Accountable, Not Just Facebook, for How They Manipulate People.” In my view, this mismatch of titles is a great illustration of information manipulation. I doubt that the writer of the improved headline is aware of the irony.
The ubiquity of information manipulation is far broader than Facebook twirling the dials of its often breathless users. Navigate to Google and run this query:
cloud word processing
Note anything interesting in the results list displayed for me on my desktop computer:
The number one ad is for Google. In the first page of results, Google’s cloud word processing system is listed three more times. I did not spot Microsoft Office in the cloud except in item eight: Is Google Docs Making Microsoft Word Redundant.
For most Google search users, the results are objective. No distortion evident.
Here’s what Yandex displays for the same query:
No Google word processing and no Microsoft word processing whether in the cloud or elsewhere.
When it comes to searching for information, the notion that a Web indexing outfit is displaying objective results is silly. The Web indexing companies are in the forefront of distorting information and manipulating users.
Flash back to the first year of the Bush administration when Richard Cheney was vice president. I was in a meeting where the request was considered to make sure that the vice president’s office Web site would appear in FirstGov.gov hits in a prominent position. This, gentle reader, is a request that calls for hit boosting. The idea is to write a script or configure the indexing plumbing to make darned sure a specific url or series of documents appears when and where they are required. No problem, of course. We created a stored query for the Fast Search & Transfer search system and delivered what the vice president wanted.
This type of results manipulation is more common than most people accept. Fiddling Web search, like shaping the flow of content on a particular semantic vector, is trivial. Search engine optimization is a fools’ game compared with the tried and true methods of weighting or just buying real estate on a search results page, a Web site from a “real” company.
The notion that disinformation, reformation, and misinformation will be identifiable, rectified, and used to hold companies accountable is not just impossible. The notion itself reveals how little awareness of the actual methods of digital content injection work.
How much of the content on Facebook, Twitter, and other widely used social networks is generated by intelligence professionals, public relations “professionals,” and folks who want to be perceived as intellectual luminaries? Whatever your answer, what data do you have to back up your number? At a recent intelligence conference in Dubai, one specialist estimated that half of the traffic on social networks is shaped or generated by law enforcement and intelligence entities. Do you believe that? Probably not. So good for you.
Amusing, but as someone once told me, “Ignorance is bliss.” So, hello, happy idealists. The job is identifying, interpreting, and filtering. Tough, time consuming work. Most of the experts prefer to follow the path of least resistance and express shock that Facebook would toy with its users. Be outraged. Call for action. Invent an algorithm to detect information manipulation. Let me know how that works out when you look for a restaurant and it is not findable from your mobile device.
Stephen E Arnold, July 5, 2014
July 4, 2014
One of the two or three readers of this blog reported a new and revolutionary search and Big Data vendor called ThoughtSpot. I navigated to the site and enjoyed to wolf / dog. The headline is:
Your business is fast and data hungry.
I really liked the wolf / dog. I found the various links kept pointing to the wolf / dog. I am no longer fast or data hungry. I am outta here. Maybe a reader will let me know when the Web site is working again. the company has captured $30 million in funding according to Venture Beat. I assume the Web site will be fattened in the days ahead. This should be easy. According to Google Maps, ThoughtSpot is very near the In and Out Burger in Redwood City. Presumably the Google-like search for Big Data will be the next double double cheeseburger. My dogs like In and Out Burgers. Neither is fast nor data hungry.
Stephen E Arnold, July 4, 2014
July 3, 2014
Search engines are seeing a drop in ad revenue, because instead of opening Web browser and hitting a search engine to find information, users are turning to search apps instead. TechCrunch states that in the article: “We Search More On Apps, Less On Google Now.” Google dropped from its 82.8 percent dominance of the search engine ad revenue to a mere 65.7 percent. The US mobile ad market, however, spiked to over $17.73 billion-way more than Google brought in the past two years for search.
Users are sticking to niche apps to find the information they need. It makes sense given that the aggregated results are more in tune to do what we want than having to sift through irrelevant search results. Nielson ran a consumer report that found users are spending 34 hours a month on mobile phones over 27 at their desktop. Their search wants have also shifted:
“According to the eMarketer report, we’re really big on local search. Yelp is leading the pack here in terms of ad-revenue growth. Predictions for the local business search company are at 136 percent, or $119 million in mobile ad revenue this year. While that’s a drop in the bucket compared to the spend for Google, Yahoo or Bing, it’s a telling shift in consumer behavior. Revenues are expected to triple by 2016 for Yelp. Meanwhile, Google revenue is expected to drop to 64.2 percent of the overall market by then.”
Google is not going bankrupt. The company is still making money and is still growing, it is just not dominating the entire search market. Users are getting smarter about the way they search as well as finding different ways to get their information. The old search browser might be a thing of the past soon.
July 2, 2014
I read “The Rise (and Fall?) of NoSQL.” The write up seems to take a stance somewhat different from that adopted by enterprise search vendors. With search getting more difficult to sell for big bucks, findability folks are reinventing themselves as Big Data mavens. Examples range from the Fast Search clones to tagging outfits. (Sorry, no names this morning. Search and content processing vendors with chunks of venture firm cash do not need any more fireworks today.)
Is Big Data the white knight that will allow those venture funded companies to deliver a huge payday? I don’t know, but I keep my nest egg is less risky
Here’s the segment I noted:
It’s quite simple: analytics tooling for NoSQL databases is almost non-existent. Apps stuff a lot of data into these databases, but legacy analytics tooling based on relational technology can’t make any sense of it (because it’s not uniform, tabular data). So what usually happens is that companies extract, transform, normalize, and flatten their NoSQL data into an RDBMS, where they can slice and dice data and build reports. The cost and pain of this process, together with the fact that NoSQL databases aren’t fully self-contained (using them requires using their “competition”” for analytics!) is the biggest threat to the possible dominance of NoSQL databases.
My take on this searchification of Big Data boils down to one word: Scrambling for revenues. Perhaps some of the money pumped into crazy marketing schemes might be directed at creating something that works. Systems that dip into a barrel of trail mix return a snack that cannot replace a square meal.
Stephen E Arnold, July 2, 2014
July 2, 2014
Making money from search and content processing is difficult. One company has made a breakthrough. You can learn how Mark Brandon, one of the founders of QBox, is using the darling of the open source search world to craft a robust findability business.
I interviewed Mr. Brandon, a graduate of the University of Texas as Austin, shortly after my return from a short trip to Europe. Compared with the state of European search businesses, Elasticsearch and QBox are on to what diamond miners call a “pipe.”
In the interview, which is part of the Search Wizards Speak series, Mr. Brandon said:
We offer solutions that work and deliver the benefits of open source technology in a cost-effective way. Customers are looking for search solutions that actually work.
Simple enough, but I have ample evidence that dozens and dozens of search and content processing vendors are unable to generate sufficient revenue to stay in business. Many well known firms would go belly up without continual infusions of cash from addled folks with little knowledge of search’s history and a severe case of spreadsheet fever.
Qbox’s approach pivots on Elasticsearch. Mr. Brandon said:
When our previous search product proved to be too cumbersome, we looked for an alternative to our initial system. We tested Elasticsearch and built a cluster of Elasticsearch servers. We could tell immediately that the Elasticsearch system was fast, stable, and customizable. But we love the technology because of its built-in distributed nature, and we felt like there was room for a hosted provider, just as Cloudant is for CouchDB, Mongolab and MongoHQ are for MongoDB, Redis Labs is for Redis, and so on. Qbox is a strong advocate for Elasticsearch because we can tailor the system to customer requirements, confident the system makes information more findable for users.
When I asked where Mr. Brandon’s vision for functional findablity came from, he told me about an experience he had at Oracle. Oracle owns numerous search systems, ranging from the late 1980s Artificial Linguistics’ system to somewhat newer systems like the late 1990s Endeca system, and the newer technologies from Triple Hop. Combine these with the SES technology and the hybrid InQuira formed from two faltering NLP systems, and Oracle has some hefty investments.
Here’s Mr. Brandon’s moment of insight:
During my first week at Oracle, I asked one of my colleagues if they could share with me the names of the middleware buyer contacts at my 50 or so named accounts. One colleague said, “certainly”, and moments later an Excel spreadsheet popped into my inbox. I was stunned. I asked him if he was aware that “Excel is a Microsoft technology and we are Oracle.” He said, “Yes, of course.” I responded, “Why don’t you just share it with me in the CRM System?” (the CRM was, of course, Siebel, an Oracle product). He chortled and said, “Nobody uses the CRM here.” My head exploded. I gathered my wits to reply back, “Let me get this straight. We make the CRM software and we sell it to others. Are you telling me we don’t use it in-house?” He shot back, “It’s slow and unusable, so nobody uses it.” As it turned out, with around 10 million corporate clients and about 50 million individual names, if I had to filter for “just middleware buyers”, “just at my accounts”, “in the Northeast”, I could literally go get a cup of coffee and come back before the query was finished. If I added a fourth facet, forget it. The CRM system would crash. If it is that bad at the one of the world’s biggest software companies, how bad is it throughout the enterprise?
Stephen E Arnold, July 2, 2014
July 2, 2014
Environment protection organizations are always asking for support and most of the time that translates into money. Paying a few dollars might make you feel good for a short time, but what if you could donate hundreds of dollars instead of your pocket change? How? Simply by clicking your mouse. The Ecosia team have taken the economics of search and applied it to a new search engine type. The Ecosia search engine generates income whenever people use it. Eighty percent of the income from the searches is used to plant trees in Brazil.
Technology and nature are pitted against each other in the collective consciousness, but Ecosia pairs them together. Ecosia recently updated its search experience, according to the blog post “Search And You Shall Receive.” The update includes images, maps, videos, and latest news. Ecosia pulls its results from many places, including Google, so you can still search through Google results and plant a tree at the same time.
There are also other cool updates:
There are other search engines that use a similar model such as GoodSearch.com. Startups with a charitable goal never get enough attention. We encourage you to spread the word about Ecosia and plant a tree.