CyberOSINT banner

The Clever Folks at Yale Remind Us We Are Not Clever

April 1, 2015

Years ago I gave a lecture at Yale University. Very interesting experience. Everyone in the audience knew what was in my monographs about Google. Incredible. I thought I had gathered original information. Well, did I learn how dumb I was. Invigorating.

I read with an eye on the April Fool’s notation on my calendar “Yale Study: You’re Not as Clever As Your Googling Suggests.” I must admit that after I learned I was hopelessly stupid after my lecture, I knew this.

According to the write up:

Yale psychology professor Frank Keil argues that having the internet’s vast resources at your fingertips causes people to confuse their internal knowledge base (what they personally know) with their external knowledge base (knowing where to find the information they need). In short, it acts as a sort of cognitive opiate, convincing people they know more than they do even when the search results come up empty.

Yes. Proof. Not must the attitude of my audience nor their somewhat snarky questions at the meet and mingle, now there is proof.

Isn’t it wonderful to have confirmation that you, like me, are stupider than we knew.

Stephen E Arnold, April 1, 2015

HP: Caveat Venditor Becomes the Company Slogan

March 31, 2015

[I was going to post this on April Fool’s Day. But I thought that some of my very small audience would think I was posting a joke. This is no joke, I fear.]

I am not sure my high school Latin is working, but I think I am close. You know the phrase, Caveat emptor. My view is that Hewlett Packard’s new slogan is, “Seller beware” or caveat venditor in my version of the dead language.

Navigate to “HP Sues Autonomy Co-Founder Lynch in U.K. for $5.1 Billion.” The write up reports:

Hewlett-Packard Co. escalated its more than two-year-old battle with Michael Lynch, suing the Autonomy Corp. co-founder, as well as a former chief financial officer, for $5.1 billion. Hewlett-Packard has maintained that before it agreed to buy the Cambridge, England-based software company for $10 billion in 2011, Lynch and other managers gave an overly optimistic representation of its financial health.

There you go. Let me get this straight. HP decided to buy something. That something triggered much work by HP executives and its consultants. The something became Autonomy. More analyses and conversations ensued.

HP believes that the sellers (Dr. Mike Lynch and his senior managers) did the Norman Vincent Peale thing to sway the $100 billion corporation. You remember. The Power of Positive Thinking. I assume Dr. Lynch and his team did the normal sales pitch complete with diagrams, buzzwords, and lots of upbeat comments about the market opportunity, the IDOL and DRE technology, and the future for smart software. Most of the pitches I have heard in my 50 year business career are more marketing than verifiable facts. Buyers want to buy. Sellers want to sell. Sellers usually have a tough time forcing a buyer to buy unless the situation takes place in a Netflix entertainment experience.

false advertising

A happy quack to http://www.owned.com/search/advertising-fail/

The article points out:

The U.K.’s Serious Fraud Office in January dropped its probe into the takeover after finding “insufficient evidence for a realistic prospect of conviction,” the agency said at the time. The U.S. Department of Justice is still investigating, and the SFO said it gave its files to the U.S. authorities. The U.K. accounting regulator, the Financial Reporting Council, is still looking into the matter. The fight has been played out in the open on both sides of the Atlantic, with Lynch posting comments and documents on his blog and Hewlett-Packard aligning with shareholders to pursue Lynch and Hussain in court.

Okay. The SFO seems to okay with the deal. FRC is still analyzing.

The winner is going to be the law firms working on this matter. From my point of view, HP bought Autonomy. Dr. Lynch sold Autonomy. As far as I know, Dr. Lynch did not use direct or implied threats to cause the deal to occur. HP, managed by adults, made a decision.

Now, years and billions later, HP is going to “prove” that a known technology wizard with a strong marketing sense fooled a multi-billion dollar company, its handpicked team of managers and analysts, and legions of brains for hire folks.

I know Dr. Lynch is good. I did not know he was a magician and hypnotist.

Fascinating but HP has to do something in addition to splitting its company in two, ignoring the threat posed by Amazon and its ilk, the absence of management wisdom, and the uncertain market into which HP knowingly jumped.

I wonder if HP will take a look in the mirror and wonder what business message the company is sending. Auto dealers in Palo Alto are probably wondering if they are next to be sued. Every auto salesperson with whom I interacted stresses the positive. I, when the buyer, have to do my homework and understand the facts about a purchase BEFORE signing the deal and forking over hard cash.

Stephen E Arnold, March 31, 2015

HP Vertica and IDOL: Just Three Short Plus Years in the Making

March 31, 2015

I read an article from the outfit that relies on folks like Dave Schubmehl for expertise. The write up is “HP Links Vertica and IDOL Seeking Better Unstructured Data Analysis.” But I quite like the subtitle because it provides a timeline; to wit:

The company built a connector server for the products, which it acquired separately in 2011.

Let’s see that is just about three years plus a few months. The story reminded me of Rip Van Winkle who woke to a different world when he emerged from his slumber. The Sleepy Hollow could be a large technology company in the act of performing mitosis in order to generate [a] excitement, [b] money, and [c] the appearance of progress. I wonder if the digital Sleepy Hollow is located near Hanover Street? I will have to investigate that parallel.

What’s a few years of intellectual effort in a research “cave” when you are integrating software that is expected to generate billions of dollars in sales. Existing Vertica and Autonomy licensees are probably dancing in the streets.

The write up states:

Promising more thorough and timelier data analysis, Hewlett-Packard has released a software package that combines the company’s Vertica database with its IDOL data analysis platform. The HP Haven Connector Framework Server may allow organizations to study data sets that were too large or unwieldy to analyze before. The package provides “a mixture of statistical and contextual understanding,” of data, said Jeff Veis, HP vice president of marketing for big data. “You can pull in any form of data, and then do real-time high performance analysis.”

Hmm. “Promising” and “may allow” are interesting words and phrases. It seems as if the employer of Mr. Schubmehl is hedging on the HP assertions. I wonder, “Why?”

Read more

Remotely Search Your Files

March 28, 2015

While it is a pain having to switch between apps to complete tasks, it is an even bigger pain trying to securely search your laptop or desktop computer for files using your mobile device. Sure, there are cloud storage services and the ability to log into your computer via remote Web apps. The problem still remains that you have to log on and connect with your computer. X1 Mobile Search takes off that problem and TechWorld has an oldie, but a good review on the app: “X1 Mobile Search Review.”

For a mere fifteen dollars, you download the X1 Mobile Search app on your computer and mobile device and then you can not only search for your files, but also edit them from within the app. It sounds too good to be true, but the X1 works. The application must be downloaded on both devices and connected to the Internet.

TechWorld says the mobile device is a worthy investment:

“Unlike some other programs that allow you to share files between mobile devices and PC and Macs, this one is designed for searching the whole computer, rather than just sharing specific files or pieces of information. You’ll find it a great complement to other programs such as Evernote and SugarSync.”

Give it a whirl.

Whitney Grace, March 28, 2015
Sponsored by ArnoldIT.com, publisher of CyberOSINT

Watson Goes Blekko

March 28, 2015

I read “Goodbye Blekko: Search Engine Joins IBM’s Watson Team.” According to the write up, “Blekko’s home page says its team and technology are now part of IBM’s Watson technology.” I would not know this. I do not use the service. I wrestled with the implementation of Blekko on a news service and then wondered if Yandex was serious about the company. Bottom line: Blekko is not one of my go to search systems, and I don’t cover it in my Alternatives to Google lectures for law enforcement and intelligence professionals.

The write up asserts:

Blekko came out of stealth in 2008 with Skrenta promising to create a search engine with “algorithmic editorial differentiation” compared to Google. Its public search engine finally opened in 2010, launching with what the site called “slashtags” — a personalization and filtering tool that gave users control over the sites they saw in Blekko’s search results.

Another search system becomes part of the puzzling Watson service. How many information access systems does IBM require to make Watson the billion dollar revenue generator or at least robust enough to pay the rent for the Union Square offices?

IBM “owns” the Clementine system which arrived with the SPSS purchase. IBM owns Vivisimo, which morphed into a Big Data system in the acquisition news release, iPhrase, and the wonky search functions in DB2. Somewhere along the line, IBM snagged the Illustra system. From its own labs, IBM has Web Fountain. There is the decades old STAIRS system which may still be available as Service Master. And, of course, there is the Lucene system which provides the dray animals for Watson. Whew. That is a wealth of information access technology, and I am not sure it is comprehensive.

My point is that Blekko and its razzle dazzle assertions now have to provide something that delivers a payoff for IBM. On the other hand, maybe IBM Watson executives are buying technology in the hopes that one of the people “aquihired” or the newly bought zeros and ones will generate massive cash flows.

Watson has morphed from a question answering game show winner into all manner of fantastic information processing capabilities. For me, Watson is an example of what happens when a lack of focus blends with money, executive compensation schemes, and a struggling $100 billion outfit.

Lots of smoke. Not much revenue fire. Stakeholders hope it will change. I am looking forward to a semantically enriched recipe for barbeque sauce that includes tamarind and other spices not available in Harrod’s Creek, Kentucky. Yummy. A tasty addition to the quarterly review menu: Blekko with revenue and a piquant profit sauce.

Perhaps IBM next will acquire Pertimm and the Qwant search system which terrrifes Eric Schmidt? Surprises ahead. I prefer profitable, sustainable revenues however.

Stephen E Arnold, March 28, 2015

Semantic Search Becomes Search Engine Optimization: That Is Going to Improve Relevance

March 27, 2015

I read “The Rapid Evolution of Semantic Search.” It must be my age or the fact that it is cold in Harrod’s Creek, Kentucky, this morning. The write up purports to deliver “an overview of the history of semantic search and what this means for marketers moving forward.” I like that moving forward stuff. It reminds me of Project Runway’s “fashion forward.”

The write up includes a wonky graphic that equates via an arrow Big Data and metadata, volume, smart content, petabytes, data analysis, vast, structured, and framework. Big Data is a cloud with five little arrows pointing down. Does this mean Big Data is pouring from the sky like yesterday’s chilling rain?

The history of the Semantic Web begins in 1998. Let’s see that is 17 years ago. The milestone is in the context of the article, the report “Semantic Web road Map.” I learned that Google was less than a month old. I thought that Google was Backrub and the work on what was named Google begin a couple, maybe three years, earlier. Who cares?

The Big Idea is that the Web is an information space. That sounds good.

Well in 2012, something Big happened. According to the write up Google figured out that 20 percent of its searches were “new.” Aren’t those pesky humans annoying. The article reports:

long tail keywords made up approximately 70 percent of all searches. What this told Google was that users were becoming interested in using their search engine as a tool for answering questions and solving problems, not just looking up facts and finding individual websites. Instead of typing “Los Angeles weather,” people started searching “Los Angeles hourly weather for March 1.” While that’s an extremely simplified explanation, the fact is that Google, Bing, Facebook, and other internet leaders have been working on what Colin Jeavons calls “the silent semantic revolution” for years now. Bing launched Satori, a knowledge storehouse that’s capable of understanding complex relationships between people, things, and entities. Facebook built Knowledge Graph, which reveals additional information about things you search, based on Google’s complex semantic algorithm called Hummingbird.

Yep, a new age dawned. The message in the article is that marketers have a great new opportunity to push their message in front of users. In my book, this is one reason why running a query on any of the ad supported Web search engines returns so much irrelevant information. In my just submitted Information Today column, I report how a query for the phrase “concept searching” returned results littered with a vendor’s marketing hoo-hah.

I did not want information about a vendor. I wanted information about a concept. But, alas, Google knows what I want. I don’t know what I want in the brave new world of search. The article ignores the lack of relevance in results, the dust binning of precision and recall, and the bogus information many search queries generate. Try to find current information about Dark Web onion sites and let me know how helpful the search systems are. In fact, name the top TOR search engines. See how far you get with Bing, Google, and Yandex. (DuckDuckGo and Ixquick seem to be aware of TOS content by the way.)

So semantic in the context of this article boils down to four points:

  1. Think like an end user. I suppose one should not try to locate an explanation of “concept searching.” I guess Google knows I care about a company with a quite narrow set of technology focused on SharePoint.
  2. Invest in semantic markup. Okay, that will make sense to the content marketers. What if the system used to generate the content does not support the nifty features of the Semantic Web. OWL, who? RDF what?
  3. Do social. Okay, that’s useful. Facebook and Twitter are the go to systems for marketing products I assume. Who on Facebook cares about cyber OSINT or GE’s cratering petrochemical business?
  4. And the keeper, “Don’t forget about standard techniques.” This means search engine optimization. That SEO stuff is designed to make relevance irrelevant. Great idea.

Net net: The write up underscores some of the issues associated with generating buzz for a small business like the ones INC Magazine tries to serve. With write ups like this one about Semantic Search, INC may be confusing their core constituency. Can confused executives close deals and make sense of INC articles? I assume so. I know I cannot.

Stephen E Arnold, March 27, 2015

Need to Remove SharePoint Results?

March 26, 2015

I read “SharePoint 2013 Items Removed with Search Result Removal Return from the Dead!” The article explains how to remove results from a user’s search results. If a user cannot locate specific information, that is a benefit, right? The write up includes links to two Microsoft documents that provide more detail. Are your search results comprehensive? Heh, heh, heh.

Stephen E Arnold, March 26, 2015

FTC and Google: Never Complain, Never Explain Usually

March 26, 2015

I read “FTC Addresses Its Choice Not to Sue Google.” The write up reports that the FTC is explaining its decision not to chase Google around the conference table. Heck, would that tire out the Googlers, making it tough to stay awake in a White House meeting?

According to the write up:

“All five Commissioners (three Democrats and two Republicans) agreed that there was no legal basis for action with respect to the main focus of the investigation — search,” the statement released on Wednesday read. “The Commission’s decision on the search allegations was in accord with the recommendations of the F.T.C.’s Bureau of Competition, Bureau of Economics, and Office of General Counsel.”

I think this means, “No problemo.”

I also found this statement about the FTC’s expertise in information governance interesting:

In the final paragraph of the commissioners’ statement, the agency once more expressed regret at the inadvertent release of its internal document. “We are taking additional steps to ensure that such a disclosure does not occur in the future,” it said.

That’s good. The future. Many search vendors point out that the functions their marketers say are available today really mean in the “future.” Is this a characteristic of our digital era.

Stephen E Arnold, March 26, 2015

Big Data and Their Interesting Processes

March 25, 2015

I love it when mid tier consultants wax enthusiastically about Big Data. Search your data lake, enjoins one clueless marketer. Big Data is the future, sings a self appointed expert. Yikes.

To get a glimpse of exactly what has to be done to process certain types of Big Data in an economical yet timely manner, I suggest you read “Analytics on the Cheap.” The author is 0X74696D. Get it?

The write up explains the procedures required to crunch data and manage the budget. The work flow process I found interesting is:

  • Incoming message passes through our CDN to pick up geolocation headers
  • Message has its session authenticated (this happens at our routing layer in Nginx/OpenResty)
  • Message is routed to an ingest server
  • Ingest server transforms message and headers into a single character-delimited querystring value
  • Ingest server makes a HTTP GET to a 0-byte file on S3 with that querystring
  • The bucket on S3 has S3 logging turned on.
  • We ingest the S3 logs directly into Redshift on a daily basis.

The write up then provides code snippets and some business commentary. The author also identifies the upside of the approach used.

Why is this important? It is easy to talk about Big Data. Looking at what is required to make use of Big Data reveals the complexity of the task.

Keep this hype versus real world split in mind the next time you listen to a search vendor yak about Big Data.

Stephen E Arnold, March 25, 2015

Relaxing a Query: PostgreSQL Style

March 22, 2015

If you are a user of PostgreSQL and want to implement fuzzy, relaxed, or “show ‘em something sort of close to the user’s query,” you will want to read “Super Fuzzy Searching on PostgreSQL.” Fuzzy search makes it possible to show a user who is not quite sure how terms appear in an index. Fuzzy is not exactly like “close” in horseshoes. More algorithmic magic is at play in information retrieval systems.

The article explains PostgreSQL fuzzy capabilities and launches into the notion of trigrams. Keep in mind that Manning & Napier (creators of DR LINK) possess some n-gram patents. The old Brainware which may have once been SER) also possesses some n-gram type patents. I recall hearing years ago that Brainware developed a trigram search system which worked reasonably well when looking for similar patent claims. Brainware is now part of a printer company, and I have lost track of the search technology. I suppose I could investigate the Brainware/Lexmark status, but I have other tasks beckoning my attention.

The write up explains how to implement trigrams for PostgreSQL. The code examples are useful and the tips for dealing with large datasets are quite helpful. The author does not mention the n-gram related patents. I assume that the author assumes that the patent holders assume no one is infringing. That is a triple assumption set. int ere sti ngt rig ram coi nci den ce_

Stephen E Arnold, March 22, 2015

Next Page »