CyberOSINT banner

How Does a Xoogler Address Search?

July 2, 2015

There are two ways to answer this question.

At Verizon AOL, the approach is to use Bing and the Microsoft ad platform. See “AOL Takes Over Majority of Microsoft’s Ad Business, Swaps Google Search For Bing.” You may have to pay with something other than Greek coded euros to view this article.

At Yahoo, the approach may be to use Google search results, not Microsoft Bing’s. Will Yahoo embrace the GOOG? According to “Yahoo Search Testing Google Search Results: Search PandaMonium”, this may be happening.

The write up states:

I am uncertain to what degree they [sic The author seems to be referring to Yahoo] are testing search results from Google, but on some web browsers I am seeing Yahoo! organics and ads powered by Bing & in other browsers I am seeing Yahoo! organics and ads powered by Google. Here are a couple screenshots.

Will the change have an impact on the relevance of Yahoo search results? Jury is out.

Stephen E Arnold, July 2, 2015

Software Market Begs for Integration Issue Relief

July 2, 2015

A recent report proves what many users already know: integrating an existing CMS with new and emerging software solutions is difficult. As quickly as software emerges and changes, users are finding that hulking overgrown CMS solutions are lagging behind in terms of agility. SharePoint is no stranger to this criticism. Business Solutions offers more details in their article, “ISVs: Study Shows Microsoft SharePoint Is Open To Disruption.”

A report from Software Advice surveyed employees that use content management systems (CMS) on a daily basis and found 48 percent had considerable problems integrating their CMS with their other software solutions. The findings mirror a recent AIIM report that found only 11 percent of companies experienced successful Microsoft SharePoint implementation . . . The results of this report indicate that the CMS market is ripe for disruption if a software vendor could solve the integration issues typically associated with SharePoint.”

No doubt, Microsoft understands the concerns and perceived threats, and will attempt to solve some of the issue with the upcoming release of SharePoint Server 2016. However, the fact remains that SharePoint is a big ship to turn, and change will not be dramatic or happen overnight. In the meantime, stay on top of the latest news for tips, tricks, and third-party solutions that may ease some of the pain. Look to Stephen E. Arnold and his SharePoint feed on in order to stay in touch without a huge investment in time.

Emily Rae Aldridge, July 2, 2015

Sponsored by, publisher of the CyberOSINT monograph

Compound Search Processing Repositioned at ConceptSearching

July 2, 2015

The article titled Metadata Matters; What’s The One Piece of Technology Microsoft Doesn’t Provide On-Premises Or in the Cloud? on ConceptSearching re-introduces Compound Search Processing, ConceptSearching’s main offering. Compound Search Processing is a technology achieved in 2003 that can identify multi-word concepts, and the relationships between words. Compound Search Processing is being repositioned, with Concept Searching apparently chasing Sharepoint Sales. The article states,

“The missing piece of technology that Microsoft and every other vendor doesn’t provide is compound term processing, auto-classification, and taxonomy that can be natively integrated with the Term Store. Take advantage of our technologies and gain business advantages and a quantifiable ROI…

Microsoft is offering free content migration for customers moving to Office 365…If your content is mismanaged, unorganized, has no value now, contains security information, or is an undeclared record, it all gets moved to your brand new shiny Office 365.”

The angle for Concept Searching is metadata and indexing, and they are quick to remind potential customers that “search is driven by metadata.” The offerings of ConceptSearching comes with the promise that it is the only platform that will work with all versions of Sharepoint while delivering their enterprise metadata repository. For more information on the technology, see the new white paper on Compoud Term Processing.
Chelsea Kerwin, July 2, 2014

Sponsored by, publisher of the CyberOSINT monograph


Google, Search, and Swizzled Results

July 1, 2015

I am tired of answering questions about the alleged blockbuster revelations from a sponsored study and an academic Internet legal eagle wizard. To catch up on the swizzled search results “news”, I direct your attention, gentle reader, to these articles:

I don’t have a dog in this fight. I prefer the biases of, the wonkiness of Qwant, the mish mash of iSeek, and the mixed outputs of

I don’t look for information using my mobile devices. I use my trusty MacBook and various software tools. I don’t pay much, if any, attention to the first page of results. I prefer to labor through the deeper results. I am retired, out of the game, and ready to charge up my electric wheel chair one final time.

Let me provide you with three basic truths about search. I will illustrate each with a story drawn from my 40 year career in online, information access, and various types of software.

Every Search Engine Provides Tuning Controls

Yep, every search system with which i have worked offers tuning controls. Here’s the real life story. My colleagues and I get a call in our tiny cubicle in an office near the White House. The caller told us to make sure that the then vice president’s Web site came up for specific queries. We created for the Fast Search & Transfer system a series of queries which we hard wired into the results display subsystem. Bingo. When the magic words and phrases were searched, the vice president’s Web page with content on that subject came up. Why did we do this? Well, we knew the reputation of the vice president and I had the experience of sitting in a meeting he chaired. I strongly suggested we just do the hit boosting and stop wasting time. That VP was a firecracker. That’s how life goes in the big world of search.

Key takeaway: Every search engine provides easy or hard ways to present results. These controls are used for a range of purposes. The index just does not present must see benefits information when an employee runs an HR query or someone decides that content is not providing a “good user experience.”

Engineers Tailor Results Frequently

The engineers who have to deal with the weirdness of content indexing, the stuff that ends up in the exception file, a broken relevance function when an external synonym list is created, whatever—these issues have to be fixed one by one. No one talking about the search system knows or cares about this type of grunt work. The right fix is the one that works with the least hassle. If one tries to explain why certain content is not in the index, a broken conversion filter is not germane to the complainer’s conversation. When the exclusions are finally processed, these may be boosted in some way. Hey, people were complaining so weight these cont4ent objects so they show up. This works with grumpy advertisers, cranky Board members, and clueless new hires. Here’s the story. We were trying to figure out why a search system at a major trade association did not display more than half of the available content. The reason was that the hardware and memory were inadequate for the job. We fiddled. We got the content in the index. We flagged it so that it would appear at the top of a results list. The complaining stopped. No one asked how we did this. I got paid and hit the road.

Key takeaway: In real world search, there are decisions made to deal with problems that Ivory Tower types and disaffected online ecommerce sites cannot and will not understand. The folks working on the system put in a fix and move on. There are dozens and dozens of problems with every search system we have encountered since my first exposure to STAIRS III and BRS. Search sucked in the late 1960s and early 1970s, and it sucks today. To get relevant information, one has to be a very, very skilled researcher, just like it was in the 16th century.

New Hires Just Do Stuff

Okay, here’s a fact of life that will grate on the nerves of the Ivy League MBAs. Search engineering is grueling, difficult, and thankless works. Managers want precision and recall. MBAs often don’t understand that which they demand. So why not hard wire every darned query from this ivy bedecked whiz kid. Ask Jeeves took this route and it worked until the money for humans ran out. Today new hires come in to replace the experienced people like my ArnoldIT team who say, “Been there done that. Time for cyberOSINT.” The new lads and lasses grab a problem and solve it. Maybe a really friendly marketer wants Aunt Sally’s home made jam to be top ranked. The new person just sets the controls and makes an offer of “Let’s do lunch.”  Maybe the newcomer gets tired of manual hit boosting, writes a script to automate boosting via a form which any marketer can complete. Maybe the script kiddie posts the script on the in-house system. Bingo. Hit boosting is the new black because it works around perceived relevance issues. Real story: At a giant drug company, researchers could not find their content. The fix was to create a separate search system, indexed and scored to meet the needs of the researchers, and then redirect every person from the research department to the swizzled search system. Magic.

Key takeaway: Over time functions, procedures, and fixes get made and managers, like prison guards, no longer perform serious monitoring. Managers are too busy dealing with automated meeting calendars or working on their own start up. When companies in the search business have been around for seven, ten, or fifteen years, I am not sure anyone “in charge” knows what is going on with the newcomers’ fixes and workarounds. Continuity is not high on the priority list in my experience.

What’s My View of the Wu-velations?

I have three observations:

  1. Search results boosting is a core system function; it is not something special. If a search system does not include a boosting function, programmers will find a way to deliver boosting even if it means running two queries and posting results to a form with the boosted content smack in the top spot.
  2. Google’s wildly complex and essentially unmanageable relevance ranking algorithms does stuff that is perplexing because it is tied into inputs from “semantic servers” and heaven knows what else. I can see a company’s Web site disappearing or appearing because no one understands the interactions among the inputs in Google’s wild and crazy system. Couple that with hit boosting and you have a massive demonstration of irrelevant results.
  3. Humans at a search company can reach for a search engineer, make a case for a hit boosting function, and move on. The person doing the asking could be a charming marketer or an errant input system. No one has much, if any, knowledge of actions of a single person or a small team as long as the overall system does not crash and burn.

I am far more concerned about the predictive personalization methods in use for the display of content on mobile devices. That’s why I use

It is the responsibility of the person looking for information to understand bias in results and then exert actual human effort, time, and brain power to figure out what’s relevant and what’s not.

Fine beat up on the Google. But there are other folks who deserve a whack or two. Why not ask yourself, “Why are results from Bing and Google so darned similar?” There’s a reason for that too, gentle reader. But that’s another topic for another time.

Stephen E Arnold, July 1, 2015

CSC Attracts Buyer And Fraud Penalties

July 1, 2015

According to the Reuters article “Exclusive: CACI, Booz Allen, Leidos Eyes CSC’s Government Unit-Sources,” CACI International, Leidos Holdings, and Booz Allen Hamilton Holdings

have expressed interest in Computer Sciences Corp’s public sector division.  There are not a lot of details about the possible transaction as it is still in the early stages, so everything is still hush-hush.

The possible acquisition came after the news that CSC will split into two divisions: one that serves US public sector clients and the other dedicated to global commercial and non-government clients.  CSC has an estimated $4.1 billion in revenues and worth $9.6 billion, but CACI International, Leidos Holdings, and Booz Allen Hamilton might reconsider the sale or getting the price lowered after hearing this news: “Computer Sciences (CSC) To Pay $190M Penalty; SEC Charges Company And Former Executives With Accounting Fraud” from Street Insider.  The Securities and Exchange Commission are charging CSC and former executives with a $190 million penalty for hiding financial information and problems resulting from the contract they had with their biggest client.  CSC and the executives, of course, are contesting the charges.

“The SEC alleges that CSC’s accounting and disclosure fraud began after the company learned it would lose money on the NHS contract because it was unable to meet certain deadlines. To avoid the large hit to its earnings that CSC was required to record, Sutcliffe allegedly added items to CSC’s accounting models that artificially increased its profits but had no basis in reality. CSC, with Laphen’s approval, then continued to avoid the financial impact of its delays by basing its models on contract amendments it was proposing to the NHS rather than the actual contract. In reality, NHS officials repeatedly rejected CSC’s requests that the NHS pay the company higher prices for less work. By basing its models on the flailing proposals, CSC artificially avoided recording significant reductions in its earnings in 2010 and 2011.”

Oh boy!  Is it a wise decision to buy a company that has a history of stealing money and hiding information?  If the company’s root products and services are decent, the buyers might get it for a cheap price and recondition the company.  Or it could lead to another disaster like HP and Autonomy.

Whitney Grace, July 1, 2015

Sponsored by, publisher of the CyberOSINT monograph

ClearStory Is On the Move

July 1, 2015

The article on Virtual-Strategy Magazine titled ClearStory Data Appoints Dr. Timothy Howes as Chief Technology Offiver; Fromer Vice President of Yahoo, CTO of HP Software, Opsware, and Netscape discusses Howe’s reputation as an innovative thinker who helped invent LDAP. His company Rockmelt Inc. was acquired by Yahoo and he also co-founded Loudcloud, which is now known as Opsware, with the founders of VC firm Andreessen Horowitz, who are current backers of ClearStory Data. Needless to say, obtaining his services is quite a coup for ClearStory. Howe discusses his excitement to join the team in the article,

“There’s a major technology shift happening in the data market right now as businesses want to see and explore more data faster. ClearStory is at the forefront of delivering the next-generation data analysis platform that brings Spark-powered, fast-cycle analysis to the front lines of business in a beautiful, innovative user experience that companies are in dire need of today,” said Howes. “The ClearStory architectural choices made early on, coupled with the focus on an elegant, collaborative user model is impressive.”

The article also mentions that Ali Tore, formerly of Model N, has been named the new Chief Product Officer. Soumitro Tagore of the startup Clari will become the VP of Engineering and Development Operations. ClearStory Data is intent on the acceleration of the movement of data for businesses. Their Intelligent Data Harmonization platform allows data from different sources to be quickly and insightfully explored.

Chelsea Kerwin, July 1, 2014

Sponsored by, publisher of the CyberOSINT monograph

Keyword Search Is Not Productive. Who Says?

June 30, 2015

I noticed a flurry of tweets pointing to a diagram which maps out the Future of Search. You can view the diagram at or Direct your attention to this assertion:

As amount of data grows, keyword search is becoming less productive.

Now look at what will replace keyword search:

  • Social tagging
  • Automatic semantic tagging
  • Natural language search
  • Intelligent agents
  • Web scale reasoning.

The idea is that we will experience a progression through these “operations” or “functions.” The end point is “The Intelligent Web” and the Web scale reasoning approach to information access.

Interesting. But I am not completely comfortable with this analysis.

Let me highlight four observations and then leave you to your own sense of what the Web will become as the amount of data increases.

First, keyword search is a utility function, and it will become ubiquitous. It will not go away or be forgotten. Keyword search will just appear in more and more human machine interactions. Telling your automobile to call John is keyword search. Finding an email is often a matter of plugging a couple of words into the Gmail search box.

Second, more data does translate to programmers lacing together algorithms to deliver information to users. The idea is that a mobile device user will just “get” information. This is a practical response to the form factor, methods to reduce computational loads imposed by routine query processing, and the human desire for good enough information. The information just needs to be good enough which will work for most people. Do you want your child’s doctor to take automatic outputs if your child has cancer?

Third, for certain types of information access, the focus is shifting, as it should, from huge flows of data to chopping flows down into useful chunks. Governments archive intercepts because the computational demands of processing information in real time for large numbers of users who need real time access are an issue. As data volume grows, computing horsepower is laboring to keep pace. Short cuts are, therefore, important. But most of the short cuts require on having a question to answer. Guess what? Those short cuts are often keyword queries. The human may not be doing keyword searching, but the algorithms are.

Fourth, some types of information require both old fashioned Boolean keyword search and retrieval AND the manual, time consuming work of human specialists. In my experience, algorithms are useful, but there are subjects which require the old fashioned methods of querying, reading, researching, analyzing, and discussing. Most of these functions are keyword centric.

In short, keyword queries can be dismissed or dressed up in fancy jargon. I don’t think the method is going away too quickly. Charts and subjective curves are one thing. Real world information interaction is another.

Stephen E Arnold, June 30, 2015

Webinar from BrightFunnel Ties Marketing to Revenue

June 30, 2015

The webinar on BrightFunnel Blog titled Campaign Attribution: Start Measuring True Marketing Impact (How-To Video) adds value to marketing efforts. BrightFunnel defines itself as platform for marketing analytics that works to join marketing more closely to revenue. The webinar is focused on the attribution application. The video poses three major questions that the application can answer about how pipeline and revenue are affected by marketing channels and specific campaigns, as well as how to gain better insight on the customer. The article overviews the webinar,

“Marketers care. We care a lot about what happens to all those leads we generate for sales. It can be hard to get a complete view of marketing impact when you’re limited to trusting that the right contacts, if any, are being added to opportunities! In this recording from our recent webinar, see how BrightFunnel solves key attribution problems by providing seamless visibility into multi-touch campaign attribution so you can accurately measure the impact you have on pipeline and revenue.”

BrightFunnel believes in an intuitive approach, claiming that three to four weeks has been plenty of time for their users to get set up and get to work with their product. They host a series of webinars that allows interested parties to ask direct questions and be answered live.

Chelsea Kerwin, June 30, 2014

Sponsored by, publisher of the CyberOSINT monograph


Alternative Search Engines: The Gray Lady Way

June 29, 2015

I read “Alternative Search Engines.” (Note: If you have to pay to read the article, visit a library and look for the story in the New York Times Magazine.) The process was painful. Distinctions which I find important were not part of the write up. The notion that some outfits actually index Web sites, and other outfits use Bing and Google search results without telling the user or the New York Times this cost cutting, half measure. Well, who cares? I don’t.

The write up asserts:

I was investigating the more practical, or just more traditional, alternatives to Google: Bing (owned by Microsoft), Yahoo (operated by Google back then and by Bing now), (an aggregator of Yahoo/Bing, Google and others) and newer sites like DuckDuckGo and IxQuick (which don’t track your search history), Gibiru and Unbubble (which don’t censor results) and Wolfram Alpha (which curates results). They were all too organized, too logical — the results were all the same, with only slight differences in the order of their presentation. It seemed to me that the Search Engine of Tomorow couldn’t be concerned with the best way to find what users were searching for, but with the best way to find what users didn’t even know they were searching for.

In case the Gray Lady has not figured out the real world, tomorrow means mobile devices. Mobile devices deliver filtered, personalized, swizzled for advertisers results. If you expect to run key word queries on the next iPhone or Android device, give that a whirl and let me know how that works out for you.

The crisis in search is that content is not available. Obtaining primary and certain secondary information is time consuming, difficult, and tedious. The reality of alternative search engines is that these are few and far between.

Do you trust or Do you know what the size of the Exalead search index is? What’s included and what’s omitted from Qwant, the search engine based on Pertimm (who?) which allegedly causes Eric Schmidt to suffer Qwant induced insomnia?

Nah. In Beyond Search, our view has been that the old fashioned, library type of research is a gone goose. The even older fashioned “talk to humans” and “do original research which conforms to the minimal guidelines reviewed in Statistics 101 classes” is just too Baby Boomerish.

With the Gray Lady explaining search, the demise of precision and recall, relevancy, editorial policies for inclusion in an index, and latency between information being available and inclusion in an index is history.

Stephen E Arnold, June 29, 2015

Oracle Data Integrator Extension

June 29, 2015

The article titled Oracle Launches ODI in April with the Aim to Revolutionize Big Data on Market Realist makes it clear that Oracle sees big money in NoSQL. Oracle Data Integrator, or ODI, enables developers and analysts to simplify their lives and training. It cancels the requirement for their learning multiple programming languages and allows them to use Hadoop and the like without much coding expertise. The article states,

“According to a report from PCWorld, Jeff Pollock, Oracle vice president of product management, said, “The Oracle Data Integrator for Big Data makes a non-Hadoop developer instantly productive on Hadoop…” Databases like Hadoop and Spark are targeted towards programmers who have the coding knowledge expertise required to manipulate these databases with knowledge of the coding needed to manage them. On the other hand, analysts usually use software for data analytics.”

The article also relates some of Oracle’s claims about itself, including that it holds a larger revenue than IBM, Microsoft, SAP AG, and Teradata combined. Those are also Oracle’s four major competitors. With the release of ODI, Oracle intends to filter data arriving from a myriad of different places. Clustering data into groups related by their format or framework is part of this process. The end result is a more streamlined version without assumptions about the level of coding knowledge held by an analyst.

Chelsea Kerwin, June 29, 2014

Sponsored by, publisher of the CyberOSINT monograph

Next Page »