CyberOSINT banner

Palantir Explains Palantir

July 1, 2015

I read a question about Palantir on Quora. You may be able to access it at but no promises. A person named Abhiram and then Kevin Simler provided some information. Here’s are three items I found interesting:

  1. At Palantir we specialize in analysis.
  2. The first important thing to note is that we don’t actually do the analysis ourselves.
  3. You could say that we help summarize large data sets, in the sense that we have to provide the analyst with a rich library of techniques and algorithms.

I think I understand.

Stephen E Arnold, July 1, 2015

Google, Search, and Swizzled Results

July 1, 2015

I am tired of answering questions about the alleged blockbuster revelations from a sponsored study and an academic Internet legal eagle wizard. To catch up on the swizzled search results “news”, I direct your attention, gentle reader, to these articles:

I don’t have a dog in this fight. I prefer the biases of, the wonkiness of Qwant, the mish mash of iSeek, and the mixed outputs of

I don’t look for information using my mobile devices. I use my trusty MacBook and various software tools. I don’t pay much, if any, attention to the first page of results. I prefer to labor through the deeper results. I am retired, out of the game, and ready to charge up my electric wheel chair one final time.

Let me provide you with three basic truths about search. I will illustrate each with a story drawn from my 40 year career in online, information access, and various types of software.

Every Search Engine Provides Tuning Controls

Yep, every search system with which i have worked offers tuning controls. Here’s the real life story. My colleagues and I get a call in our tiny cubicle in an office near the White House. The caller told us to make sure that the then vice president’s Web site came up for specific queries. We created for the Fast Search & Transfer system a series of queries which we hard wired into the results display subsystem. Bingo. When the magic words and phrases were searched, the vice president’s Web page with content on that subject came up. Why did we do this? Well, we knew the reputation of the vice president and I had the experience of sitting in a meeting he chaired. I strongly suggested we just do the hit boosting and stop wasting time. That VP was a firecracker. That’s how life goes in the big world of search.

Key takeaway: Every search engine provides easy or hard ways to present results. These controls are used for a range of purposes. The index just does not present must see benefits information when an employee runs an HR query or someone decides that content is not providing a “good user experience.”

Engineers Tailor Results Frequently

The engineers who have to deal with the weirdness of content indexing, the stuff that ends up in the exception file, a broken relevance function when an external synonym list is created, whatever—these issues have to be fixed one by one. No one talking about the search system knows or cares about this type of grunt work. The right fix is the one that works with the least hassle. If one tries to explain why certain content is not in the index, a broken conversion filter is not germane to the complainer’s conversation. When the exclusions are finally processed, these may be boosted in some way. Hey, people were complaining so weight these cont4ent objects so they show up. This works with grumpy advertisers, cranky Board members, and clueless new hires. Here’s the story. We were trying to figure out why a search system at a major trade association did not display more than half of the available content. The reason was that the hardware and memory were inadequate for the job. We fiddled. We got the content in the index. We flagged it so that it would appear at the top of a results list. The complaining stopped. No one asked how we did this. I got paid and hit the road.

Key takeaway: In real world search, there are decisions made to deal with problems that Ivory Tower types and disaffected online ecommerce sites cannot and will not understand. The folks working on the system put in a fix and move on. There are dozens and dozens of problems with every search system we have encountered since my first exposure to STAIRS III and BRS. Search sucked in the late 1960s and early 1970s, and it sucks today. To get relevant information, one has to be a very, very skilled researcher, just like it was in the 16th century.

New Hires Just Do Stuff

Okay, here’s a fact of life that will grate on the nerves of the Ivy League MBAs. Search engineering is grueling, difficult, and thankless works. Managers want precision and recall. MBAs often don’t understand that which they demand. So why not hard wire every darned query from this ivy bedecked whiz kid. Ask Jeeves took this route and it worked until the money for humans ran out. Today new hires come in to replace the experienced people like my ArnoldIT team who say, “Been there done that. Time for cyberOSINT.” The new lads and lasses grab a problem and solve it. Maybe a really friendly marketer wants Aunt Sally’s home made jam to be top ranked. The new person just sets the controls and makes an offer of “Let’s do lunch.”  Maybe the newcomer gets tired of manual hit boosting, writes a script to automate boosting via a form which any marketer can complete. Maybe the script kiddie posts the script on the in-house system. Bingo. Hit boosting is the new black because it works around perceived relevance issues. Real story: At a giant drug company, researchers could not find their content. The fix was to create a separate search system, indexed and scored to meet the needs of the researchers, and then redirect every person from the research department to the swizzled search system. Magic.

Key takeaway: Over time functions, procedures, and fixes get made and managers, like prison guards, no longer perform serious monitoring. Managers are too busy dealing with automated meeting calendars or working on their own start up. When companies in the search business have been around for seven, ten, or fifteen years, I am not sure anyone “in charge” knows what is going on with the newcomers’ fixes and workarounds. Continuity is not high on the priority list in my experience.

What’s My View of the Wu-velations?

I have three observations:

  1. Search results boosting is a core system function; it is not something special. If a search system does not include a boosting function, programmers will find a way to deliver boosting even if it means running two queries and posting results to a form with the boosted content smack in the top spot.
  2. Google’s wildly complex and essentially unmanageable relevance ranking algorithms does stuff that is perplexing because it is tied into inputs from “semantic servers” and heaven knows what else. I can see a company’s Web site disappearing or appearing because no one understands the interactions among the inputs in Google’s wild and crazy system. Couple that with hit boosting and you have a massive demonstration of irrelevant results.
  3. Humans at a search company can reach for a search engineer, make a case for a hit boosting function, and move on. The person doing the asking could be a charming marketer or an errant input system. No one has much, if any, knowledge of actions of a single person or a small team as long as the overall system does not crash and burn.

I am far more concerned about the predictive personalization methods in use for the display of content on mobile devices. That’s why I use

It is the responsibility of the person looking for information to understand bias in results and then exert actual human effort, time, and brain power to figure out what’s relevant and what’s not.

Fine beat up on the Google. But there are other folks who deserve a whack or two. Why not ask yourself, “Why are results from Bing and Google so darned similar?” There’s a reason for that too, gentle reader. But that’s another topic for another time.

Stephen E Arnold, July 1, 2015

CSC Attracts Buyer And Fraud Penalties

July 1, 2015

According to the Reuters article “Exclusive: CACI, Booz Allen, Leidos Eyes CSC’s Government Unit-Sources,” CACI International, Leidos Holdings, and Booz Allen Hamilton Holdings

have expressed interest in Computer Sciences Corp’s public sector division.  There are not a lot of details about the possible transaction as it is still in the early stages, so everything is still hush-hush.

The possible acquisition came after the news that CSC will split into two divisions: one that serves US public sector clients and the other dedicated to global commercial and non-government clients.  CSC has an estimated $4.1 billion in revenues and worth $9.6 billion, but CACI International, Leidos Holdings, and Booz Allen Hamilton might reconsider the sale or getting the price lowered after hearing this news: “Computer Sciences (CSC) To Pay $190M Penalty; SEC Charges Company And Former Executives With Accounting Fraud” from Street Insider.  The Securities and Exchange Commission are charging CSC and former executives with a $190 million penalty for hiding financial information and problems resulting from the contract they had with their biggest client.  CSC and the executives, of course, are contesting the charges.

“The SEC alleges that CSC’s accounting and disclosure fraud began after the company learned it would lose money on the NHS contract because it was unable to meet certain deadlines. To avoid the large hit to its earnings that CSC was required to record, Sutcliffe allegedly added items to CSC’s accounting models that artificially increased its profits but had no basis in reality. CSC, with Laphen’s approval, then continued to avoid the financial impact of its delays by basing its models on contract amendments it was proposing to the NHS rather than the actual contract. In reality, NHS officials repeatedly rejected CSC’s requests that the NHS pay the company higher prices for less work. By basing its models on the flailing proposals, CSC artificially avoided recording significant reductions in its earnings in 2010 and 2011.”

Oh boy!  Is it a wise decision to buy a company that has a history of stealing money and hiding information?  If the company’s root products and services are decent, the buyers might get it for a cheap price and recondition the company.  Or it could lead to another disaster like HP and Autonomy.

Whitney Grace, July 1, 2015

Sponsored by, publisher of the CyberOSINT monograph

ClearStory Is On the Move

July 1, 2015

The article on Virtual-Strategy Magazine titled ClearStory Data Appoints Dr. Timothy Howes as Chief Technology Offiver; Fromer Vice President of Yahoo, CTO of HP Software, Opsware, and Netscape discusses Howe’s reputation as an innovative thinker who helped invent LDAP. His company Rockmelt Inc. was acquired by Yahoo and he also co-founded Loudcloud, which is now known as Opsware, with the founders of VC firm Andreessen Horowitz, who are current backers of ClearStory Data. Needless to say, obtaining his services is quite a coup for ClearStory. Howe discusses his excitement to join the team in the article,

“There’s a major technology shift happening in the data market right now as businesses want to see and explore more data faster. ClearStory is at the forefront of delivering the next-generation data analysis platform that brings Spark-powered, fast-cycle analysis to the front lines of business in a beautiful, innovative user experience that companies are in dire need of today,” said Howes. “The ClearStory architectural choices made early on, coupled with the focus on an elegant, collaborative user model is impressive.”

The article also mentions that Ali Tore, formerly of Model N, has been named the new Chief Product Officer. Soumitro Tagore of the startup Clari will become the VP of Engineering and Development Operations. ClearStory Data is intent on the acceleration of the movement of data for businesses. Their Intelligent Data Harmonization platform allows data from different sources to be quickly and insightfully explored.

Chelsea Kerwin, July 1, 2014

Sponsored by, publisher of the CyberOSINT monograph

Watson: Will It Be Able to Make Major Government Applications More Intelligent?

June 30, 2015

I recently commented on the 25 percent problem rate in government software. You can find that Beyond Search item at this link. I can relate to companies who want to improve US government software. Go for it.

IBM has a plan which apparently ignores IBM Federal Systems (an outfit which creates, upgrades, and maintains some US government software). The approach focuses on a group of student from the University of Texas at Austin.

The write up states that Lauri Saft, director of the IBM Watson Ecosystem, has this view:

“You don’t program Watson, you teach it. We gave them [the students] the empty shell of Watson and said, ‘Go and come up with ideas you feel would be valuable.’”

The training method was one of Autonomy IDOL’s most important functions. The challenge which IDOL licensees faced, as I understand the system, is that the system can drift unless training and calibration are part of the routine maintenance cycle. For some licensees, the time and cost of the training and calibration were hurdles. Has Watson moved beyond Autonomy’s approach?

I assume the answer is, “Yes.” Therefore, the use of Watson to improve government software by making that software more intelligent should be a home run.

The students in Austin created a Watson app named CallScout. The focus is not on government software as I think of market opportunities. The students are working on a social service program in Texas. The application is customer support solution. That’s good.

My thought was that IBM Federal Systems would be using Watson’s remarkably broad spectrum of capabilities to address issues at the national level, maybe the regional level for Homeland Security or the EPA. I did not expect a local app for social services in Texas.

Perhaps the IBM Federal Systems Watson home run will be announced soon. The incubator with student thing is not likely to boost IBM top line revenues in the way a major US government Watson deal would. But PR is PR, big or small. IBM needs its Federal Systems’ unit to get the Watson revenue flowing. Time is a wastin’ because there are other outfits nosing into this potentially lucrative territory.

Stephen E Arnold, June 30, 2015

Keyword Search Is Not Productive. Who Says?

June 30, 2015

I noticed a flurry of tweets pointing to a diagram which maps out the Future of Search. You can view the diagram at or Direct your attention to this assertion:

As amount of data grows, keyword search is becoming less productive.

Now look at what will replace keyword search:

  • Social tagging
  • Automatic semantic tagging
  • Natural language search
  • Intelligent agents
  • Web scale reasoning.

The idea is that we will experience a progression through these “operations” or “functions.” The end point is “The Intelligent Web” and the Web scale reasoning approach to information access.

Interesting. But I am not completely comfortable with this analysis.

Let me highlight four observations and then leave you to your own sense of what the Web will become as the amount of data increases.

First, keyword search is a utility function, and it will become ubiquitous. It will not go away or be forgotten. Keyword search will just appear in more and more human machine interactions. Telling your automobile to call John is keyword search. Finding an email is often a matter of plugging a couple of words into the Gmail search box.

Second, more data does translate to programmers lacing together algorithms to deliver information to users. The idea is that a mobile device user will just “get” information. This is a practical response to the form factor, methods to reduce computational loads imposed by routine query processing, and the human desire for good enough information. The information just needs to be good enough which will work for most people. Do you want your child’s doctor to take automatic outputs if your child has cancer?

Third, for certain types of information access, the focus is shifting, as it should, from huge flows of data to chopping flows down into useful chunks. Governments archive intercepts because the computational demands of processing information in real time for large numbers of users who need real time access are an issue. As data volume grows, computing horsepower is laboring to keep pace. Short cuts are, therefore, important. But most of the short cuts require on having a question to answer. Guess what? Those short cuts are often keyword queries. The human may not be doing keyword searching, but the algorithms are.

Fourth, some types of information require both old fashioned Boolean keyword search and retrieval AND the manual, time consuming work of human specialists. In my experience, algorithms are useful, but there are subjects which require the old fashioned methods of querying, reading, researching, analyzing, and discussing. Most of these functions are keyword centric.

In short, keyword queries can be dismissed or dressed up in fancy jargon. I don’t think the method is going away too quickly. Charts and subjective curves are one thing. Real world information interaction is another.

Stephen E Arnold, June 30, 2015

The Google Cloud: Low Ceiling, Visibility Limited

June 30, 2015

I read “Google Cloud Platform: Google Execs Speak.” I highlighted one passage. In response to a question about recent Google cloud service price cuts, the Googler Brian Stevens said:

Our [pricing] is, to be honest, completely driven by measurable infrastructure improvements. So the numbers that you’re seeing aren’t even looking at the competition. They’re looking at the efficiencies. We actually can cost out all of our ongoing infrastructure for our platform, which we actually charge back to the group… We actually modeled those [costs]. We built our plans for next year. We have a set of goals around infrastructure efficiencies that we’re going to drive next year as well. Those [costs] are mapped right back into further and further discounts. So the model, for us, will continue.

I assume that Amazon will remain competitive with Google as both companies try to create value adding services. How low will Google cloud prices go? The suggestion that Google pays little attention to the actions of its competitors strikes me as interesting. I am sensitive to the words “honest” and “actually.”

Stephen E Arnold, June 30, 2015

Microsoft Puts the Cloud First with SharePoint Server 2016

June 30, 2015

Discussion of the cloud seems to push users into two camps: for and against. While hybrid is probably truly the way of the future, folks are still currently either of the “love it” or “hate it” variety. Redmond Magazine has provided good ongoing coverage of the upcoming SharePoint Server 2016 release, and their article, “Microsoft Taking a ‘Cloud First’ Approach with SharePoint 2016,” gives more details about what can be expected.

The article says:

“SharePoint Server 2016 will be a very cloud-inspired product when commercially released next year . . . Microsoft’s cloud services have been looming in the background of prior SharePoint Server releases . . . Office 365 cloud services have played a role since SharePoint Server 2013, and they will do so going forward with SharePoint Server 2016.”

One of the main promotional points of the new release is a promised “unified experience” for SharePoint users. While cloud skeptics still have reason to be cautious, the promised improvements may win them over. To stay up-to-date with the latest news regarding SharePoint, stayed tuned in to and the dedicated SharePoint feed. Stephen E. Arnold is a longtime leader in search and his expertise comes in handy when trying to stay current without spending a lot of time doing independent research.

Emily Rae Aldridge, June 30, 2015

Sponsored by, publisher of the CyberOSINT monograph


Tumblr Has a GIF For You

June 30, 2015

Facebook recently enabled users to post GIF images on the social media platform.  Reddit was in an uproar over the new GIF and celebrated by posting random moving images from celebrities making weird faces to the quintessential cute kitten.  GIFs are an Internet phenomenon and are used by people to express their moods, opinions, or share their fandom.  Another popular social medium platform, Tumblr, the microblogging site used to share photos, videos, quotes, and more, has added a GIF search, says PCMag in “Tumblr Adds New GIF Search Capabilities.”

The main point of Tumblr is the ability share content either a user creates or someone else creates.  A user’s Tumblr page is a personal reflection of themselves and GIFs are one of the ultimate content pieces to share.  Tumblr’s new search option for GIFs is very simple: a user picks the + button, clicks the GIF button, and then search for the GIF that suits your mood.  A big thing on Tumblr is citing who created a piece and the new search option has that covered:

“Pick the GIF you want and it slinks right in, properly credited and everything,” the company said. “Whoever originally posted the GIF will be notified accordingly. On their dashboard, on their phone, all the regular places notifications go.”

GIFs are random bits of fun that litter the Internet and quickly achieve meme status.  They are also easy to make, which appeals to people with vey little graphic background.  They can make something creative and fun without much effort and now the can be easily found and shared on Tumblr.

Whitney Grace, June 30, 2015

Sponsored by, publisher of the CyberOSINT monograph


Webinar from BrightFunnel Ties Marketing to Revenue

June 30, 2015

The webinar on BrightFunnel Blog titled Campaign Attribution: Start Measuring True Marketing Impact (How-To Video) adds value to marketing efforts. BrightFunnel defines itself as platform for marketing analytics that works to join marketing more closely to revenue. The webinar is focused on the attribution application. The video poses three major questions that the application can answer about how pipeline and revenue are affected by marketing channels and specific campaigns, as well as how to gain better insight on the customer. The article overviews the webinar,

“Marketers care. We care a lot about what happens to all those leads we generate for sales. It can be hard to get a complete view of marketing impact when you’re limited to trusting that the right contacts, if any, are being added to opportunities! In this recording from our recent webinar, see how BrightFunnel solves key attribution problems by providing seamless visibility into multi-touch campaign attribution so you can accurately measure the impact you have on pipeline and revenue.”

BrightFunnel believes in an intuitive approach, claiming that three to four weeks has been plenty of time for their users to get set up and get to work with their product. They host a series of webinars that allows interested parties to ask direct questions and be answered live.

Chelsea Kerwin, June 30, 2014

Sponsored by, publisher of the CyberOSINT monograph


Next Page »