CyberOSINT banner

GAO DCGS Letter B-412746

June 1, 2016

A few days ago, I stumbled upon a copy of a letter from the GAO concerning Palantir Technologies dated May 18, 2016. The letter became available to me a few days after the 18th, and the US holiday probably limited circulation of the document. The letter is from the US Government Accountability Office and signed by Susan A. Poling, general counsel. There are eight recipients, some from Palantir, some from the US Army, and two in the GAO.

palantir checkmate

Has the US Army put Palantir in an untenable spot? Is there a deus ex machina about to resolve the apparent checkmate?

The letter tells Palantir Technologies that its protest of the DCGS Increment 2 award to another contractor is denied. I don’t want to revisit the history or the details as I understand them of the DCGS project. (DCGS, pronounced “dsigs”, is a US government information fusion project associated with the US Army but seemingly applicable to other Department of Defense entities like the Air Force and the Navy.)

The passage in the letter I found interesting was:

While the market research revealed that commercial items were available to meet some of the DCGS-A2 requirements, the agency concluded that there was no commercial solution that could  meet all the requirements of DCGS-A2. As the agency explained in its report, the DCGS-A2 contractor will need to do a great deal of development and integration work, which will include importing capabilities from DCGS-A1 and designing mature interfaces for them. Because  the agency concluded that significant portions of the anticipated DCSG-A2 scope of work were not available as a commercial product, the agency determined that the DCGS-A2 development effort could not be procured as a commercial product under FAR part 12 procedures. The protester has failed to show that the agency’s determination in this regard was unreasonable.

The “importing” point is a big deal. I find it difficult to imagine that IBM i2 engineers will be eager to permit the Palantir Gotham system to work like one happy family. The importation and manipulation of i2 data in a third party system is more difficult than opening an RTF file in Word in my experience. My recollection is that the unfortunate i2-Palantir legal matter was, in part, related to figuring out how to deal with ANB files. (ANB is i2 shorthand for Analysts Notebook’s file format, a somewhat complex and closely-held construct.)

Net net: Palantir Technologies will not be the dog wagging the tail of IBM i2 and a number of other major US government integrators. The good news is that there will be quite a bit of work available for firms able to support the prime contractors and the vendors eligible and selected to provide for-fee products and services.

Was this a shoot-from-the-hip decision to deny Palantir’s objection to the award? No. I believe the FAR procurement guidelines and the content of the statement of work provided the framework for the decision. However, context is important as are past experiences and perceptions of vendors in the running for substantive US government programs.

Read more

Next-Generation Business Intelligence Already Used by Risk Analysis Teams

June 1, 2016

Ideas about business intelligence have certainly evolved with emerging technologies. Addressing this, an article, Why machine learning is the new BI from CIO, speaks to this transformation of the concept. The author describes how reactive analytics based on historical data do not optimally assist business decisions. Questions about customer satisfaction are best oriented toward proactive future-proofing, according to the article. The author writes,

“Advanced, predictive analytics are about calculating trends and future possibilities, predicting potential outcomes and making recommendations. That goes beyond the queries and reports in familiar BI tools like SQL Server Reporting Services, Business Objects and Tableau, to more sophisticated methods like statistics, descriptive and predictive data mining, machine learning, simulation and optimization that look for trends and patterns in the data, which is often a mix of structured and unstructured. They’re the kind of tools that are currently used by marketing or risk analysis teams for understanding churn, customer lifetimes, cross-selling opportunities, likelihood of buying, credit scoring and fraud detection.”

Does this mean that traditional business intelligence after much hype and millions in funding is a flop? Or will predictive analytics be a case of polishing up existing technology and presenting it in new packaging? After time — and for some after much money has been spent — we should have a better idea of the true value.

 

Megan Feil, June 1, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Financial Institutes Finally Realize Big Data Is Important

May 30, 2016

One of the fears of automation is that human workers will be replaced and there will no longer be any more jobs for humanity.  Blue-collar jobs are believed to be the first jobs that will be automated, but bankers, financial advisors, and other workers in the financial industry have cause to worry.  Algorithms might replace them, because apparently people are getting faster and better responses from automated bank “workers”.

Perhaps one of the reasons why bankers and financial advisors are being replaced is due to their sudden understanding that “Big Data And Predictive Analytics: A Big Deal, Indeed” says ABA Banking Journal.  One would think that the financial sector would be the first to embrace big data and analytics in order to keep an upper hand on their competition, earn more money, and maintain their relevancy in an ever-changing world.   They, however, have been slow to adapt, slower than retail, search, and insurance.

One of the main reasons the financial district has been holding back is:

“There’s a host of reasons why banks have held back spending on analytics, including privacy concerns and the cost for systems and past merger integrations. Analytics also competes with other areas in tech spending; banks rank digital banking channel development and omnichannel delivery as greater technology priorities, according to Celent.”

After the above quote, the article makes a statement about how customers are moving more to online banking over visiting branches, but it is a very insipid observation.  Big data and analytics offer the banks the opportunity to invest in developing better relationships with their customers and even offering more individualized services as a way to one up Silicon Valley competition.  Big data also helps financial institutions comply with banking laws and standards to avoid violations.

Banks do need to play catch up, but this is probably a lot of moan and groan for nothing.  The financial industry will adapt, especially when they are at risk of losing more money.  This will be the same for all industries, adapt or get left behind.  The further we move from the twentieth century and generations that are not used to digital environments, the more we will see technology integration.

Whitney Grace, May 30, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

eBay Struggles with Cluttered, Unstructured Data, Deploys Artificial Intelligence Strategy

May 24, 2016

The article on Forbes titled eBay’s Next Move: Artificial Intelligence To Refine Product Searches predicts a strong future for eBay as the company moves further into machine learning. For roughly six years eBay has been working with Expertmaker, a Swedish AI and analytics company. Forbes believes that eBay may have recently purchased Expertmaker. The article explains the logic behind this logic,

“One of the key turnaround goals of eBay is to encourage sellers to define their products using structured data, making it easier for the marketplace to show relevant search results to buyers. The acquisition of Expertmaker should help the company in this initiative, given its expertise in artificial intelligence, machine learning and big data.”

The acquisition of Expertmaker should allow for a more comprehensive integration of eBay’s “noisy data.” Expertmaker’s AI strategy is based in genetics research, and has made great strides in extracting concealed value from data. For eBay, a company with hundreds of millions of listings clogging up the platform, Expertmaker’s approach might be the ticket to achieving a more streamlined, categorized search. If we take anything away from this, it is that eBay search currently does not work very well. At any rate, they are taking steps to improve their platform.

 
Chelsea Kerwin, May 24, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Search Sink Hole Identified and Allegedly Paved and Converted to a Data Convenience Store

May 20, 2016

I try to avoid reading more than one write up a day about alleged revolutions in content processing and information analytics. My addled goose brain cannot cope with the endlessly recycled algorithms dressed up in Project Runway finery.

I read “Ryft: Bringing High Performance Analytics to Every Enterprise,” and I was pleased to see a couple of statements which resonated with my dim view of information access systems. There is an accompanying video in the write up. I, as you may know, gentle reader, am not into video. I prefer reading, which is the old fashioned way to suck up useful factoids.

Here’s the first passage I highlighted:

Any search tool can match an exact query to structured data—but only after all of the data is indexed. What happens when there are variations? What if the data is unstructured and there’s no time for indexing? [Emphasis added]

The answer to the question is increasing costs for sales and marketing. The early warning for amped up baloney are the presentations given at conferences and pumped out via public relations firms. (No, Buffy, no, Trent, I am not interested in speaking with the visionary CEO who hired you.)

I also highlighted:

With the power to complete fuzzy search 600X faster at scale, Ryft has opened up tremendous new possibilities for data-driven advances in every industry.”

I circled the 600X. Gentle reader, I struggle to comprehend a 600X increase in content processing. Dear Mother Google has invested to create a new chip to get around the limitations of our friend Von Neumann’s approach to executing instructions. I am not sure Mother Google has this nailed because Mother Google, like IBM, announces innovations without too much real world demonstration of the nifty “new” things.

I noted this statement too:

For the first time, you can conduct the most accurate fuzzy search and matching at the same speed as exact search without spending days or weeks indexing data.

Okay, this strikes me as a capability I would embrace if I could get over or around my skepticism. I was able to take a look at the “solution” which delivers the astounding performance and information access capability. Here’s an image from Ryft’s engineering professionals:

image

Notice that we have Spark and pre built components. I assume there are myriad other innovations at work.

The hitch in the git along is that in order to deal with certain real world information processing challenges, the inputs come from disparate systems, each generating substantial data flows in real time.

Here’s an example of a real world information access and understanding challenge, which, as far as I know, has not been solved in a cost effective, reliable, or usable manner.

image

Image source: Plugfest 2016 Unclassified.

This unclassified illustration makes clear that the little things in the sky pump out lots of data into operational theaters. Each stream of data must be normalized and then converted to actionable intelligence.

The assertion about 600X sounds tempting, but my hunch is that the latency in normalizing, transferring, and processing will not meet the need for real time, actionable, accurate outputs when someone is shooting at a person with a hardened laptop in a threat environment.

In short, perhaps the spark will ignite a fire of performance. But I have my doubts. Hey, that’s why I spend my time in rural Kentucky where reasonable people shoot squirrels with high power surplus military equipment.

Stephen E Arnold, May 20, 2016

The Kardashians Rank Higher Than Yahoo

May 20, 2016

I avoid the Kardashians and other fame chasers, because I have better things to do with my time.  I never figured that I would actually write about the Kardashians, but the phrase “never say never” comes into play.  As I read Vanity Fair’s “Marissa Mayer Vs. ‘Kim Kardashian’s Ass” : What Sunk Yahoo’s Media Ambitions?” tells a bleak story about the current happenings at Yahoo.

Yahoo has ended many of its services, let go fifteen percent of staff, and there are very few journalists left on the team.  The remaining journalists are not worried about producing golden content, they have to compete with a lot already on the Web, especially “Kim Kardashian’s ass” as they say.

When Marissa Mayer took over Yahoo as the CEO in 2012, she was determined to carve out Yahoo’s identity as a tech company.  Mayer, however, wanted Yahoo to be media powerhouse, so she hired many well-known journalists to run specific niche projects in popular areas from finance to beauty to politics.  It was not a successful move and now Yahoo is tightening its belt one more time.  The Yahoo news algorithm did not mesh with the big name journalists, the hope was that their names would soar above popular content such as Kim Kardashian’s ass.  They did not.

Much of Yahoo’s current work comes from the Alibaba market.  The result is:

“But the irony is that Mayer, a self-professed geek from Silicon Valley, threw so much of her reputation behind high-profile media figures and went with her gut, just like a 1980s magazine editor—when even magazine editors, including those who don’t profess to “get” technology, have long abandoned that practice themselves, in favor of what the geeks in Silicon Valley are doing.”

Mayer was trying to create a premiere media company, but lower quality content is more popular than top of the line journalists.  The masses prefer junk food in their news.

 

Whitney Grace, May 20, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Big Data and Value

May 19, 2016

I read “The Real Lesson for Data Science That is Demonstrated by Palantir’s Struggles · Simply Statistics.” I love write ups that plunk the word statistics near simple.

Here’s the passage I highlighted in money green:

… What is the value of data analysis?, and secondarily, how do you communicate that value?

I want to step away from the Palantir Technologies’ example and consider a broader spectrum of outfits tossing around the jargon “big data,” “analytics,” and synonyms for smart software. One doesn’t communicate value. One finds a person who needs a solution and crafts the message to close the deal.

When a company and its perceived technology catches the attention of allegedly informed buyers, a bandwagon effort kicks in. Talks inside an organization leads to mentions in internal meetings. The vendor whose products and services are the subject of these comments begins to hint at bigger and better things at conferences. Then a real journalist may catch a scent of “something happening” and writes an article. Technical talks at niche conferences generate wonky articles usually without dates or footnotes which make sense to someone without access to commercial databases. If a social media breeze whips up the smoldering interest, then a fire breaks out.

A start up should be so clever, lucky, or tactically gifted to pull off this type of wildfire. But when it happens, big money chases the outfit. Once money flows, the company and its products and services become real.

The problem with companies processing a range of data is that there are some friction inducing processes that are tough to coat with Teflon. These include:

  1. Taking different types of data, normalizing it, indexing it in a meaningful manner, and creating metadata which is accurate and timely
  2. Converting numerical recipes, many with built in threshold settings and chains of calculations, into marching band order able to produce recognizable outputs.
  3. Figuring out how to provide an infrastructure that can sort of keep pace with the flows of new data and the updates/corrections to the already processed data.
  4. Generating outputs that people in a hurry or in a hot zone can use to positive effect; for example, in a war zone, not get killed when the visualization is not spot on.

The write up focuses on a single company and its alleged problems. That’s okay, but it understates the problem. Most content processing companies run out of revenue steam. The reason is that the licensees or customers want the systems to work better, faster, and more cheaply than predecessor or incumbent systems.

The vast majority of search and content processing systems are flawed, expensive to set up and maintain, and really difficult to use in a way that produces high reliability outputs over time. I would suggest that the problem bedevils a number of companies.

Some of those struggling with these issues are big names. Others are much smaller firms. What’s interesting to me is that the trajectory content processing companies follow is a well worn path. One can read about Autonomy, Convera, Endeca, Fast Search & Transfer, Verity, and dozens of other outfits and discern what’s going to happen. Here’s a summary for those who don’t want to work through the case studies on my Xenky intel site:

Stage 1: Early struggles and wild and crazy efforts to get big name clients

Stage 2: Making promises that are difficult to implement but which are essential to capture customers looking actively for a silver bullet

Stage 3: Frantic building and deployment accompanied with heroic exertions to keep the customers happy

Stage 4: Closing as many deals as possible either for additional financing or for licensing/consulting deals

Stage 5: The early customers start grousing and the momentum slows

Stage 6: Sell off the company or shut down like Delphes, Entopia, Siderean Software and dozens of others.

The problem is not technology, math, or Big Data. The force which undermines these types of outfits is the difficulty of making sense out of words and numbers. In my experience, the task is a very difficult one for humans and for software. Humans want to golf, cruise Facebook, emulate Amazon Echo, or like water find the path of least resistance.

Making sense out of information when someone is lobbing mortars at one is a problem which technology can only solve in a haphazard manner. Hope springs eternal and managers are known to buy or license a solution in the hopes that my view of the content processing world is dead wrong.

So far I am on the beam. Content processing requires time, humans, and a range of flawed tools which must be used by a person with old fashioned human thought processes and procedures.

Value is in the eye of the beholder, not in zeros and ones.

Stephen E Arnold, May 19, 2016

Signs of Life from Funnelback

May 19, 2016

Funnelback has been silent as of late, according to our research, but the search company has emerged from the tomb with eyes wide open and a heartbeat.  The Funnelback blog has shared some new updates with us.  The first bit of news is if you are “Searchless In Seattle? (AKA We’ve Just Opened A New Office!)” explains that Funnelback opened a new office in Seattle, Washington.   The search company already has offices in Poland, United Kingdom, and New Zealand, but now they want to establish a branch in the United States.  Given their successful track record with the finance, higher education, and government sectors in the other countries they stand a chance to offer more competition in the US.  Seattle also has a reputable technology center and Funnelback will not have to deal with the Silicon Valley group.

The second piece of Funnelback news deals with “Driving Channel Shift With Site Search.”  Channel shift is the process of creating the most efficient and cost effective way to deliver information access and usage to users.  It can be difficult to implement a channel shift, but increasing the effectiveness of a Web site’s search can have a huge impact.

Being able to quickly and effectively locate information on a Web site saves time for not only more important facts, but it also can drive sales, further reputation, etc.

“You can go further still, using your search solution to provide targeted experiences; outputting results on maps, searching by postcode, allowing for short-listing and comparison baskets and even dynamically serving content related to what you know of a visitor, up-weighting content that is most relevant to them based on their browsing history or registered account.

Couple any of the features above with some intelligent search analytics, that highlight the content your users are finding and importantly what they aren’t finding (allowing you to make the relevant connections through promoted results, metadata tweaking or synonyms), and your online experience is starting to become a lot more appealing to users than that queue on hold at your call centre.”

I have written about it many times, but a decent Web site search function can make or break a site.  Not only does it demonstrate that the Web site is not professional, it does not inspire confidence in a business.  It is a very big rookie mistake to make.

 

Whitney Grace, May 19, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

IBM Uses Watson Analytics Freebie Academic Program to Lure in Student Data Scientists

May 6, 2016

The article on eWeek titled IBM Expands Watson Analytics Program, Creates Citizen Data Scientists zooms in on the expansion of the IBM  Watson Analytics academic program, which was begun last year at 400 global universities. The next phase, according to Watson Analytics public sector manager Randy Messina, is to get Watson Analytics into the hands of students beyond computer science or technical courses. The article explains,

“Other examples of universities using Watson Analytics include the University of Connecticut, which is incorporating Watson Analytics into several of its MBA courses. Northwestern University is building Watson Analytics into the curriculum of its Predictive Analytics, Marketing Mix Models and Entertainment Marketing classes. And at the University of Memphis Fogelman College of Business and Economics, undergraduate students are using Watson Analytics as part of their initial introduction to business analytics.”

Urban planning, marketing, and health care disciplines have also ushered in Watson Analytics for classroom use. Great, so students and professors get to use and learn through this advanced and intuitive platform. But that is where it gets a little shady. IBM is also interested in winning over these students and leading them into the data analytics field. Nothing wrong with that given the shortage of data scientists, but considering the free program and the creepy language IBM uses like “capturing mindshare among young people,” one gets the urge to warn these students to run away from the strange Watson guy, or at least proceed with caution into his lair.

Chelsea Kerwin, May 6, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Mouse Movements Are the New Fingerprints

May 6, 2016

A martial artist once told me that an individual’s fighting style, if defined enough, was like a set of fingerprints.  The same can be said for painting style, book preferences, and even Netflix selections, but what about something as anonymous as a computer mouse’s movement?  Here is a new scary thought from PC & Tech Authority: “Researcher Can Indentify Tor Users By Their Mouse Movements.”

Juan Carlos Norte is a researcher in Barcelona, Spain and he claims to have developed a series of fingerprinting methods using JavaScript that measures times, mouse wheel movements, speed movement, CPU benchmarks, and getClientRects.   Combining all of this data allowed Norte to identify Tor users based on how they used a computer mouse.

It seems far-fetched, especially when one considers how random this data is, but

“’Every user moves the mouse in a unique way,’ Norte told Vice’s Motherboard in an online chat. ‘If you can observe those movements in enough pages the user visits outside of Tor, you can create a unique fingerprint for that user,’ he said. Norte recommended users disable JavaScript to avoid being fingerprinted.  Security researcher Lukasz Olejnik told Motherboard he doubted Norte’s findings and said a threat actor would need much more information, such as acceleration, angle of curvature, curvature distance, and other data, to uniquely fingerprint a user.”

This is the age of big data, but looking Norte’s claim from a logical standpoint one needs to consider that not all computer mice are made the same, some use lasers, others prefer trackballs, and what about a laptop’s track pad?  As diverse as computer users are, there are similarities within the population and random mouse movement is not individualistic enough to ID a person.  Fear not Tor users, move and click away in peace.

 

Whitney Grace, May 6, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »