Leak or Plant: The New Ecology of Information

August 6, 2010

Where is the line between freedom of online speech and national security? One Web site is testing this border and creating quite a storm. The Washington Post recently ran a scathing editorial, “WikiLeaks Must be Stopped,”  discussing the legality of the aforementioned WikiLeaks (www.wikileaks.org), which claims to have leaked over 70,000 classified documents. The article pulls no punches, beginning with: “Let’s be clear: WikiLeaks is not a news organization; it is a criminal enterprise.” The article basically calls the site terroristic, though it is not affiliated with terror organizations. The Post actually encourages the United States to use military force, if necessary, to close down the site. Now, there’s no question this is a concerning site, but the internet is a place where voices can be heard, maybe the government should work harder on preventing leaks instead of crushing Web sites.

Beyond Search has some different thoughts. First, much of the information is recycled from open sources. Convenient. Second, is the information disinformation? The “value” of the content may not be the information itself but the notional impact of having these data floating around. Who loses? Who wins? Is this a new form of publishing?

Pat Roland, August 6, 2010

Content Risks and Rewards

July 28, 2010

My field is open source intelligence. I can’t reveal my sources, but I have heard that an intelligence unit can duplicate anywhere from 80 to 90 percent of its classified information from open sources. The trick of course is to know what is important. Most people can look at an open source document, dismiss it, and go about their day unaware of the key item of information that was right in front of them.

For that reason, this blog and my other blogs are open source. I use my Overflight system to suck in publicly accessible content. I look at what the system spits out and I highlight the important stuff. The magic in the system is not the software nor the writers whom I pay to create most of the content in Beyond Search and my other writings. I am sufficiently confident in my method that when I talk with a so called expert or an executive from a company, I am skeptical about what that person asserts. In most cases, experts lack the ability to put their information in context. Without context, even good information is useless.

When i read about Wikileaks publishing allegedly classified information, I wondered about the approach. Point your browser at “Next Step for Wikileaks: Crowdsourcing Classified Data” and learn what is ahead for information dissemination. The idea is that lots of people will contribute secrets.

Baloney.

The more stuff that is described as secret and sensitive, the more difficult it will be to figure out what is on the money and what is not. I have some nifty software, but I know from my tests that when information is weaponized, neither humans nor software can pinpoint where the train went off the tracks.

In my view, folks publishing allegedly classified information are looking for some rough sledding. Furthermore, the more baloney that gets pumped into the system, the greater the likelihood for disinformation.

If these documents had become known to me, I would have kept the puppies to myself. I would have used my Overflight system to verify points that my method identified as important. I would not accept any assertion, fact, or argument as valid until some more work was done.

Wikileaks is now famous, and sometimes fame can be tough. Just ask John Belushi if you can find him. People ask me what I don’t provide some color for some of my remarks. Well, that is because some information is not appropriate for a free blog. This is a lesson that I think some folks are going to learn in the School of Hard Knocks.

Stephen E Arnold, July 29, 2010

Freebie and open source

Summer Search Rumor Round Up

July 26, 2010

The addled goose has been preoccupied with some new projects. In the course of running around and honking, he has heard some rumors. The goose wants to be clear. He is not sure if these rumors are 100 percent rock solid. He does want to capture them before the mushy information slips away:

image

Source: http://oneyearbibleimages.com/rumors.gif

First, the goose heard that there will be some turnover at Microsoft Fast. The author of some of the posts in the Microsoft Enterprise Search Blog may be leaving for greener pastures. You can check out the blog at this link. What does this tell the goose? More flip flopping at Microsoft? Not sure. Any outfit that pays $1.2 billion for software that comes with its own police investigation is probably an outfit that would scare the addled goose to death. The blog is updated irregularly with such write ups as “Crawling Case Sensitive Repositories Using SharePoint Server 2010” and “SharePoint 2010 Search ‘Dogfood’ Part 3 – Query Performance Optimization.” Ah, the new problem of upper and lower case and the ever present dog food regarding performance. I thought Windows most recent software ran as fast as a jack rabbit. Guess not.

Second, a number of traditional search vendors are poking around for semantic technology. The notion that key words don’t work particularly well seems to be gaining traction. The problem is that some of the high profile outfits have been snapped up. For example, Powerset fell into the Microsoft maw and Radar Networks was gobbled by Paul Allen’s love child, Evri. Now the stampede is on. The problem is that the pickings seem to be slim, a bit like the t shirts after a sale at the Wal-Mart up the road from the goose pond here in Harrods’s Creek. For some lucky semantic startups, Christmas could come early this year. Anyone hear, a sound like “hack, hack”. Oh, that must be short for Hakia. You never know.

Third, performance may have forced a change at HMV.co.uk in merrie olde England. Dieselpoint was the incumbent. I heard that Dieselpoint is on the look out for partners and investors. The addled goose tried to interview the founder of the company but a clever PR person sidelined the goose and shunted him to the drainage ditch that runs through Blue Island, Illinois. Will Dieselpoint land the big bucks as Palantir did.

Fourth, the goose heard that a trio of Microsoft certified partners with snap in SharePoint search components were looking for greener pastures. What seems to be happening is that the easy sales have dried up since Microsoft started its current round of partner cheerleading. The words are there, but the sales are not. Microsoft seems to want the money to flow to itself and not its partners. Who is affected? The goose cannot name names without invoking the wrath of Redmond and a pride of PR people who insist that their clients are knocking the socks off the competition. However, does the enterprise need a half dozen companies pitching metatagging to SharePoint licensees? I think not. If sales don’t pick up, the search engine death watch list will pick up a few new entries before the leaves fall. Vendors in the US, Denmark, Germany, Austria, and Canada are likely to watching Beyond Search’s death watch list. Remember Convera? It spawned Search Technologies. Remember the pre Microsoft Fast? It spawned Comperio? When a search engine goes away, the azurini flower.

Fifth, what’s happened to the Oracle killers? I lost track of Speed of Mind years ago. There was a start up with a whiz bang method of indexing databases. I haven’t heard much about killing Oracle lately. In fact, stodgy old Oracle is once again poking around for search and content processing technology according to one highly unreliable source. With SES11g now available to Oracle database administrators, perhaps the time is right to put some wood behind a 21st century search solution.

If you want to complain about one of these rumors, use the comments section of this blog. Alternatively, contact one of the azurini outfits and get “real” verification. Some of their consultants use this blog as training material for the consultants whom you compensate. No rumor this. Fact.

Stephen E Arnold, July 26, 2010

Freebie

Endeca and Agile Business Intelligence

July 26, 2010

If you have not read the interview about Fetch Technologies, you might want to take a look. Fetch is a company that sucks in content and makes it available for analysis. Among its features is an innovative programming method. The idea is that the old style business intelligence approach is too slow for today’s business and operational environment. Fetch is an information platform, and it has a number of advocates. Also, in the same sector are equally accomplished outfits such as Kapow Tech and JackBe, among others. Vendors like Exalead have made significant headway in business intelligence, challenging some of the old line outfits to up their game. IBM bought SPSS but I am waiting for significant innovation. SAS acquired Teragram and Memex, so I expect big things from these firms. Autonomy has a Hummer filled with business intelligence clients, and that firm continues to chew into the old line firms cut off from the fast moving client herd. In short, business intelligence is a big deal.

Endeca has been in the business intelligence business for many years. I did a report that pegged the date in the 2002 to 2003 period, maybe even earlier.  I was, therefore, not surprised with the information revealed in “New Study Details Top Questions Effective CIOs Must Ask to Determine Agile BI Readiness.” With the stampede to business intelligence, it is obvious even to a first year business school student at an academic backwater like the one I attended that something is causing the corporate antelope to take off.

The cause of the shift, in my experience, boils down to four factors:

First, traditional business intelligence is complicated and requires dedicated headcount to get up and running and create the reports managers require.

Second, the managers are usually clueless about what constitutes “good data.” In fact, with lousy data, the business intelligence systems produce outputs that may mislead the clueless MBA from my alma mater. The reports often baffle me, but I am an addled goose and don’t really have corporate grade bloodlines.

Third, the time windows in which decisions must be taken continue to get smaller. Whether real or induced by iPhone attention deficit disorder is irrelevant. Crunching data from the dinosaur systems takes too long. Most of the azure chip consultants seem okay with the idea that systems their firms recommend make guessing a standard business practice.

Fourth, disparate data are very expensive to normalize. That’s why outfits like Fetch and Palantir are doing pretty well. Palantir, as you may know, is now valued at $1.0 billion and sucks in disparate data, outputs reports, and pretty much leapfrogs the more traditional outfits.

What did the Endeca study reveal?

Here are the points that jumped out at me:

  1. Analysts have to create reports for more than 70 percent of those in the survey sample
  2. Time is short, deadlines vary.
  3. About half of those in the sample found business intelligence systems too hard.

No surprises for me.

What interested me is that a company with a strong foundation in eCommerce is pushing business intelligence. My view is that Endeca, like other vendors in search and content processing, want to generate more revenues from their technology, content connectors, and partnerships with value added resellers.

The challenge for Endeca will be to deal with the inertia within many large companies. Endeca is not alone in chasing business intelligence. In fact, Endeca has been pushing business intelligence to some degree for a number of years. So far the high performers have been companies with a combination of technology, content processing capabilities, and the ability to solve specific business problems.

My hunch is that general purpose business intelligence systems are going to face a long slog uphill. The newer players have to contend with one another, price cuts, the economy, and the marketing challenge.

Perhaps a bespoke survey will do the trick? My view is that Endeca like other search vendors is looking for a way to generate revenue. Like eDiscovery, the realities of the marketplace are going to make it tough for the many business intelligence vendors to find a pot of gold at the end of the BI rainbow.

Furthermore, I think that the business intelligence push is one more indication that “pure search” has been disrupted significantly. Furthermore, can some search vendors deliver actionable business intelligence or just results list with lipstick and mascara?

Stephen E Arnold, July 26, 2010

Freebie

Exclusive Interview: Mike Horowitz, Fetch Technologies

July 20, 2010

Savvy content processing vendors have found business opportunities where others did not. One example is Fetch Technologies, based in El Segundo, California. The company was founded by professors at the University of Southern California’s Information Sciences Institute. Since the firm’s doors opened in the late 1990s, Fetch has developed a solid clientele and a reputation for cracking some of the most challenging problems in information processing. You can read an in-depth explanation of the Fetch system in the Search Wizards Speak’s interview with Mike Horowitz.

The Fetch solution uses artificial intelligence and machine learning to intelligently navigate and extract specific data from user specified Web sites. Users create “Web agents” that accurately and precisely extract specific data from Web pages. Fetch agents are unique in that they can navigate through form fields on Web sites, allowing access to data in the Deep Web, which search engines generally miss.

You can learn more about the company and its capabilities in an exclusive interview with Mike Horowitz, Fetch’s chief product officer. Mr. Horowitz joined Fetch after a stint at Googler.

In the lengthy discussion with Mr. Horowitz, he told me about the firm’s product line up:

Fetch currently offers Fetch Live Access as an enterprise software solution or as a fully hosted SaaS option. All of our clients have one thing in common, and that is their awareness of data opportunities on the Web. The Internet is a growing source of business-critical information, with data embedded in millions of different Web sites – product information and prices, people data, news, blogs, events, and more – being published each minute. Fetch technology allows organizations to access this dynamic data source by connecting directly to Web sites and extracting the precise data they need, turning Web sites into data sources.

The company’s systems and methods make use of proprietary numerical recipes. Licensees, however, can program the Fetch system using the firm’s innovative drag-and-drop programming tools. One of the interesting insights Mr. Horowitz gave me is that Fetch’s technology can be configured and deployed quickly. This agility is one reason why the firm has such a strong following in the business and military intelligence markets.

He said:

Fetch allows users to access the data they need for reports, mashups, competitive insight, whatever. The exponential growth of the Internet has produced a near-limitless set of raw and constantly changing data, on almost any subject, but the lack of consistent markup and data access has limited its availability and effectiveness. The rise of data APIs and the success of Google Maps has shown that there are is an insatiable appetite for the recombination and usage of this data, but we are only at the early stages of this trend.

The interview provides useful insights into Fetch and includes Mr. Horowitz’s views about the major trends in information retrieval for the last half of 2010 and early 2011.

Now, go Fetch.

Stephen E Arnold, July 20, 2010

Freebie. I wanted money, but Mr. Horowitz provided exclusive screen shots for my lectures at the Special Library Association lecture in June and then my briefings in Madrid for the Department of State. Sigh. No dough, but I learned a lot.

Amateur Sleuthing: Looking behind an Email Address?

July 12, 2010

Short honk: I am not endorsing the method disclosed in “How To: Find the Person Behind an Email Address.” You may find the techniques useful. Enjoy being Dick Tracy. Don’t forget your wrist radio.

Stephen E Arnold, July 12, 2010

Freebie

Clarabridge API Available

July 5, 2010

When we first learned about Clarabridge, our initial impression was that it was a system developed primarily for Microstrategy customers wanting more beef in their business intelligence capabilities. Over the years, the company has diversified and expanded its market reach and capabilities. Now Clarabridge is aiming to improve its customer feedback searchability by adding an API. Yahoo! Finance recently reported this upgrade and its many benefits in an eye-opening article. Currently, Clarabridge provides sentiment and text analytic software for improving customer experience. By adding SOAP-based application program interfaces (APIs) customers will now be able to better review feedback. The API allows users to submit a single document for processing, real-time extraction of language content and sentiment, as well as customized starter packs for various customer needs. These new API options are an exciting addition in the elusive world of customer satisfaction. If used properly, this software could one day replace former marketing tools like focus groups and customer surveys. Note: If the Yahoo News link goes dead, you can get the information from www.clarabridge.com.

Jessica West Bratcher, July 5, 2010

Freebie

Humans Not Replaceable Yet

July 5, 2010

The secret to national security is in searches, or so a recent Federal Times article tries to convince us. Citing the botched Christmas Day terror attempt, it claims that Homeland Security is deluged in so much data that agents could never be expected to stop a suspect in time. “Without better information systems,” the article says, “the intelligence community will be hamstrung in its efforts to transform information into intelligence.” The answer, it claims, are semantic searches that make preliminary conclusions on their own. So much faith in smart and semantic search capabilities is exciting, but overlooks the human element. High-powered search tools are great, but the technology still cannot surpass human instincts and knowledge, no matter how sensitive the equipment.

Jessica West Bratcher, July 5, 2010

Freebie

More Efficient Social Graph and Semantic Analysis

June 30, 2010

Short honk: My hunch is that the University of Maryland has come up with a nifty method to deal with some cumbersome and computationally intensive computations. Navigate to “Scientists Develop World’s Fastest Program to Find Patterns in Social Networks” and read about fancy math and chopping big data into chunks. With the technique, figuring out patterns gets easier. I will resist a pun about cozying up to big data. Here’s the passage that caught my attention in the write up:

In a paper that has been accepted for presentation at the 2010 Advances in Social Network Analysis and Mining conference to be held in Denmark in August, Broecheler, Pugliese and Subrahmanian [University of Maryland wizards] leveraged a key insight – it is possible to split the social network into a set of almost independent, relatively small sub-networks, each of which is stored on a computer in a cloud computing cluster in such a way that the probability that a query pattern will need to access two nodes is kept as small as possible. Using knowledge of past queries and a complex set of calculations to compute these probabilities, their paper reports algorithms and experiments to answer social network subgraph pattern matching queries on real-world social network data with 778 million edges (which may denote relationships or connections between individuals) in less than one second. More recent results not contained in the paper are able to efficiently answer queries to social network databases containing over a billion edges.

Strikes me as important, particularly for outfits gunning their PT boats toward Fort Google.

Stephen E Arnold, June 30, 2010

Freebie

Cyber Warriors and Search

June 29, 2010

Booz, Allen – the outfit where I worked after my years at Halliburton NUS (Nuclear Utility Services) – has been booking business big time in Washington, DC. I have heard that Booz, Allen has been explaining the challenges of cyber warfare. Now this is not a new topic. A number of analysts have pointed out that systems connected to a public network can be compromised by a range of methods. I recall hearing a lecture by Winn Schwartau a number of years ago. Now the blue chip crowd has caught up with Mr. Schwartau, the author of Information Warfare, and some of his ideas which date from the late 1990s.

One azure chip consulting firm advocated slashing security budgets. I wrote about that odd approach at a time of risk in “Cut That Security Budget, Says Azure Chip Consultancy.” I know about marching out of step, but it is a good idea to be on the same parade ground.

I received an email from one of my two or three readers pointing me the online defense magazine, Defense Update. The April story “Hackers, Terrorists or Cyber Warriors?” is an interesting one. The key idea is that “cyber warfare is here and now.” In that write up are some useful ideas and facts. For me, the key passage was:

Shai Blitzbau, technical director at Magelan information defense and intelligence services describes typical attacks simulated by his company, providing threat assessment audit for government, security and commercial organizations. In recent exercises Magelan performed a threat simulation, that targeted an essential national infrastructure network responsible for the production and distribution of a vital product, considered as basic necessity for the entire population. The simulation demonstrated how, after 96 hour preparation, the team could bring a network, producing and distributing critical goods to a standstill, and keep it idle for at least two weeks. The aggressor team that started with zero access to, or knowledge of the target, managed to study the target, write malicious code, penetrate the network and execute his attack in less than four days.

I wanted to point out that there are extremely fast, effective search systems that can index and make searchable content “sucked” out of a secure system. You can learn about the Gaviri pocket search technology at www.gaviri.com.

Search is one component in the warrior’s arsenal. Booz, Allen is right in forcing governmental entities to be aware of risks. Within the last 14 days, I have been in a facility. I had in my back pocket a small USB drive equipped with a “pocket search” technology. The screening did not flag this device. I did not realize I had the USB in my pocket until I emptied my pockets at the hotel after the meeting.

The blue chip crowd is correct in focusing attention on cyber warfare. Slashing security budgets is ill considered in my opinion.

Stephen E Arnold, June 29, 2010

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta