Information Does Leak
December 10, 2010
In spite of repeated attempts in the last few days to shut Wikileaks down, the bane of the establishment’s existence presses on. Forced to relocate its Web address to a Swiss site, wikileaks.ch, after its original host, Amazon.com, abandoned it, Wikileaks has found refuge in hundreds of mirror sites, other Web sites that duplicate its content and make removal of the information from the Internet nearly impossible.
At last count, Wikileaks was mirrored on 1368 sites.
As documented by several news services, including the BBC, on Monday, founder Julian Assange was arrested by British police onsexual-assault charges. In both cases, the women acknowledge that sex with Assange was consensual, but they claim at some point, it became nonconsensual.
Meanwhile, in the United States, the anti-Wikileaks rhetoric has amped itself up several notches. The Hill quoted former speaker of the House Newt Gingrich as saying, “Information warfare is warfare, and Julian Assange is engaged in warfare. Information terrorism, which leads to people getting killed, is terrorism, and Julian Assange is engaged in terrorism. He should be treated as an enemy combatant.”
As CNN reported, one “statesman” even went so far as to call for his execution.
The standard buzz-phrase defenses are also being trotted out, aptly demonstrated by Sen. Diane Feinstein’s Wall Street Journal op ed, such as “threat to national security” and “placing our troops in harm’s way.”The keen-minded among us should note that these are the same buzz-phrase defenses that have been used to keep the United States mired in at least two wars for the last decade.
Is there evidence that Assange is endangering the United States? Some pundits see powerbrokers are being embarrassed and exposed. How is that a bad thing for the powerless? We understand how it is a bad thing for those in power, however.
After all, as Time Magazine reported, it was the United States government that trumpeted the interagency sharing of intelligence post-911. Naturally, it is the United States government that is appalled when such a wide distribution of “sensitive” information finds its way into the datasphere.
Whatever you may think of the interesting Julian Assange, some see him as doing a service to Americans and to the world by leaking this information. The less we know about our government’s activities, the stronger they become. The more we know, the stronger we become.
No matter how many attempts are made to silence Wikileaks and its supporters or to censor the Web, Assange and those like him are creatures of the Net. It is pervasive in its scope. Thus, Wikileaks or a comparable service will persist.
Can a government bureaucracy control information that leaks to the Internet? We don’t know.
Pete Fernbaugh, December 10, 2010
Freebie
Google: Buy Time
December 10, 2010
I marvel at how giant news outfits get scoops. I mean day after day I read articles from the New York Times that are revolutionary. Revolutionary I say. Here’s an example: “Google Executive: No Time to Build, So We Buy.”
The article explains that Google is in a hurry. Google has lots of employees but it can’t form teams quickly enough to respond to opportunities. Despite having lots of Math Club guys and gals, Google can’t pinpoint the exact person needed to hop on a bandwagon and play the tuba.
Goodness. Who would have guessed? Google has watch Amazon implement most of the cloud ideas that Google spelled out in its voluminous patent applications in the last 10 years. Google watched Facebook catch fire, Microsoft buy a piece of the company, and Facebook explain that it wants to become the Web. Already some Facebook users stay within Facebook. No bigger Web wanted. Period.
Here’s what the New York Times did not ask Susan Mojcicki, a Googler as close to the throbbing hearts of the Google triumvirate as anyone in Mountain View:
- Why can’t you assemble a team?
- Why can’t you identify internal talent?
- Why can’t you create products that capitalize on trends?
- Why can’t Google deploy a local service that makes sense to advertisers?
Did the Gray Lady ask these questions? Nope. What we get is revolutionary information that warrants a Google T shirt.
Stephen E Arnold, December 10, 2010
Freebie
Oracle SES10g and Stellent
December 10, 2010
I have a stack of “read later” documents. I was waiting for a Skype call, and I read the October 2010 Oracle white paper “Searching Oracle Content Server (Stellent) with Oracle Secure Enterprise Search 10g.” The title caught my attention because SES is now at version 11g, but the October 2010 white paper focuses on Oracle SES10g. Interesting.
If you are a Stellent customer, you may want to check out the white paper. If you compete with Oracle, the white paper is a must read. Several points jumped out when I worked my way through the 25 pages of text and diagrams.
First, the good stuff begins on page 10 of the white paper. Here’s the passage that signals the work ahead for the never-to-be-fired Oracle database administrator:
“The first challenge is to retrieve the data and the metadata from the Content Server repository into SES.”
So much for seamless integration of the Stellent and the SES10g software.
Second, the method for mapping metadata does not specify who does the mapping. A bit of noodling leads me to the hypothesis that the “smart” software does mapping, then the lucky Oracle database administrator gets to dive in. In short, mapping fields and document attributes is likely to be a bit of work.
Third, the security steps are no big surprise. What did catch my attention was the need to install an additional security component into Content Server and then take a snapshot of the repository. My hunch is that one wants to have sufficient system and storage and resources to handle beefy repositories. Running out of space is likely to mandate going back to square one. Not a hint of graceful recovery or even a checklist of “Do This Before You Start” items.
Fourth, a number of components and functions have to be configured by the system administrator. Once again: hands on work.
Oracle DBAs have job security by design I opine.
Stephen E Arnold, December 10, 2010
Freebie
CopperEye: Speedy Stuff
December 10, 2010
I came across CopperEye several years ago. I was looking for a solution that would cope with large volumes of data, mainframe and client server hardware, and specific performance requirements. CopperEye met the specs. In London last week, I engaged in a conversation and learned that CopperEye was not widely known in the more traditional search and retrieval field. The purpose of this write up is to provide some basic information about the company. In a nutshell, the firm offers a system that can discover, parse and index data in a relational database or flat file output. The method can handle “big data”. (A video demo is available on YouTube.)
In 2007, In-Q-Tel, the investment arm of the Central Intelligence Agency, signed a deal for a strategic investment in CopperEye. In that 2007 announcement, an In-Q-Tel spokesperson said:
We selected CopperEye because it offers superior technology in the area of the retention and retrieval of structured, historical data,” said Troy M. Pearsall, Executive Vice President of Technology Transfer at In-Q-Tel. “Given the volume of information gathered by organizations within the public and private sectors, it made perfect sense to invest in an innovative data access technology that will potentially meet the critical needs of the U.S. Intelligence Community. We look forward to working with CopperEye in the coming months and years.”
Based on the information in my Overflight system, CopperEye is privately held. Now about 10 years old, the company provides enterprise class archiving solutions, including compliance archiving. The firm’s search product is called CopperEye Search. The Greenwich product uses standard SQL to retrieve records from log files. the Secure Data Retrieval Server is an an appliance that complies with with data retention regulations. the CopperEye Indexing function is optimized for high speed.
The current version Retrieval Server includes features improved compression The compression introduces no latency while yielding more efficient storage and reduced disc accesses. The system has been engineered for high availability. When deployed as a distributed system, queries operate as though the data set were a single environment. One interesting feature is that the system can be configured to process queries as parallel, failover or round robin methods.
The CEO of the firm is Carmen Carey. The founders are Paul McCafferty (COO) and Duncan Pauly. You can get more information about the company at www.coppereye.com.
Stephen E Arnold, December 10, 2010
Freebie
Open Source Push Back or Around
December 9, 2010
I found interesting “The Dark Side of Open Source Conferences.” If you are anti open source, you may want to print out this article and send it to your friends. If you are pro open source, keep an eye on your colleagues. Me thinks appropriate behavior is a plus. Now if you can’t make the link work, don’t honk at me. I had trouble tracking down the write up myself.
Here’s a taste of the article:
Deb Nicholson says the days of “eye candy” are far from over. She says of an event held within the last two years, “When strippers were hired to mix with people at the Saturday night event everyone attended, that made everyone uncomfortable.” Three of the ten women reported being physically assaulted at a conference. Mackenzie says, “I was grabbed from behind and kissed by a stranger without permission.” Later she found out that this person assaulted another woman at the conference. Noirin Shirley says after a man grabbed and kissed her at a conference after-party, she told him she wasn’t interested, and “He responded by jamming his hand into my underwear and fumbling.” At the Linux Storage and Filesystems Workshop 2007, I organized a group outing to a pub, only to have one of the invited workshop attendees grab my ass while I was having a completely normal conversation. (I told him to never touch me again, warned my friends about him, and refused to speak to him again.)
The gist is that some open source types make other open source types feel uncomfortable. I know how I feel when Federal authorities at airports paw me. I can imagine how those referenced in the article react.
Stephen E Arnold, December 9, 2010
Freebie
Real Time Conversation with a Mid Tier Wizard
December 9, 2010
I am not making this conversation up. I gave a talk to 43 20 somethings at Skinker’s, a delightful place near London Bridge tube stop. No, I did not buy a Skinker’s T shirt, but it did look smart. My topic was real time search. More accurately, I was explaining the engineering considerations in delivering low latency indexing and querying which most vendors and second string consultants happily tell you is “real time search”.
The most interesting part of my evening was a short conversation I had with a mid tier consultant, what I call an azure chip consultant or generally the azurini. To be a blue chip consultant is easy. Just get hired by one of the two three or four management consulting firms, do some notable work, and not die of a heart attack from the pressure. Thousands of Type A’s who crave constant stroking takes a toll, believe you me. The mid tier lad introduced himself. He reminded me that I had met him before. In the dim light of Skinker’s I would not have been able to recognize Tess, my deaf white boxer. No matter. A big grin and warm handshake were what the azure chip lad thought would jog my memory.
The basic idea is that real time is not achievable. There are gating factors at three main points in any content processing system. The first is the green box, which is the catch all for the service providers, ISPs, and others in the network chain. The pink boxes represent the vendors providing services to the client who wants low latency service. The yellow boxes represent the different “friction points” behind the firewall or within the organization’s hybrid infrastructure. Resolving these points of “friction” boils down to brains and money. If an organization lacks either, the latency of the system will be high and increase over time. Users, of course, don’t know this. The problems latency produces range from financial losses to field operations personnel being killed due to stale intelligence.
It didn’t.
Anyway, three observations.
Oracle and Open Source Agitation
December 9, 2010
At a conference in London on December 3, 2010, I posed a question to a panel of content processing CEOs. I asked, “Isn’t open source going to put more pressure on firms with proprietary software and systems.” The answers were cautiously phrased. The advantage of open source is that it appears to be “free”. But proprietary software comes with the backing of a big company, not a rag tag crowd of volunteer developers.
I was a well behaved moderator, and I did not fire back with the advantages of open source software, nor did I take issue with the measured comments of the panelists. I had read “Oracle’s Open Source Missteps Continue with Hudson Project”. I knew that Oracle was wheeling more armored vehicles into its battle with open source software. If my panelists were not worried, those folks did not see the world exactly as Larry Ellison and his senior executives perceive it. Like I said, I was a good, chubby moderator. However, I will be one Estonian kroner that Mark Hurd gets with the Oracle open source program.
What’s the latest weaponry look like? Here’s the write up’s take”:
…an Oracle email notifying Hudson users and developers of this migration was not received as the sender was not subscribed to the mailing lists in question. As a result, Hudson developers have been locked out of the mailing lists and unable to access or update the source code for more than a week.
Is this shaped information? Is this an attempt to wrest control of the Hudson Project from the open source community?
I don’t know, but if open source is no big deal as the content processing execs told me, does Oracle know something other software company presidents don’t know?
Oracle seems to be taking increasingly contentious positions with regard to open source. My money is on Oracle’s perception of open source. After all if MySQL, Java, and the Hudson Project were irrelevant, why waste the time and PR brownie points?
This open source software warrants continued observation. Perhaps the folks reassuring me were putting on their game faces, keeping the inner concerns under wraps?
Stephen E Arnold, December 9, 2010
Freebie
Online London Endnote
December 9, 2010
The weather made life exciting at the International Online Show in London this year. With weather crippling transport and causing power failures at my local Tesco, I was fearful that no one would turn up for the closing session of the three-day event, held November 30, December 1, and December 2, 2010.
I was wrong. More than 190 people braved the weather to hear comments from a stellar panel of search and content processing executives. On the panel were:
- Autonomy, Andrew Kanter, chief operating officer
- Funnelback, David Hawking, chief scientist
- SAS, Saratendu Sethi, director text mining and research
- Smartlogic, Jeremy Bentley, managing director.
The format of the endnote is a wide ranging discussion of likely trends in 2011. I am not able to summarize the 90 minutes of commentary by the distinguished endnote participants.
In this short post, I want to highlight the trends ArnoldIT.com’s research have identified as among the most interesting for 2011. The format of the endnote is a “debate”; that is, I summarize a trend and the endnote panelists explain their views on each trend. Judging from feedback after these “debates”, the audience is pleased with the information and the format.
London in the snow. Life as it was in the 1750s. Source: Francis Sedgemore
Here are the trends that were posed to the panelists:
Open Source
Like it or not, open source is going to be an issue in 2011. Among the disruptive aspects of community-supported, free software will be large companies’ efforts to make open source proprietary. In parallel, the open source adherents will work overtime to prevent big, commercial enterprises from choking off an alternative to lock-in, high prices, and vendor-determined upgrade paths. People can laugh off open source, but the Oracle Google law suit and the forking underway in important open source projects signal a pot that could boil over without warning. Mid tier consulting firms are busy blowing off open source search. And why not? The former English majors and Art History majors now masquerading as search experts make their money from commercial outfits who slurp down these firms thin porridge.
Search Is Marginalized
Look at the state of Microsoft SharePoint search. Look at the marketing messages from leading search vendors. Look at the number of low cost search solutions from companies like Sowsoft to dtSearch, among others. Lucene/Solr are free to download and are now in use at Cisco Systems, LinkedIn, and Twitter. This means that key word vendors are going to be pressured. And it means that search and content processing vendors are going to have to find a way to make sales in niche markets. I anticipate some mergers and failures in 2011. Big changes afoot.
Big Data
Big data are a problem. Most search and content processing systems cannot be tweaked to handle petascale data flows. As a result, new vendors are entering the market and are going to exert more pressure on incumbents. Companies that can morph into a search system range from Aster Data to Cloudera to Digital Reasoning. Yet none of these companies is at this show or on the radar of most procurement teams. In 2011, big data will cripple some incumbent search installations, hastening mergers or outright business failures.
Laundry (Dirty and Lists)
Objective search results are disappearing. Hit boosting is the name of the game, whether for political, administrative, or advertising purposes. The laundry list of search results just makes work for those looking for information. In 2011, the quest for user experience will further erode the user’s ability to get unbiased information. In 2011, analytic engines operating automatically will produce information. The problem with “training wheels” is that the consumers of information will be easily duped. Disinformation and what I call weaponized information like Wikileaks will become a major focus of search and content processing systems. Vendors will deliver tools; licensees will use them to shape the information experience.
Mobile Access
At most trade shows, the focus is on a style of computing that is going to lose ground in 2011. In some countries, mobile device access is the primary method of obtaining information. Traditional search and content processing vendors will be under severe pressure to shoehorn ageing systems into the new form factors. The results will be mixed. New providers will emerge, adding pressure to an already charged business environment.
You, like the panelists, don’t have to agree with these trends our research highlighted. That’s the hook for this type of endnote.
If you want this type of wrap up for one of your upcoming conferences or sales meetings, write us at seaky2000@yahoo.com. Fees do apply.
Stephen E Arnold, December 9, 2010
Freebie but the gig in London was sponsored
Arnold Guest on Land SDS Podcast
December 9, 2010
Stephen E Arnold, ArnoldIT.com, was a guest on Dr. Tyra Oldham’s Land SDS podcast. The interview covered the contribution of computer technology to “green” initiatives. In the interview, Arnold said, “Significant gains in energy efficiency can be obtained using real time micro adjustments. Computers and smart software can dynamically adjust systems to reduce energy costs and minimize ecological impact.”
You can listen to the complete interview at www.landsds.com
Ken Toth, December 9, 2010
Microsoft, the US Treasury, and Search
December 9, 2010
The new Microsoft-based Treasury.gov Web site works pretty well. Pictures flash, the links work, and the lay out is reasonably clear. There is the normal challenge of government jargon. So “Help, I am going to lose my home” becomes “Homeowner’s HOPE Hotline”.
I am interested in search and retrieval. I wanted to run through my preliminary impressions of the search interface, system responsiveness, and the relevance of the queries. I look at public facing search services differently from most people’s angle of attack. Spare me direct complaints via email. Just put your criticisms, cautions, and comments in the form provided at the foot of this Web page.
Search Interface
The basic search box is in the top right hand corner of the splash page. No problem, and when I navigate to other pages in the Web site, the search box stays put. However, when I click on some links I am whisked outside of the Treasury.gov site and the shift is problematic. No search box on some pages. Here’s an example: http://www.makinghomeaffordable.gov/index.html. Remember my example from the HOPE Hotline reference? Well, that query did not surface content gold on Treasury.gov. I went somewhere else, and I was confused. This probably is a problem peculiar to me, but I found it disconcerting.
Other queries I ran a query for “Treasury Hunt,” a service that allows me to determine if a former Arnold left money or “issues” for me. Here’s the result screen for the query “Treasury Hunt”:
The first hit in the result list points to this page:
The problem is that the hot link from this page points to this Web site, which I could not locate in the results list.
Several observations:
First, the response time for the system was sluggish, probably two seconds, which was longer than Google’s response time. No big deal, just saying “slower.”
Second, the results list did not return the expected hit. For most people, this makes zero difference. For me, I found the lack of matching hits to explicit links interesting. In fact, I assumed that the results list would have the TreasuryDirect hit at the top of the results list. Not wrong, just not what I expected.




