Cell Phone Early Warning System

November 9, 2009

A happy quack to my colleague in the Near East for pointing me to “Cellphone Alert System Expected in 2 Yrs.” The point of the story is that Israel’s home front command “will be able to calculate the precise location of an impact zone, and alert residents in an affected neighborhood via their cellphones.” I also noted this passage:

Soffer [Israeli official] said that 90 percent of the civilian casualties sustained by Israel during the Second Lebanon War and Operation Cast Lead in Gaza involved people who were struck by projectiles while they were in open areas away from buildings. Civilians who seek cover in designated safe zones during rocket attacks are not likely to be wounded or killed…

Interesting use of “push, real-time mobile technology in my opinion.

Stephen Arnold, November 9, 2009

I was at the Jewish Community Center last night but I had to pay to get in. I don’t think that counts as payment for this write up. To be safe, I will alert the Jefferson Country Animal Control Office.

Written by Stephen E. Arnold · Filed Under Mobile, News, Real time search, Security, Technology | 3 Comments

Paranoia Blossoms

October 14, 2009

The article “5 Ways You’re Secretly Being Monitored” is interesting and may represent a growing interest among Internet users. The idea is that when you run a query or ride a bus, you may be monitored. The data feed into repositories where the everyday activities get crunched, transformed, and analyzed. The article is a trifle heavy handed, which disturbs the addled goose. Worth reading, particularly if you live outside the US and plan a trip to popular destinations like the big cities, will use the Internet, and ride mass transit.

Stephen Arnold, October 14, 2009

Written by Stephen E. Arnold · Filed Under News, Security, Technology | Comments Off on Paranoia Blossoms

Google and User Tracking Again

October 6, 2009

I found “Google View Thru Tracking: Has Big Brother Been Watching” quite interesting. In my Google monographs I describe briefly some of Google’s mechanisms for capturing user behavior data. This article is one of the first that references Google’s method for “knowing” what happens whether the user clicks or not. That statement is a simplification, but the idea is that Google’s systems and methods maintain an awareness in user sessions. Chrome is one step to a richer data collection and management method, but that technology is not referenced in the article. If you are interested in this type of monitoring, read the Search Engine Watch story.

Stephen Arnold, October 6, 2009, No dough

Written by Stephen E. Arnold · Filed Under Business strategy, News, Security, Technology | Comments Off on Google and User Tracking Again

Oracle SES11 in Beta

October 3, 2009

Oracle has put some wood behind its Secure Enterprise Search product. The current version is SES10.1.8 G. You can download this system at this Oracle link. I learned from one of my two or three readers that Oracle has moved SES11 into beta mode. The product manager of the beta is Stefan Buchta. If you want to test the system, you can obtain his email address and more information at this Oracle link.

As I was getting up to speed, I noticed that Oracle had available a new white paper, dated January 2009. The addled goose was ashamed of himself. He missed this document in his routine scan of the Oracle Overflight reports.

After downloading the white paper “Secure Enterprise Search Version 10.1.8.4”, the addled goose noticed some interesting items; to wit:

The white paper reports “sub second query performance”. My question was, “What’s the index size, refresh and query load, and system infrastructure? Throwing hardware at a performance problem is often a way to resolve performance issues, and my hunch is that SES10g can be a computational glutton.

Second, among the enhancements listed was “security”. Hmm. Security has been a feature since version 9i as I recall. I wonder how security has been improved because the “full security” for search requires the licensing of the Oracle security server which may no longer be required, but somehow I doubt that Oracle has bundled this component with the plain vanilla SES10g product.

Third, SES10g seems to use the word “repository” in concert with the phrase “Oracle 10g database”. My recollection is that the “database” can be prone to certain Oracle bottlenecks related to intensive disc reads. Performance, therefore, is easy to talk about but may be expensive to deliver. But since we have not tested this most recent build, maybe the bottlenecks have been resolved. I have heard that Oracle is a Google partner and that some of the applications folks at Oracle are using the Google Search Appliance, not SES10g. Maybe this is an aberration?

Fourth, the crawler can handled structured and unstructured data. I know that SES10g can deal with structured data. That is a core competency of Oracle. I am not 100 percent that the unstructured data challenge has been fully met. Customers want hybrid content systems and the market is heating up. Autonomy’s SPE is a challenger because the Oracle solution may not be the home run that the white paper suggests. Autonomy is quite savvy when it comes to exploiting opportunities created by large players who don’t deliver fully on the market collaterals’ assertions.

Fifth, connectors get more attention. The list of connectors on page 25 of the white paper seems to lag what’s offered by the Lucid Imagination open source search system and is even farther behind connectors available from Coveo, Exalead, and others in the search and content processing sector. Surprisingly, connectors for MicroStrategy (close to Clarabridge), Business Objects (SAP and Inxight), and Cognos (IBM) have been removed. Well, that’s one way to get Oracle shops to adopt Oracle’s in house and acquired business intelligence components.

The white paper concludes with a snapshot of the AT Kearney knowledge portal. EDS bought AT Kearney and then the partners of AT Kearney bought the firm from EDS in 2005. Since that time, AT Kearney has been chugging along. It ranks among the blue chip consulting firms and is still working to meet the lofty goals set forth by Andrew Thomas Kearney in 1929. I wonder if Oracle is an AT Kearney client. I will have to check.

The knowledge portal interface reminded me of the Clearwell Systems, Coveo, and Exalead interfaces by the way.

In short, the white paper struck me as a modest update to the previous Oracle white papers. I did not see a reference to the vertical vocabularies that were once available for Oracle content processing systems. The architecture did not strike me as significantly different. Performance gains probably come from using Intel’s multi core processors and the larger memory space enabled with 64 bit support.

Take a look. I have no pricing data at this time.

Stephen Arnold, October 3, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Database, Enterprise, News, Search, Security, Technology, Text processing | Comments Off on Oracle SES11 in Beta

Security Poker: Google Calls Microsoft

September 26, 2009

Software and security are like one of the combinations in chemistry lab. Get calcium carbine and hydrochloric acid. Mix. Ignite. Interesting. With Google marginalizing Microsoft’s Internet Explorer, Microsoft responded with an assertion about security. Wow. Microsoft’s Internet Explorer, for me at least, has been one of the software applications that gives me headaches. My father gathers malware the way I do news stories in my RSS reader.

Microsoft’s response to Google’s marginalization play is summarized in “Microsoft believes Google Chrome Frame lowers security of IE”. Google’s response is described in “Google Barks Back at Microsoft over Chrome Frame Security.”

I have to tell that I think this is quite exciting. My knowledge about Microsoft’s security in its browsers and related software comes from Steve Gibson’s Security Now podcast. My recollection is that Mr. Gibson is quite conservative when it comes to security. For that reason, I have switched to Firefox. I don’t know if this is the optimal path for me, but I changed my father over to Firefox, and I had fewer nasties to kill when he used Firefox.

My hunch is that the war of words will escalate and quickly. Security is not Microsoft’s strong suit in my opinion. Google may continue to probe this decayed tooth.

Stephen Arnold, September 26, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Google, Microsoft, News, Security | 1 Comment

Scaling SharePoint Could Be Easy

September 24, 2009

Back in the wonderful city of Washington, DC. I participated in a news briefing at the National Press Club today (September 23, 2009). The video summary of the presentations will be online next week. During the post briefing discussion, the topic of scaling SharePoint came up. The person with whom I was speaking sent me a link when she returned to her office. I read “Plan for Software Boundaries (Office SharePoint Server)” and realized that this Microsoft Certified Professional was jumping through hoops created by careless system design. I don’t think the Google enterprise applications are perfect, but Google has eliminated the egregious engineering calisthenics that Microsoft SharePoint delivers as part of the standard software.

I can deal with procedures. What made me uncomfortable right off the bat was this segment in the TechNet document:

- In most circumstances, to enhance the performance of Office SharePoint Server 2007, we discourage the use of content databases larger than 100 GB. If your design requires a database larger than 100 GB, follow the guidance below:
  - Use a single site collection for the data.
  - Use a differential backup solution, such as SQL Server 2005 or Microsoft System Center Data Protection Manager, rather than the built-in backup and recovery tools.
  - Test the server running SQL Server 2005 and the I/O subsystem before moving to a solution that depends on a 100 GB content database.
- Whenever possible, we strongly advise that you split content from a site collection that is approaching 100 GB into a new site collection in a separate content database to avoid performance or manageability issues.

Why did I react strongly to these dot points? Easy. Most of the datasets with which we wrestle are big, orders of magnitude larger than 100 Gb. Heck, this cheap net book I am using to write this essay has a 120 Gb solid state drive. My test corpus on my desktop computer weighs in at 500 Gb. Creating 100 Gb subsets is not hard, but in today’s petascale data environment, these chunks seem to reflect what I would call architectural limitations.

As I worked my way through the write up, I found numerous references to hard limits. One example was this statement from a table:

Office SharePoint Server 2007 supports 50 million documents per index server. This could be divided up into multiple content indexes based on the number of SSPs associated with an index server.

I like the “could be.” That type of guidance is useful, but my question is, “Why not address the problem instead of giving me the old “could be”? We have found limits in the Google Search Appliance, but the fix is pretty easy and does not require any “could be” engineering. Just license another GSA and the system has been scaled. No caveats.

I hope that the Fast ESP enterprise search system tackles engineering issues, not interface (what Microsoft calls user experience). In order to provide information access, the system has to be able to process the data the organization needs to index. Asking my team to work around what seem to be low ceilings is extra work for us. The search system needs to make it easy to deliver what the users require. This document makes clear that the burden of making SharePoint search falls on me and my team. Wrong. I want the system to lighten my load, not increase it with “could be” solutions.

Stephen Arnold, September 24, 2009

Written by Stephen E. Arnold · Filed Under Enterprise, News, Security, SharePoint | 1 Comment

Twitter Trends: A Glimpse of the Future of Content Monitoring

September 23, 2009

A happy quack to the reader who sent me information about “Trendsmap Maps Twitter Trends in Real-Time.” The Cnet write up points out that this Web site uses “trending Twitter topics by geographical location by combining data from Twitter’s API and What The Trend.” Very interesting publicly accessible service. Similar types of monitoring systems are in use in certain government organizations. The importance of this implementation is that the blend of disparate systems provide new ways to look at people, topics, and relationships. With this system another point becomes clear. If you want to drop off the grid, move to a small town where data flows are modest. Little data won’t show up so more traditional monitoring methods have to be used. On the other hand, for those in big metro areas, interesting observations may be made. Fascinating. The site has some interface issues but a few minutes of fiddling will make most of the strengths and weaknesses clear. The free service is at http://www.trendsmap.com/.

Stephen Arnold, September 22, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, News, Real time search, Security, Technology, Text processing | 2 Comments

A Modest Facebook Hack

September 13, 2009

For you lovers of Facebook, swing on over to Pjf.id.au and read “Dark Stalking on Facebook”. This is search with some jaw power. The key segment was in my opinion:

If a large number of my friends are attending an event, there’s a good chance I’ll find it interesting, and I’d like to know about it. FQL makes this sort of thing really easy; in fact, finding all your friends’ events is on their Sample FQL Queries page. Using the example provided by Facebook, I dropped the query into my sandbox, and looked at the results which came back. The results were disturbing. I didn’t just get back future events my friends were attending. I got everything they had been invited to: past and present, attending or not.

Links and some how to tips. Have fun before the former Googlers and Facebookers hop to it.

Stephen Arnold, September 13, 2009

Written by Stephen E. Arnold · Filed Under News, Privacy, Search, Security, Social | 1 Comment

Open Source Metadata Tool

September 12, 2009

I received an interesting question yesterday (September 11, 2009). The writer wanted to know if there was a repository of open source software which served the intelligence community. I have heard of an informal list maintained by some specialized outfits, but I could not locate my information about these sources. I suggested running a Google query. Then I received a link to a Network World story with the title “Powerful Tool to Scour Document Metadata Updated.” Although not exactly the type of software my correspondent was seeking, I found the tool interesting. The idea is that some word processing and desktop software embed user information in documents. The article asserted:

The application, called FOCA (Fingerprinting Organizations with Collected Archives), will download all documents that have been posted on a Web site and extract the metadata, or the information generated about the document itself. It often reveals who created the document, e-mail address, internal IP (Internet Protocol) addresses and much more….FOCA can also identify OS versions and application versions, making it possible to see if a particular computer or user has up-to-date patches. That information is of particular use to hackers, who could then do a spear phishing attack, where a specific user is targeted over e-mail with an attachment that contains malicious software.

Some of the information that is “code behind” what the document shows in the Word edit menu is exciting.

Stephen Arnold, September 12, 2009

Written by Stephen E. Arnold · Filed Under News, Online (general), Security, Text processing | Comments Off on Open Source Metadata Tool

Google Ordered to Provide Email Info

September 12, 2009

Short honk: The Canadian publication National Post’s “Google Ordered to ID Authors of Emails to York University” caught my attention. If true, privacy watchers may want to note this passage from the news story:

York University has won court orders requiring Google Inc. and Canada’s two largest telecommunications companies to reveal the identities of the anonymous authors of contentious emails that accused the school’s president of academic fraud.

The article suggests that this is an “extraordinary” action. Is it? When the extraordinary become ordinary, the meaning of a word and the event to which it applies can confuse me. Would Voltaire or Swift obtained tenure at York were each alive today? I don’t know what “academic fraud” means either. That is why I am an addled goose I know.

Stephen Arnold, September 12, 2009

Written by Stephen E. Arnold · Filed Under Google, Legal matters, News, Security | Comments Off on Google Ordered to Provide Email Info

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.