Scaling SharePoint Could Be Easy

September 24, 2009

Back in the wonderful city of Washington, DC. I participated in a news briefing at the National Press Club today (September 23, 2009). The video summary of the presentations will be online next week. During the post briefing discussion, the topic of scaling SharePoint came up. The person with whom I was speaking sent me a link when she returned to her office. I read “Plan for Software Boundaries (Office SharePoint Server)” and realized that this Microsoft Certified Professional was jumping through hoops created by careless system design. I don’t think the Google enterprise applications are perfect, but Google has eliminated the egregious engineering calisthenics that Microsoft SharePoint delivers as part of the standard software.

I can deal with procedures. What made me uncomfortable right off the bat was this segment in the TechNet document:

- In most circumstances, to enhance the performance of Office SharePoint Server 2007, we discourage the use of content databases larger than 100 GB. If your design requires a database larger than 100 GB, follow the guidance below:
  - Use a single site collection for the data.
  - Use a differential backup solution, such as SQL Server 2005 or Microsoft System Center Data Protection Manager, rather than the built-in backup and recovery tools.
  - Test the server running SQL Server 2005 and the I/O subsystem before moving to a solution that depends on a 100 GB content database.
- Whenever possible, we strongly advise that you split content from a site collection that is approaching 100 GB into a new site collection in a separate content database to avoid performance or manageability issues.

Why did I react strongly to these dot points? Easy. Most of the datasets with which we wrestle are big, orders of magnitude larger than 100 Gb. Heck, this cheap net book I am using to write this essay has a 120 Gb solid state drive. My test corpus on my desktop computer weighs in at 500 Gb. Creating 100 Gb subsets is not hard, but in today’s petascale data environment, these chunks seem to reflect what I would call architectural limitations.

As I worked my way through the write up, I found numerous references to hard limits. One example was this statement from a table:

Office SharePoint Server 2007 supports 50 million documents per index server. This could be divided up into multiple content indexes based on the number of SSPs associated with an index server.

I like the “could be.” That type of guidance is useful, but my question is, “Why not address the problem instead of giving me the old “could be”? We have found limits in the Google Search Appliance, but the fix is pretty easy and does not require any “could be” engineering. Just license another GSA and the system has been scaled. No caveats.

I hope that the Fast ESP enterprise search system tackles engineering issues, not interface (what Microsoft calls user experience). In order to provide information access, the system has to be able to process the data the organization needs to index. Asking my team to work around what seem to be low ceilings is extra work for us. The search system needs to make it easy to deliver what the users require. This document makes clear that the burden of making SharePoint search falls on me and my team. Wrong. I want the system to lighten my load, not increase it with “could be” solutions.

Stephen Arnold, September 24, 2009

Written by Stephen E. Arnold · Filed Under Enterprise, News, Security, SharePoint | 1 Comment

Twitter Trends: A Glimpse of the Future of Content Monitoring

September 23, 2009

A happy quack to the reader who sent me information about “Trendsmap Maps Twitter Trends in Real-Time.” The Cnet write up points out that this Web site uses “trending Twitter topics by geographical location by combining data from Twitter’s API and What The Trend.” Very interesting publicly accessible service. Similar types of monitoring systems are in use in certain government organizations. The importance of this implementation is that the blend of disparate systems provide new ways to look at people, topics, and relationships. With this system another point becomes clear. If you want to drop off the grid, move to a small town where data flows are modest. Little data won’t show up so more traditional monitoring methods have to be used. On the other hand, for those in big metro areas, interesting observations may be made. Fascinating. The site has some interface issues but a few minutes of fiddling will make most of the strengths and weaknesses clear. The free service is at http://www.trendsmap.com/.

Stephen Arnold, September 22, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, News, Real time search, Security, Technology, Text processing | 2 Comments

A Modest Facebook Hack

September 13, 2009

For you lovers of Facebook, swing on over to Pjf.id.au and read “Dark Stalking on Facebook”. This is search with some jaw power. The key segment was in my opinion:

If a large number of my friends are attending an event, there’s a good chance I’ll find it interesting, and I’d like to know about it. FQL makes this sort of thing really easy; in fact, finding all your friends’ events is on their Sample FQL Queries page. Using the example provided by Facebook, I dropped the query into my sandbox, and looked at the results which came back. The results were disturbing. I didn’t just get back future events my friends were attending. I got everything they had been invited to: past and present, attending or not.

Links and some how to tips. Have fun before the former Googlers and Facebookers hop to it.

Stephen Arnold, September 13, 2009

Written by Stephen E. Arnold · Filed Under News, Privacy, Search, Security, Social | 1 Comment

Open Source Metadata Tool

September 12, 2009

I received an interesting question yesterday (September 11, 2009). The writer wanted to know if there was a repository of open source software which served the intelligence community. I have heard of an informal list maintained by some specialized outfits, but I could not locate my information about these sources. I suggested running a Google query. Then I received a link to a Network World story with the title “Powerful Tool to Scour Document Metadata Updated.” Although not exactly the type of software my correspondent was seeking, I found the tool interesting. The idea is that some word processing and desktop software embed user information in documents. The article asserted:

The application, called FOCA (Fingerprinting Organizations with Collected Archives), will download all documents that have been posted on a Web site and extract the metadata, or the information generated about the document itself. It often reveals who created the document, e-mail address, internal IP (Internet Protocol) addresses and much more….FOCA can also identify OS versions and application versions, making it possible to see if a particular computer or user has up-to-date patches. That information is of particular use to hackers, who could then do a spear phishing attack, where a specific user is targeted over e-mail with an attachment that contains malicious software.

Some of the information that is “code behind” what the document shows in the Word edit menu is exciting.

Stephen Arnold, September 12, 2009

Written by Stephen E. Arnold · Filed Under News, Online (general), Security, Text processing | Comments Off on Open Source Metadata Tool

Google Ordered to Provide Email Info

September 12, 2009

Short honk: The Canadian publication National Post’s “Google Ordered to ID Authors of Emails to York University” caught my attention. If true, privacy watchers may want to note this passage from the news story:

York University has won court orders requiring Google Inc. and Canada’s two largest telecommunications companies to reveal the identities of the anonymous authors of contentious emails that accused the school’s president of academic fraud.

The article suggests that this is an “extraordinary” action. Is it? When the extraordinary become ordinary, the meaning of a word and the event to which it applies can confuse me. Would Voltaire or Swift obtained tenure at York were each alive today? I don’t know what “academic fraud” means either. That is why I am an addled goose I know.

Stephen Arnold, September 12, 2009

Written by Stephen E. Arnold · Filed Under Google, Legal matters, News, Security | Comments Off on Google Ordered to Provide Email Info

Social Networks and Security

August 25, 2009

I got roasted at a conference last year when I pointed out that controlling security and privacy in social networks was a challenge. One 20 something told me that I was an addled goose. No push back from me. I stuck to my assertion and endured the smarmy remarks and head shaking. I thought of this young person when I read “Social Networks Leak Personal Information”. Sure, it is one write up in a trade magazine, but it contains a statement I find instructive:

The researchers say that social networks leak information through a combination of HTTP header information — the Referrer header and the Request-URI — and cookies sent to third-party aggregators such as Google (NSDQ: GOOG)’s DoubleClick, Google Analytics, and Omniture, among others. As a consequence of this leakage, third-party aggregators can potentially link social network identifiers to past and future Web site visits, thereby identifying a person and his or her online activities.

Right? Wrong? With the young-at-heart going social, old geese like me want to move forward with some caution.

Stephen Arnold, August 25, 2009

Written by Stephen E. Arnold · Filed Under Enterprise, News, Privacy, Security, Social | Comments Off on Social Networks and Security

The Microsoft Plan to Beat Google in Search

August 22, 2009

That’s a much better title than a “how to” about Microsoft’s plan to beat Google. You can find a by-the-book description of Microsoft’s “kill Google” strategy in the August 16, 2009, Mashable story “Search Showdown: Microsoft’s Plan to Win the Search War.” I want to be upfront. I think Google has won the Web search war and it is now threatening in other “wars” as well. Nevertheless, Ben Parr runs through the Microsoft game plan. My observation about this write up is that it omits two key points of Microsoft’s approach:

First, Microsoft is in “pay for traffic” mode. Cash back is one of the ways Bing is getting traffic. Not a problem for me, but it is an important point when thinking about how Google operates—getting users addicted to the Google search service.

Second, Microsoft is focusing on “user experience” or UX. The idea is that Google is often a plain if not dowdy looking system. Microsoft wants to deliver eye candy. This “look” is what sets Bing apart. Unfortunately, the results have yet to match Google’s for my test queries. I think this UX push is a big part of the Fast ESP system as well. In my opinion, Google is about plumbing; Microsoft is about decoration.

Read the Mashable story. Make up your own mind.

Stephen Arnold, August 22, 2009

Written by Stephen E. Arnold · Filed Under Google, Microsoft, News, Security | 1 Comment

Security Gaps Permit Intercepts

August 12, 2009

Short honk: If you are not up to speed on ways to intercept information, navigate to “10 Ways Your Voice and Data Can Be Spied Upon”. Useful list.

Stephen Arnold, August 12, 2009

Written by Stephen E. Arnold · Filed Under News, Security | Comments Off on Security Gaps Permit Intercepts

Google Marketing to the Enterprise

August 10, 2009

I usually find Larry Dignan’s view of information technology spot. I was not too surprised with the argument in his “Google’s Campaign for Apps Doesn’t Address the IT Data Elephant in the Room.” The key passage in the article for me was:

In fact, nothing in Google’s marketing toolbox—the viral emails, the YouTube videos and the posters you can plaster near the water cooler—are going to change fact that your corporate data is hosted by Google. If Google really wants to entice the enterprise it should have skipped the YouTube videos and allowed companies to store some of their own data.

I agree that Google has not done a good job of addressing the “Google has your data” argument.

Google has some patent documents that describe clever ways to have some data processed by Google’s systems and other data on a client’s servers with the client retaining control over the data. I am sitting in an airport at 4 50 am Eastern and don’t have access to my Google files. My recollection is that Google has been beavering away with systems and methods to provide different control methods.

The problem with Google is the loose coupling between the engineering and marketing at Google. The push to the enterprise strikes me as a way to capitalize on several market trends for which Google has data. Keep in mind that Google does not take actions in a cavalier way. Data drive most decisions, based on my research. Then group think takes over. In that process, the result is a way to harvest low hanging fruit.

After some time passes, engineering methods follow along that add features, functions, and some robustness. A good example is the Google Search Appliance. In its first version security was lax. The present version provides a number of security features. Microsoft uses the same approach which has caused me to wait until version 3 of a Microsoft product before I jump on board. For Google, the process of change is incremental and much less visible.

My hunch is that once Google’s “go Google” program responds to the pent up demand for more hands on support for the appliance, Apps, and maps—then the Google will add additional features. The timeline may well be measured in years.

If a company wants to use Google technology to reduce costs now and reduce to some degree the hurdles that traditional information technology approaches put in the way of senior management, the “go Google” program will do its job.

Over time, Google will baby step forward. Those looking for traditional approaches to enterprise software will have a field day criticizing Google and its approach. My thought is that Google seems to be moving forward with serious intent.

I think there will be even louder and more aggressive criticism of Google’s new enterprise push. In my opinion, that criticism will not have much of an impact on the Google. The company seems to be making money, growing, and finding traction despite its unorthodox methods.

Will Google “win” in the enterprise sector? I don’t know. I do know that Google is disruptive, and that the effects of the disruption create opportunities. Traditional enterprise software companies may want to look at those opportunities, not argue that the ways of the past are the ways of the future. The future will be different from what most of us have spent years learning to love. Google’s approach is based on the fact that customers * want * Google solutions, particularly applications that require search and access to information. That is not what traditional information technology professional want.

Stephen Arnold, August 10, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Cloud computing, Google, News, Security, Technology | Comments Off on Google Marketing to the Enterprise

E Mail that Deletes Itself

August 8, 2009

Short honk: Want to make your email self destruct? Navigate to the Vanish page. A unit of i2 in the UK was exploring this function but the company moved resources elsewhere. Useful for some; not so useful for others.

Stephen Arnold, August 8, 2009

Written by Stephen E. Arnold · Filed Under News, Privacy, Security | Comments Off on E Mail that Deletes Itself

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.