SSNOrder Google: The Digital GutenbergSurf on Google

Limitations of MSFT Exchange 2010

March 16, 2010

I am not sure how one of my goslings came across this spreadsheet tucked away on the Microsoft Exchange Web log. When I tried to access the file, the system did not recognize my “official” Microsoft MSDN user ID nor my Windows Live credentials. So you may have to register to access the blog. Once there, you need to look for the download section and visually inspect the file names for the one that points to the Exchange Performance Excel spreadsheet. Running a query in the blog’s search box produced zero hits for me. But with some persistence and patience I was able to get a copy of the spreadsheet. Latency was a problem when I was fiddling with this download. (Note: if the link is dead, write one of the goslings at benkent2020 at yahoo dot com, and maybe he will email you a copy of this document.)

Once you get the document “Scalability Limitations”, you will see some pretty interesting information. One quick example is that the spreadsheet includes three columns of specifics about scaling amidst the more marketing oriented data on the spreadsheet. These three juicy columns are:

  • Limitation
  • Issue
  • Mitigation.

Here’s the information for the row Database Size:

  • Limitation–Exchange 2007 – 200GB; Exchange 2010 – 2TB or 1 disk, whichever is less
  • Issue–The DB size guidance changed from 200GB (if you are in CCR) to 2TB or 1 disk, whichever is greater (if you have 2+ copies of the DB in question)
  • Mitigation—Blank. No information.

Okay.

I hope you are able to locate this document. For those of you eager to install Exchange 2010, SharePoint 2010, and Fast Search 2010, you will want to make sure you have these type of spreadsheets at your fingertips * before * you jump on the Microsoft Enterprise steam engine. The information in the spreadsheet makes clear why some types of email content processing may be expensive to implement.

Stephen E Arnold, March 16, 2010

This is the equivalent of the free newspaper Velocity in Louisville. Read it for nothing. I will report working for no dough to the Jefferson County agency that thinks I work in Louisville when I spend most of my time in the warm embrace of airlines.

Real Time Search: Poor Layout or Lousy Content?

March 9, 2010

In my Information World Review column which I submitted last week, I talked about the “marshmallow wars” being waged among Google, Microsoft, and Yahoo. The idea is that these three “big boys” are not doing a particularly meaty job with real time content. I was fascinated to read “Why Do We Ignore Real Time Results from Google Search” in the media-savvy Guardian. I focused on the substance of the real time results, the latency, and the method of displaying these results. Each company rows its real time boat differently, and that makes life difficult for geese like me.

The Guardian’s approach, which was quite interesting to me, focused on eye tracking. You can read the write up and decide whether user experience or the content itself is the problem. I am very skeptical of the razzle dazzle about eye candy and how eyes move. My recollection from my grade school and high school days is that some people are not very adept readers. In my class which underwent a speed reading test in Illinois in the 1950s, few students were able to absorb blocks of text at one glance. Obviously, if there are some slow readers, there may be some difficulty with certain types of layouts. On the other hand, if you are like me and can swallow paragraphs or even pages at a glance, then the eye movement stuff may not be as significant as the value of the information.

My column for Information World Review focuses on substance. I leave the wandering eyeballs of those who read a word or two at a time and may sub vocalize when they grind through information to the arts and crafts approach to information. My opinion is that Google, Microsoft, and Yahoo are chasing real time content because it is has marketing value. There are useful data in real time results but not in the presentations of the big dogs of Web search. I identify some go-to services for real time search but you will have to wait until the IWR publishing cycle outputs the column.

Stephen E Arnold, March 10, 2010

No one paid me to write this. I wonder if those reading the article glance, move their lips, or follow with their fingers. I suppose this type of non compensated writing and the attendant question means I must report to the FBI, an outfit skilled in dealing with impressions of fingers.

Automated Reports May Squeeze Azure Chip Consultants

March 3, 2010

On Mach 1, 2010, I heard an interesting presentation by Alacra. This company aggregates business information and packages it in an easy-to-read way. The talk focused on Alacra Pulse. The basics of the service as I recorded the speaker’s comments are:

The Alacra Pulse Platform monitors thousands of carefully selected news feeds and blogs and extracts actionable intelligence in near-real-time.  Alacra’s Applied Knowledge Extraction turns Web content into an idea generation and current awareness service for financial professionals and corporate executives.

The Pulse service offers a free version and a premium version. I thought the demonstration of the free version was quite good, and it certainly shows how far automated content assembly has come in the last two or three years. You can explore the service yourself. Navigate to http://pulse.alacra.com/analyst-comments to get started.

Right after listening to the Alacra talk, I read a news release about “New Market Report Now Available: Hon Hai Precision Industry Company Limited – SWOT Analysis” from an azure chip outfit called Datamonitor. Now Hon Hai Precision, if I recall correctly, is a unit of Foxconn. Foxconn makes stuff for “hollow manufacturers.” The idea is that a company like Apple or Dell does not have the expertise or cost structure to make things. Foxconn and Hon Hoi do. The buzzword for lacking a core competency in manufacturing is outsourcing in my opinion. Foxconn is interesting because of a glitch in Apple related security and an employee death.

I think I prefer the autogenerated reports from Alacra, thank you. Image sourcehttp://4.bp.blogspot.com/_Hs_cKZPkvaU/Sxshr9-foYI/AAAAAAAAAG4/ZtHNfWYGEIE/s400/sleazy-salesman-thumb.jpg:

The azure chip crowd assembles reports, presumably like the Hon Hoi “SWOT” analysis using basic business school methods, which may or may not be germane to the present financial climate. Here’s what the news release said:

Datamonitor’s Hon Hai Precision Industry Company Limited – SWOT Analysis company profile is the essential source for top-level company data and information. Hon Hai Precision Industry Company Limited – SWOT Analysis examines the company’s key business structure and operations, history and products, and provides summary analysis of its key revenue lines and strategy.

Would you trust your retirement savings to anyone relying upon an off the shelf,  MBA’s SWOT analysis in today’s real time world? I would not.

But several more important questions crossed my mind.

First, why would I pay for a profile when I could get the basic information from Alacra? Furthermore, when I get the Alacra report, it is (as I recall the speaker’s saying) generated in real time. No delay between my request and getting the freshest data. I am reasonably capable can can formulate my own views of a company from fresh data if what I saw from Alacra is indicative of what the company provides in its Pulse service.

Second, assume I don’t know about Alacra. Why not use a federating search system such as Devilfinder.com, Ixquick.com, or Vivisimo.com? There are even more interesting federating systems to tap, including Bright Planet and Deep Web Technologies. I understand laziness, but these services can deliver the basic information that provides a manager with direct links to some publicly available, quite useful reports about the company; for example, the Google Finance or the AOL company report.

Third, if I * really * want to do a good job, perhaps my firm should look into industrial strength solutions from Kapow Technologies or Kroll Ontrack? Let’s face it. Buying an off-the-shelf report chock full of business school jargon may not be what’s needed to deal with a business decision in today’s economic climate.

My opinion is that as outfits like Alacra and Google roll out more reports generated by “smart” software, the pressure will mount on the azure chip consulting crowd. Already pressured from below by the likes of Gerson Lehrman Group and from above from the blue chip outfits like McKinsey & Co, the azure chip outfits will be like the cheese and salami in a Panini, feel heat and pressure from two places at once.

My thought is that canny business executives may want to check out services like Pulse, familiarize themselves with federating search systems, consider an industrial strength solution that operates in near real time, or just sign up for a pay as you consulting service from Gerson Lehrman. In short, do something other than buying the Cliff’s Notes for executives who are lazy like this addled goose. In short, the middle may be an uncomfortable place for self appointed experts, poobahs, and mavens writing reports about Chinese businesses using English language sources. Yep, Cliff’s Notes for Harried Executives?

Stephen E Arnold, March 3, 2010

No one paid me to write this. Since I mentioned myself as a lazy goose, I will report non paid writing to Fish & Wildlife, an outfit that does a stellar job scheduling rooms at national parts and monitoring the health of geese in the US.

Buzz Search: Defaults Do Not Fly

February 22, 2010

Editor’s Note: Constance Ard, the Answer Maven, is one of the goslings. She wrote an overview of Google Buzz search functionality. Ms. Ard is active in the Special Libraries Association, heads up the legal interest group, and has an MLS with an emphasis on online search, taxonomies, and content processing.

With the release of Buzz flapping everyone’s wings over the last Internet half-life, it’s time to consider some practical application for Buzz. Danny Sullivan at Search Engine Land has laid the groundwork for searching Buzz.

For the record, the type it in the box and trust the search results, aren’t enough with this service from Google. You can see below, that Buzz, a social media tool that gets food from Twitter, Google Reader, Friend Feed, and SMS display results from a typical box search that are surprisingly old in the real-time scheme of things.

These results are for a search done at approximately 8 p.m. EST on February 17, 2010, through the Buzz search box with the term: Olympics. The first result is time-stamped 4:50 p.m. The last result was stamped 9:41 a.m. and the second was stamped 8:23 a.m. These are not exactly real-time results and not even reverse chronological in display.

clip_image002

clip_image002[4]

clip_image002[6]

clip_image002[8]

The same search on Buzzzy.com (selected results shown below) done at the same approximate time provides even more irritating displays. Has anyone heard of time, date stamps? I understand that in real-time search hours count but in search, pinpointing an accurate date and time is essential.

Read more

Twitter and Mining Tweets

February 21, 2010

I must admit. I get confused. There is Twitter, TWIT (a podcast network), TWIST (a podcast from another me-too outfit), and “tweets”. If I am confused, imagine the challenge for text processing and then analyzing short messages.

Without context, a brief text message can be opaque to someone my age; for example, “r u thr”. Other messages say one thing, “at the place, 5” and mean to an insider “Mary’s parents are out of town. The party is at Mary’s house at 5 pm.”

When I read “Twitter’s Plan to Analyze 100 Billion Tweets”, several thoughts struck me:

  1. What took so long?
  2. Twitter is venturing into some tricky computational thickets. Analyzing tweets (the word given to 140 character messages sent via Twitter and not to be confused with “twits”, members of the TWIT podcast network) is not easy.
  3. Non US law enforcement and intelligence professionals will be paying a bit more attention to the Twitter analyses because Twitter’s own outputs may be better, faster, and cheaper than setting up exotic tweet subsystems.
  4. Twitter makes clear that it has not analyzed its own data stream, which surprises me. I thought these young wizards were on top of data flows, not sitting back and just reacting to whatever happens.

According to the article, “Twitter is the nervous system of the Web.” This is a hypothetical, and I am not sure I buy that assertion. My view is that Google’s more diverse data flows are more useful. In fact, the metadata generated by observing flows within Buzz and Wave are potentially a leapfrog. Twitter is a bit like one of those Faith Popcorn-type of projects. Sniffing is different from getting the rare sirloin in a three star eatery in Lyon.

The write up points out that Twitter will use open source tools for the job. There are some juicy details of how Twitter will process the traffic.

A useful write up.

Stephen E Arnold, February 22, 2010

No one paid me to write this article. I will report non payment to the Department of Labor, where many are paid for every lick of work.

PointCast Version 2?

February 19, 2010

I read “Did Google Reader Just Turn on the Firehose?” I don’t use the Google Reader. The addled goose does not read. But when he scanned with one eye the story in Stay N’ Alive, he had one thought, “Is PointCast back?” Different coat of paint maybe but possibly the same squeaking wheel?

Stephen E Arnold, February 19, 2010

No one paid me to write this. Non payment means that I must report this to the IMF, an outfit aware of such sad situations.

Exegy Delivers Ultra High Performance Hosted Service

February 3, 2010

With the buzz about real time content processing and outfits like Thomson Reuters delivering really fast throughput, I was not surprised to read in Wall Street & Technology that Exegy has gunned its engine and driven into the low latency hosted content processing service business. “Exegy Deploys Ultra Low Latency Ticket Plant on Options PIPE Platform” reports that Exegy has teamed with Options IT to make its Ticket Plant available on the Options IT platform. If you are not familiar with these firms, both support customers who require low latency access to information. The article said:

The Option PIPE platform is a fully optimized and managed, software-vendor-neutral, global technology infrastructure, providing clients with the efficiencies of a hosted technology service delivered with the scalability, strength and security of an enterprise solution. The hosted Exegy Ticker Plant is the first hardware-accelerated market data appliance built from the ground up to ensure high-frequency traders continuously have the best view of the electronic markets.

Exegy has engineered its hardware, firmware, and software to chop latency from content processing. For more information about Exegy navigate to http://www.exegy.com. For information about Options IT, point your browser to http://www.options-it.com/.

In a drag race, which vendor would win? I would lean toward the Exegy teams. Serious invention from that crowd in my opinion. I described Exegy in a a couple of my studies of next generation content processing vendors because the company distinguished itself with low latency crunching for the Wall Street crowd that has been thinned along with me in the economic melt down.

Stephen E Arnold, February 3, 2010

No one paid me to write this short article. I will report non payment to the IRS who cares about me. No, it really cares. For me. For you. For everyone.

Exclusive Interview: Digital Reasoning

February 2, 2010

Tim Estes, the youthful founder and chief technologist, for Digital Reasoning, a search and content processing company based in Tennessee, reveals the technology the is driving the company’s growth. Mr. Estes, a graduate of the University of Virginia, tackled the problem of information overload with a fresh approach. You can learn about Digital Reasoning’s approach that delivers a system that “deeply, conceptually searches within unstructured data, analyzes it and presents dynamic visual results with minimal human intervention. It reads everything, forgets nothing and gets smarter as you use it.”

Mr. Estes explained:

Digital Reasoning’s core product offering is called “Synthesys.” It is designed to take an enterprise from disparate data silos (both structured and unstructured), ingest and understand the data at an entity level (down to the “who, what, and wheres” that are mentioned inside of documents), make it searchable, linkable, and provide back key statistics (BI type functionality). It can work in an online/real-time type fashion given its performance capabilities. Synthesys is unique because it does a really good job at entity resolution directly from unstructured data. Having the name “Umar Farouk Abdul Mutallab” misspelled somewhere in the data is not a big deal for us – because we create concepts based on the patterns of usage in the data and that’s pretty hard to hide. It is necessarily true that a word grounds its meaning to the things in the data that are of the same pattern of usage. If it wasn’t the case no receiving agent could understand it. We’ve figured out how to reverse engineer that mental process of “grounding” a word. So you can have Abdulmutallab ten different ways and it doesn’t matter. If the evidence links in any statistically significant way – we pull it together.

You can read the full-text of this exclusive interview with Tim Estes on the ArnoldIT.com site in the Search Wizard Speak series. You can get more information about Digital Reasoning from the company’s Web site.

The Search Wizards Speak series provides the largest collection of free, detailed information about major enterprise search systems.Why pay the azure-chip consultants for sponsored listings, write ups prepared by consultants with little or no hands on experience, and services that “sell” advertorials. You hear in the developer’s, founders, and CEO’s own words what a system does and how it solves content-related problems.

Stephen E Arnold, February 2, 2010

No one paid me to write about my own Web site. I will report this charitable act to the head of the Red Cross.

Thomson Reuters Redefines Real Time

January 29, 2010

“Real time” is one of those phrases that is so easy to say but so, so difficult to deliver. Exalead has demonstrated to me a latency of 12 to 15 minutes. This means that when a change is made to the location of a package, that datum becomes available to a user of the client’s search enabled application within 12 to 15 minutes. In my experience, that’s fast. The old Excite.com (Architex) indexing system would grind for hours to update Adverworld pages. A mainstream search system labored for hours to update several million Web pages. But real time means no latency. Zero. Zip. Nada.

Thomson Reuters’ approach is explained in “Thomson Reuters Delivers Microsecond Access To News In London And Chicago.” Real time means that in Chicago and New York, certain content is available in microseconds. The write up said:

Rich Brown, Global Business Manager, Machine Readable News, Thomson Reuters, said: “Being first to act on this information can dramatically affect a firm’s profit and loss. The launch of NewsScope Direct, the market’s fastest machine readable news service, into London and Chicago reflects our commitment to delivering the market moving information our clients need at the speed required by their high performance trading strategies.”

Some questions:

  1. Are the data numeric or text?
  2. What is the latency for the information prior to its being received at a Thomson Reuters’ data center?
  3. What does “microsecond” mean?
  4. What part of the system delivers “microsecond” access?

Until I know more, I think this is a marketing and PR play to differentiate Thomson Reuters from other financial trading data vendors. I wonder if Thomson Reuters is able to beat the pants off Exegy, another outfit with speedy systems for the financial services industry?

Stephen E Arnold, January 29, 2010

A post I wrote whilst watching Tyson shiver in front of the fire. I will report his chill and my lack of compensation to the sharp eyed folks at the SEC.

Financeoid Pushes Business News Aggregation Forward

January 27, 2010

Business information is important but difficult to index. On the surface, business information appears to be a less-than-demanding type of content. Some publishers embed ticker symbols. Others stuff in metatags. The problem, however, boils down to language. Marketing mavens are quick to invent new words, spell names in a weird way, and cook up bizarre coinages to create consulting buzz. (A good example of this appeared in the Wall Street Journal on january 25, 2010, page B7 in the article “Strategic Plans Lose Favor.” That was a buzzword fest in my opinion.)

Now there is Financeoid.com. I admit that I am not crazy about the name, but I can see that the service makes certain business information easily accessible. I have already dropped the hard copy subscription to the Financial Times, and I think my local newspaper subscription is next. If Financeoid shows some muscle, maybe I will drop the Wall Street Journal hard copy subscription. I find its information increasingly stale and feature oriented. Not what I want with my McVittie’s biscuit in the morning.

I suggest you take a look at a financial news aggregation service that pulls “financial news, tips, and advises [sic] from 15,000 financial/ business blogs.” Although still in shake down cruise mode, the service makes pretty clear that the traditional financial media may have to shift their hate gaze from Google to other online innovators. The service Financeoid.com is at http://www.financeoid.com.

The site calculates “karma” via a proprietay algorithm. The goslings and I think this is a quite interesting aggregation service. The company promises that it will offer additoinal aggregations in the future. Worth a look.

Stephen E Arnold, January 27, 2010

A free write up. I will report this to the Bureau of Labor. I am a slave to this blog. If I were younger, I could turn myself in for employee overwork.

Next Page »