Google Interview Worth Reading
March 25, 2009
The interview with Alfred Spector in ComputerWorld is interesting for what it says and what it omits. You can find the article “The Grill: Google’s Alfred Spector on the Hot Seat” here. This is a three part interview. Mr. Spector is billed as Google’s vice president of research. For me, the most interesting comment was:
Do you have plans to go after that huge body of information on the Internet that is not currently searched? There is stuff on the Web, the so-called Deep Web, that is only “materialized” when a particular query is given by filling fields in a form. Since crawlers only follow HTML links, they cannot get to that “hidden” content. We have developed technologies to enable the Google crawler to get content behind forms and therefore expose it to our users. In general, this kind of Deep Web tends to be tabular in nature. It covers a very broad set of topics. It’s a challenge, but we’ve made progress.
I would hope so. Google has Drs. Guha and Halevy chugging away or had them chugging away on this problem. Furthermore, Google bought Transformics, a company that most of the Google pundits have paid scant attention to. Yep, Googzilla is making progress. Just plonking along with the fellow who worked on the semantic Web standards and the chap who invented the information manifold. I enjoy Google understatement.
Stephen Arnold, March 24, 2009
Google Copies from Ask.com
March 25, 2009
Newsvine.com ran a story bylined by Michael Liedtke, a journalist working for the Associated Press. I am fearful of quoting anything from an AP story, but I think I can convey the gist of the story “Google Draws upon Rival Ideas with Search Changes” here. The idea is that Google’s suggested queries were inspired–Mr. Liedtke uses the word “popularized”–by Ask.com, the search engine of NASCAR. When I read this, I laughed. Suggested searches are not exactly a new innovation. I looked in my files and found references to clustering dating back a decade or more. I recall a clustering effort coded by the Information Industry Award recipient Howard Flank in 1981. The difference between the early attempts at clustering and what Google introduced boils down to one word–scale. Mr. Flank’s effort would not run on the machines available to us at the proprietary Lockheed Dialog company. My thought is that the Google has a practice of working through innovation from computer labs and research papers, learning, and using its clever methods to implement useful functions on Google’s scale. Was Ask.com the inspiration for Google’s engineers. A more likely influence was Dr. Salton’s 1978 paper “Generation and Search of Clustered Files”. Need a copy. Click here. I have zero relationship with Googzilla, an outfit wishing I was a roasted goose. But I was taken aback with the suggestion that the GOOG turned to the search engine of NASCAR for inspiration.
Stephen Arnold, March 25, 2009
EntropySoft: Exclusive Interview with Nicolas Maquaire, CEO
March 25, 2009
A search engine or content processing system is deaf and dumb without a connector to a content source. Most text processing systems include these software connectors (sometimes called “filters” or “adaptors”) to process flat text such as the ASCII generated by a simple text editor. But plain text makes up a small part of the content stored on an organization’s file servers, workstations, and computers. In order to index content from a legacy AS/400 system running the Ironsides enterprise resource planning system, a specialized software connector is required. Writing these connectors is tricky. EntropySoft is a content integration company. The firm has a strong competency in creating software to perform a range of content manipulations; for example, content transformation of an XML file into a file type required by another business process or enterprise system. Mr. Maquaire spoke with Stephen E. Arnold, ArnoldIT.com on March 24, 2009, about EntropySoft’s software and services.
Nicolas Maquaire, the chief executive officer, of EntropySoft described his company this way:
EntropySoft is a connector factory. We have more than 30 read/write connectors for unstructured data, possibly the biggest portfolio on the market. Our connectors enable most of the features of popular content-centric applications such as Alfresco, IBM FileNet P8, Hummingbird DM, Interwoven TeamSite, IBM Lotus Quickplace, Microsoft SharePoint etc… The extensive support of features and the size of the connector portfolio make this technology perfect OEM material for many software industries. On top of the read / write connectors, EntropySoft has two technological layers (Content ETL and Content Federation) that are also available as OEM components.
A number of the world’s leading search and content processing companies use EntropySoft’s connectors. Examples include Coveo, Exalead, and Image Integration Systems.
Mr. Maquaire, in an exclusive interview with ArnoldIT.com’s Search Wizards Speak series, said:
The market for content integration is complex. Building a single connector for a specific use case seems nonsensical to us. If you develop many connectors, interoperability then becomes reality. Thanks to its more than 30 (and growing!) connectors, EntropySoft is becoming a one-stop-shopping point for connectivity and interoperability. For the past four years, EntropySoft has acquired valuable knowledge on all popular content-centric systems. EntropySoft connectors have been market-tested for years. EntropySoft connectors are put to work daily in critical business conditions, and EntropySoft unique in-house developed testing system allows fast implementation of customer-driven connectors improvements.
You can read the full-text of the Maquaire interview on the ArnoldIT.com Web site here. The interview is number 37 in this series. The interviews provide one of the most useful bodies of information about enterprise search and content processing available at this time. The Search Wizards Speak is available as a service to organizations and information professionals worldwide. Knowledge about search and content processing increases the payoff from an investment in information retrieval.
ATT on Social Networking Impacts
March 25, 2009
AT&T teamed with an azure chip consultancy Early Strategies Consulting. The white paper is eight pages of information about “The Business Impacts of Social Networking”. You can find a copy of this document by clicking the link here. The newly and partially reassembled Ma Bell has a tendency to move content around. If this link doesn’t work, just buzz AT&T customer service. The operators will be delighted to help you.
What’s the point of the white paper? According to the executive summary:
Social networking fosters collective intelligence, collaborative work and support communities. Tools and behaviors from the consumer world are now making the transition to the corporate world, with diverse implications for changing the way businesses operate. This paper explores 10 opportunities presented by social networking, along with 10 associated challenges.
My hunch is that the paper is designed to generate revenue.
For me, the most interesting part was the diagram of the organization chart of the future. The idea is that the traditional top down organization will have social networks embedded within them.
A close second was the vocabulary of the document. I enjoyed this blend of Ma Bell and MBA speak. Give it a read then send the document around and up the organization chart of the future.
Stephen Arnold, Marcy 25, 2009
High Speed Database Export
March 24, 2009
I have received several comments and emails about search and content processing system performance. I have been aware of the disconnect between the vendors’ assurances about performance and real-world behavior. With some organizations using traditional relational databases to house XML content, some operations can be sluggish due to the amount of data stuffed into a series of tables. If export speed, flexibility, and automation are important to you, you may want to take a look at Active Database Software here. The company produces the FlySpeed Data Export component. The software component makes it possible to save data in Excel, comma separated values, HTML, and other formats. You can automate most data export operations. You can download a trial from the company’s Web site here. A single user license for the universal build of the tool is about $100. Give it a try. A trial is available.
Stephen Arnold, March 24, 2009
Palantir: Data Analysis
March 24, 2009
In the last month, three people have asked me about Palantir Technologies. I have had several people mention the work environment and the high caliber of the team at the company. The company has about 170 employees and is privately held. I have heard that the firm is profitable, but I have that from two sources now hunting for work after their financial institutions went south. The company is one of the leaders in finance and intelligence analytics. The specialities of the company include global macro research and trading; quantitative trading; knowledge discovery and knowledge management.
If you are not familiar with the company, you may want to navigate to www.palantirtech.com and take a look at the company’s offerings. Located in Palo Alto, the company focuses on making software that facilitates information analysis. With interest in business intelligence waxing and waning, Palantir has captured a very solid reputation for sophisticated analytics. Law enforcement and intelligence agencies “snap in” Palantir’s software to perform analysis and generate visualizations of the data. The company has been influenced by Apple in terms of the value placed upon sophisticated design and presentation. Palantir’s system makes highly complex tasks somewhat easier because of the firm’s interfaces. If you want to generate a visualization of a large, complex analytic method, Palantir can produce visually arresting graphics. If you navigate to the company’s “operation tradestop” page here, you can access demonstrations and white papers.
When I last checked the company’s demos, a number of them provided examples drawn from military and intelligence simulations. These examples provide a useful window into the sophistication of the Palantir technology. The company’s tools can manipulate data from any domain where large datasets and complex analyses must be run. The screenshot below comes from the firm’s demonstration of an entity extraction, text processing, and relationship analysis:
A Palantir relationship diagram. Each object is a link making it easy to drill down into the underlying data or documents.
Each object on the display is “live” so you can drill down or run other analyses about that object. The idea is to make data analysis interactive. Most of the vendors of high-end business intelligence systems offer some interactivity, but Palantir has gone further than most firms.
The company has a Web log, and it seems to be updated with reasonable frequency. The Web log does a good job of pointing out some of the features of the firm’s software. For example, I found this discussion of the Palantir monitoring server quite useful. The Web site emphasizes the visualization capabilities of the software. The Web log digs deeper into the innovations upon which the graphics rest.
Be careful when you run a Google query for Palantir. There are several firms with similar names. You will want to navigate to www.palantirtech.com. You may find yourself at another Palantir when you want the business intelligence firm.
Stephen Arnold, March 24, 2009
Google: Economy Slowing Juggernaut’s Tie Ups
March 24, 2009
Reuters’s Alexei Oreskovic’s “Google Deal Machine Adjusts to Slow Times” caused several readers to send me links to the story. You can read it here if you haven’t already seen the article. The main idea is that the lousy economy is having an adverse impact on Google. The examples in the write up focused on Google deals.For me, the most interesting comment in the article was this sentence:
Rumors have recently swirled about Google having its eye on social-networking upstart Twitter and online travel agency Expedia (EXPE.O).
Reuters’ sources run counter to what I have read elsewhere and commented upon in this Web log. Google has seemed indifferent to Twitter, the real time search upstart. Now, Twitter has an ad experiment of sorts underway with its news pals. The Wall Street Journal reported that Twitter has a deal with Microsoft and Federated Media. You may or may not be able to reach the WSJ story here. Those dead tree folks are quite protective of their content as part of their ossification and petrifaction processes, which is definitely less useful than indifference in my opinion.
So, who’s is correct? Reuters? The Wall Street Journal? My hunch is that Google is looking for bargains. I can’t recall what the MBAs in New York call this type of deal hunt. Bottom something?
Stephen Arnold, March 25, 2009
Google Slowing Down, Sitting on the Sidelines
March 24, 2009
IDC has been showing some zip. Two articles caught my attention because both point out vulnerabilities in this formidable company. You must read both of these articles. They were:
- The ComputerWorld story “Pentaho and Amazon.com Deliver BI to the Cloud” here. The story reported that Amazon, the cloud computing retailer, hooked up with Pentaho. The goal is to deliver business intelligence. How is this germane to Google? In my opinion, Google is not in this game. The company’s failure to respond to Amazon’s cloud computing challenge underscores the fact that Google is not as nimble as Google. I was hoping that Eric Lai would have pointed out that Google is simply not at this dance.
- The IDG news service story “Google Apps Missing Enterprise Social-Networking Revolution” here. This story was distributed by Reuters and it pointed out that Google’s Orkut is not hooked into Google Apps.”
Is Google falling behind? In my view, Google is the cat’s meow. To some Google watchers, I think one can make a case that the GOOG is not able to keep pace with some of its more nimble rivals. IDC seems to be on top of this issue.
Stephen Arnold, March 24, 2009
Social Media: Money Flowing
March 24, 2009
A happy quack to the reader who sent me a link to this Mashable story, “Social Media Marketing Budgets on the Rise.” At a time when budgets are under significant pressure, Mashable presented some data from a consulting firm that had some good news about social media; that is, wikis, blogs, video, etc. The most interesting comment in the Mashable write up was this segment:
According to a new study released by Aberdeen Group…, 63 percent of companies plan to increase their social media marketing budgets in 2009, despite the current weakness in the economy. Digging deeper into the numbers, 21 percent of those surveyed plan to increase social media spending by 25 percent or more, while a mere 3 percent plan to shrink their budgets (34 percent responded “no change”).
Mashable includes a nifty and useful chart as well.
Stephen Arnold, March 24, 2009
ISYS Search Software: Google Patent Collection
March 24, 2009
You will want to take a look at the ISYS Search Software demonstration here. The company took my collection of Google patent documents from 1998 to December 2008 and processed them. You can run a key word query, click on the names of people, and explore this window into Google’s technology hot house via the ISYS Search Version 9. When you locate a patent document that interests you, a single click will display the PDF of the patent document. You can browse the drawings and claims with the versatile ISYS system at your beck and call.
I have used the ISYS Search Software since Version 3.0. The system delivers high speed document processing, high speed query processing, and a raft of features. For more information about ISYS Version 9, click here. I have been critical of search systems for more than two decades. ISYS Search Software engineers’ have listened to me, and I know from experience that the team in Crow’s Nest and in Denver have a long term commitment to their customers and implementing useful features with each release.
Highly recommended. More information about ISYS Search Software is at http://www.isys-search.com/
Stephen Arnold, March 24, 2009