Longtop Pumps Up Metadata

April 26, 2011

Longtop Announces Launch of Upgraded Metadata Management Platform,” reports CNBC. China’s highly successful financial services developer/ solutions provider Longtop Financial Technologies Limited is jumping on the metadata bandwagon with its BI.MetaManager V2.0.

Actually, this is an upgrade and expansion, not a brand new product. The company did some custom work in this realm in ’07 and ’08, and deployed version one of BI.MetaManager in 2009 to many of its customers. The article describes the new version:

BI.MetaManager V2.0 offers extended scalability and flexibility for development, improved reliability and user interface, as well as new features such as visualized enterprise data map and cross-platform support of Structured Query Language (SQL) script parsing.

Sounds good. The use of metadata, information about data that is embedded in said data, can be extremely useful when properly managed. Lately, though, many players have been working to capitalize on it; suddenly metadata indexing is the new black. And metadata continues to roil the legal eagles. Is indexing discoverable? Is indexing not discoverable? Who owns metadata? Lawyers will figure this out. In the meantime, indexing helps users, not sure about attorneys.

Cynthia Murrell April 26, 2011

Freebie

Google, Traffic, English 101, and an Annoying Panda

April 21, 2011

I read a snippet on my iPad and then the full story in the hard copy of the Wall Street Journal “Sites Retool for Google Effect.” You can find this story on hard copy page B 4 in the version that gets tossed in the wet grass in Harrod’s Creek, Kentucky. Online, not too sure anymore. This link may work. But, then again, maybe not.

The point of the story is that Google has changed its method of determining relevance. A number of sites mostly unfamiliar to me made the point that Google’s rankings are important to businesses. One example was One Way Furniture, an outfit that operates in Melville, New York. Another was M2commerce LLC, an office supply retailer in Atlanta, Georgia. My take away from the story is that these sites’ owners are going to find a way to deliver content that Google perceives as being relevant.

image

A panda attack. Some Web site owners suffer serious wounds. Who are these Web site owners trying to please? Google or their customers? Image source: http://tomdoerr.wordpress.com/2011/03/25/whos-in-the-house-panda-in-da-house/

I don’t want to be too much like my auto mechanic here in Harrod’s Creek, but what about the customer? My thought is that if one posts information, these outfits should ask, “What does our customer need to make an informed decision?” The Wall Street Journal story left me with the impression, which is probably incorrect, that the question should be, “What do I need to create so Google will reward me with a high Google rank?”

For many years I have been avoiding search engine optimization. When I explained how some of Google’s indexing “worked” on lecture tours for my 2004-2005 Google monograph, The Google Legacy, pesky SEO kept popping up. Google has done a reasonable job of explaining how its basic voting mechanism worked. For those of you who were fans of John Kleinberg, you know that Google was influenced to some extent by Clever. There are other touch points in the Backrub/Google PageRank methods disclosed in the now famous PageRank patent. Not familiar with that document? You can find a reasonable summary on Wikipedia or in my The Google Legacy.

If we flash forward from 1996, 1997, and 1998 to the present, quite a bit has happened to relevance ranking in the intervening 13 to 15 years. First, note that we are talking more than a decade. The guts of PageRank remain but the method has been handled the way my mother reacted to a cold day. She used to put on a sweater. Then she put on a light jacket. After adding a scarf, she donned her heavy wool coat. Underneath, it was my mom, but she added layers of “stuff” to keep her warm.

image

All wrapped up, just slow moving with reduced vision. Layers have and operational downside.

That’s what has happened, in part, to Google. The problem with technology is that if you build a giant facility, it becomes difficult, time consuming, and expensive to tear big chunks of that facility apart and rebuild it. The method of change in MBA class is to draw a couple of boxes, babble a few buzzwords, get a quick touch of Excel fever, and then head to the squash court. The engineering reality is that the MBA diagrams get implemented incrementally. Eventually the desired rebuild is accomplished, but at any point, there is a lot of the original facility still around. If you took an archaeology class for something other than the field trips, you know that humans leave foundations, walls, and even gutters in place. The discarded material is then recycled in the “new” building.

How this apply to Google? Works the same way.

How significant are the changes that Google has made in the last few months? The answer is, “It depends.”

Google has to serve a number of different constituencies. Each constituency has to be kept happy and the “gravity” of each constituency carefully balanced. Algorithms, even Google algorithms, are still software. Software, even smart software that scurries to a look up table to get a red hot value or weight, is chock full of bugs, unknown dependencies, and weird actions that trigger volleyball games or some other mind clearing activity.

image

Google has to make progress and keep its different information “packages” in balance and hooked up.

The first constituency is the advertiser. I know you think that giant companies care about “you” and “your Web site”, but that is just not true. I don’t care about individuals who have trouble using the comments section of this blog. If a user can’t figure something out, what am I supposed to do? Call WordPress and tell them to fix its comments function because one user does not know how to fill in a Web form? I won’t do that. WordPress won’t do that. I am not confident you, gentle reader, would do that. Google has to fiddle with its relevance method because there are some very BIG reasons to take such a risky and unknown charged step as slapping another layer of functionality on top of the ageing PageRank method. My view is that Google is concerned enough to fool with plumbing because of its awareness that the golden goose of Adwords and Adsense is honking in a manner that signals distress. No advertisers, no Google. Pretty simple equation, but that’s one benefit from living in rural Kentucky. I can only discern the obvious.

Read more

Asia Technical Services

April 20, 2011

An Interview with Patrick and Jean Garez

In Hong Kong in late March 2011, I met with one of the senior officers of Asia Tech. The company’s official name is “Asia Technical Services Pte Ltd.” I learned about the company from Dassault Exalead. For eight years Asia Tech has been the partner for Exalead in Asia and has become the “go to” resource for the Dassault Systèmes team covering South Asia regarding Exalead after the acquisition. Based in Singapore, Asia Tech is hours away from Dassault clients in Thailand, China, and Viet-Nam, among other countries whose thirst for Dassault technology continues to increase. In my initial conversation with Jean Garez, the person who appears to be the heir apparent to the firm his father founded, I learned that Asia Tech is now responding to a surge of inquiries about Exalead’s search based applications.

jeanpatrick

Patrick (founder) and Jean Garez (senior manager), Asia Technology Services Pte Ltd.

Upon my return to the US, I followed up with Mr. Garez via Skype for a more lengthy discussion. On the call, Patrick Garez joined the interview. For convenience, I have merged the comments from both Garezs into one stream. The full text of that interview appears below:

What’s the history of Asia Tech?

Asia Technical Services Pte Ltd was first conceived in Hong Kong in 1974 by our founder, and my father, Patrick Garez. The original business was the marketing and after-sales support of products, engineering services and asset management solutions to the commercial aviation industry. My father was a pioneer because he was among the first to predict the growth potential of commercial aviation in the Asia Pacific region and to identify Singapore as the future hub for South East Asia and beyond.

Along the way ATS tackled some industry-specific software solutions supporting various maintenance data management, engineering processes and workflows, but it wasn’t until 2003 that ATS officially began distributing software solutions as a dedicated part of our business.

What triggered the shift?

Client demand. ATS has prided itself on responding to the needs of its clients across this region. Once we started doing work in a different area, word of mouth sent additional projects our way.

ATS focuses on finding leading edge innovative and cost effective ISV solutions from Europe and the US and offering them a platform to enter into the Asia Pacific market with a limited investment.

And your activity in search?

Same path.

In the mid-2000’s up until probably 2009, the search market in Singapore and the region was dominated by legacy platforms built with an 80’s approach key word indexing and  information retrieval. There was some interest in the SPSS and SAS approach to structured data, of course.

However, in response to a client project, we came across a technologically-advanced company in Paris, France. The founder was a member of the original Digital Equipment AltaVista.com search team and making significant progress with technology that was scalable and very, very speedy. In addition, Exalead was deploying a lighter, automated semantic engine that did the thinking for the user by automatically categorizing and providing structure to unstructured data. We tapped them for our client project from then on, we knew we were going to see great things from them. We continued to follow and participate in the growth of this company from their incubation phase until its acquisition in 2010 by Dassault Systems. ATS remains its partner for the region.

Read more

Euro Lecture: Domains and Boundaries in Digital Information

April 15, 2011

I keep getting letters from various government officials asking for my write up of a public lecture I gave in Spain about a year ago. I email and I then get snail mail letters explaining that my document did not arrive. I think the lecture will be accessible worldwide if I reproduce the text with only some redaction and updating in this blog. Herewith is another version of my formal presentation and analysis of informations domains which collide, morph, and evolve. The key point is that by “jumping up a level” even established leaders find that the boundaries have changed. In math, one goes from a simple 1 + 1 problem to an n-space problem. If you disagree with me as much as some conference organizers, use the comments section of this Web log. Don’t send me snail mail or an email. Like some government entities, I don’t receive this type of communication. Brave new world and all that.

When Domains Collide, Boundaries Shift

In the ancient world, crossing a frontier triggered mixed emotions. Fear of the unknown or the threat of brigands outlined some voyagers’ experience. There was excitement, evoked because of real or imagined adventures in crossing boundaries. Leaving the familiar world of one’s home for a vacation in another country can, for some, heighten one’s senses, and stimulate the appetite for adventure. The question becomes, “Where are you?” Look at the figure in the box. No room to move. Look at the figure in a hypercube? Movement is possible. Which is the reality of digital information?

clip_image002

In a box and trapped? Or, room to move?

clip_image002[30]

Perception and defining boundaries becomes more important than ever before.

Boundaries: Real or Imagined?

Those engaged in the information industries today are also trying to cope with boundaries. Few of these feature hard lines of demarcation. When Caesar crossed the Rubicon, the symbolic action committed him to a course of action that rippled through the ruling elite of Rome. Entrepreneurs like Richard Rosenblatt, certainly no Julius Caesar, crossed from the land of MySpace.com into content production. His approach tapped individuals, often with little or no formal journalistic training, to create content. Thousands upon thousands of articles flowed from Demand Media into the firm’s Web sites and on to his clients’ Web sites. Though ignored by the “real” publishing community, Demand Media is poised for an initial public offering, introducing consulting services delivered by individuals who are not “real” consultants, and generating millions of clicks from Web sites like eHow.com and Cracked.com. Demand Media now is contemplating additional services which are similar to those offered by professional publishing companies and consulting firms. When I briefed a publishing company earlier this year, I mentioned Demand Media. I asked who was familiar with the firm. No one in attendance knew much about the company.

The issue of crossing a border, more specifically, the space between something well-known and something not-so-well-known is the focus of this essay. Of particular interest is the intersection of two different domains. Thinking broader than a college student taking her first trip to Paris, I want to explore what happens when digital spaces bump together. The boundaries of these intersections are in my opinion ripe with opportunities.

To give the inquiry some handholds, I will discuss the domains of traditional information and non-traditional production. In some ways, there is a significant financial stake in the boundary between these two domains. Each has its leaders and foot soldiers. Each has a method of working. Each has a mission. Each has a business model or models. What makes the intersection worthy of comment is that the collision of the traditional and non-traditional information worlds an important pivot point.

In the traditional versus non-traditional confrontations, I am not certain which “side” will win. Maybe neither will triumph? The costs of the collision may be so high that both sides fall, spent from the battle. Let’s look at an example of domains in collision.

Before World War One, transportation was expensive. For most Americans and Europeans, horses and mules were the Chevrolets and Hondas of the era. By the end of World War One, automobiles captured the fancy of the public. With that shift, MBAs learned that buggy whip manufacturers should have been able to manufacture seat covers for the horseless carriage. According to business school lore, the bright and agile would thrive. The proprietors who did not adapt had to find their future elsewhere. Sounds good, doesn’t it? Much of the US MBA cant has a similar lilt. The financial improprieties and the gasping economy make many aware of the shortcomings of MBA thinking. The domain of traditional financial conservatism died under the Hummmers driven by the top man at Bear Stearns or by the dare devil Bernie Madoff.

The point is that when domains collide—whether horses and automobiles or business methods based on trust with more facile and fluid approaches—unexpected consequences occur. The boundary at the intersection of domains that collide is one of uncertainty, opportunity, and risk. Winners and losers often look at their fate and wonder, “What happened?”

clip_image002[6]
Demand Media has been a winner. Let me use financial payout as a yard stick. Business Week magazine was the American version of the highly regarded Economist. Bloomberg purchased Business Week for about $5 million. Associated Content, an information factory similar to Demand Media, sold to Yahoo for 15, maybe 18 times more than Business Week. That works about to $90 million versus $5 million. Associated Content and Demand Media produce bulk content for online consumption. If I measure quality in terms of dollars, is Business Week is a lower-value product when viewed in economic terms? Is the reasoned and sonorous writing of Business Week less successful than the crunchy, semi-professional outputs from hundreds of anonymous writers. The lesson from this transaction does not require a sleek, sharp-pencil MBA to explain.

Read more

Why SEO Is in a Bind?

April 4, 2011

In New York City, I gave a breezy 15 minute lecture about “content with intent.” The main point was that traditional search engine optimization methods are now under attack. On one hand, the Web indexing systems have had to admit that SEO distorts results lists. Examples range from links to auto generated placeholder pages such as the one at www.usseek.com or links to sites not related to the user’s query.

Google has made significant changes to its method of relevance ranking. You can read about the infamous Panda update to the PageRank algorithm in these articles:

Blekko.com’s approach has been more direct. The company introduced filtering of sites. For more information about the Blekko method, read “Blekko Banning Some Content Farm Sites.”

The larger problem can be seen by running a free comparison on www.compete.com. Enter the urls for Bing, Facebook, Google, Twitter, and Yahoo in the search box on this page. If the traffic from Facebook and Twitter are combined, the traffic winners will not be a traditional Web search engine in the future. Keep in mind that Compete.com’s data may be different from the data your Web analytics system uses.

image

SEO experts and service providers may find themselves hemmed in by changes such as Google’s Panda algorithm tweak.

The real problem for traditional search engine optimization service providers comes from a combination of factors, not a single factor. This means that Google’s Panda update has disrupted some Web sites’ traffic, there are a number of other forces altering the shape of SEO. These include:

  • A social system which allows a user to recommend a good source of information is performing a spontaneous and in most cases no cost editorial function or a curation activity. A human has filtered sources of information and is flagging a particular source with a value judgment. The individual judgment may be flawed but over time, the social method will provide a useful pool of information sources. Delicious.com was an early example of how this type of system worked.
  • The demographics of users is changing. As younger people enter the datasphere, these individuals are likely to embrace different types of information retrieval methods. Traditional Web search is similar to running a query on a library’s online public access catalog. The new demographic uses mobile devices and often has a different view of the utility of a search box.
  • The SEO methods have interacted with outfits that generate content tailored to what users look for. When Lady Gaga is hot, content farms produce information about Lady Gaga. Over the last five years, producing content that tracks what people are searching for has altered search results. The content may be fine, but the search engines’ relevance ranking methods often skew the results making a user spend more time digging through a results list.
  • Google, as well as other online search systems, is essentially addicted to online advertising revenue. Despite the robust growth on online advertising, Google has to find a way to generate the revenue it needs to support its brute force Web indexing and search system AND keep its stakeholders happy. With search results getting less relevant, advertisers may think twice about betting on Google’s PageRank and Oingo-based Adwords system.

Read more

The FTC, Google and the Buzz

March 30, 2011

I read “Google Will Face Privacy Audits For The Next 20 Long Years.” The Federal Trade Commission has under its umbrella the mechanism to trigger privacy audits of Google’s practices for the next 20 years. Okay. Two decades. The matter fired off the launch pad in February 2010 and, if the story is spot on, landed with a ruling in March 2011. Here’s the passage that caught my attention:

As the FTC put it, “Although Google led Gmail users to believe that they could choose whether or not they wanted to join the network, the options for declining or leaving the social network were ineffective.”

I think this means that Google’s judgment was found lacking. The notion of just doing something and apologizing if that something goes wrong works in some sectors. The method did not seem to work in this particular situation, however.

I noted this passage in the article:

Google has formally apologized for the whole mess, saying “The launch of Google Buzz. fell short of our usual standards for transparency and user control—letting our users and Google down.”

Yep. Apologies. More about those at the Google blog. Here’s the passage of Google speak I found fascinating:

User trust really matters to Google.

For sure. No, really. You know. Really. Absolutely.

I am not sure I have an opinion about this type of “decision”. What strikes me is that if a company cannot do exactly what it wants, that company may be hampered to some degree. On the other hand, a government agency which requires a year to make a decision seems to be operating at an interesting level of efficiency.

What about the users? Well, does either of the parties to this legal matter think about the user? My hunch is that Google wants to get back to the business of selling ads. The FTC wants to move on to weightier matter. The user will continue with behaviors that fascinate economists and social scientists.

In a larger frame, other players move forward creating value. Web indexing, ads, and US government intervention may ultimately have minimal impact at a 12 month remove. Would faster, more stringent action made a more significant difference? Probably but not now.

Maybe Google and the FTC will take Britney Spears’s advice:

“My mum said that when you have a bad day, eat ice-cream. That’s the best advice,”

A modern day Li Kui for sure. For sure. No, really.

Stephen E Arnold, March 30, 2011

Freebie unlike some of life’s activities

OpenText Joins Semantic Web Race

March 25, 2011

Nstein, the Quebec based content administration merchant recently acquired by Open Text, announced the release of a new version of the popular Semantic Navigation software. In a notice on the company’s blog, “Open Text Semantic Navigation Now Available.” The write up presented a lengthy laundry list of features and functions.

Boiling the article down to a sentence or two proved difficult. We believe that OpenText now offers a crawling and indexing system that supports faceted navigation. But there is an important twist. The semantic tool has a search engine optimization and sentiment analysis component as well. The article asserts:

[A licensee can] enrich content–including huge volumes of uncategorized content–by automatically analyzing and tagging it with metadata to help discern relevant and insightful keywords, topics, summaries, and sentiments.

The list of features and functions is lengthy. There is additional information available. Public information is available at this link, but you will need an OpenText user name and password to access the content at this link.

If the product performs according to the descriptions in the source article, a number of OpenText’s competitors will be faced with significant competition.

Stephen E Arnold, March 25, 2011

Freebie

Access Innovations and IEEE Team Up

March 20, 2011

Access Innovations has cultivated a solid relationship with the Institute of Electrical and Electronics Engineers, the foundation of which seems to be their Data Harmony software series.

Access Innovations is one of the leaders in indexing, controlled vocabulary development, and taxonomies. For IEEE Access Innovations has a long, successful track record in helping organizations develop thesauri and controlled vocabularies. The company also has proprietary software which can perform automatic content tagging.

IEEE is responsible for close to a third of the technical publications circulated around the globe, has now sought the firm’s help in revamping how their Xplore library catalogues the massive amounts of data stored within.

Access Innovations said:

To complete the latest project, Access Innovations used an implementation of Data Harmony Metadata Extractor to determine the article’s content type and then built an improved rules base to identify content types in order for each type to be indexed in a specific way using the IEEE Thesaurus.”

Access Innovation’s system provides users the ability to outline and remove information from the source, compiling a fresh record in the process. This marks yet another lucrative venture for the 33 year old company, which services a variety of academic institutions and government agencies.

Micheal Cory, March 20, 2011

Freebie

Facebook, Semantic Search, and Bad News for the Key Word Crowd

March 16, 2011

You can wade through the baloney from the pundits, satraps, and poobahs. I will cut to the chase. Facebook can deliver a useful search service without too many cartwheels. There are three reasons. (If you want to complain, that’s what the comments section of the blog permits. Spare me personal email and LinkedIn comments.)

First, there are upwards of 500 million users who spend more time in Facebook doing Facebook things than I would have ever believed. I don’t do “social” but 500 million or more people see me as a dinosaur watching the snow flakes. Fine.

Second, the Facebook users stuff links in their posts, pages, wall crannies, and everywhere else in the Facebook universe they can. This bunch of urls is a selection filter that is of enormous value to Facebook users. Facebook gets real people stuffing in links without begging, paying, or advertising. The member-screened and identified links just arrive.

Third, indexing the content on the pages to which the links refer produces an index that is different from and for some types of content more useful to Facebook members than laundry lists, decision engine outputs, or faceted results from any other system. Yep, “any other”. That situation has not existed since the GOOG took the learnings of the key word crowd, bought Oingo, and racked up the world’s biggest online advertising and search engine optimization operation in the history of digital mankind.

Navigate to “New Facebook Patent: the Huge Implications of Curated Search” and learn Bnet’s view of a patent document. I am not as excited about the patent at the Bnet outfit, but it is interesting. If one assumes that the patent contributes to the three points I identified above, Facebook gets a boost.

But my view is that Facebook does not need much in the way of a boost from semantics or any other hot trend technology. Facebook is sitting on a search gold mine. When Facebook does release its index of member-provided sources, four things will take place over a period of “Internet” time.

  1. The Google faces a competitor able to index at lower cost. Google, remember, is a brute force operation. Facebook is letting the members do the heavy lifting. A lower cost index of Facebook-member-vetted content is going to be a threat. The threat may fizzle, but a threat it will be to the Google.
  2. Users within Facebook can do “search” where Facebook members prefer to be. This means that Facebook advertising offers some interesting opportunities not lost on the Xooglers who now work at Facebook and want a gigantic payday for themselves. Money can inspire certain types of innovation.
  3. Facebook is closed. The “member” thing is important to keep in mind. The benefits of stateful actions are many, and you don’t need me to explain why knowing who a customer is, who the customer’s friends are, and what the customer does is important. But make the customer a member and you get some real juice.
  4. Facebook competitors will have to find a way to deal with the 500 million members and fast. Facebook may not be focused on search, but whatever the company does will leverage the membership, not the whizzy technology.

Bottomline: Facebook has an opportunity in search whether it does laundry lists, facets, semantics, or any combination of methods. My question, “When will Facebook drop its other social shoe?”

Stephen E Arnold, March 16, 2011

Freebie unlike the ads big companies will want to slap into Facebook outputs for its members

Metadata Are Important. Good to Know.

March 16, 2011

I read “When it Comes to Securing and Managing Data, It’s all about the Metadata.” The goslings and I have no disagreement about the importance of metadata. We do prefer words and phrases like controlled term lists, controlled vocabularies, classification systems, indexing, and geotagging. But metadata is hot so metadata the term shall be.

There is a phase that is useful when talking about indexing and the sorts of things in our preferred terms list. That phrase is “editorial policy.” Today’s pundits, former English majors, and unemployed Webmasters like the word “governance.” I find the word disconcerting because “governance” is unfamiliar to me. The word is fuzzy and, therefore, ideal for the poobahs who advise organizations unable to find content on the reasons for the lousy performance of one or more enterprise search systems.

The article gallops through these concepts. I learned about the growing issue of managing and securing structured and semi structured data within the enterprise.  (Isn’t this part of security?) I learned about collaborative content technologies are on the increase which is an echo of locking a file which several people edit in an authoring system.)

I did notice this factoid:

IDC forecasts that the total digital universe volume will increase by a factor of 44 in 2020. According to the report, unstructured data and metadata have an average annual growth rate of 62 percent. More importantly, high-value information is also skyrocketing. In 2008, IDC found that 22 to 33 percent of the digital universe was high-value information (data and content that are governed by security, compliance and preservation obligations). Today, IDC forecasts that high-value information will comprise close to 50 percent of the digital universe by the end of 2020.

There you go. According to the article, metadata framework technology is a large part of the answer to this problem to collect user and group information, permissions information, access activity, and sensitive content indicators.

My view is to implement an editorial policy for content. Skip the flowery and made-up language. Get back to basics. That would be what I call indexing, a component addressed in an editorial policy. Leave the governance to the government. The government is so darn good at everything it undertakes.

Stephen E Arnold, March 16, 2011

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta