Open Source: A Bad Fit for Corporations?

November 9, 2015

I read “Corporations and OSS Do Not Mix.” The write up fooled me. I thought the approach was going to be that proprietary software vendors and open source code may find themselves at odds.

I was wrong.

The article explains that open source software and commercial organizations bump into licensing issues and some real world hurdles. The article states:

the joy and enthusiasm that I had when I started working on open source has been flattened. My attitude was naïve at best – this is fun and maybe I’m helping some other people do good and have fun too. This is also how a lot of my friends presently view their projects.

The list of challenges ranges from the selfishness of the commercial enterprise to dumb requests.

I also noted this passage:

Open source software is full of toxic people. This certainly shouldn’t be a surprise at this point. I would guess that it is safe to say that pretty much every person (including myself, I’m certainly not exempt from this) has had bad days and reacted poorly when dealing with the community, contributors, colleagues, etc. These are not excuses and these events can (and often do) shape the behaviors of the community and those observing it.

The article includes a list of positive ideas.

My hunch is that search vendors with proprietary software will become aggressive disseminators of the anti-open source possibilities of this write up.

That’s what makes search and content processing such credible business sectors.

Stephen E Arnold, November 9, 2015

Weekly Watson: A Fast Food Winner May Be Coming

November 9, 2015

I read “Watson’s Melt in Your Mouth Moroccan Almond Curry.” Each time I learn about a new recipe, I realize that IBM has invested in a very promising technology. Instead of a human chef fiddling around in the kitchen or (heaven forbid) attending an inefficient cooking school, let IBM Watson develop new recipes.

Imagine how far the culinary arts would have advanced if in 1895, Watson instead of non digital chefs set up the world famous and incredibly French outfit. Julia Childs, when she was not working on her part time government job, would not have wasted her time trying to generate a baguette in a home kitchen oven.

The write up reports with what seems to be a quite serious tone:

On paper the recipe looked to be leaning toward bland, but its clever combination of all the elements worked. Traditional Moroccan lamb curries have intense flavors highlighted by garlic, onion, sometimes ginger, cinnamon and then sweetened with honey and dried apricot to balance lamb’s strong taste. Here, though, Watson prescribes small amounts of cardamom, cumin, turmeric.

What no tamarind which plays a key role in a BBQ sauce, partner?

The article states:

I’ll make this again for sure, though I might consider augmenting the sweetness provided by the tomatoes with some kind of fruit or honey and perhaps throwing whole blanched almonds in during the long cook. If you’ve been waiting for a more “user friendly” Watson dish to serve up to friends, then this is one you should seriously consider.

I know that IBM actually generates revenue from its mainframe business. The recipes created by Watson? Probably not. IBM could open a fast food restaurant and blow the limping KFC and McDonald’s out of the water. I think the IBM goal of a $1 billion in revenue may be well served with some piping hot almond curry. An instant fast food winner here in rural Kentucky. Watson can generate a business plan for this as quickly as IBM decided to support an updated version of OS/2.

Stephen E Arnold, November 9, 2015

Data Fusion: Not Yet, Not Cheap, Not Easy

November 9, 2015

I clipped an item to read on the fabulous flight from America to shouting distance of Antarctica. Yep, it’s getting smaller.

The write up was “So Far, Tepid Responses to Growing Cloud Integration Hariball.” I think the words “hair” and “ball” convinced me to add this gem to my in flight reading list.

The article is based on a survey (nope, I don’t have the utmost confidence in vendor surveys). Apparently the 300 IT “leaders” experience

pain around application and data integration between on premises and cloud based systems.

I had to take a couple of deep breaths to calm down. I thought the marketing voodoo from vendors embracing utility services (Lexmark/Kapow), metasearch (Vivisimo, et al), unified services (Attivio, Coveo, et al), and licensees of conversion routines from outfits ranging from Oracle to “search consulting” in the “search technology” business had this problem solved.

If the vendors can’t do it, why not just dump everything in a data lake and let an open source software system figure everything out. Failing that, why not convert the data into XML and use the magic of well formed XML objects to deal with these issues?

It seems that the solutions don’t work with the slam dunk regularity of a 23 year old Michael Jordan.

Surprise.

The write up explains:

The old methods may not cut it when it comes to pulling things together. Two in three respondents, 59%, indicate they are not satisfied with their ability to synch data between cloud and on-premise systems — a clear barrier for businesses that seek to move beyond integration fundamentals like enabling reporting and basic analytics. Still, and quite surprisingly, there isn’t a great deal of support for applying more resources to cloud application integration. Premise-to-cloud integration, cloud-to-cloud integration, and cloud data replication are top priorities for only 16%, 10% and 10% of enterprises, respectively. Instead, IT shops make do with custom coding, which remains the leading approach to integration, the survey finds.

My hunch is that the survey finds that hoo-hah is not the same as the grunt work required to take data from A, integrate it with data from B, and then do something productive with the data unless humans get involved.

Shocker.

I noted this point:

As the survey’s authors observe. “companies consistently under estimate the cost associated with custom code, as often there are hidden costs not readily visible to IT and business leaders.”

Reality just won’t go away when it comes to integrating disparate digital content. Neither will the costs.

Stephen E Arnold, November 9, 2015

Photo Farming in the Early Days

November 9, 2015

Have you ever wondered what your town looked like while it was still urban and used as farmland?  Instead of having to visit your local historical society or library (although we do encourage you to do so), the United States Farm Security Administration and Office Of War Information (known as  FSA-OWI for short) developed Photogrammer.  Photogrammer is a Web-based image platform for organizing, viewing, and searching farm photos from 1935-1945.

Photogrammer uses an interactive map of the United States, where users can click on a state and then a city or county within it to see the photos from the timeline.  The archive contains over 170,000 photos, but only 90,000 have a geographic classification.  They have also been grouped by the photographer who took the photos, although it is limited to fifteen people.  Other than city, photographer, year, and month, the collection c,an be sorted by collection tags and lot numbers (although these are not discussed in much detail).

While farm photographs from 1935-1945 do not appear to need their own photographic database, the collection’s history is interesting:

“In order to build support for and justify government programs, the Historical Section set out to document America, often at her most vulnerable, and the successful administration of relief service. The Farm Security Administration—Office of War Information (FSA-OWI) produced some of the most iconic images of the Great Depression and World War II and included photographers such as Dorothea Lange, Walker Evans, and Arthur Rothstein who shaped the visual culture of the era both in its moment and in American memory. Unit photographers were sent across the country. The negatives were sent to Washington, DC. The growing collection came to be known as “The File.” With the United State’s entry into WWII, the unit moved into the Office of War Information and the collection became known as the FSA-OWI File.”

While the photos do have historical importance, rather than creating a separate database with its small flaws, it would be more useful if it was incorporated into a larger historical archive, like the Library of Congress, instead of making it a pet project.

Whitney Grace, November 9, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Banks Turn to Blockchain Technology

November 9, 2015

Cryptocurrency has come a long way, and now big banks are taking the technology behind Bitcoin very seriously, we learn in “Nine of the World’s Biggest Banks Form Blockchain Partnership” at Re/code. Led by financial technology firm R3, banks are signing on to apply blockchain tech to the financial markets. A few of the banks involved so far include Goldman Sacks, Barclays, JP Morgan, Royal Bank of Scotland, Credit Suisse, and Commonwealth Bank of Australia. The article notes:

“The blockchain works as a huge, decentralized ledger of every bitcoin transaction ever made that is verified and shared by a global network of computers and therefore is virtually tamper-proof. The Bank of England has a team dedicated to it and calls it a ‘key technological innovation.’ The data that can be secured using the technology is not restricted to bitcoin transactions. Two parties could use it to exchange any other information, within minutes and with no need for a third party to verify it. [R3 CEO David] Rutter said the initial focus would be to agree on an underlying architecture, but it had not yet been decided whether that would be underpinned by bitcoin’s blockchain or another one, such as one being built by Ethereum, which offers more features than the original bitcoin technology.”

Rutter did mention he expects this tech to be used post-trade, not directly in exchange or OTC trading, at least not soon. It is hoped the use of blockchain technology will increase security while reducing security and errors.

Cynthia Murrell, November 9, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Big Data, Like Enterprise Search, Kicks the ROI Can Down the Road

November 8, 2015

I read “Experiment with Big Data Now, and Worry about ROI Later, Advises Pentaho ‘Guru’.” That’s the good thing about gurus. As long as the guru gets a donation, the ROI of advice is irrelevant.

I am okay with the notion of analyzing data, testing models, and generating scenarios based on probabilities. Good, useful work.

The bit that annoys me is the refusal to accept that certain types of information work is an investment. The idea that fiddling with zeros and ones has a return on investment is—may I be frank?—stupid.

Here’s a passage I noted as a statement from a wizard from Pentaho, a decent outfit:

“There are a couple of business cases you can make for data laking. One is warm storage [data accessed less often than “hot”, but more often than “cold”] – it’s much faster and cheaper to run than a high-end data warehouse. On the other hand, that’s not where the real value is – the real value is in exploring, so that’s why you do at least need to have a data scientist, to do some real research and development.”

The buzzwords, the silliness of “real value,” and “real” research devalue work essential to modern business.

Enterprise search vendors were the past champions of baloney. Now the analytics firms are trapped in the fear of valueless activity.

That’s not good for ROI, is it?

Stephen E Arnold, November 8, 2015

Search Froth: Coveo Swallows Another $35 Million

November 7, 2015

If you believe in search, you will not feel nervous that a 10 year old information access company has required $67.7 million in capital. I don’t believe in search as a million dollar business. I struggle to understand how Coveo will generate enough sales to pay back the $67 million and change.

I read “Coveo Grabs $35 Million Series D.” Here’s a passage I highlighted:

The funding will fuel the expansion of Coveo’s operations and its strategy to become the de facto technology that businesses and enterprise platforms use to recommend the most relevant information, people, products and services for customer and employee engagement. The funding will help speed the launch of additional intelligent search apps for large enterprise technology ecosystems where contextual insights from outside the platform are critical to delivering unified, engaging experiences. Coveo also plans to make further investments in sales and marketing, R&D and product groups, and will also grow its team in those strategic areas of the business.

Yep, intelligent. Search based apps. Customer support. Rah rah.

I also circled this passage:

Coveo is experiencing an extended period of hyper growth, recently announcing many consecutive quarters of record-breaking growth and recent industry recognition. Growth stats include quarterly records for new customers signed, and quarterly revenue bookings that have more than doubled over prior years. In Q3, and for the second consecutive year, Coveo was recognized as the most visionary leader in Gartner’s 2015 Magic Quadrant for Enterprise Search. The company was also named a leader in Big Data Search and Knowledge Discovery by Forrester Research in September.

The write up invokes the Coveo Cloud and uses Attivio’s phrase “unified search.” There are references to content management. There are nods to the mid tier consultants who opine about search technology often without context or hands on experience.

But the subtext is clear: Coveo is in the same game as IBM Watson. But Watson is hitting the same barriers as its search precursors. Making money from a utility function is tough. With open source options, the business proposition supports a consulting business. But cranking out hundreds of millions from search technology remains a very tough job.

The search challenge is meaningful because enterprise search continues to drag around several financial albatrosses. There is the implosion of Convera which not even Allen & Co. could push to supersonic speeds. There is also the small matter of Fast Search & Transfer, its financial missteps, and the $1.2 billion that Microsoft paid for a company which has the distinction of a founder found wanting in the eyes of the law. And I have to mention
Autonomy. Yikes, Autonomy.

The point of these examples is to underscore several points:

  1. Investment in search and content processing seems to have a pattern of falling short of the revenue mark or the expectations of the purchasers of these gussied up outfits. Hey, it is history, folks.
  2. Open source search alternatives exist and are gaining wider acceptance. The business model of Elastic is to provide value added services, but the “customer” essential sells himself or herself.
  3. Wild and crazy IBM wants to reinvent search as cognitive computing. The bet is in the billion dollar range, and I think it faces insurmountable odds because the IBM model is somewhat similar to HP’s plans for search. How many big dogs fit in this kennel?

So what does a company do with tens of millions in funding after 10 years in business?

My hunch is that the investors want Coveo spiffed up and sold to a larger company. Once that deal goes through, the investors will breathe a sigh of release and move on to the next big thing—maybe investing in luxury safari resorts in Africa.

My hunch is that Coveo, like Attivio, BA Insight, Recommind, and X1, will have a very difficult time hitting Endeca’s revenue number of about $140 million when the company sold to Oracle. Talk about hundreds of millions in revenue is easy. Delivering sustainable, organic revenue requires more than buzzwords, pivots, and incantations from mid tier consulting firms.

In today’s market, selling Coveo and moving on may be the golden egg a decade old goose might be able to lay.

Stephen E Arnold, November 6, 2015

Another Semantic Search Play

November 6, 2015

The University of Washington has been search central for a number of years. Some interesting methods have emerged. From Jeff Dean to Alon Halevy, the UW crowd has been having an impact.

Now another search engine with ties to UW wants to make waves with a semantic search engine. Navigate to “Artificial-Intelligence Institute Launches Free Science Search Engine.” The wizard behind the system is Dr. Oren Etzioni. The money comes from Paul Allen, a co founder of Microsoft.

Dr. Etzioni has been tending vines in the search vineyard for many years. His semantic approach is described this way:

But a search engine unveiled on 2 November by the non-profit Allen Institute for Artificial Intelligence (AI2) in Seattle, Washington, is working towards providing something different for its users: an understanding of a paper’s content. “We’re trying to get deep into the papers and be fast and clean and usable,” says Oren Etzioni, chief executive officer of AI2.

Sound familiar: Understanding what a sci-tech paper means?

According to the write up:

Semantic Scholar offers a few innovative features, including picking out the most important keywords and phrases from the text without relying on an author or publisher to key them in. “It’s surprisingly difficult for a system to do this,” says Etzioni. The search engine uses similar ‘machine reading’ techniques to determine which papers are overviews of a topic. The system can also identify which of a paper’s cited references were truly influential, rather than being included incidentally for background or as a comparison.

Does anyone remember Gene Garfield? I did not think so. There is a nod to Expert System, an outfit which has been slogging semantic technology in an often baffling suite of software since 1989. Yep, that works out to more than a quarter of a century.) Hey, few doubt that semantic hoohah has been a go to buzzword for decades.

There are references to the Microsoft specialist search and some general hand waving. The fact that different search systems must be used for different types of content should raise some questions about the “tuning” required to deliver what the vendor can describe as relevant results. Does anyone remember what Gene Garfield said when he accepted the lifetime achievement award in online? Right, did not think so. The gist was that citation analysis worked. Additional bells and whistles could be helpful. But humans referencing substantive sci-tech antecedents was a very useful indicator of the importance of a paper.

I interpreted Dr. Garfield’s comment as suggesting that semantics could add value if the computational time and costs could be constrained. But in an era of proliferating sci-tech publications, bells and whistles were like chrome trim on a 59 Oldsmobile 98. Lots of flash. Little substance.

My view is that Paul Allen dabbled in semantics with Evri. How did that work out? Ask someone from the Washington Post who was involved with the system.

Worth testing the system in comparative searches against commercial databases like Compendex, ChemAbs, and similar high value commercial databases.

Stephen E Arnold, November 5, 2015

Data Lake and Semantics: Swimming in Waste Water?

November 6, 2015

I read a darned fascinating write up called “Use Semantics to Keep Your Data Lake Clear.” There is a touch of fantasy in the idea of importing heterogeneous “data” into a giant data lake. The result is, in my experience, more like waste water in a pre-treatment plant in Saranda, Albania. Trust me. Distasteful.

Looks really nice, right?

The write up invokes a mid tier consultant and then tosses in the fuzzy word term governance. We are now on semi solid ground, right? I do like the image of a data swap which contrasts nicely with the images from On Golden Pond.

I noted this passage:

Using a semantic data model, you represent the meaning of a data string as binary objects – typically in triplicates made up of two objects and an action. For example, to describe a dog that is playing with a ball, your objects are DOG and BALL, and their relationship is PLAY. In order for the data tool to understand what is happening between these three bits of information, the data model is organized in a linear fashion, with the active object first – in this case, DOG. If the data were structured as BALL, DOG, and PLAY, the assumption would be that the ball was playing with the dog. This simple structure can express very complex ideas and makes it easy to organize information in a data lake and then integrate additional large data stores.

Okay.

Next I circled:

A semantic data lake is incredibly agile. The architecture quickly adapts to changing business needs, as well as to the frequent addition of new and continually changing data sets. No schemas, lengthy data preparation, or curating is required before analytics work can begin. Data is ingested once and is then usable by any and all analytic applications. Best of all, analysis isn’t impeded by the limitations of pre-selected data sets or pre-formulated questions, which frees users to follow the data trail wherever it may lead them.

Yep, makes perfect sense. But there is one tiny problem. Garbage in, garbage out. Not even modern jargon can solve this decades old computer challenge.

Fantasy is much better than reality.

Stephen E Arnold, November 6, 2015

Self Deception and Web Search

November 6, 2015

It never occurred to me that humans would fool themselves via Web search. I assumed falsely that an individual seeking information would obtain a knowledge pile by reading, conversation with others, and analysis. The idea of using a Web search to get smart never struck me as a good idea. Use of commercial databases to obtain information was a habit I formed at good old Booz, Allen & Hamilton. Ellen Shedlarz, the ace information professional, sort of tolerated my use of the then-expensive, tough to use online services. Favorites sources of information for me in the late 1970s were Compendex, ChemAbs, and my old favorite ABI INFORM.

Imagine my surprise when I read “Googling Stuff Can Cause us to Overestimate our Own Knowledge.” The write up reported:

The main takeaway message of this research is that when we’re called on to provide information without the internet’s help, we need to be aware that we might possess a false sense of security. The most obvious example of how we should apply this is in the run up to a school or university examination. If we only ever prepare for examinations with the internet on hand and don’t take closed book mock tests without the internet’s help, we might not realize until it is too late that information that we think is in our heads actually isn’t.

There you go. False confidence or the Google effect.

From my point of view, the issue is not confined to a particular Web search system. The assumption that anyone can get smart via a query, reading some documents, and answer a question is just one manifestation of entitlement.

The person seeking information assumes that his or her skills are up to the task of figuring out what’s correct, what’s baloney, and what’s important is a facet of the gold star mentality. Everyone gets a reward for going through the motions. Participate in a race. That’s the same as winning the race. Answer some multiple choice questions. That’s the same as working out a math problem in long hand.

Unfortunately it takes real work to learn something, understand it, and apply it to achieve a desired result.

Locating a restaurant via a voice search is nifty, but if the restaurant is a rat hole, one’s tummy may rebel.

Search and retrieval is work. Quick example.

In a casual conversation with a doctoral student, I mentioned the Dark Web.

The student told me, “Yes, I plan to dive into the Dark Web and maybe do a training program for executives.”

Good idea, but the person with whom I was speaking has some interesting characteristics:

  • No programming or technical expertise
  • No substantive background in security
  • No awareness of the risks associated with poking around in hidden Web sites.

However, the person has the entitlement quality. The assumption that an unfamiliar topic can be figured out quickly and easily. What could possibly go wrong?

One possibility: Accessing a Dark Web site operated by a law enforcement or intelligence entity.

As I asked, what could possibly go wrong?

Stephen E Arnold, November 6, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta