Facebook and Humans: Reality Is Not Marketing

May 16, 2016

I read “Facebook News Selection Is in Hands of Editors Not Algorithms, Documents Show.” The main point of the story is that Facebook uses humans to do work. The idea is that algorithms do not seem to be a big part of picking out what’s important.

The write up comes from a “real” journalism outfit. The article points out:

The boilerplate about its [Facebook’s]  news operations provided to customers by the company suggests that much of its news gathering is determined by machines: “The topics you see are based on a number of factors including engagement, timeliness, Pages you’ve liked and your location,” says a page devoted to the question “How does Facebook determine what topics are trending?”

After reading this, I thought of Google’s poetry created by its artificial intelligence system. Here’s the line which came to mind:

I started to cry. (Source: Quartz)

I vibrate with the annoyance bubbling under the surface of the newspaper article. Imagine. Facebook has great artificial intelligence. Facebook uses smart software. Facebook open sources its systems and methods. The company says it is at the cutting edge of replacing humans with objective procedures.

The article’s belief in baloney is fried and served cold on stale bread. Facebook uses humans. The folks at real journalism outfits may want to work through articles like “Different Loci of Semantic Interference in Picture Naming vs. Word-Picture Matching Tasks” to get a sense of why smart systems go wandering.

So what’s new? Palantir Technologies uses humans to index content. Without that human input, the “smart” software does some useful work, but humans are part of the work flow process.

Other companies use humans too. But the marketing collateral and the fizzy presentations at fancy conferences paint a picture of a world in which cognitive, artificially intelligent, smart systems do the work that subject matter experts used to do. Humans, like indexers and editors, are no longer needed.

Now reality pokes is rose tinted fingertips into the real world.

Let me be clear. One reason I am not happy with the verbiage generated about smart software is one simple fact.

Most of the smart software systems require humans to fiddle at the beginning when a system is set up, while the system operates to deal with exceptions, and after an output is produced to figure out what’s what. In short, smart software is not that smart yet.

There are many reasons but the primary one is that the math and procedures underpinning many of the systems with which I am familiar are immature. Smart software works well when certain caveats are accepted. For example, the vaunted Watson must be trained. Watson, therefore, is not that much different from the training Autonomy baked into its IDOL system in the mid 1990s. Palantir uses humans for one simple reason. Figuring out what’s important to a team under fire with software works much better if the humans with skin in the game provide indexing terms and identify important points like local names for stretches of highway where bombs can be placed without too much hassle. Dig into any of the search and content processing systems and you find expenditures for human work. Companies licensing smart systems which index automatically face significant budget overruns, operational problems because of lousy outputs, and piles of exceptions to either ignore or deal with. The result is that the smoke and mirrors of marketers speaking to people who want a silver bullet are not exactly able to perform like the carefully crafted demonstrations. IBM i2 Analyst’s Notebook requires humans. Fast Search (now an earlobe in SharePoint) requires humans. Coveo’s system requires humans. Attivio’s system requires humans. OpenText’s suite of search and content processing requires humans. Even Maxxcat benefits from informed set up and deployment. Out of the box, dtSearch can index, but one needs to know how to set it up and make it work in a specific Microsoft environment. Every search and content processing system that asserts that it is automatic is spackling flawed wallboard.

For years, I have given a lecture about the essential sameness of search and content processing systems. These systems use the same well known and widely taught mathematical procedures. The great breakthroughs at SRCH2 and similar firms amount to optimization of certain operations. But the whiziest system is pretty much like other systems. As a result, these systems perform in a similar manner. These systems require humans to create term lists, look up tables of aliases for persons of interest, hand craft taxonomies to represent the chunk of reality the system is supposed to know about, and other “libraries” and “knowledgebases.”

The fact that Watson is a source of amusement to me is precisely because the human effort required to make a smart system work is never converted to cost and time statements. People assume Watson won Jeopardy because it was smart. People assume Google knows what ads to present because Google’s software is so darned smart. People assume Facebook mines its data to select news for an individual. Sure, there is automation of certain processes, but humans are needed. Omit the human and you get the crazy Microsoft Tay system which humans taught to be crazier than some US politicians.

For decades I have reminded those who listened to my lectures not to confuse what they see in science fiction films with reality. Progress in smart software is evident. But the progress is very slow, hampered by the computational limits of today’s hardware and infrastructure. Just like real time, the concept is easy to say but quite expensive and difficult to implement in a meaningful way. There’s a reason millisecond access to trading data costs so much that only certain financial operations can afford the bill. Smart software is the same.

How about less outrage from those covering smart software and more critical thinking about what’s required to get a system to produce a useful output? In short, more info and less puffery, more critical thinking and less sawdust. Maybe I imagined it but both the Google and Tesla self driving vehicles have crashed, right? Humans are essential because smart software is not as smart as those who believe in unicorns assume. Demos, like TV game shows, require pre and post production, gentle reader.

What happens when humans are involved? Isn’t bias part of the territory?

Stephen E Arnold, May 16, 2016

Watson Does Cyber Security

May 10, 2016

I heard a rumor that Palantir Technologies has turned down the volume on its cybersecurity initiative. I was interested to learn that IBM is jumping into this niche following the lead of its four star general Thomas “Weakly” Watson.

According to “IBM’s Watson Is Going to Cybersecurity School,” General Watson “announced a new year-long research project through which it will collaborate with eight universities to help train its Watson artificial-intelligence system to tackle cybercrime.”

A number of capable outfits are attacking this market sector. Instead of buying a high octane outfit, I learned:

This fall, it will begin working with students at universities including California State Polytechnic University at Pomona, Penn State, MIT, New York University and the University of Maryland at Baltimore County along with Canada’s universities of New Brunswick, Ottawa and Waterloo.

Never give up. Forward, march.

Stephen E Arnold, May 10, 2016

Artificial Intelligence Spreading to More Industries

May 10, 2016

According to MIT Technology Review, it has finally happened. No longer is artificial intelligence the purview of data wonks alone— “AI Hits the Mainstream,” they declare. Targeted AI software is now being created for fields from insurance to manufacturing to health care. Reporter Nanette Byrnes  is curious to see how commercialization will affect artificial intelligence, as well as how this technology will change different industries.

What about the current state of the AI field? Byrnes writes:

“Today the industry selling AI software and services remains a small one. Dave Schubmehl, research director at IDC, calculates that sales for all companies selling cognitive software platforms —excluding companies like Google and Facebook, which do research for their own use—added up to $1 billion last year. He predicts that by 2020 that number will exceed $10 billion. Other than a few large players like IBM and Palantir Technologies, AI remains a market of startups: 2,600 companies, by Bloomberg’s count. That’s because despite rapid progress in the technologies collectively known as artificial intelligence—pattern recognition, natural language processing, image recognition, and hypothesis generation, among others—there still remains a long way to go.”

The article examines ways some companies are already using artificial intelligence. For example, insurance and financial firm USAA is investigating its use to prevent identity theft, while GE is now using it to detect damage to its airplanes’ engine blades. Byrnes also points to MyFitnessPal, Under Armor’s extremely successful diet and exercise tracking app. Through a deal with IBM, Under Armor is blending data from that site with outside research to help better target potential consumers.

The article wraps up by reassuring us that, despite science fiction assertions to the contrary, machine learning will always require human guidance. If you doubt, consider recent events—Google’s self-driving car’s errant lane change and Microsoft’s racist chatbot. It is clear the kids still need us, at least for now.

 

Cynthia Murrell, April 10, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Semantics Made Easier

May 9, 2016

For fans of semantic technology, Ontotext has a late spring delight for you. The semantic platform vendor Ontotext has released GraphDB 7. I read “Ontotext Releases New Version of Semantic Graph Database.” According to the announcement, set up and data access are easier. I learned:

The new release offers new tools to access and explore data, eliminating the need to know everything about the dataset before start working with it. GraphDB 7 enables users to navigate their way through third-party and any other dataset regardless of data volumes, which makes it a powerful Big Data analytics tool. Ver.7 offers visual exploration of the loaded data schema – ontology, interactive query builder for better entity retrieval, and full support for RDF 1.1 allowing smooth import of a huge number of public Open Data as well as proprietary Linked Datasets.

If you want to have a Palantir-type system, check out Ontotext. The company is confident that semantic technology will yield benefits, a claim made by other semantic technology vendors. But the complexity challenges associated with conversion and normalization of content is likely to be a pebble in the semantic sneaker.

Stephen E Arnold, May 9, 2016

Old Pals Chatting: IDC Expert Chums Up Cognitive Marketing

May 4, 2016

I recall a fellow named Dave Schubmehl. You may recall that name. He was the IDC wizard who ingested my research about open source outfits and then marketed it via Amazon without my permission. Since that go round with my information used without a written agreement with me, I have taken a skeptical view of IDC and its “experts.” I won’t comment on its business practices, administrative acumen, and general ineptitude with regard to publishing a bit of my research as an eight page, $3,500 “analysis.” Yikes. Eight pages at $3,500 for work pumped out on Amazon, the WalMart of the digital world.

I read, therefore, with considerable skepticism “Interview with Rich Vancil: Group VP, Executive Advisory of IDC.” I was not disappointed. Perhaps I should say, my already low expectations were just about met.

The interviewer, according to the interview text, has been an acquaintance of the IDC wizard for decades. Furthermore, the interviewer (obviously an objective type of person) will “meet up to catch up on life outside business.” The article is “old pals chatting.”

What a chat?

I learned that:

The IDC 3rd Platform is a broad term for our present IT industry and economy. It is where 100% of WW IT revenue growth is coming from and it includes the product categories of Mobile; Social; Cloud, and Big Data. The 3rd Platform is eclipsing the 2nd Platform – described broadly as the “last 30 years” of IT, and this has been mainly enterprise computing: Lan / Internet; Client / Server; and premised based infrastructure such as servers, storage, and licensed software.

A third platform. “Platform” is an interesting word. I get the idea of a Palantir platform. I suppose I can get in sync with the Windows 10 platform. But an IDC platform? Well, that’s an idea which would never have floated from the pond filled with mine drainage here in Harrod’s Creek.

A consulting firm is in the business of selling information. A platform exists at outfits like Booz, Allen, McKinsey, and Bain. But the notion that a mid tier outfit has had three platforms intrigues me. When I looked at some of the 1917-1918 reports at Booz, Allen when Ellen Shedlarz ran the information center, the format, the tone, the approach, and the word choice was incorporated in the charm school into which new hires were herded. I could, in a moment of weakness, call Booz, Allen’s systems and methods a platform. But are the words “systems” and “methods” more appropriate?

The other interesting point in the write up was a nifty new diagram which purports to make clear the third platform confection. I know you won’t be able to read the diagram. Buy the report which hopefully is less than the $3,500 slapped on eight pages of my research.

image

Source: IDC 2016 at this link. If you find the link dead, just buzz up IDC and order document 01517018. The reports based on my research were 236511, 236514, 236086, and 237410. Buy them all for a mere $14,000.

Notice the blobs. Like another mid tier outfit, blobs are better than numbers. The reason fuzziness is a convenient graphic device is that addled geese like me ask questions; for example:

  • What data are behind the blobs
  • What was the sample size
  • Where did the categories come from like “cognitive marketing”?

I have a supposition about the “cognitive” thing. The IDC wizard Dave Schubmehl pumped out lots of tweets about IBM cognitive computing. One IDC executive, prior to seeking a future elsewhere, wrote a book about “cognitive” processes. Both of these IDC experts guzzled the IBM Watson lattes somewhere along the cafeteria line.

Back to the interview among two friends. I learned:

MarTech is a big deal. IDC is doing a very careful accounting of this area and we now account for 78 separate product / service categories and literally thousands of vendors. Like any other emerging and fast growth IT category, consolidation will be inevitable. But in the meantime, it makes for a daunting set of choices for the CMO and team.

I like the word daunting. There is nothing like a list of items which are not grouped in a useful manner to set IDC neural pathways abuzz. But the IDC mavens have cracked the problem. The company has produced a remarkable 2015 technology map. Check this out:

image

Source: Expert Interview, 2016

I moved forward in the write up. The daunting problem has contributed to what the interviewer describes as “an awesome conference.” I like that “awesome” thing. How does the write up conclude? There is a reference to golf, the IDC professional’s medical history, and this statement:

The best analysts can simplify, simplify. Analysts who try to impress by using big words and complex frameworks…end up confusing their audience and so they become ineffective.

Remarkable content marketing.

Stephen E Arnold, May 4, 2016

Unicorn Land: Warm Hot Chocolate and a Nap May Not Help

April 25, 2016

In the heady world of the unicorn, there are not too many search and content processing companies. I do read open source information about Palantir Technologies. Heck, I might even wrap up my notes about Palantir Gotham and make them available to someone with a yen to know more about a company which embraces secrecy but has a YouTube channel explaining how its system works.

I was poking around for open source information about how Palantir ensures that a person with a secret clearance does not “see” information classified at a higher level of access. From what I have read, the magic is in time stamps, open source content management, and some middleware. I took a break from reading the revelations from a person in the UK who idled away commute time writing about Palantir and noted “On the Road to Recap: Why the Unicorn Financing Market Just Became Dangerous for All Involved.”

I enjoy “all” type write ups. As I worked through the 5,600 word write up, I decided not to poke fun at the logic of “all” and jotted down the points which struck me as new information and the comments which I thought might be germane to Palantir, a company which (as I document in my Palantir Notebook) has successfully fast cycles of financing between 2003 and 2015 when the pace appears to have slowed.

There is no direct connection between the On the Road to Recap article and Palantir, and I certainly don’t want to draw explicit parallels. In this blog post, let me highlight some of the passages from the source article and emphasize that you might want to read the original article. If you are interested in search and content processing vendors like Attivio, Coveo, Sinequa, Smartlogic, and others of their ilk, some of the “pressures” identified in the source article are likely to apply. If the write up is on the money, I am certainly delighted to be in rural Kentucky thinking about what to have for lunch.

The first point I noted was new information to me. You, gentle reader, may be MBAized and conversant with the notion of understanding the lay of the land; to wit:

most participants in the ecosystem have exposure to and responsibility for specific company performance, which is exactly why the changing landscape is important to understand.

Ah, reality. I know that many search and content processing vendors operate without taking a big picture view. The focus is on what I call “what can we say to close a deal right now” type thinking. The write up roasts that business school chestnut of understanding life as it is, not as a marketer believes it to be.

I noted this statement in the source article:

Late 2015 also brought the arrival of “mutual fund markdowns.” Many Unicorns had taken private fundraising dollars from mutual funds. These mutual funds “mark-to-market” every day, and fund managers are compensated periodically on this performance. As a result, most firms have independent internal groups that periodically analyze valuations. With the public markets down, these groups began writing down Unicorn valuations. Once more, the fantasy began to come apart. The last round is not the permanent price, and being private does not mean you get a free pass on scrutiny.

Write downs, to me, mean one might lose one’s money.

I then learned a new term, dirty term sheets. Here’s the definition I highlighted in a bilious yellow marker hue:

“Dirty” or structured term sheets are proposed investments where the majority of the economic gains for the investor come not from the headline valuation, but rather through a series of dirty terms that are hidden deeper in the document. This allows the Shark to meet the valuation “ask” of the entrepreneur and VC board member, all the while knowing that they will make excellent returns, even at exits that are far below the cover valuation. Examples of dirty terms include guaranteed IPO returns, ratchets, PIK Dividends, series-based M&A vetoes, and superior preferences or liquidity rights. The typical Silicon Valley term sheet does not include such terms. The reason these terms can produce returns by themselves is that they set the stage for a rejiggering of the capitalization table at some point in the future. This is why the founder and their VC BOD member can still hold onto the illusion that everything is fine. The adjustment does not happen now, it will happen later.

I like rejiggering. I have experienced used car sales professionals rejiggering numbers for a person who once worked for me. Not a good experience as I recall.

I then circled this passage:

One of the shocking realities that is present in many of these “investment opportunities” is a relative absence of pertinent financial information. One would think that these opportunities which are often sold as “pre-IPO” rounds would have something close to the data you might see in an S-1. But often, the financial information is quite limited. And when it is included, it may be presented in a way that is inconsistent with GAAP standards. As an example, most Unicorn CEOs still have no idea that discounts, coupons, and subsidies are contra-revenue.

So what’s this have to do in my addled brain with Palantir? I had three thoughts, which are my opinion, and you may ignore them. In fact, why not stop reading now.

  1. Palantir is a unicorn and it may be experiencing increased pressure to generate a right now pay out to its stakeholders. One way Palantir can do this is to split its “secret” business from its Metropolitan business for banks. The “secret” business remains private, and the Metropolitan business becomes an IPO play. The idea is to get some money to keep those who pumped more than $700 million into the company since 2003 sort of happy.
  2. Palantir has to find a way to thwart those in its “secret” work from squeezing Palantir into a niche and then marginalizing the company. There are some outfits who would enjoy becoming the go-to solution for near real time operational intelligence analysis. Some outfits are big (Oracle and IBM), and others are much, much smaller (Digital Reasoning and Modus Operandi). If Palantir pulls off this play, then the government contract cash can be used to provide a sugar boost to those who want some fungible evidence of a big, big pay day.
  3. Palantir has to amp up its marketing, contain overhead, and expand its revenue from non government licenses and consulting.

Is Palantir’s management up to this task? The good news is that Palantir has not done the “let’s hire a Google wizard” to run the company. The bad news is that Palantir had an interesting run of management actions which resulted in a bit of a legal hassle with i2 Group before IBM bought it.

I will continue looking for information about Gotham’s security system and method. In the back of my mind will be the information and comments in On the Road to Recap.

Stephen E Arnold, April 25, 2016

Data Intake: Still a Hassle

April 21, 2016

I read “Big Data’s Biggest Problem: It’s Too Hard to Get the Data In.” Here’s a quote I noted:

According to a study by data integration specialist Xplenty, a third of business intelligence professionals spend 50% to 90% of their time cleaning up raw data and preparing to input it into the company’s data platforms. That probably has a lot to do with why only 28% of companies think they are generating strategic value from their data.

My hunch is that with the exciting hyperbole about Big Data, the problem of normalizing, cleaning, and importing data is ignored. The challenge of taking file A in a particular file format and converting to another file type is indeed a hassle. A number of companies offer expensive filters to perform this task. The one I remember is Outside In, which sort of worked. I recall that when odd ball characters appeared in the file, there would be some issues. (Does anyone remember XyWrite?) Stellent purchased Outside In in order to move content into that firm’s content management system. Oracle purchased Stellent in 2006. Then Kapow “popped” on the scene. The firm promoted lots of functionality, but I remember it as a vendor who offered software which could take a file in one format and convert it into another format. Kofax (yep, the scanner oriented outfit) bought Kofax to move content from one format into one that Kofax systems could process. Then Lexmark bought Kofax and ended up with Kapow. With that deal, Palantir and other users of the Kapow technology probably had a nervous moment or are now having a nervous moment as Lexmark marches toward a new owner. Entropy, a French outfit, was a file conversion outfit. It sold out to Salesforce. Once again, converting files from Type A to another desired format seems to have been the motivating factor.

Let us not forget the wonderful file conversion tools baked into software. I can save a Word file as an RTF file. I can import a comma separated file into Excel. I can even fire up Framemaker and save a Dot fm file as RTF. In fact, many programs offer these import and export options. The idea is to lessen the pain of have a file in one format which another system cannot handle. Hey, for fun, try opening a macro filled XyWrite file in Framemaker or Indesign. Just change the file extension to one the system thinks it recognizes. This is indeed entertaining.

The write up is not interested in the companies which have sold for big bucks because their technology could make file conversion a walk in the Hounz Lane Park. (Watch out for the rats, gentle reader.) The write up points out three developments which will make the file intake issues go away:

  1. The software performing file conversion “gets better.” Okay, I have been waiting for decades for this happy time to arrive. No joy at the moment.
  2. “Data preparers become the paralegals of data science.” Now that’s a special idea. I am not clear on what a “data preparer” is, but it sounds like a task that will be outsourced pretty quickly to some country far from the home of NASCAR.
  3. Artificial intelligence” will help cleanse data. Excuse me, but smart software has been operative in file conversion methods for quite a while. In my experience, the exception files keep on piling up.

What is the problem with file conversion? I don’t want to convert this free blog post into a lengthy explanation. I can highlight five issues which have plagued me and my work in file conversion for many years:

First, file types change over time. Some of the changes are not announced. Others like the Microsoft Word XML thing are the subject of months long marketing., The problem is that unless the outfit responsible for the file conversion system creates a fix, the exception files can overrun a system’s capacity to keep track of problems. If someone is asleep at the switch, data in the exception folder can have an adverse impact on some production systems. Loss of data is interesting but trashing the file structure is a carnival. Who does not pay attention? In my experience, vendors, licensees, third parties, and probably most of the people responsible for a routine file conversion task.

Second, the thrill of XML is that it is not particularly consistent. Somewhere along the line, creativity takes precedence over for well formed. How does one deal with a couple hundred thousand XML files in an exception folder? What do you think about deleting them?

Third, the file conversion software works as long as the person creating a document does not use Fancy Dan “inserts” in the source document. Problems arise from videos, certain links, macros, and odd ball formatting of the source document. Yep, some folks create text in Excel and wonder why the resulting text is a bit of a mess.

Fourth, workflows get screwed up. A file conversion system is semi smart. If a process creates a file with an unrecognized extension, the file conversion system fills the exception folder. But what if one valid extension is changed to a supported but incorrect extension. Yep, XML users be aware that there are proprietary XML formats. The files converted and made available to a system are “sort of right.” Unfortunately sort of right in mission critical applications can have some interesting consequences.

Fifth, attention to detail is often less popular than fiddling with one’s mobile phone or reading Facebook posts. Human inattention can make large scale data conversion fail. I have watched as a person of my acquaintance deleted the folder of exception files. Yo, it is time for lunch.

So what? Smart software makes certain assumptions. At this time, file intake is perceived as a problem which has been solved. My view is that file intake is a core function which needs a little bit more attention. I do not need to be told that smart software will make file intake pain go away.

Stephen E Arnold, April 21, 2016

Digging for a Direction of Alphabet Google

April 21, 2016

Is Google trying to emulate BAE System‘s NetReveal, IBM i2, and systems from Palantir? Looking back at an older article from Search Engine Watch, How the Semantic Web Changes Everything for Search may provide insight. Then, Knowledge Graph had launched, and along with it came a wave of communications generating buzz about a new era of search moving from string-based queries to a semantic approach, organizing by “things”. The write-up explains,

“The cornerstone of any march to a semantic future is the organization of data and in recent years Google has worked hard in the acquisition space to help ensure that they have both the structure and the data in place to begin creating “entities”. In buying Wavii, a natural language processing business, and Waze, a business with reams of data on local traffic and by plugging into the CIA World Factbook, Freebase and Wikipedia and other information sources, Google has begun delivering in-search info on people, places and things.”

This article mentioned Knowledge Graph’s implication for Google to deliver strengthened and more relevant advertising with this semantic approach. Even today, we see the Alphabet Google thing continuing to shift from search to other interesting information access functions in order to sell ads.

 

Megan Feil, April 21, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Interface Design: An Argument for the IBM i2 Approach

April 15, 2016

i read “Why I Love Ugly, Messy Interfaces — and You Probably Do Too.” I have been checking out information about interfaces for augmented intelligence or what I call “cyber OSINT.” The idea I am exploring is how different vendors present information functions to people who are working under pressure. Now the pressure in which I am interested involves law enforcement, intelligence, and staying alive. I am not too worried about how to check the weather on a mobile phone.

The write up points out that

…there is no single right way to do things. There’s no reason to assume that having a lot of links or text on a page, or a dense UI, or a sparse aesthetic is fundamentally bad — those might be fine choices for the problem at hand. Especially if it’s a big, hairy problem. Products that solve big, hairy problems are life savers. I love using these products because they work so damn well. Sure they’re kind of a sprawling mess. That’s exactly why they work!

Consider the IBM i2 Analyst’s Notebook interface. Here’s an example courtesy of Google Images:analyst notebook

The interface has a menu bar across the top, display panels, and sidebar options. In order to use this application which is called Analyst’s Notebook, one attends classes. Years ago I did a little work for i2 before it became part of IBM. Without regular use of the application, I would forget how to perform certain tasks.

There is a competitor to i2’s Analysts Notebook: Palantir Gotham. Again, courtesy of Google Images, here’s an example of the Palantir Gotham interface:

palantirThe interface includes options in the form of a a title bar with icons, a sidebar, and some right click features which display a circular context menu.

The principal difference between the two interfaces boils down to color.

There are some significant differences, and these include:

  • Palantir provides more helper and wizard functions. These allow a user to perform many tasks without sitting through five or more days of classroom and hands on instruction.
  • The colors and presentation are more stylish, not exactly a mobile phone app approach but slicker than the Analyst’s Notebook design
  • The interface automates more functions. Both applications require the user to perform some darned tedious work. But once that work is completed, Gotham allows software to perform some tasks with a mouse click.

My point is that interface choices and functionality have to work together. If the work flows are not assisted by the interface and smart software, simple or complex interfaces will be a barrier  to quick, high value work.

When someone is shooting at the person operating the laptop with either of these applications in use, the ability to complete a task without confusion is paramount. Confusing pretty with staying alive is not particularly helpful.

Stephen E Arnold, April 15, 2016

The Database Divide: SQL or NoSQL

April 13, 2016

I enjoy reading about technical issues which depend on use cases. When I read “Big Data And RDBMS: Can They Coexist?”, I thought about the premise, not the article. Information Week is one of those once, high flying dead tree outfits which have embraced digital. My hunch is that the juicy headline is designed less to speak to technical issues and more to the need to create some traffic.

In my case, it worked. I clicked. I read. I ignored because obviously specific methods exist because there are different problems to solve.

Here’s what I read after the lusted after click:

Peaceful coexistence is turning out to be the norm, as the two technologies prove to be complementary, not exclusive. As much as casual observers would like to see big data technologies win the future, RDBMS (the basis for SQL and database systems such as Microsoft SQL Server, IBM DB82, Oracle, and MySQL) is going to stick around for a bit longer.

So this is news? In an organization, some types of use cases are appropriate for the row and column approach. Think Excel. Others are better addressed with a whizzy system like Cassandra or a similar data management tool.

The write up reported that Codd based systems are pretty useful for transactions. Yep, that is accurate for most transactional applications. But there are some situations better suited to different approaches. My hunch is that is why Palantir Technologies developed its data management middleware AtlasDB, but let’s not get caught in a specific approach.

The write up points out that governance is a good idea. The context for governance is the SQL world, but my experience is that figuring out what to analyze and how to ensure “good enough” data quality is important for the NoSQL crowd as well.

I noted this statement from the wizard “Brown” who authored Data Mining for Dummies:

Users are not always clear [RDBMS and big data] are different products,” Brown said. “The sales reps are steering them to whatever product they want [the users] to buy.”

Yep, sales. Writing about data can educate, entertain, or market.

In this case, the notion that two technologies themselves content for attention does little to help one determine what method to use and when. Marketing triumphs.

Stephen E Arnold, April 13, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta