How to Use Watson
August 7, 2015
While there are many possibilities for cognitive computing, what makes an idea a reality is its feasibility and real life application. The Platform explores “The Real Trouble With Cognitive Computing” and the troubles IBM had (has) trying to figure out what they are going to do with the supercomputer they made. The article explains that before Watson became a Jeopardy celebrity, the IBM folks came up 8,000 potential experiments for Watson to do, but only 20 percent of them.
The range is small due to many factors, including bug testing, gauging progress with fuzzy outputs, playing around with algorithmic interactions, testing in isolation, and more. This leads to the “messy” way to develop the experiments. Ideally, developers would have a big knowledge model and be able to query it, but that option does not exist. The messy way involves keeping data sources intact, natural language processing, machine learning, and knowledge representation, and then distributed on an infrastructure.
Here is another key point that makes clear sense:
“The big issue with the Watson development cycle too is that teams are not just solving problems for one particular area. Rather, they have to create generalizable applications, which means what might be good for healthcare, for instance, might not be a good fit—and in fact even be damaging to—an area like financial services. The push and pull and tradeoff of the development cycle is therefore always hindered by this—and is the key barrier for companies any smaller than an IBM, Google, Microsoft, and other giants.”
This is exactly correct! Engineering is not the same as healthcare and it not all computer algorithms transfer over to different industries. One thing to keep in mind is that you can apply different methods from other industries and come up with new methods or solutions.
Whitney Grace, August 7, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
IT Architecture Needs to Be More Seamless
August 7, 2015
IT architecture might appear to be the same across the board, but depending on the industry the standards change. Rupert Brown wrote “From BCBS To TOGAF: The Need For A Semantically Rigorous Business Architecture” for Bob’s Guide and he discusses how TOGAF is the defacto standard for global enterprise architecture. He explains that while TOGAF does have its strengths, it supports many weaknesses are its reliance on diagrams and using PowerPoint to make them.
Brown spends a large portion of the article stressing that information content and model are more important and a diagramed should only be rendered later. He goes on that as industries have advanced the tools have become more complex and it is very important for there to be a more universal approach IT architecture.
What is Brown’s supposed solution? Semantics!
“The mechanism used to join the dots is Semantics: all the documents that are the key artifacts that capture how a business operates and evolves are nowadays stored by default in Microsoft or Open Office equivalents as XML and can have semantic linkages embedded within them. The result is that no business document can be considered an island any more – everything must have a reason to exist.”
The reason that TOGAF has not been standardized using semantics is the lack of something to connect various architecture models together. A standardized XBRL language for financial and regulatory reporting would help get the process started, but the biggest problem will be people who make a decent living using PowerPoint (so he claims).
Brown calls for a global reporting standard for all industries, but that is a pie in the sky hope unless the government imposes regulations or all industries have a meeting of the minds. Why? The different industries do not always mesh, think engineering firms vs. a publishing house, and each has their own list of needs and concerns. Why not focus on getting industry standards for one industry rather than across the board?
Whitney Grace, August 7, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Quality and Text Processing: An Old Couple Still at the Alter
August 6, 2015
I read “Why Quality Management Needs Text Analytics.” I learned:
To analyze customer quality complaints to find the most common complaints and steer the production or service process accordingly can be a very tedious job. It takes time and resources.
This idea is similar to the one expressed by Ronen Feldman in a presentation he gave in the early 2000s. My notes of the event record that he reviewed the application of ClearForest technology to reports from automobile service professionals which presented customer comments and data about repairs. ClearForest’s system was able to pinpoint that a particular mechanical issue was emerging. The client responded to the signals from the ClearForest system and took remediating action. The point was that sometime in the early 2000s, ClearForest had built and deployed a text analytics system with a quality-centric capability.
I mention this point because many companies are recycling ideas and concepts which are in some cases long beards. ClearForest was acquired by the estimable Thomson Reuters. Some of the technology is available as open source at Calais.
In search and content processing, the case examples, the lingo, and even the technology has entered what I call its “recycling” phase.
I learned about several new search systems this week. I looked at each. One was a portal, another a metasearch system, and a third a privacy centric system with a somewhat modest index. Each was presented as new, revolutionary, and innovative. The reality is that today’s information highways are manufactured from recycled plastic bottles.
Stephen E Arnold, August 6, 2015
Glossary Shrinks Data Science to 27 Concepts
August 6, 2015
I am not exactly sure how mathematics morphed into Big Data and then into data science. The evolutionary spark is powerful indeed. I came across a Glossary for Data Science. I am interested in term lists. For most fancy math I rely on one of my printed references or a Web site like Mathway.com. I was interested in the MapR Blog’s word list. The field of data science is boiled down to 27 terms (concepts). These are useful, but I will continue to use my copy of the Oxford Cocnise Dictionary of Mathematics. Brief lists of terms are not as useful as more comprehensive compilations. We are finalizing the Glossary for the forthcoming Dark Web Basics study. The term list has more than a couple dozen entries too.
Stephen E Arnold, August 6, 2015
Rocket AeroText Search: Stretching the Access Concept
August 6, 2015
I did a quick check on AeroText search. I assume that even the most jejune enterprise search expert is familiar with this system. What I noticed is that AeroText now moves beyond search into six separate functions. These reminded me of Fast Search & Transfer’s approach in the 2006-2007, pre-implosion period.
The six functions, which you can read about and request a demo of, are at this link. These are:
- Folio Views. The idea is that basic search and retrieval are provided by Rocket
- Folio Builder. The idea is that information can be organized into folders for research purposes
- Folio Publisher. A commercial publishing company can package its information and sell it in digital form.
- Folio Integrator. This is a a software development kit.
- NXT Enterprise Server. This is the enterprise centric content processing and search system.
- NXT Professional Publishing Server. This is a “suite for storing, assembling, securing, and distributing content” which includes search.
If you navigate have a copy of one the first three editions of the Enterprise Search Report I wrote between 2003 and 2006, you will be able to check out the similarities. I present some of the Fast Search nomenclature in this 2012 article.
I find the marketing and positioning of Autonomy and Fast Search interesting. These companies themes are as fresh today as they were years ago.
Stephen E Arnold, August 6, 2015
Microsoft Top Execs Reaffirm SharePoint Commitment
August 6, 2015
Doubts still remain among users as to whether or not Microsoft is fully committed to the on-premise version of SharePoint. While on-premise has been a big talking point for the SharePoint Server 2016 release, recent news points to more of a hybrid focus, and more excitement from executives regarding the cloud functions. Redmond Magazine sets the story straight with their article, “Microsoft’s Top Office Exec Affirms Commitment to SharePoint.”
The article sums up Microsoft’s stance:
“Microsoft realizes and has acknowledged that many enterprises will want to use SharePoint Server to keep certain data on premises. At the same time, it appears Microsoft is emphasizing the hybrid nature of SharePoint Server 2016, tying the new on-premises server with much of what’s available via Office 365 services.”
No one can know for sure exactly how to prepare for the upcoming SharePoint Server 2016 release, or even future versions of SharePoint. However, staying up to date on the latest news, and the latest tips and tricks, is helpful. For users and managers alike, a SharePoint feed managed by Stephen E. Arnold can be a great resource. The Web site, ArnoldIT.com, is a one-stop-shop for all things search, and the SharePoint feed is particularly helpful for users who need an easy way to stay up to date.
Emily Rae Aldridge, August 6, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Thunderstone Rumbles About Webinator
August 6, 2015
There is nothing more frustrating than being unable to locate a specific piece of information on a Web site when you use its search function. Search is supposed to be quick, accurate, and efficient. Even if Google search is employed as a Web site’s search feature, it does not always yield the best results. Thunderstone is a company that specializes in proprietary software application developed specifically for information management, search, retrieval, and filtering.
Thunderstone has a client list that includes, but not limited to, government agencies, Internet developer, corporations, and online service providers. The company’s goal is to deliver “product-oriented R&D within the area of advanced information management and retrieval,” which translates to them wanting to help their clients found information very, very fast and as accurately as possible. It is the premise of most information management companies. On the company blog it was announced that, “Thunderstone Releases Webinator Web Index And Retrieval System Version 13.” Webinator makes it easier to integrate high quality search into a Web site and it has several new appealing features:
- “Query Autocomplete, guides your users to the search they want
- HTML Highlighting, lets users see the results in the original HTML for better contextual information
- Expanded XML/SOAP API allows integration of administrative interface”
We like the HTML highlighting that offers users the ability to backtrack and see a page’s original information source. It is very similar to old-fashioned research: go back to the original source to check a fact’s veracity.
Whitney Grace, August 6, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Hey Google Doubters, Burn This into Your Memory
August 6, 2015
It has been speculated that Google would lose its ad profits as mobile search begins to dominate the search market but Quartz tells a different story in the article, “Mobile Isn’t Ruining Google’s Search Business After All.” Google’s revenue continues to grow, especially with YouTube, but search remains its main earner.
According to the second-quarter earnings, Google earned $12.4 billion in Google Web sites, a $1.5 billion increase from last year. Google continues to grow on average $1.6 billion per quarter. Being able to maintain a continuous growth proves that Google is weathering the mobile search market. Here is some other news, the mobile search revolution is now and not in the future.
“That is, if mobile really was going to squeeze Google’s search advertising business, we probably would have already seen it start by now. Smartphone penetration keeps deepening—with 75% saturation in the US market, according to comScore. And for many top media properties, half of the total audience only visits on mobile, according to a recent comScore report on mobile media consumption.”
There are new actions that could either impede or help Google search, such as deep linking between apps and the Web and predictive information services, but these are still brand new and their full effect has not been determined.
Google refuses to be left behind in the mobile search market and stands to be a main competitor for years to come.
Whitney Grace, August 6, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Social Media Litigation Is on the Rise
August 6, 2015
When you think about social media and litigation, it might seem it would only come up during a civil, domestic, criminal mischief, or even a thievery suit. Businesses, however, rely on social media outlets like Facebook, Twitter, and Instagram to advertise their services, connect with their clients, and increase their Web presence. It turns out that social media is also playing a bigger role not only for social cases, but for business ones as well. The X1 eDiscovery Law and Tech Blog posted about the “Gibson Dunn Report: Number of Cases Involving Social Media Evidence ‘Skyrocket’” and how social media litigation has increased in the first half of 2015.
The biggest issue the post discusses is the authenticity of the social media evidence. A person printing out a social media page or summarizing the content for court does not qualify as sufficient evidence. The big question right now is how to guarantee that social media passes an authenticity test and can withstand the court proceedings.
This is where eDiscovery software comes into play:
“These cases cited by Gibson Dunn illustrate why best practices software is needed to properly collect and preserve social media evidence. Ideally, a proponent of the evidence can rely on uncontroverted direct testimony from the creator of the web page in question. In many cases, such as in the Vayner case where incriminating social media evidence is at issue, that option is not available. In such situations, the testimony of the examiner who preserved the social media or other Internet evidence “in combination with circumstantial indicia of authenticity (such as the dates and web addresses), would support a finding” that the website documents are what the proponent asserts.”
The post then goes into a spiel about how the X1 Social Discovery software can make social media display all the “circumstantial indicia” or “additional confirming circumstances,” for solid evidence in court. What authenticates social media is the metadata and a MD5 checksum aka “hash value.” What really makes the information sink in is that Facebook apparently has every twenty unique metadata fields, which require eDiscovery software to determine authorship and the like. It is key to know that everything leaves a data trail on the Internet, but the average Google search is not going to dig it up.
Whitney Grace, August 6, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Google: Technical Debt Has Implications for Some AI Cheerleaders
August 5, 2015
If you are interested in smart software, you may want to read “Machine Learning: the High Interest Credit Card of Technical Debt.” I like the credit card analogy. It combines big costs with what some folks see as a something-for-nothing feature of the modern world.
The write up is important because it makes clear the future cost of using certain machine learning methods. The paper helps explain why search and content processing companies often burn more cash than available.
The paper identifies specific cost points which most MBAs happily ignore or downplay in post mortems of failed search and content processing companies. The whiz kids, both boys and girls, rationalize their failure to deal with shifting boundaries, “dark dependencies,” expensive spaghetti, and the tendency of smart software to sort of drift off center.
There is a fix. It is just darned expensive like credit card interest as the clueless consumer just covers the interest.
Applying the Google paper to search and content processing vendors, the only positive financial outcome is to sell the dog before it dies. Shift the search and content problem “credit card debt” to some other firm.
Perhaps that helps explain the Lexmark financial challenge and the dismay at Hewlett Packard as the reality of Autonomy dawned on those quick to spend billions.
Worth reading. Well done, Googlers.
Stephen E Arnold, August 5, 2015