Connexica (Formerly Ardentia NetSearch) Embraces Business Analytics
December 31, 2016
You may remember Ardentia NetSearch. The company’s original product was NetSearch, which was designed to be quick to deploy and designed for the end use, not the information technology department. The company changed its name to Connexica in 2001. I checked the company’s Web site and noted that the company positions itself this way:
Our mission is to turn smart data discovery into actionable information for everyone.
What’s interesting is that Connexica asserts that
“search engine technology is the simplest and fastest way for users to service their own information needs.”
The idea is that if one can use Google, one can use Connexica’s systems. A brief description of the company states:
Connexica is the world’s pioneer of search based analytics.
The company offers Cxair. This is a Java based Web application. The application provides search engine based data discovery. The idea is that Cxair permits “fast, effective and agile business analytics.” What struck me was the assertion that Cxair is usable with “poor quality data.” The idea is to create reports without having to know the formal query syntax of SQL.
The company’s MetaVision produce is a Java based Web application that “interrogates database metadata.” The idea, as I understand it, is to use MetaVision to help migrate data into Hadoop, Cxair, or ElasticSearch.
Connexica, partly funded by Midven, is a privately held company based in the UK. The firm has more than 200 customers and more than 30 employees. When updating my files, I noted that Zoominfo reports that the firm was founded in 2006, but that conflicts with my file data which pegs the company operating as early as 2001.
A quick review of the company’s information on its Web site and open sources suggests that the firm is focusing its sales and marketing efforts on health care, finance, and government customers.
Connexica is another search vendor which has performed a successful pivot. Search technology is secondary to the company’s other applications.
Stephen E Arnold, December 31, 2016
Study of Search: Weird Results Plus Bonus Errors
December 30, 2016
I was able to snag a copy of “Indexing and Search: A Peek into What Real Users Think.” The study appeared in October 2016, and it appears to be the work of IT Central Station, which is an outfit described as a source of “unbiased reviews from the tech community.” I thought, “Oh, oh, “real users.” A survey. An IDC type or Gartner type sample which although suspicious to me seems to convey some useful information when the moon is huge. Nope. Nope.Unbiased. Nope.
Note that the report is free. One can argue that free does not translate to accurate, high value, somewhat useful information. I support this argument.
The report, like many of the “real” reports I have reviewed over the decades is relatively harmless. In terms of today’s content payloads, the study fires blanks. Let’s take a look at some of the results, and you can work through the 16 pages to double check my critique.
First, who are the “top” vendors? This list reads quite a bit about the basic flaw in the “peek.” The table below presents the list of “top” vendors along with my comment about each vendor. Companies with open source Lucene/Solr based systems are in dark red. Companies or brands which have retired from the playing field in professional search are in bold gray.
Vendor | Comment |
Apache | This is not a search system. It is an open source umbrella for projects of which Lucene and Solr are two projects among many. |
Attivio | Based on Lucene/Solr open source search software; positioned as a business intelligence vendor |
Copernic | A desktop search and research system based on proprietary technology from the outfit known as Coveo |
Coveo | A vendor of proprietary search technology now chasing Big Data and customer support |
Dassault Systèmes | Owns Exalead which is now downgraded to a utility with Dassault’s PLM software |
Data Design, now Ryft.com | Pitches search without indexing via propriety “circuit module” method |
Data Gravity | Search is a utility in a storage centric system |
DieselPoint | Company has been “quiet” for a number of years |
Expert System | Publicly traded and revenue challenged vendor of a metadata utility, not a search system |
Fabasoft | Mindbreeze is a proprietary replacement for SharePoint search |
Discontinued the Google Search Appliance and exited enterprise search | |
Hewlett Packard Enterprise | Sold its search technology to Micro Focus; legal dispute in progress over alleged fraud |
IBM Ominifind | Lucene and proprietary scripts plus acquired technology |
IBM StoredIQ | Like DB2 search, a proprietary utility |
ISYS Search Software | Now owned by Lexmark and marginalized due to alleged revenue shortfalls |
Lookeen | Lucene based desktop and Outlook search |
Lucidworks | Solr add ons with floundering to be more than enterprise search |
MAANA | Proprietary search optimized for Big Data |
Microsoft | Offers multiple search solutions. The most notorious are Bing and Fast Search & Transfer proprietary solutions |
Oracle | Full text search is a utility for Oracle licenses; owns Artificial Linguistics, Triple Hop, Endeca, RightNow, InQuira, and the marginalized Secure Enterprise Search. Oh, don’t forget command line querying via PL/SQL |
Polyspot, now CustomerMatrix | Now a customer service vendor |
Siderean Software | Went out of business in 2008; a semantic search outfit |
Sinequa | Now a Big Data outfit with hopes of becoming the “next big thing” in whatever sells |
X1 Search | An eternal start up pitching eDiscovery and desktop search with a wild and crazy interface |
What’s the table tell us about “top” systems? First, the list includes vendors not directly in the search and retrieval business. There is no differentiation among the vendors repackaging and reselling open source Lucene/Solr solutions. The listing is a fruit cake of desktop, database, and unstructured search systems. In short, the word “top” does not do the trick for me. I prefer “a list of eclectic and mostly unknown systems which include a search function.”
The report presents 10 bar charts which tell me absolutely nothing about search and retrieval. The bars appear to be a popularity content based on visits to the author’s Web site. Only two of the search systems listed in the bar chart have “reviews.” Autonomy IDOL garnered three reviews and Lookeen one review. The other eight vendors’ products were not reviewed. Autonomy and Lookeen could not be more different in purpose, design, and features.
The report then tackles the “top five” search systems in terms of clicks on the author’s Web site. Yep, clicks. That’s a heck of a yardstick because what percentage of clicks were humans and what percentage was bot driven? No answer, of course.
The most popular “solutions” illustrate the weirdness of the sample. The number one solution is DataGravity, which is a data management system with various features and utilities. The next four “top” solutions are:
- Oracle Endeca – eCommerce and business intelligence and whatever Oracle can use the ageing system for
- The Google Search Appliance – discontinued with a cloud solution coming down the pike, sort of
- Lucene – open source, the engine behind Elasticsearch, which is quite remarkably not on the list of vendors
- Microsoft Fast Search – included in SharePoint to the delight of the integrators who charge to make the dog heel once in a while.
I find it fascinating that DataGravity (1,273) garnered almost 4X the “votes” as Microsoft Fast Search (404). I think there are more than 200 million plus SharePoint licensees. Many of these outfits have many questions about Fast Search. I would hazard a guess that DataGravity has a tiny fraction of the SharePoint installed base and its brand identity and company name recognition are a fraction of Microsoft’s. Weird data or meaningless.
The bulk of the report are comparison of various search engines. I could not figure out the logic of the comparisons. What, for example, do Lookeen and IBM StoredIQ have in common? Answer: Zero.
The search report strikes me as a bit of silliness. The report may be an anti sales document. But your mileage will differ. If it does, good luck to you.
Stephen E Arnold, December 30, 2016
Google May Erase Line Between History and Real Time
December 30, 2016
Do you remember where you were or what you searched the first time you used Google? This investors.com author does and shares the story about that, in addition to the story about what may be the last time he used Google. The article entitled Google Makes An ‘Historic’ Mistake reports on the demise of a search feature on mobile. Users may no longer search published dates in a custom range. It was accessed by clicking “Search tools” followed by “Any time”. The article provides Google’s explanation for the elimination of this feature,
On a product forum page where it made this announcement, Google says:
After much thought and consideration, Google has decided to retire the Search Custom Date Range Tool on mobile. Today we are starting to gradually unlaunch this feature for all users, as we believe we can create a better experience by focusing on more highly-utilized search features that work seamlessly across both mobile and desktop. Please note that this will still be available on desktop, and all other date restriction tools (e.g., “Past hour,” “Past 24 hours,” “Past week,” “Past month,” “Past year”) will remain on mobile.
The author critiques Google, saying this move force users back to the dying desktop for this feature no longer prioritized on mobile. The point appears to be missed in this critique. The feature was not heavily utilized. With the influx of real-time data, who needs history — who needs time limits? Certainly not a Google mobile search user.
Megan Feil, December 30, 2016
Machine Intelligence Logo Collection Thing
December 29, 2016
The idea informing “The Current State of Machine Intelligence” is a good one. Take a sector of interest and identify the companies involved in that sector. Then cluster the companies by categories. The result, however, is another unreadable logo collection. Here’s an image from the write up:
The source of the image is Bloomberg Beta via O’Reilly and finally to Harrod’s Creek from the I Am Wire Web site. My hunch is that one is supposed to chase down Bloomberg Beta or one of the other intermediaries and enter into some type of relationship to get a readable version of the logo collection.
The natural language grouping under “stack” looks interesting, but I can’t read the names of the vendors. I thought Wired Magazine’s early experiments with low contrast, overly busy design would lead to legible information. Obviously I was once again off the mark. I found the inclusion of open source libraries interesting. Perhaps linking the libraries to the consulting firms specializing in these libraries would be helpful. But maybe the Bloomberg Beta wizards already did that, and I can figure out from the mosaic of logos and hippy dippy colors if the information is displayed.
I noted that “search” does not appear in the enterprise intelligence or the enterprise functions categories. I assume that the Beta team knows that one can’t have customer service without search. So much effort into a diagram which is impenetrable.
Stephen E Arnold, December 29, 2016
Internet Watch Fund Teams with Blockchain Forensics Startup
December 29, 2016
A British charity is teaming up with an online intelligence startup specializing in Bitcoin. The Register reports on this in their piece called, Bitcoin child abuse image pervs will be hunted down by the IWF. The Internet Watch Foundation, with the help of a UK blockchain forensics start-up, Elliptic, aims to identify individuals who use Bitcoin to purchase child abuse images online. The IWF will provide Elliptic with a database of Bitcoin addresses and Elliptic takes care of the rest. We learned,
The IWF has identified more than 68,000 URLs containing child sexual abuse images. UNICEF Malaysia estimates two million children across the globe are affected by sexual exploitation every year. Susie Hargreaves, IWF CEO, said, “Over the past few years, we have seen an increasing amount of Bitcoin activity connected to purchasing child sexual abuse material online. Our new partnership with Elliptic is imperative to helping us tackle this criminal use of Bitcoin.” The collaboration means Elliptic’s clients will be able to automatically monitor transactions they handle for any connection to proceeds of child sex abuse.
Machine learning and data analytics technologies are used by Elliptic to collect actionable evidence for law enforcement and intelligence agencies. The interesting piece of this technology, and others like it, is that it runs perhaps as surreptitiously in the background as those who use the Dark Web and Bitcoin for criminal activity believe they do.
Megan Feil, December 29, 2016
Smarter Content for Contentier Intelligence
December 28, 2016
I spotted a tweet about making smart content smarter. It seems that if content is smarter, then intelligence becomes contentier. I loved my logic class in 1962.
Here’s the diagram from this tweet. Hey, if the link is wonky, just attend the conference and imbibe the intelligence directly, gentle reader.
The diagram carries the identifier Data Ninja, which echoes Palantir’s use of the word ninja for some of its Hobbits. Data Ninja’s diagram has three parts. I want to focus on the middle part:
What I found interesting is that instead of a single block labeled “content processing,” the content processing function is broken into several parts. These are:
A Data Ninja API
A Data Ninja “knowledgebase,” which I think is an iPhrase-type or TeraText type of method. Not familiar with iPhrase and TeraText, feel free to browse the descriptions at the links.
A third component in the top box is the statement “analyze unstructured text.” This may refer to indexing and such goodies as entity extraction.
The second box performs “text analysis.” Obviously this process is different from “the analyze unstructured text” step; otherwise, why run the same analyses again? The second box performs what may be clustering of content into specific domains. This is important because a “terminal” in transportation may be different from a “terminal” in a cloud hosting facility. Disambiguation is important because the terminal may be part of a diversified transportation company’s computing infrastructure. I assume Data Ninja’s methods handles this parsing of “concepts” without many errors.
Once the selection of a domain area has been performed, the system appears to perform four specific types of operations as the Data Ninja practice their katas. These are the smart components:
- Smart sentiment; that is, is the content object weighted “positive” or “negative”, “happy” or “sad”, or green light or red light, etc.
- Smart data; that is, I am not sure what this means
- Smart content; that is, maybe a misclassification because the end result should be smart content, but the diagram shows smart content as a subcomponent within the collection of procedures/assertions in the middle part of the diagram
- Smart learning; that is, the Data Ninja system is infused with artificial intelligence, smart software, or machine learning (perhaps the three buzzwords are combined in practice, not just in diagram labeling?)
- The end result is an iPhrase-type representation of data. (Note: that this approach infuses TeraText, MarkLogic, and other systems which transform unstructured data to metadata tagged structured information).
The diagram then shows a range of services “plugging” into the box performing the functions referenced in my description of the middle box.
If the system works as depicted, Data Ninjas may have the solution to the federation challenge which many organizations face. Smarter content should deliver contentier intelligence or something along that line.
Stephen E Arnold, November 28, 2016
Cybersecurity Technologies Fueled by Artificial Intelligence
December 28, 2016
With terms like virus being staples in the cybersecurity realm, it is no surprise the human immune system is the inspiration for the technology fueling one relatively new digital threat defense startup. In the Tech Republic article, Darktrace bolsters machine learning-based security tools to automatically attack threats, more details and context about Darktrace’s technology and positioning was revealed. Founded in 2013, Darktrace recently announced they raised $65 million to help fund their expansion globally. Four products, including their basic cyber threat defense solution called Darktrace, comprise their product suite. The article expands on their offerings:
Darktrace also offers its Darktrace Threat Visualizer, which provides analysts and CXOs with a high-level, global view of their enterprise. Darktrace Antigena complements the core Darktrace product by automatically defends against potential threats that have been detected, acting as digital “antibodies.” Finally, the Industrial Immune System is a version of Darktrace designed for Industrial Control Systems (ICS). The key value provided by Darktrace is the fact that it relies on unsupervised machine learning, and it is able to detect threats on its own without much human interaction.
We echo this article’s takeaway that machine learning and other artificial intelligence technologies continue to grow in the cybersecurity sector. The attention on AI is only building in this industry and others. Perhaps the lack of AI is particularly well-suited to cybersecurity as it’s behind-the-scenes nature that of Dark Web related crimes.
Megan Feil, December 28, 2016
Friendly Advice from the Three Amigos: Facebook, Google, and LinkedIn
December 27, 2016
Has anyone noticed how friendly Facebook, Google, and LinkedIn are becoming? Committee work, chats at conferences, write ups in “real” journalist-type articles. Remarkable.
I read “Google, LinkedIn, and Facebook Suggest a Focus on Mobile before Looking into AI.” The titled puzzled me. The idea, it seems, is that those who are NOT Facebook, Google, and LinkedIn (Microsoft) should not get too excited about artificial intelligence. I interpreted this to mean, “Let us go fast. You folks just do the mobile app thing.”
The write up gathers a number of alleged statements made before an audience in Australia. Here five statements and my observation about each:
The Three Amigo Statements | Beyond Search Observation |
A Googler said that getting distracted by “shiny” is not good. | What do Google acquisitions and the X projects explore? Right. Shiny stuff. That’s for Google, not for the non Google people. |
Focus on mobile. | Yep, Facebook and Google have a big chunk of mobile. Stay away from AI |
Do the basics. | Again. Stick with bread and butter. Don’t think like Facebook, Google, and LinkedIn |
Learn about customer behavior. | Too bad most outfits don’t have the data to analyze as the Three Amigos do |
Mobile is important | Yep, just do mobile, people. |
Very helpful. Not a shred of arrogance or condescending thoughts.
Stephen E Arnold, December 27, 2016
Now Watson Wants to Be a Judge
December 27, 2016
IBM has deployed Watson in many fields, including the culinary arts, sports, and medicine. The big data supercomputer can be used in any field or industry that creates a lot of data. Watson, in turn, will digest the data, and depending on the algorithms spit out results. Now IBM wants Watson to take on the daunting task of judging, says The Drum in “Can Watson Pick A Cannes Lion Winner? IBM’s Cognitive System Tries Its Arm At Judging Awards.”
According to the article, judging is a cognitive process and requires special algorithms, not the mention the bias of certain judges. In other words, it should be right up Watson’s alley (perhaps the results will be less subjective as well). The Drum decided to put Watson to the ultimate creative test and fed Watson thousands of previous Cannes films. Then Watson predicted who would win the Cannes Film Festival in the Outdoor category this year.
This could change the way contests are judged:
The Drum’s magazine editor Thomas O’Neill added: “This is an experiment that could massively disrupt the awards industry. We have the potential here of AI being able to identify an award winning ad from a loser before you’ve even bothered splashing out on the entry fee. We’re looking forward to seeing whether it proves as accurate in reality as it did in training.
I would really like to see this applied to the Academy Awards that are often criticized for their lack of diversity and consisting of older, white men. It would be great to see if Watson would yield different results that what the Academy actually selects.
Whitney Grace, December 27, 2016
HonkinNews: Second Google Legacy Video Now Available
December 27, 2016
The seven minute video — Google: The Calculating Predator Legacy — presents findings from Stephen E Arnold’s monograph about the Google system from 2004 to 2007. The company changed from a friendly Web search system into an enterprise focused on revenues and profit as a publicly traded company.
Topics covered in the video include the Google computing platform, key acquistions like Keyhold and Transformic, the two pivot points for Google’s cost and technology advantages, and the business strategy of the “new” Google, Version 2.0.
Look for Part 3: Google: The Digital Gutenberg on January 3, 2017.
Kenny Toth, December 27, 2016