September 23, 2016
Remember ShrinkyDinks. Kids decorate pieces of plastic. The plastic then gets smaller when heated. I believe the ShrinkyDink management process has been disclosed. The innovator? Marissa Mayer, the former Google search guru turned business management maven.
What’s the ShrinkyDink approach to running a business? Take a revenue stream, decorate it with slick talk, and then reduce revenues and reputation. The result is a nifty entity with less value. Bad news? No. The upside is that Vanity Fair puts a positive spin on how bad news just get worse. A purple paradox!
ShrinkyDink Management. Pop business thinking into a slightly warmed market and watch those products and revenues become tinier as you watch in real time. Small is beautiful, right? I can envision a new study from Harvard University’s business school on the topic. Then comes an HBR podcast interview with Marissa Mayer, the Xoogler behind the ShrinkyDink method. A collaboration with Clayton Christensen is on deck. A book. Maybe a movie deal with Oliver Stone? As a follow up to “Snowden,” Stone writes, produces, and directs “Marissa: Making Big Little.” The film stars Ms. Mayer herself as the true Yahoo.
I read “Yahoo Verizon Deal May Be Complicated by Historic Hack.” Yahoo was “hacked,” according to the write up. Okay, but I read “hack” as a synonym for “We did not have adequate security in place.”
The write up points out:
The biggest question is when Yahoo found out about the breach and how long it waited to disclose it publicly, said Keatron Evans, a partner at consulting firm Blink Digital Security. (Kara Swisher at Recode reported that Verizon isn’t happy about Yahoo’s disclosures about the hack.)
CNBC points out that fixing the “problem” will be expensive. The write up includes this statement from the Xoogler run Yahoo:
“Such events could result in large expenditures to investigate or remediate, to recover data, to repair or replace networks or information systems, including changes to security measures, to deploy additional personnel, to defend litigation or to protect against similar future events, and may cause damage to our reputation or loss of revenue,” Yahoo warned.
Of interest to me is the notion that information about 500 million users was lost. The date of the problem seems to be about two years ago. My thought is that information about the breach took a long time to be discovered and disclosed.
Along the timeline was the sale of Yahoo to Verizon. Verizon issued a statement about this little surprise:
Within the last two days, we were notified of Yahoo’s security incident. We understand that Yahoo is conducting an active investigation of this matter, but we otherwise have limited information and understanding of the impact. We will evaluate as the investigation continues through the lens of overall Verizon interests, including consumers, customers, shareholders and related communities. Until then, we are not in position to further comment.
I highlighted in bold the two points which snagged my attention:
First, Verizon went through its due diligence and did not discover that Yahoo’s security had managed to lose 500 million customers’ data. What’s this say about Yahoo’s ability to figure out what’s going on in its own system? What’s this say about Yahoo management’s attention to detail? What’s this say about Verizon’s due diligence processes?
Second, Verizon seems to suggest that if its “interests” are not served, the former Baby Bell may want to rethink its deal to buy Yahoo. That’s understandable, but it raises the question, “What was Verizon’s Plan B if Yahoo presented the company with a surprise?” It seems there was no contingency, which is complementary with its approach to due diligence.
The decision making process at Yahoo has been, for me, wonky for a long time. The decision to release the breach information after the deal process and before the Verizon deal closes strikes me as an interesting management decision.
September 12, 2016
Who knew? There have been suggestions that Alphabet Google manipulates search results. But the disclosure of a “clever plan to stop aspiring ISIS recruits” makes clear one thing: Alphabet Google can manipulate to some degree what a person thinks and how that person may then behave.
To get the details, navigate to Wired, the truth speaker for the technical aficionados. The article is “Google’s Clever Plan to Stop Aspiring ISIS Recruits.” Let’s visit some of the factoids in the article. I, of course, believe everything I read online.
Alphabet Google used to have an outfit called Google Ideas. Ideas, in my book, are a dime a dozen. The key is converting and idea to action and then shaping the idea to generate revenue. The Google Ideas group donned a new moniker, Jigsaw. According to the write up:
Jigsaw, the Google-owned tech incubator and think tank—until recently known as Google Ideas—has been working over the past year to develop a new program it hopes can use a combination of Google’s search advertising algorithms and YouTube’s video platform to target aspiring ISIS recruits and ultimately dissuade them from joining the group’s cult of apocalyptic violence. The program, which Jigsaw calls the Redirect Method and plans to launch in a new phase this month, places advertising alongside results for any keywords and phrases that Jigsaw has determined people attracted to ISIS commonly search for. Those ads link to Arabic- and English-language YouTube channels that pull together preexisting videos Jigsaw believes can effectively undo ISIS’s brainwashing—clips like testimonials from former extremists, imams denouncing ISIS’s corruption of Islam, and surreptitiously filmed clips inside the group’s dysfunctional caliphate in Northern Syria and Iraq.
This paragraph is mildly interesting and presents weaponized information in a matter of fact, what’s the big deal way. Consider these points:
- Search ad numerical recipes and videos. Quite a combination.
- Redirect. Send folks a different place from the place they really want to go.
- Undo brainwashing. Now that’s an interesting concept. Isn’t brainwashing a tough nut to crack. Cults, Jim Jones, etc.
- Shifting attention from a “dysfunctional caliphate” to something more acceptable. Okay for ISIS, but what if the GOOG substitutes other content to something else. Right, it will never happen. Mother Google is a really good person.
The article hits the high spots of censorship, including Twitter and the US Department of State’s Think Again, Turn Away, and everyone’s favorite cartoon Average Mohammed.
Click https://www.youtube.com/watch?v=7vJ-SlxjRrQ which may be offline after the Wired article hit the Internet.
August 12, 2016
I read “Google Isn’t Safe from Yahoo’s Fate.” The write up is a business school type analysis which reminded me of the inevitable decline of many businesses. Case studies pose MBAs to be to the thrills of success and the consequences of management missteps. I recall a book, published by a now lost and forgotten outfit, which talked about blind spots and management myopia. Humans have a tendency to make errors. That’s what makes life exciting. But I see a GooHoo trajectory.
I learned in this article:
Google is on the wrong side of major trends in the digital advertising industry: Google captures direct response dollars as digital ad spend shifts up the funnel, its focus is still on browsers and websites as engagement is moving into apps and feeds, Google is deeply dependent on search during a shift to serendipitous discovery and ads designed to interrupt the user’s attention are being replaced by advertising designed to engage them. Its competitor, Facebook, is on the right side of all these trends.
The Alphabet Google thing has not been able to hit home runs in social media in my opinion. The Google Facebook dust up exists, and it seems to me that Google is withdrawing from the field of social battle.
The write up informed me:
Google’s search advertising model is built on direct response in that it charges for search ads that people click on. In theory, this is an entirely transparent model: After all, advertisers only pay when the advertising works. What it conceals is that they are taking more credit (and charging more) for value that its ads didn’t deliver. By charging you for the click that follows a search, Google effectively takes credit for the entire funnel of purchase consideration that led you to type in the search and click on the link in the first place….But the ad itself didn’t create their purchase intent — it just takes credit for it. Google’s lower funnel ads are getting credit for upper-funnel effectiveness, in no small part because the latter is just too hard to measure.
August 4, 2016
A year ago I read “20+ Text Mining and Text Analysis Tools.” The sale of Recommind to OpenText and the lack of excitement about search gave me an idea. Where are the companies identified by a mid tier consulting firm today. Let’s take a quick look.
AlchemyAPI. The company now asserts that its powers the “AI economy.” The Web sites has been updated since I last looked. There is a demo and a “free API key.” The system is now a platform. Gartner found the company to be a “cool vendor” in 2014. The company offers a webinar called “Building with Watson.”
Angoss. The company allows a customer to “predict, act, perform.” The focus is now on “customer intelligence in a single analytics tool.” The firm offers “knowledge” products and an insight optimizer.
Attensity. The company has undergone some change. The www.attensity.com Web site 404s. Years ago a text analytics cheerleader professed to be a fan. I think portions of the company operate under a different name in Germany. Appears to be in quiet mode.
Basis Technology. The company provided language reacted tools to outfits like Fast Search & Transfer. Someone told me that Basis dabbled in enterprise search. One high profile executive jumped to a company in Madrid.
Brainspace. The company’s Web site tells me, “We build brains.” The company offers NLP technology. Gartner “recommends” Brainspace for “advanced text analytics for financial institutions.” That’s good. The company does not list too many financial institutions as customers on its home page, however.
Buzzlogix. This company’s focus appears to be squarely on social media. The idea is that the firm helps its customers “listen, learn, and act.” When I visited the Web site, the most recent “news” appeared in November 2015.
Clarabridge. The company focuses on understanding “customer needs, wants, and feelings.” The company provides the “world’s most comprehensive customer intelligence platform.”
Clustify. The company positions its text analytics tools for eDiscovery. The company’s most recent news release is dated January 2014 and addresses the Recommind championed predictive coding approach to figuring out what was what in text documents.
Connexor. The company offers “machinese” demonstrations of its capabilities. The most recent item on the company’s Web site is the April 2015 announcement of a free NLP Web service.
DatumBox. This company is a “machine learning framework” provider. It makes machine learning “simple.” The Web site offers a free API key, which knocks the local KFC manager out as a potential licensee. The company’s most recent blog post is dated March 16, 2016. The most recent release is 0.7.0.
Eaagle. This is a company focused on the “new frontier of effective customer relationship management, research, and marketing.” Customers include HermanMiller, Chubb, and Suncor Energy. Data sheets, white papers, and documentation are available and no registration is necessary. Eaagle maintains a low profile.
ExpertSystem. The company bought Temis, a firm based on some ideas in the mind of a former IBM wizard. ExpertSystem, a publicly traded company, is pursuing the pharmaceutical industry and performing independent text analyses of Melania Trump’s and Michelle Obama’s speeches. The two ladies exhibit strong linguistic differences. The company’s stock is trading at $1.81 a share, a bit below Alphabet Google, an outfit also in the text analytics game.
FICO (Fair Isaac Corporation). The company gives “you the power to make smarter decisions.” The company has tallied a number of acquisitions since 1992. Its most recent purchase was Quadmetrics, a predictive analytics company. FICO is publicly traded and the stock is trading at $115.60 a share.
Cognitum. The company asserts that one can “improve your business with the innovation leader in semantic technology.” The company’s main product is Fluent Editor and it offers flagship platform called Ontorion. The firm’s spelling of “scallable” on its home page caught my attention.
IBM. The focus was not on Watson in the listing. Instead, the write up identified IBM Content Analytics as the product to watch. IBM’s LanguageWare uses a range of techniques to process content. IBM is very much in the content processing game with Watson becoming the umbrella “brand.” IBM just tallied is 16th straight quarter of declining revenue.
Intellexer offers text analytics, information security, media content search, and reputation management. The company’s most recent news release, dated May 13, 2016, announces the new version of Conceptmeister “which analyzes text from a photo, cloud documents, and URL.” Essentially this software creates a summary of the source content.
KBSPortal. This company offers natural language processing as a software as a service or NLP as SAAS. A demonstration of the system processes Wikipedia content. A demo video is available. To view it, I was asked to sign in. I declined. The company provides its prices and explains what each component does. Kudos for that approach.
Keatext. The company focuses on “customer experience management.” The company offers a two week free trial of its system. The system incorporates natural language processing. The company’s explanation of what it does requires a bit of digging.
Lexalytics. Lexalytics is in the sentiment analysis business. The company’s capabilities include categorization and entity extraction. Social media monitoring can be displayed on dashboards. The company posts its prices. When I was involved in a procurement, Lexalytics prices, based on my recollection, were significantly higher than the fees quoted on this page. At one time, Lexalytics engaged in a merger or deal with Infonics. The company acquired Semantria a couple of years ago.
Leximancer. This Australian company’s software turns up in interesting places; for example, the US social security administration in Beltsville, Maryland. The firm’s “text in, insight out” technology emerged from research at the University of Queensland. The company was founded by UniQuest, a techohlogy commercialization company operated by the University of Queensland. The system is quite useful.
Linguamatics. This company has built a following in the pharmaceutical sector. The system does a good job processing academic and research information in ways which can influence certain lines of inquiry. The company now says that it offers the “world’s leading text mining platform.” the company was founded in 2001, and it has been moving along at a steady pace. Quite useful software and capabilities.
Linguasys. Surprised to see an installation profile. The outfit is maintaining a low profile.
Luminoso. The company provides “enterprise feedback and experience analytics.” The company has teamed with another Boston-area outfit, Basis Technologies, to form a marketing partnership. The angle the company seems to be promoting is that if you are using other systems, you can enhance them with text analytics.
MeaningCloud. Meaning cloud asserts that with its system one can “extract valuable information from any text source.” The company’s Text Classification API supports the Interactive Advertising Bureau’s “standard contextual taxonomy.” The focus seems to be on sentiment analysis like Lexalytics.
July 12, 2016
I participated in a telephone call before the US holiday break. The subject was the likelihood of a potential investment in an enterprise search technology would be a winner. I listened for most of the 60 minute call. I offered a brief example of the over promise and under deliver problems which plagued Convera and Fast Search & Transfer and several of the people on the call asked, “What’s a Convera?” I knew that today’s whiz kids are essentially reinventing the wheel.
I wanted to capture three ideas which I jotted down during that call. My thought is that at some future time, a person wanting to understand the incredible failures that enterprise search vendors have tallied will have three observations to consider.
Enterprise Search: Does a Couple of Things Well When Users Expect Much More
Enterprise search systems ship with filters or widgets which convert source text into a format that the content processing module can index. The problem is that images, videos, audio files, content from wonky legacy systems, or proprietary file formats like IBM i2’s ANB files do not lend themselves to indexing by a standard enterprise search system. The buyers or licensees of the enterprise search system do not understand this one trick pony nature of text retrieval. Therefore, when the system is deployed, consternation follows confusion when content is not “in” the enterprise search system and, therefore, cannot be found. There are systems which can deal with a wide range of content, but these systems are marketed in a different way, often cost millions of dollars a year to set up, maintain, and operate.
Net net: Vendors do not explain the limitations of text search. Licensees do not take the time or have the desire to understand what an enterprise search system can actually do. Marketers obfuscate in order to close the deal. Failure is a natural consequence.
Data Management Needed
The disconnect boils down to what digital information the licensee wants to search. Once the universe is defined, the system into which the data will be placed must be resolved. No data management, no enterprise search. The reason is that licensees and the users of an enterprise search system assume that “all” or “everything” – maps to web content, email to outputs from an AS/400 Ironside are available any time. Baloney. Few organizations have the expertise or the appetite to deal with figuring out what is where, how much, how frequently each type of data changes, and the formats used. I can hear you saying, “Hey, we know what we have and what we need. We don’t need a stupid, time consuming, expensive inventory.” There you go. Failure is a distinct possibility.
Net net: Hope springs eternal. When problems arise, few know what’s where, who’s on first, and why I don’t know is on third.
June 28, 2016
I scanned a number of write ups about Google’s embrace of machine learning and smart software. I supplement my Google queries with the results of other systems. Some of these have their own index; for example, Yandex.ru and Exalead. Others are metasearch engines will suck in results and do some post processing to help answer the users’ questions. Others are disappointing and I check them out when I have a client who is willing to pay for stone flipping; for example, DuckDuckGo, iSeek, or the estimable Qwant. (I love quirky spelling too.)
I read “RankBrain Third Most Important Factor Determining Google Search Results.” Here’s the quote I noted:
Google is characteristically fuzzy on exactly how it improves search (something to do with the long tail? Better interpretation of ambiguous requests?) but Jeff Dean [former AltaVista wizard] says that RankBrain is “involved in every query,” and affects the actual rankings “probably not in every query but in a lot of queries.” What’s more, it’s hugely effective. Of the hundreds of “signals” Google search uses when it calculates its rankings (a signal might be the user’s geographical location, or whether the headline on a page matches the text in the query), RankBrain is now rated as the third most useful. “It was significant to the company that we were successful in making search better with machine learning,” says John Giannandrea. “That caused a lot of people to pay attention.”Pedro Domingos, the University of Washington professor who wrote The Master Algorithm, puts it a different way: “There was always this battle between the retrievers and the machine learning people,” he says. “The machine learners have finally won the battle.”
I have noticed in the last year, that I am unable to locate certain documents when I use the words and phrases which had served me well before smart software became the cat’s pajamas.
One recent example was my need to locate a case example about a German policeman’s trials and tribulations with the Dark Web. When I first located this document, I was trying to verify an anecdote shared with me after one of my intelligence community lectures.
I had the document in my file and I pulled it up on my monitor. The document in question is the work of an outfit and person labeled “Lars Hilse.” The title of the write up is “Dark Web & Bitcoin: Global Terrorism “Threat Assessment. The document was published in April 2013 with an update issued in November 2013. (That document was the source or maybe confirmed the anecdote about the German policeman and his Dark Web research.)
For my amusement, I wondered if I could use the new and improved Google Web search to locate the document. I display section 4.8 on my screen. The heading of the section is “Extortion (of Law Enforcement Personnel).
I entered the phrase into Google without quotes. Here’s the first page of results:
None of the hits points to the document with the five word phrase.
June 22, 2016
I was a wee lad when I read Don Quixote. I know that students in Spain and some other countries study the text of the 17th century novel closely. I did not. I remember flipping through a Classics’ comic book, reading the chapter summaries in Cliff’s Notes, and looking at the pictures in the edition in my high school’s library. Close enough for horse shoes. (I got an A on the test. Heh heh heh.)
Here’s what I recall the Don and his sidekick. A cultured fellow read a lot of fantasy fiction, mixed it up the real world, and went off on adventures or sallies. The protagonist (see I remember something from Ms. Sperling’s literature class in 1960) rode a horse and charged into the countryside to kill windmills. I remember there were lots and lots of adventures, not too much sex – drugs – rock and roll, and many convoluted literary tropes.
I still like the windmills. A Google search showed me an image which is very similar to the one in the comic book I used as my definitive version of the great novel. Here it is:
What does a guy riding a horse with a lance toward a windmill have to do with search and content processing? Well, I read “Palantir Lambastes Army Over $206 Million Contract Bidding.” I assume the information in the write up is spot on.
Palantir Technologies, a unicorn which is the current fixation of a Buzzfeed journalist, is going to sue the US Army over a “to be” contract for work. The issue is an all source information system procurement known as DCGS or sometimes DI2E. The acronyms are irrelevant. What is important is that the US Army has been plugging away with a cadre of established government contractors for a decade. Depending on whom one asks, DCGS is the greatest thing since sliced bread or it is a flop.
However, Palantir believes that its augmented intelligence system is a better DCGS / DI2E. than the actual DCGS / DI2E.
The US Army may not agree and appears be on the path to awarding the contract for DCGS work to other vendors.
According to the write up:
Palantir claims the Army’s solicitation is “unlawful, irrational, arbitrary and capricious,” according to the letter of intent Palantir sent to the U.S. Army and the Department of Justice, which was obtained by Bloomberg. The letter is a legal courtesy, which states Palantir will file a formal protest in the U.S. Court of Federal Claims next week and requests the Army delay awarding the first phase of the contract until litigation is resolved. The contract is slated to be awarded by the end of 2016.
The contract is worth a couple of hundred million, but the follow on work is likely to hit nine figures. Palantir has some investors who want more growth. The best way to get it, if the write up is accurate, is on the backs of legal eagles.
I don’t know anything about the US Army and next to nothing about Palantir, but I have some experience watching vendors protest the US government’s procurement process. My thought is that when bidders sue the government:
- Costs go up. Lawyers are very busy, often for a year or more. In lawyer land, billing is really good.
- Delays occur. The government unit snagged in the contracting hassle have to juggle more balls; for example, tasks have to be completed. When the vendors are not able to begin work, delays occur. This may not be a problem in lawyer land, but in the real world, downstream dependencies can be a hitch in the git along.
- Old scores may be hummed. Palantir settled a legal dust up with IBM which owns i2 Analysts Notebook. The Analysts Notebook is the very same software system whose file structure Palantir wanted to understand. i2 was not too keen on making its details available. (Note: I was a consultant to i2 for a number of years, and this was input number one to me from one of the founders). IBM has a pretty good institutional memory without consulting Watson.)
And Don Quixote? I wonder if the Palantirians, some of whom fancy themselves Hobbits, are going to be able to shape the real world to their vision. The trajectory of this legal dust up will be interesting to watch as it flames across the sky toward Spain and Don Quixote’s fictional library. Flame out or direct hit? The US Army and US government procurement policies are able to absorb charging horses and possibly a lance poke or two.
Stephen E Arnold, June 22, 2016
June 1, 2016
A few days ago, I stumbled upon a copy of a letter from the GAO concerning Palantir Technologies dated May 18, 2016. The letter became available to me a few days after the 18th, and the US holiday probably limited circulation of the document. The letter is from the US Government Accountability Office and signed by Susan A. Poling, general counsel. There are eight recipients, some from Palantir, some from the US Army, and two in the GAO.
Has the US Army put Palantir in an untenable spot? Is there a deus ex machina about to resolve the apparent checkmate?
The letter tells Palantir Technologies that its protest of the DCGS Increment 2 award to another contractor is denied. I don’t want to revisit the history or the details as I understand them of the DCGS project. (DCGS, pronounced “dsigs”, is a US government information fusion project associated with the US Army but seemingly applicable to other Department of Defense entities like the Air Force and the Navy.)
The passage in the letter I found interesting was:
While the market research revealed that commercial items were available to meet some of the DCGS-A2 requirements, the agency concluded that there was no commercial solution that could meet all the requirements of DCGS-A2. As the agency explained in its report, the DCGS-A2 contractor will need to do a great deal of development and integration work, which will include importing capabilities from DCGS-A1 and designing mature interfaces for them. Because the agency concluded that significant portions of the anticipated DCSG-A2 scope of work were not available as a commercial product, the agency determined that the DCGS-A2 development effort could not be procured as a commercial product under FAR part 12 procedures. The protester has failed to show that the agency’s determination in this regard was unreasonable.
The “importing” point is a big deal. I find it difficult to imagine that IBM i2 engineers will be eager to permit the Palantir Gotham system to work like one happy family. The importation and manipulation of i2 data in a third party system is more difficult than opening an RTF file in Word in my experience. My recollection is that the unfortunate i2-Palantir legal matter was, in part, related to figuring out how to deal with ANB files. (ANB is i2 shorthand for Analysts Notebook’s file format, a somewhat complex and closely-held construct.)
Net net: Palantir Technologies will not be the dog wagging the tail of IBM i2 and a number of other major US government integrators. The good news is that there will be quite a bit of work available for firms able to support the prime contractors and the vendors eligible and selected to provide for-fee products and services.
Was this a shoot-from-the-hip decision to deny Palantir’s objection to the award? No. I believe the FAR procurement guidelines and the content of the statement of work provided the framework for the decision. However, context is important as are past experiences and perceptions of vendors in the running for substantive US government programs.
March 22, 2016
Nikola Danaylov of the Singularity Weblog interviewed technology and financial analyst Stephen E. Arnold on the latest episode of his podcast, Singularity 1 on 1. The interview, Stephen E. Arnold on Search Engines and Intelligence Gathering, offers thought-provoking ideas on important topics related to sectors — such as intelligence, enterprise search, and financial — which use indexing and content processing methods Arnold has worked with for over 50 years.
Arnold attributes the origins of his interest in technology to a programming challenge he sought and accepted from a computer science professor, outside of the realm of his college major of English. His focus on creating actionable software and his affinity for problem-solving of any nature led him to leave PhD work for a job with Halliburton Nuclear. His career includes employment at Booz, Allen & Hamilton, the Courier Journal & Louisville Times, and Ziff Communications, before starting ArnoldIT.com strategic information services in 1991. He co-founded and sold a search system to Lycos, Inc., worked with numerous organizations including several intelligence and enforcement organizations such as US Senate Police and General Services Administration, and authored seven books and monographs on search related topics.
With a continued emphasis on search technologies, Arnold began his blog, Beyond Search, in 2008 aiming to provide an independent source of “information about what I think are problems or misstatements related to online search and content processing.” Speaking to the relevance of the blog to his current interest in the intelligence sector of search, he asserts:
“Finding information is the core of the intelligence process. It’s absolutely essential to understand answering questions on point and so someone can do the job and that’s been the theme of Beyond Search.”
As Danaylov notes, the concept of search encompasses several areas where information discovery is key for one audience or another, whether counter-terrorism, commercial, or other purposes. Arnold agrees,
“It’s exactly the same as what the professor wanted to do in 1962. He had a collection of Latin sermons. The only way to find anything was to look at sermons on microfilm. Whether it is cell phone intercepts, geospatial data, processing YouTube videos uploaded from a specific IP address– exactly the same problem and process. The difficulty that exists is that today we need to process data in a range of file types and at much higher speeds than ever anticipated, but the processes remain the same.”
Arnold explains the iterative nature of his work:
“The proof of the value of the legacy is I don’t really do anything new, I just keep following these themes. The Dark Web Notebook is very logical. This is a new content domain. And if you’re an intelligence or information professional, you want to know, how do you make headway in that space.”
Describing his most recent book, Dark Web Notebook, Arnold calls it “a cookbook for an investigator to access information on the Dark Web.” This monograph includes profiles of little-known firms which perform high-value Dark Web indexing and follows a book he authored in 2015 called CYBEROSINT: Next Generation Information Access.
March 1, 2016
Years ago an outfit in Europe wanted me to look at claims made by search and content processing vendors about real time functions.
The goslings and I rounded up the systems, pumped our test corpus through, and tried to figure out what was real time.
The general buzzy Teddy Bear notion of real time is that when new data are available to the system, the system processes the data and makes them available to other software processes and users.
The Teddy Bear view is:
- Zero latency
- Works reliably
- No big deal for modern infrastructure
- No engineering required
- Any user connected to the system has immediate access to reports including the new or changed data.
Well, guess what, Pilgrim?
We learned quickly that real time, like love and truth, is a darned slippery concept. Here’s one view of what we learned:
Types of Real Time Operations. © Stephen E Arnold, 2009
The main point of the chart is that there are six types of real time search and content processing. When someone says, “Real time,” there are a number of questions to ask. The major finding of the study was that for near real time processing for a financial trading outfit, the cost soars into seven figures and may keep on rising as the volume of data to be processed goes up. The other big finding was that every real time system introduces latency. Seconds, minutes, hours, days, and weeks may pass before the update actually becomes available to other subsystems or to users. If you think you are looking at real time info, you may want to shoot us an email. We can help you figure out which type of “real time” your real time system is delivering. Write benkent2020 @ yahoo dot com and put Real Time in the subject line, gentle reader.
I thought about this research project when I read “Why the Search Console Reporting Is not real time: Explains Google!” As you work through the write up, you will see that the latency in the system is essentially part of the woodwork. The data one accesses is stale. Figuring out how stale is a fairly big job. The Alphabet Google thing is dealing with budgets, infrastructure costs, and a new chief financial officer.
Real time. Not now and not unless something magic happens to eliminate latencies, marketing baloney, and user misunderstanding of real time.
Excitement in non real time.
Stephen E Arnold, March 1, 2016