Twitch Incorporates ClipMine Discovery Tools
September 18, 2017
Gameplay-streaming site Twitch has adapted the platform of their acquisition ClipMine, originally developed for adding annotations to online videos, into a metadata-generator for its users. (Twitch is owned by Amazon.) TechCrunch reports the development in, “Twitch Acquired Video Indexing Platform ClipMine to Power New Discovery Features.” Writer Sarah Perez tells us:
The startup’s technology is now being put to use to translate visual information in videos – like objects, text, logos and scenes – into metadata that can help people more easily find the streams they want to watch. Launched back in 2015, ClipMine had originally introduced a platform designed for crowdsourced tagging and annotations. The idea then was to offer a technology that could sit over top videos on the web – like those on YouTube, Vimeo or DailyMotion – that allowed users to add their own annotations. This, in turn, would help other viewers find the part of the video they wanted to watch, while also helping video publishers learn more about which sections were getting clicked on the most.
Based in Palo Alto, ClipMine went on to make indexing tools for the e-sports field and to incorporate computer vision and machine learning into their work. Their platform’s ability to identify content within videos caught Twitch’s eye; Perez explains:
Traditionally, online video content is indexed much like the web – using metadata like titles, tags, descriptions, and captions. But Twitch’s streams are live, and don’t have as much metadata to index. That’s where a technology like ClipMine can help. Streamers don’t have to do anything differently than usual to have their videos indexed, instead, ClipMine will analyze and categorize the content in real-time.
ClipMine’s technology has already been incorporated into stream-discovery tools for two games from Blizzard Entertainment, “Overwatch” and “Hearthstone;” see the article for more specifics on how and why. Through its blog, Twitch indicates that more innovations are on the way.
Cynthia Murrell, September 18, 2017
A New and Improved Content Delivery System
September 7, 2017
Personalized content and delivery is the name of the game in PRWEB’s, “Flatirons Solutions Launches XML DITA Dynamic Content Delivery Solutions.” Flatirons Solutions is a leading XML-based publishing and content management company and they recently released their Dynamic Content Delivery Solution. The Dynamic Content Delivery Solution uses XML-based technology will allow enterprises to receive more personalized content. It is advertised that it will reduce publishing and support costs. The new solution is built with the Mark Logic Server.
By partnering with Mark Logic and incorporating their industry-leading XML content server, the solution conducts powerful queries, indexing, and personalization against large collections of DITA topics. For our clients, this provides immediate access to relevant information, while producing cost savings in technical support, and in content production, maintenance, review and publishing. So whether they are producing sales, marketing, technical, training or help documentation, clients can step up to a new level of content delivery while simultaneously improving their bottom line.
The Dynamic Content Delivery Solution is designed for government agencies and enterprises that publish XML content to various platforms and formats. Mark Logic is touted as a powerful tool to pool content from different sources, repurpose it, and deliver it to different channels.
MarkLogic finds success in its core use case: slicing and dicing for publishing. It is back to the basics for them.
Whitney Grace, September 7, 2017
—
Factoids about Toutiao: Smart News Filtering Service
August 28, 2017
The filtering service Toutiao is operated by Bytedance. The company attracted attention because it is generating money (allegedly) and has lots of users or “daily average users” in the 120 million range. (If you are acronym minded, the daily average user count is a DAU. Holy Dau!)
Forget Google’s “translate this page” for Toutiao, the service is blind to the Toutiao content. A work around is to cut and paste snippets into FreeTranslations.org or get someone who reads Chinese to explain what’s on the Toutiao’s pages.
Other items of interest include. (Oh, the hyperlinks point to the source of the factoid.)
- $900 million in revenue (allegedly). Wall Street Journal, August 28, 2017 with a pay wall for your delectation
- Funding of $3 billion Crunchbase
- Valuation of $20 billion or more Reuters
- Toutiao means headlines Wikipedia
- What it does from Wikipedia:
Toutiao uses algorithms to select different quality content for individual users. It has created algorithmic models that understand information (text, images, videos, comments, etc.) in depth, and developed large-scale machine learning systems for personalized recommendation that surfaces content users have not necessarily signaled preference for yet. Using Natural Language Processing and Computer Vision technologies in A.I, Toutiao extracts hundreds of entities and keywords as features from each piece of content. When a user first open the app, Toutiao makes a preliminary recommendation based on the operation system of his mobile device, his location and other factors. With users’ interactions with the app, Toutiao fine-tunes its models and make better recommendations.
- Founded by Zhang Yiming, age 34, in 2012 Reuters
Technode’s “Why Is Toutiao, a News App, Setting Off Alarm Bells for China’s Giants?” suggests that Toutiao may be the next big Chinese online success. The reason is that the service aggregates “news” from disparate content sources; for example, text, video, images, and data.
Toutiao may be the next big thing in algorithmic, mobile centric information access solutions. The company generates revenues from online ads. The company’s secret sauce include smart software plus some extra ingredients:
- Social functions
- Search
- Video
- User generated “original” content
- Global plans.
Net net: Worth watching.
Stephen E Arnold, August 28, 2017
Smartlogic: A Buzzword Blizzard
August 2, 2017
I read “Semantic Enhancement Server.” Interesting stuff. The technology struck me as a cross between indexing, good old enterprise search, and assorted technologies. Individuals who are shopping for an automatic indexing systems (either with expensive, time consuming hand coded rules or a more Autonomy-like automatic approach) will want to kick the tires of the Smartlogic system. In addition to the echoes of the SchemaLogic approach, I noted a Thomson submachine gun firing buzzwords; for example:
best bets (I’m feeling lucky?)
dynamic summaries (like Island Software’s approach in the 1990s)
faceted search (hello, Endeca?)
model
navigator (like the Siderean “navigator”?)
real time
related topics (clustering like Vivisimo’s)
semantic (of course)
taxonomy
topic maps
topic pages (a Google report as described in US29970198481)
topic path browser (aka breadcrumbs?)
visualization
What struck me after I compiled this list about a system that “drives exceptional user search experiences” was that Smartlogic is repeating the marketing approach of traditional vendors of enterprise search. The marketing lingo and “one size fits all” triggered thoughts of Convera, Delphes, Entopia, Fast Search & Transfer, and Siderean Software, among others.
I asked myself:
Is it possible for one company’s software to perform such a remarkable array of functions in a way that is easy to implement, affordable, and scalable? There are industrial strength systems which perform many of these functions. Examples range from BAE’s intelligence system to the Palantir Gotham platform.
My hypothesis is that Smartlogic might struggle to process a real time flow of WhatsApp messages, YouTube content, and mobile phone intercept voice calls. Toss in the multi language content which is becoming increasingly important to enterprises, and the notional balloon I am floating says, “Generating buzzwords and associated over inflated expectations is really easy. Delivering high accuracy, affordable, and scalable content processing is a bit more difficult.”
Perhaps Smartlogic has cracked the content processing equivalent of the Voynich manuscript.
Will buzzwords crack the Voynich manuscript’s inscrutable text? What if Voynich is a fake? How will modern content processing systems deal with this type of content? Running some content processing tests might provide some insight into systems which possess Watson-esque capabilities.
What happened to those vendors like Convera, Delphes, Entopia, Fast Search & Transfer, and Siderean Software, among others? (Free profiles of these companies are available at www.xenky.com/vendor-profiles.) Oh, that’s right. The reality of the marketplace did not match the companies’ assertions about technology. Investors and licensees of some of these systems were able to survive the buzzword blizzard. Some became the digital equivalent of Ötzi, 5,300 year old iceman.
Stephen E Arnold, August 2, 2017
Academic Publisher Retracts Record Number of Papers
June 20, 2017
To the scourge of fake news we add the problem of fake research. Retraction Watch announces “A New Record: Major Publisher Retracting More Than 100 Studies from Cancer Journal over Fake Peer Reviews.” We learn that Springer Publishing Company has just retracted 107 papers from a single journal after discovering their peer reviews had been falsified. Faking the integrity of cancer research? That’s pretty low. The article specifies:
To submit a fake review, someone (often the author of a paper) either makes up an outside expert to review the paper, or suggests a real researcher — and in both cases, provides a fake email address that comes back to someone who will invariably give the paper a glowing review. In this case, Springer, the publisher of Tumor Biology through 2016, told us that an investigation produced “clear evidence” the reviews were submitted under the names of real researchers with faked emails. Some of the authors may have used a third-party editing service, which may have supplied the reviews. The journal is now published by SAGE. The retractions follow another sweep by the publisher last year, when Tumor Biology retracted 25 papers for compromised review and other issues, mostly authored by researchers based in Iran.
The article shares Springer’s response to the matter, some from their official statement and some from a spokesperson. For example, we learn the company cut ties with the “Tumor Biology” owners, and that the latest fake reviews were caught during a process put in place after that debacle. See the story for more details.
Cynthia Murrell, June 20, 2017
Algorithms Are Getting Smarter at Identifying Human Behavior
June 19, 2017
Algorithm deployed by large tech firms are better at understanding human behaviors, reveals former Google data scientist.
In an article published by Business Insider titled A Former Google Data Scientist Explains Why Netflix Knows You Better Than You Know Yourself, Seth Stephens-Davidowitz says:
Many gyms have learned to harness the power of people’s over-optimism. Specifically, he said, “they’ve figured out you can get people to buy monthly passes or annual passes, even though they’re not going to use the gym nearly enough to warrant this purchase.
Companies like Netflix use this to their benefit. For instance, during initial years, Netflix used to encourage users to create playlists. However, most users ended up watching the same run of the mill content. Netflix thus made changes and started recommending content that was similar to their content watching habits. It only proves one thing, algorithms are getting smarter at understanding and predicting human behaviors, and that is both good and bad.
Vishal Ingole, June 19, 2017
U.S. Government Keeping Fewer New Secrets
February 24, 2017
We have good news and bad news for fans of government transparency. In their Secrecy News blog, the Federation of American Scientists’ reports, “Number of New Secrets in 2015 Near Historic Low.” Writer Steven Aftergood explains:
The production of new national security secrets dropped precipitously in the last five years and remained at historically low levels last year, according to a new annual report released today by the Information Security Oversight Office.
There were 53,425 new secrets (‘original classification decisions’) created by executive branch agencies in FY 2015. Though this represents a 14% increase from the all-time low achieved in FY 2014, it is still the second lowest number of original classification actions ever reported. Ten years earlier (2005), by contrast, there were more than 258,000 new secrets.
The new data appear to confirm that the national security classification system is undergoing a slow-motion process of transformation, involving continuing incremental reductions in classification activity and gradually increased disclosure. …
Meanwhile, ‘derivative classification activity,’ or the incorporation of existing secrets into new forms or products, dropped by 32%. The number of pages declassified increased by 30% over the year before.
A marked decrease in government secrecy—that’s the good news. On the other hand, the report reveals some troubling findings. For one thing, costs are not going down alongside classifications; in fact, they rose by eight percent last year. Also, response times to mandatory declassification requests (MDRs) are growing, leaving over 14,000 such requests to languish for over a year each. Finally, fewer newly classified documents carry the “declassify in ten years or less” specification, which means fewer items will become declassified automatically down the line.
Such red-tape tangles notwithstanding, the reduction in secret classifications does look like a sign that the government is moving toward more transparency. Can we trust the trajectory?
Cynthia Murrell, February 24, 2017
Investment Group Acquires Lexmark
February 15, 2017
We read with some trepidation the Kansas City Business Journal’s article, “Former Perceptive’s Parent Gets Acquired for $3.6B in Cash.” The parent company referred to here is Lexmark, which bought up one of our favorite search systems, ISYS Search, in 2012 and placed it under its Perceptive subsidiary, based in Lenexa, Kentucky. We do hope this valuable tool is not lost in the shuffle.
Reporter Dora Grote specifies:
A few months after announcing that it was exploring ‘strategic alternatives,’ Lexmark International Inc. has agreed to be acquired by a consortium of investors led by Apex Technology Co. Ltd. and PAG Asia Capital for $3.6 billion cash, or $40.50 a share. Legend Capital Management Co. Ltd. is also a member of the consortium.
Lexmark Enterprise Software in Lenexa, formerly known as Perceptive Software, is expected to ‘continue unaffected and benefit strategically and financially from the transaction’ the company wrote in a release. The Lenexa operation — which makes enterprise content management software that helps digitize paper records — dropped the Perceptive Software name for the parent’s brand in 2014. Lexmark, which acquired Perceptive for $280 million in cash in 2010, is a $3.7 billion global technology company.
If the Lexmark Enterprise Software (formerly known as Perceptive) division will be unaffected, it seems they will be the lucky ones. Grote notes that Lexmark has announced that more than a thousand jobs are to be cut amid restructuring. She also observes that the company’s buildings in Lenexa have considerable space up for rent. Lexmark CEO Paul Rooke is expected to keep his job, and headquarters should remain in Lexington, Kentucky.
Cynthia Murrell, February 15, 2017
How to Quantify Culture? Counting the Bookstores and Libraries Is a Start
February 7, 2017
The article titled The Best Cities in the World for Book Lovers on Quartz conveys the data collected by the World Cities Culture Forum. That organization works to facilitate research and promote cultural endeavors around the world. And what could be a better measure of a city’s culture than its books? The article explains how the data collection works,
Led by the London mayor’s office and organized by UK consulting company Bop, the forum asks its partner cities to self-report on cultural institutions and consumption, including where people can get books. Over the past two years, 18 cities have reported how many bookstores they have, and 20 have reported on their public libraries. Hong Kong leads the pack with 21 bookshops per 100,000 people, though last time Buenos Aires sent in its count, in 2013, it was the leader, with 25.
New York sits comfortably in sixth place, but London, surprisingly, is near the bottom of the ranking with roughly 360 bookstores. Another measure the WCCF uses is libraries per capita. Edinburgh of all places surges to the top without any competition. New York is the only US city to even make the cut with an embarrassing 2.5 libraries per 100K people. By contrast, Edinburgh has 60.5 per 100K people. What this analysis misses out on is the size and beauty of some of the bookstores and libraries of global cities. To bask in these images, visit Bookshelf Porn or this Mental Floss ranking of the top 7 gorgeous bookstores.
Chelsea Kerwin, February 7, 2017
JustOne: When a Pivot Is Not Possible
February 4, 2017
CopperEye hit my radar when I did a project for the now-forgotten Speed of Mind search system. CopperEye delivered high speed search in a patented hierarchical data management system. The company snagged some In-Q-Tel interest in 2007, but by 2009, I lost track of the company. Several of the CopperEye senior managers teamed to create the JustOne database, search and analytic system. One of the new company’s inventions is documented in “Apparatus, Systems, and Methods for Data Storage and/or Retrieval Based on a Database Model-agnostic, Schema-Agnostic, and Workload-Agnostic Data Storage and Access Models.” If you are into patent documents about making sense of Big Data, you will find US20140317115 interesting. I will leave it to you to determine if there is any overlap between this system and method and those of the now low profile CopperEye.
Why would In-Q-Tel get interested in another database? From my point of view, CopperEye was interesting because:
- The system and method was idea for finding information from large collections of intercept information
- The tech whiz behind the JustOne system wanted to avoid “band-aid” architectures; that is, software shims, wrappers, and workarounds that other data management and information access systems generated like rabbits
- The method of finding information achieved or exceeded the performance of the very, very snappy Speed of Mind system
- The system sidestepped a number of the problems which plague Oracle-style databases trying to deal with floods of real time information from telecommunication traffic, surveillance, and Internet of Things transmissions or “emissions.”
How import6ant is JustOne? I think the company is one of those outfits which has a better mousetrap. Unlike the champions of XML, JustOne uses JSON and other “open” technologies. In fact, a useful version of the JustOne system is available for download from the JustOne Web site. Be aware that the name “JustOne” is in use by other vendors.
The fragmented world of database and information access. Source: Duncan Pauly
A good, but older, write up explains some of the strengths of the JustOne approach to search and retrieval couched in the lingo of the database world. The key points from “The Evolution of Data Management” strikes me as helpful in understanding why Jerry Yang and Scott McNealy invested in the CopperEye veterans’ start up. I highlighted these points:
- Databases have to be operational and analytical; that is, storing information is not enough
- Transaction rates are high; that is, real time flows from telecommunications activity
- Transaction size varies from the very small to hefty; that is, the opposite of the old school records associated with old school IBM IMS system
- High concurrency; that is, more than one “thing” at a time
- Dynamic schema and query definition
I highlighted this statement as suggestive:
In scaled-out environments, transactions need to be able to choose what guarantees they require – rather than enforcing or relaxing ACID constraints across a whole database. Each transaction should be able to decide how synchronous, atomic or durable it needs to be and how it must interact with other transactions. For example, must a transaction be applied in chronological order or can it be allowed out of time order with other transactions providing the cumulative result remains the same? Not all transactions need be rigorously ACID and likewise not all transactions can afford to be non-atomic or potentially inconsistent.
My take on this CopperEye wind down and JustOne wind up is that CopperEye, for whatever management reason, was not able to pivot from where CopperEye was to where CopperEye had to be to grow. More information is available from the JustOne Database Web site at www.justonedb.com.
Is Duncan Pauly one of the most innovative engineers laboring in the database search sector? Could be.
Stephen E Arnold, February 4, 2017