SharePoint Revealed

September 23, 2015

Microsoft SharePoint. It brings smiles to the faces of the consultants and Certified Experts who can make the collection of disparate parts work like refurbished John Harrison clock.

I read “Microsoft SharePoint ECM Suite for Content Management.” The write up explains that SharePoint became available in 2001. The write up does not reference the NCompass Labs’ acqusition or other bits and pieces of the SharePoint story. That’s okay. It is 2015. Ancient history in terms of Internet time. Also, what is content management? Does it include audio, video, and digital images? What about binaries? What about data happily stored on the state of Michigan’s mainframes?

image

 

 

 

 

 Jack Benny’s Maxwell reminds me of Fast Search’s 1998 approach to information access. With Fast Search inside, SharePoint delivers performance that is up to the Maxwell’s standards for speed, reliability, and engineering excellence.

The write up reveals that SharePoint evolved “gradually.” The most recent incarnation of the system includes a number of functions; specifically mentioned in the article, are:

  • A cloud based service
  • A foundation for collaboration and document sharing
  • A server. I thought there were multiple servers. Guess not.
  • A designer component for creating nifty looking user experiences. Isn’t Visual Studio or other programming tool required as well?
  • Cloud storage. Isn’t this redundant?
  • Search

I prefer a more modern approach to information access. The search systems I use are like a Kia Soul. The code often includes hamsters too.

Here’s what the write up says about search:

Microsoft FAST Search, which provides indexing and efficient search of content of all types.

I like the indexing and “efficient” description. The content of “all types” is interesting as well.

How well does Fast Search in its present incarnation handle audio and video? What about real time streams of social media like the Twitter fire hose? You get the idea. “All” is shorthand for “some” content.

I am not captivated by the whizzy features in SharePoint and its content management capabilities. I am not thrilled with building profiles of employees within an organization. I am pretty relaxed when it comes to collaboration. Phones work pretty well. Email is okay too. I work on documents alone and provide a version for authorized individuals to review. I need no big gun system necessary needed. Just a modern one.

What about Fast Search?

Let me highlight a few salient points:

  • The product originated in Norway. You know where Trondheim is, right? Oslo? Of course. Great in the winter too. The idea burst from academia prior to 1998, when the company was officially set up. That makes the architecture an agile, youthful 17 years old.
  • In 2008, Microsoft paid $1.2 billion for a company which was found wanting in its accountancy skills. After investigations and a legal proceeding, the company seems to have had revenues well below its reported $170 million in 2007. Until the HP Autonomy deal, this was a transaction that helped fuel the “search is a big payday” belief. At an estimated $60 million instead of $170, Microsoft paid about 20 times Fast Search’s 2007 earnings. After the legal eagles landed, the founder of Fast Search found himself on the wrong end of a court decision. Think lock up time.
  • Fast Search is famous for me because its founder told me that he was abandoning Web search for the enterprise search market. Autonomy’s revenue seemed to be a number the founder thought was reachable. As time unspooled, the big pay day arrived for Google. Enterprise search did not work out in terms of Google scale numbers. Fast Search backed out of an ad model to pursue an academic vision of information access as the primary enterprise driver.
  • The Fast Search solution which is part of SharePoint has breathed life into dozens of SharePoint search add ins. These range from taxonomy systems to clustering components to complete snap in replacements for the Fast Search components. Hundreds upon hundreds of consultants make their living implementing, improving, and customizing search and retrieval for SharePoint.

Net net: SharePoint has more than 150 million licensees. SharePoint is the new DOS for the enterprise. SharePoint is a consultant’s dream come true.

For me, I prefer simpler and more recent technology. That 17 year old approach seems more like Jack Benny’s Maxwell than a modern search Kia Soul.

Stephen E Arnold, September 23, 2015

Cloud Excitement: What Is Up?

September 21, 2015

I noted two items about cloud services. The first is summarized in “Skype Is Down Worldwide for Many Users.” I used Skype last week one time. I noted that the system was unable to allow my Skype conversationalist to hear me. We gave up fooling with the systems, and the person who wanted to speak with me called me up. I wonder how much that 75 minute international call cost. Exciting.

I also noted that Amazon went offline for some of its customers on September 21, 2015. The information was in “Amazon Web Services Experiences Outages Sunday Morning, Causing Disruptions On Netflix, Tinder, Airbnb And More.”

Several observations are warranted:

  • What happened to automatic failover, redundancy, and distributed computing? I assumed that Google’s loss of data in its Belgium data center was a reminder that marketing chatter is different from actual data center reality. Guess not?
  • Whom or what will be blamed? Amazon will have a run at the Ashburn, Virginia nexus. Microsoft will probably blame a firmware or software update. The cause may be a diffusion of boots on the ground technical knowledge. Let’s face it. These cloud services are complicated puppies. As staff seek their future elsewhere and training is sidestepped, the potential for failure exists. The fix-it-and-move on approach to engineering adds to the excitement. Failure, in a sense, is engineered into many of these systems.
  • What about the promise of having one’s data in the cloud so nothing is lost, no downtime haunts the mobile device user, and no break in a seamless user experience occurs? More baloney? Yep, probably.

Net net: I rely on old fashioned computing and software methods. I think I lost data about 25 years ago and went offline never. Redundancy, reliability, and fail over take work gentle reader, not marketing and advertising.

How old school. The reason my international call took place was a result of my having different mobile telephony accounts plus an old Bell head landline. Expensive? Sure, but none of this required me to issue a news release, publicize how wonderful my cloud system was, and the egg-on-the-face reality of failure.

Stephen E Arnold, September 21, 2015

Google: Single Point of Failure Engineering

September 18, 2015

Do you recall the lightning strike at the Alphabet Google’s data center in Belgium? Sure you do. Four lightning strikes caused the data center to lose data. See “Lightning in Belgium Disrupts Google Cloud Services.” I asked myself, “How could a redundant system, tweaked by AltaVista wizards decades ago, lose data?”

When I was assembling information for the first study in my three part Google series, I waded through many technical papers and patent documents from the GOOG (now Alphabet). These made clear to me that the GOOG was into redundancy. There were nifty methods with clever names. Chubby, anyone?

Now the Belgium “act of God” must have been an anomaly. Since 2003, the GOOG should have been improving its systems and their robustness. Well, maybe Belgium is lower on the hardened engineering list?

I found this article quite interesting: “Google Is 2 Billion Lines of Code. And It Is All in One Place.” Presumably the knowledge embodied in ones and zeros is not in one place. Nope. The code is in 10 data centers, kept in check with Piper, a home brew code management system.

But, I noted:

There are limitations this system. Potvin [Google wizard] says certain highly sensitive code—stuff akin to the Google’s PageRank search algorithm—resides in separate repositories only available to specific employees. And because they don’t run on the ‘net and are very different things, Google stores code for its two device operating systems—Android and Chrome—on separate version control systems. But for the most part, Google code is a monolith that allows for the free flow of software building blocks, ideas, and solutions.

No lightning strikes are expected. What are the odds for simultaneous lightning strikes at multiple data centers? Not worth worry about this unlikely disaster scenario. Earthquake? Nah. Sabotage? Nah.

No single point of failure for the Alphabet Google thingy. Cloud services just do not lose data most of the time. The key word is “most.”

Stephen E Arnold, September 18, 2015

US Government Outdoes Squarespace and Weebly

September 18, 2015

The ability of the US government to innovate is remarkable. I learned in “18F’s Federalist Helps Agencies Build Websites Faster.” You, gentle reader, probably know that 18F refers to the street on which the ever efficient General Services Administration, part of the White House’s Executive Branch, works its wonders. In addition to a big courtyard, the 18 F Street facility also has an auditorium which sometimes floods, thus becoming a convenient swimming pool for the tireless innovators laboring in the structure a short walk from the president’s oval office.

The write up explained to me:

Currently in its first phase of software testing, the Federalist [the US government’s Web site builder] “automates common tasks for integrating GitHub, a content editor and Amazon Web Services,” so that web developers can manage and create new static websites on one consolidated platform, 18F said in a post on GSA.gov. The toolkit is equipped with a collection of static-site templates and a web-based content editor, allowing agencies to easily add and create section 508-compliant content while cutting the cost of designing an entirely new site or standing up a content management system.

When I read this, I thought about Squarespace, Weebly, and other services which have been providing similar functions, often for free, for many years.

The write up pointed out:

The platform is intended to be a faster, less expensive and more efficient option for developers building static sites and agencies without the website expertise.  According to 18F, Federalist uses the same scalable content delivery strategy developed for healthcare.gov and the recently launched College Scorecard.

Obviously using one of the existing, free or low cost commercial services was inappropriate. The next project will be inventing the wheel and using vulcanized rubber, not polymers. The road map also calls for a multi year study of fire.

Stephen E Arnold, September 18, 2015

Svelte Python Web Crawler

September 17, 2015

Short honk: Looking for a compact, lean Web crawler? Navigate to “A Web Crawler With Asyncio Coroutines.” One of the code wizards is Guido van Rossum. You, gentle reader, are probably aware that Mr. Van Rossum was the author of Python. He is a former Xoogler. The presentation of the crawler is a bit like a box of Legos. You will be rewarded.

Stephen E Arnold, September 17, 2015

Recommind Hits $70 Million

September 16, 2015

A video from the Big Data Landscape, part of their Big Data TV series, brings us an interview with Recommind’s CEO, Bob Tennant. The 11-and-a-half minute video and its transcript appear under the headline, “How Recommind Grew to $70M in Big Data Revenue.”

The interview by Dave Feinleib explores Recommind’s right-moves-at-the-right-time origin story, what its intelligence and eDiscovery software does, and why Tennant is confident the company will continue to thrive. This successful CEO also offers advice for aspiring entrepreneurs in any field, so check out the video or transcript for those words of wisdom.

Interestingly, the technology Tennant describes reminds us of early Autonomy methods [pdf]. He discusses working with unstructured data:

“So what you have to do is try to understand at a deeper level what’s happening semantically. What Recommind does is marry up a very highly scalable system for dealing with unstructured information– and the kind of database you need for doing that is different than what you would utilize for online transaction processing. But it also marries that up with a very deep knowledge of machine learning, which is the root of the company and where our post-docs were doing their research, to help understand what the key pieces of information in the sea of textual stuff are. And once you understand the key pieces, then you can put that into applications for further use or you can provide it to business intelligence applications to make sense of, or you can feed it elsewhere. But that’s very different from dealing with very structured data that most people are familiar with.”

Launched in 2000 and headquartered in San Francisco, Recommind provides search-powered analysis and governance solutions to customers around the world. The company’s Malolo technology stack is built upon their CORE information management platform.

Cynthia Murrell, September 16, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Where’s the Finish Line Enterprise Search?

September 16, 2015

What never ceases to amaze me is that people are always perplexed when goals for technology change.  It always comes with a big hullabaloo and rather than complaining about the changes, time would be better spent learning ways to adapt and learn from the changes.  Enterprise search is one of those technology topics that sees slow growth, but when changes occur they are huge.  Digital Workplace Group tracks the newest changes in enterprise search, explains why they happened, and how to adapt: “7 Ways The Goal Posts On Enterprise Search Have Moved.”

After spending an inordinate amount of explaining how the author arrived at the seven ways enterprise search has changed, we are finally treated to the bulk of the article.  Among the seven reasons are obvious insights that have been discussed in prior articles on Beyond Search, but there are new ideas to ruminate about.  Among the obvious are that users want direct answers, they expect search to do more than find information, and understanding a user’s intent.  While the obvious insights are already implemented in search engines, enterprise search lags behind.

Enterprise search should work on a more personalized level due it being part of a closed network and how people rely on it to fulfill an immediate need.  A social filter could be applied to display a user’s personal data in search results and also users rely on the search filter as a quick shortcut feature. Enterprise search is way behind in taking advantage of search analytics and how users consume and manipulate data.

“To summarize everything above: Search isn’t about search; it’s about finding, connecting, answers, behaviors and productivity. Some of the above changes are already here within enterprises. Some are still just being tested in the consumer space. But all of them point to a new phase in the life of the Internet, intranets, computer technology and the experience of modern, digital work.”

As always there is a lot of room for enterprise search improvement, but these changes need to made for an updated and better work experience.

Whitney Grace, September 16, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Free InetSoft Data Tools for AWS Users

September 14, 2015

Users of AWS now have access to dashboard and analytics tools from data intelligence firm InetSoft, we learn from “InetSoft’s Style Scope Agile Edition Launched on Amazon Web Services for No Extra Cost Cloud-based Dashboards and Analytics” at PRWeb. The press release announces:

“Installable directly from the marketplace into an organization’s Amazon environment, the application can connect to Amazon RDS, Redshift, MySQL, and other data sources. Its primary limitation is a limit of two simultaneous users. In terms of functionality, the enterprise administration layer with granular security controls is omitted. The application gives fast access to powerful KPI reporting and multi-dimensional analysis, enabling the private sharing of dashboards and visualizations ideally suited for individual analysts, data scientists, and small teams in any departmental function. It also provides a self-service way of evaluating much of the same technology available in InetSoft’s commercial offerings, applications suitable for enterprise-wide deployment or embedding into other cloud-based solutions.”

So now AWS users can pick up free tools with this Style Scope Agile Edition, and InetSoft may pick up a customers for its commercial version of Style Scope. The company emphasizes that their product does not require users to re-architect data warehouses, and their data access layer, based on MapReduce principles, boosts performance. Founded in 1996, InetSoft is based in New Jersey.

Cynthia Murrell, September 14, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Content Matching Helps Police Bust Dark Web Sex Trafficking Ring

September 4, 2015

The Dark Web is not only used to buy and sell illegal drugs, but it is also used to perpetuate sex trafficking, especially of children.  The work of law enforcement agencies working to prevent the abuse of sex trafficking victims is detailed in a report by the Australia Broadcasting Corporation called “Secret ‘Dark Net’ Operation Saves Scores Of Children From Abuse; Ringleader Shannon McCoole Behind Bars After Police Take Over Child Porn Site.”  For ten months, Argos, the Queensland, police anti-pedophile taskforce tracked usage on an Internet bulletin board with 45,000 members that viewed and uploaded child pornography.

The Dark Web is notorious for encrypting user information and that is one of the main draws, because users can conduct business or other illegal activities, such as view child pornography, without fear of retribution.  Even the Dark Web, however, leaves a digital trail and Argos was able to track down the Web site’s administrator.  It turned out the administrator was an Australian childcare worker who had been sentenced to 35 years in jail for sexually abusing seven children in his care and sharing child pornography.

Argos was able to catch the perpetrator by noticing patterns in his language usage in posts he made to the bulletin board (he used the greeting “hiya”). Using advanced search techniques, the police sifted through results and narrowed them down to a Facebook page and a photograph.  From the Facebook page, they got the administrator’s name and made an arrest.

After arresting the ringleader, Argos took over the community and started to track down the rest of the users.

” ‘Phase two was to take over the network, assume control of the network, try to identify as many of the key administrators as we could and remove them,’ Detective Inspector Jon Rouse said.  ‘Ultimately, you had a child sex offender network that was being administered by police.’ ”

When they took over the network, the police were required to work in real-time to interact with the users and gather information to make arrests.

Even though the Queensland police were able to end one Dark Web child pornography ring and save many children from abuse, there are still many Dark Web sites centered on child sex trafficking.

 

Whitney Grace, September 4, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

 

 

Shades of CrossZ: Compress Data to Speed Search

September 3, 2015

I have mentioned in my lectures a start up called CrossZ. Before whipping out your smartphone and running a predictive query on the Alphabet GOOG thing, sit tight.

CrossZ hit my radar in 1997. The concept behind the company was to compress extracted chunks of data. The method, as I recall, made use of fractal compression, which was the rage at that time. The queries were converted to fractal tokens. The system then quickly pulled out the needed data and displayed them in human readable form. The approach was called as I recall “QueryObject.” By 2002, the outfit dropped off my radar. The downside of the CrossZ approach was that the compression was asymmetric; that is, slow preparing the fractal chunk but really fast when running a query and extracting the needed data.

Flash forward to Terbium Labs, which has a patent on a method of converting data to tokens or what the firm calls “digital fingerprints.” The system matches patterns and displays high probability matches. Terbium is a high potential outfit. The firm’s methods may be a short cut for some of the Big Data matching tasks some folks in the biology lab have.

For me, the concept of reducing the size of a content chunk and then querying it to achieve faster response time is a good idea.

What do you think I thought when I read “Searching Big Data Faster”? Three notions flitter through my aged mind:

First, the idea is neither new nor revolutionary. Perhaps the MIT implementation is novel? Maybe not?

Second, the main point that “evolution is stingy with good designs” strikes me as a wild and crazy generalization. What about the genome of the octopus, gentle reader?

Third, MIT is darned eager to polish the MIT apple. This is okay as long as the whiz kids take a look at companies which used this method a couple of decades ago.

That is probably not important to anyone but me and to those who came up with the original idea, maybe before CrossZ popped out of Eastern Europe and closed a deal with a large financial services firm years ago.

Stephen E Arnold, September 3, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta