Indexing Teen Messages?

September 7, 2015

If you are reading teens’ SMS messages, you may need a lexicon of young speak. The UK Department of Education has applied tax dollars to help you decode PAW and GNOC. The problem is that the http://parentinfo.org/ does not provide a link to the word list. What is available is a link to Netlingo’s $20 list of Internet terms.

image

Maybe I am missing something in “P999: What Teenage Messages Really Mean?”

For a list of terms teens and the eternally young use, check out these free links:

I love it when “real journalists” do not follow the links about which they write. Some of these folks probably find turning on their turn signal too much work as well.

Stephen E Arnold, September 7, 2015

dtSearch Chases Those Pesky PDFs

September 7, 2015

While predictive analytics and other litigation software are more important than ever for legal professionals to sift through the mounds of documents and discover patterns, several companies have come to the rescue, especially dtSearch.  Inside Counsel explains how a “New dtSearch Release Offers More Support To Lawyers.”

The latest dtSearch release is not only able to search through terabytes of information in online and offline environments, but its documents filters have broadened to search encrypted PDFs, including those with a password.  While PDFs are a universally accepted document format, they are a pain to deal with if they ever have to be edited or are password protected.

Also included in the dtSearch are other beneficial features:

“Additionally, dtSearch products can parse, index, search, display with highlighted hits, and extract content from full-text and metadata in several data types, including: Web-ready content; other databases; MS Office formats; other “Office” formats, PDF, compression formats; emails and attachments; Recursively embedded objects; Terabyte Indexer; and Concurrent, Multithreaded Searching.”

The new PDF search feature with the ability to delve into encrypted PDF files is a huge leap ahead of its rivals, being able to explore PDFs without Adobe Acrobat or another PDF editor will make pursuing through litigation much simpler.

Whitney Grace, September 7, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Bar Exam Brouhaha

September 7, 2015

We cannot resist sharing this article with you, though it is only tangentially related to search; perhaps it has implications for the field of eDiscovery. Bloomberg Business asks and answers: “Are Lawyers Getting Dumber? Yes, Says the Woman who Runs the Bar Exam.”

Apparently, scores from the 2014 bar exam dropped significantly across the country compared to those of the previous year. Officials at the National Conference of Bar Examiners (NCBE), which administers the test, insist they carefully checked their procedures and found no problems on their end. They insist the fault lies squarely with that year’s crop of law school graduates, not with testing methods. Erica Moeser, head of the NCBE, penned a letter to law school officials informing them of the poor results, and advising they take steps to improve their students’ outcomes. To put it mildly, this did not go well with college administrators, who point out Moeser herself never passed the bar because she practices in Wisconsin, the only state in which the exam is not required to practice law.

So, who is right? Writer Natalie Kitroeff points out this salient information:

“Whether or not the profession is in crisis—a perennial lament—there’s no question that American legal education is in the midst of an unprecedented slump. In 2015 fewer people applied to law school than at any point in the last 30 years. Law schools are seeing enrollments plummet and have tried to keep their campuses alive by admitting students with worse credentials. That may force some law firms and consumers to rely on lawyers of a lower caliber, industry watchers say, but the fight will ultimately be most painful for the middling students, who are promised a shot at a legal career but in reality face long odds of becoming lawyers.”

The 2015 bar exam results could provide some clarification, but those won’t start coming out until sometime in September. See the article for much more information on Moeser, the NCBE, the bar exam itself, and the state of legal education today. Makers of eDiscovery software may want to beef up their idiot-proofing measures as much as possible, just to be safe.

Cynthia Murrell, September 7, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

IBM Enterprise Search without Watson

September 6, 2015

Where’s Watson?

The question was not answered at this IBM Enterprise Search page.

After the hyperbole added to the Watson search, smart software, analytics, cancer curing, recipe making thing—I expected more from Big Blue.

The verbiage and the list of sources does not allude to Watson, the future of informatio0n access.

What I learned was:

  1. OmniFind is still with us
  2. Search requires IBM Content Analytics
  3. If you want to index DB2, you have to license the IBM InfoSphere Federation Server
  4. Lots of hardware and storage are on the horizon either on premises or in the cloud or via a hybrid solution.

Confused? Not me.

I expected more than a collection of ageing systems which must be built from dozens of components.

In my opinion, enterprise search at IBM has been designed to create consulting and engineering services revenue.

Maybe Watson cannot answer this question for IBM, “What is the optimal way to facilitate information access with Watson?”

The void speaks loudly to me.

Stephen E Arnold, September 6, 2015

Publishers Display Their Online Pricing Acumen

September 6, 2015

I have returned from PEI (Prince Edward Island, gentle reader). The modest traffic and the weight of fresh mussels are behind me. I learned that mussels in PEI were one third the price of those available from the fish monger in Harrod’s Creek. There is something to this pricing thing.

Perched in a comfortable gray plastic zero coefficient of friction chain in Chicago’s wonderful airport, I read “E-Book Sales Fall After New Amazon Contracts.” The main idea is that some big boys and girls in upmarket publishing houses worked overtime to get pricing control of their eBooks on Amazon.

According to the write up:

“The new business model for e-books is having a significant impact on what [the big] publishers report,” said one publishing executive. “There’s no question that publishers’ net receipts have gone down.”

What does this suggest to me? Three items:

First, the business analyses of these large outfits did not deliver oodles of dough. No surprise. Amazon prices the Google way: Data with a frosting of what sure seems like distinctly subjective behavior.

Second, the Amazon reality is that eBooks have less value than the good, old fashioned, dead tree versions. Er, streaming music exists, right?

Third, the big boys and girls continue to demonstrate their deep understanding of the world of zeros and ones.

No surprise.

Stephen E Arnold, September 6, 2015

ZyLAB and Azure: A Cloud Marriage for a Cost Controlled Relationship

September 5, 2015

I read “ZyLAB eerste eDiscovery leverancier op Microsoft Azure Platform.” The Dutch company has been certified to process data stored in the Azure cloud computing platform. ZyLAB is one of the vendors serving the legal eDiscovery market.

The idea is that ZyLAB can be scaled more easily when using Azure. The objective is to reduce the cost of eDiscovery and related text processing costs.

Johannes Scholtes, Founder of ZyLAB, said:

As companies increasingly content stored in the cloud, it is important that, in the case of an eDiscovery or other legal investigations, attorneys can search these data. Migrating large amounts of content to and from the cloud provides all kinds of problems for the bandwidth and does only after processing and evaluating a small amount of data to be submitted to a third party. It is much more practical to carry out the whole process in the cloud, and only to meet the final data set for production from the cloud.

Some attorneys may be uncomfortable if information germane to a legal matter is not stored within the firm on the law firm’s servers. However, costs are a key concern in many law firms. Lower cost solutions are of interest. One assumes that security is not a concern. Will the Ashley Madison litigation make use of Microsoft Azure and cloud based eDiscovery? Interesting question in my opinion.

Stephen E Arnold, September 5, 2015

Smartlogic Chops at the Gordian Knot of the Semantic Web

September 5, 2015

This semantic Web thing just won’t take a nap. The cheerleaders for the Big Data and analytics revolutions are probably as annoyed as I am. Let’s face it. Semantic was a good buzzword years ago. The problem remains that anything to do with indexing, taxonomies, ontologies, and linguistics lacks sizzle.

If you want analytics, you definitely want predictive analytics. (I agree.Who wants those tired Statistics 101 methods when Kolmogorov-Arnold methods are available. Not me, that’s one of my relatives. I am the dumb Arnold.)

If you want data, you want Big Data. The notion of having large volumes of zeros and ones to process in real time is more exciting than extracting a subset which meets requirements for validity and then doing historical analyses. The real time thing is where it is at.

I read “The Promise of the Semantic Web, Truth of Fiction?” hoping for an epiphany. Failing that high water mark of intellectual insight, I would have been satisfied with a fresh spin on an old idea. No joy.

I read:

Semantic technologies have the capacity to extract meaning from unstructured information found within an enterprise and make them available for processing. Our new Semaphore 4 platform combines the power of semantic technologies with our ontology management, auto classification, and semantic enhancement server to help organizations identify, classify and tag their content in order to use the intelligence within it to manage their business.

The information strikes me as a bit of the old rah rah for a specific product. The system is proprietary. The licensee must perform some work to allow the “platform” to deliver optimum outputs.

What about the answer to the question of a promise as truth or fiction?

The answer is to license a proprietary product. I am okay with that, but when the title of the write up purports to tackle an issue of substance and deflects substantive analysis with a sales pitch, I realize that I am out of step with the modern methods.

Here’s my take on the question about the semantic Web.

Folks, the semantic Web thing is a reality. A number of outfits have been employing semantic methods for years. The semantics, however, are plumbing and out of site. The companies pitching RDF, Owl, and other conventions are following a wave which built, formed, and crashed on the shore years ago.

At this time, next generation information access vendors incorporate linguistic and semantic methods in their plumbing. The particular pipe and joint are not elevated to be the solution. The subsystems and their components are well understood, readily available methods.

As a result, one gets semantics with systems from Diffeo, Recorded Future, and other innovators.

The danger with asking a tough question and then answering it with somewhat stale information is that someone may come along and say, “There are vendors who are advancing the state of the art with innovative solutions.”

That is the main reason that MIT and Google have funded these NGIA (next generation information access) outfits. Innovation is more than asking a question, not answering it, and delivering a sales pitch for a component. Not too useful to me, gentle reader.

I would suggest that the Gordian knot is in the mind of the semantic solution marketer, not the mind of a prospect with a real time content problem for which modern technology enables effective solutions.

Stephen E Arnold, September 5, 2015

An Alphabet Google Process May Spell Trouble

September 4, 2015

Google does not fiddle with search results, or that’s what I concluded by reading the blog post called “Improving Quality Isn’t Anti-Competitive.”

I also read “Google Has a Secret Interview Process… And It Landed Me a Job.” The point of this article is that Google monitored a user’s queries about programming. When certain terms appeared in the user’s query, the Google search system displayed this question:

Your’re speaking out language. Up for a challenge?

When the user moved forward with the challenge, the result was a Google request for an interview.

Magic? Objective? Efficient?

My thought was, “Perhaps Google performs a similar on the fly monitoring and procedural function when users perform other queries?”

Instead of getting a job at Google, applications of this method could, not the “could”, gentle reader, be used to display results to achieve other user actions.

Will the whiz kids grousing about Google in the European Commission see this “how I got hired” article as a way to spell trouble for Alphabet Google?

Nah, impossible. I love Alphabet Google. Objectively, of course.

Stephen E Arnold, September 4, 2015

Subjective Big Data: Marginalized Hype from a Mid Tier Outfit

September 4, 2015

I read “Why Gartner Dropped Big Data Off the Hype Curve.” The article purports to explain why Gartner Group, a mid tier consulting firm, eliminated Big Data from its hype cycle. Let me ask, “Perhaps Big Data reports do not sell to executives who have zero clue what Big Data means to a struggling business?” The write up is an analytics and data clean room. Facts are tough to discern.

The article included a chart without numbers to help knowledge hungry folks figure out what technology is an innovation trigger, a technology which is at the peak of inflated expectations, what technology have fallen (gasp!) into the trough of disillusionment, which are on the slope of enlightenment, and which have reached the plateau of productivity.

The write up fills the empty vessel of my mind with this insight from a mid tier wizard, Betsy Burton. She allegedly revealed:

There’s a couple of really important changes,” Burton says. “We’ve retired the big data hype cycle. I know some clients may be really surprised by that because the big data hype cycle was a really important one for many years. “But what’s happening is that big data has quickly moved over the Peak of Inflated Expectations,” she continues, “…and has become prevalent in our lives across many hype cycles. So big data has become a part of many hype cycles.”

I like that observation about Big Data becoming part of many hype cycles.

That’s reassuring. I don’t know what Big Data is, but it is now part of many hype cycles.

I like subjective statements about what is moving through a hype cycle. When one hype cycle is not enough, then put the fuzzy wuzzy statement into many hype cycles. Neat.

The article explains that other “notable subtractions” took place; for example, drop outs include:

  • Prescriptive analytics, which I presume are numbers which are not used in this article’s graphics. Numbers are so annoying because one must explain where the numbers came from, figure out if the numbers are accurate, and then make decisions about how to extract valid outputs from numerical recipes. Who has time for that?
  • Data science. I am not sure what this means, but it’s off the hype cycle hit parade.
  • Complex event processing. Sounds great but it too is a victim of the delete button.

I view the listing as subjective. Subjectivity is useful, particularly when discussing which painting in the Wildenstein Collection is the best one or which of Mozart’s variations is the hot one.

Objective analyses, in my opinion, to make a case that virtual reality is on the slope of enlightenment or that affective computing is lifting off like a hyperbole fueled rocket.

Am I the only one who finds these subjective lists silly? My hunch is that the reason concepts get added to the list is to create some demand for a forthcoming study. The reason stuff disappears is because reports about the notion do not sell.

I wonder if there are data available from mid tier consulting firms to back up my hypothesis. Well, we can argue whether pale ivory is more attractive than honey milk.

Interior design professionals will go to the mattresses tinted white wisp to defend their subjective color choice. Do mid tier consultants share this passion?

Stephen E Arnold, September 4, 2015

Content Matching Helps Police Bust Dark Web Sex Trafficking Ring

September 4, 2015

The Dark Web is not only used to buy and sell illegal drugs, but it is also used to perpetuate sex trafficking, especially of children.  The work of law enforcement agencies working to prevent the abuse of sex trafficking victims is detailed in a report by the Australia Broadcasting Corporation called “Secret ‘Dark Net’ Operation Saves Scores Of Children From Abuse; Ringleader Shannon McCoole Behind Bars After Police Take Over Child Porn Site.”  For ten months, Argos, the Queensland, police anti-pedophile taskforce tracked usage on an Internet bulletin board with 45,000 members that viewed and uploaded child pornography.

The Dark Web is notorious for encrypting user information and that is one of the main draws, because users can conduct business or other illegal activities, such as view child pornography, without fear of retribution.  Even the Dark Web, however, leaves a digital trail and Argos was able to track down the Web site’s administrator.  It turned out the administrator was an Australian childcare worker who had been sentenced to 35 years in jail for sexually abusing seven children in his care and sharing child pornography.

Argos was able to catch the perpetrator by noticing patterns in his language usage in posts he made to the bulletin board (he used the greeting “hiya”). Using advanced search techniques, the police sifted through results and narrowed them down to a Facebook page and a photograph.  From the Facebook page, they got the administrator’s name and made an arrest.

After arresting the ringleader, Argos took over the community and started to track down the rest of the users.

” ‘Phase two was to take over the network, assume control of the network, try to identify as many of the key administrators as we could and remove them,’ Detective Inspector Jon Rouse said.  ‘Ultimately, you had a child sex offender network that was being administered by police.’ ”

When they took over the network, the police were required to work in real-time to interact with the users and gather information to make arrests.

Even though the Queensland police were able to end one Dark Web child pornography ring and save many children from abuse, there are still many Dark Web sites centered on child sex trafficking.

 

Whitney Grace, September 4, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

 

 

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta