Archive | Ent RSS feed for this section

Why Hadoop projects fail — and how to make yours a success

25 Jun

Without doubt, “big data” is the hottest topic in enterprise IT since cloud computing came to prominence five years ago. And the most concrete technology behind the big data trend is Hadoop.

Most enterprises are at least experimenting with Hadoop, and the potential for transformative business improvement is real. But just as real is the chance of what I call a “Hadoop hangover” if the project fails to meet expectations and instead results in costly failure.

To help you make the most of Hadoop, let’s look at the promise of big data analytics, and how to avoid expensive, disillusioning failure.

Getting from big data to smart algorithms

For most businesses, big data is an attempt to emulate the advanced data-driven business techniques that propelled Amazon and Google to the forefront of their respective industries.

This is not business intelligence as we have known it in the past: the primary aim is not to facilitate executive decision making through charts and reports, but to entwine data-driven algorithms directly into the business processes that drive customer experience.

Hadoop — essentially an open source implementation of core Google technologies — is the most concrete technology behind big data. Hadoop enables big data projects by providing an economic way to store and process masses of raw data. Hadoop has been proven at scale at Facebook and Yahoo, and was the basis of the most impressive artificial intelligence project to date: IBM’s Watson, the super-computer that won Jeapordy! in 2011.

Most – if not all- Fortune 500 companies have at least a Hadoop pilot project in place. Many are still in the initial data capture stage: setting up the workflows to capture raw business data, demographics and the “data exhaust” flowing from websites and social media. These data capture projects entail significant risk in their own right.

Of course, collecting the data is only the beginning. There’s an old adage: “data is now knowledge and knowledge isn’t information” — and this remains true even if you have “big” data. Indeed, we might add a new clause for our big data world: “information isn’t action”. In other words, determining the meaning of the data is no longer enough: we have to establish the mechanisms — implemented as complex adaptive algorithms — that drive a more effective business.

It’s a tenant of big data analytics that the more data you have, the less complex your algorithms need be. It’s the difference between predicting the outcome of an election from a polling sample and counting the votes on election night. The election night count is always more accurate.

Furthermore, machine learning techniques allow algorithms to be “trained” from the data itself. Essentially the data drives and refines the algorithms.

So having lots of data is an advantage. But, at the end of the day, it still requires a lot of human intelligence to come up with the best answers. Indeed, sometimes it’s a matter of asking the right question. Collecting the data is necessary but not sufficient. Getting from big data to smart algorithms is a unique challenge in its own right.

With all that in mind, let’s look at the key challenges facing successful Big Data analytic projects:

Data scientists are critical, but in short supply

The Googles and Amazons of this world succeeded in their big data projects largely because they were able to attract and retain some of the world’s most gifted computer scientists. These were individuals who brought to the table not just programming skills; they were also able to bring to bear complex statistical analysis techniques, business insight, cognitive psychology and incredible innovative problem solving abilities.

We’ve come to call these types of people “data scientists” and it’s well understood that the base skills — statistics, algorithms, parallel programming, and so on are in short supply. Academia is only just responding with curricula to produce suitably qualified graduates. It will be years before we see a significant increase in qualified data scientists.

If and when we see the supply of data scientists increase, we will still be faced with a more fundamental issue. This stuff is hard. It requires the ability to think across at least three fundamentally complex specializations, including competitive business strategy, machine learning algorithms, and massively parallel data programming. This unique combination of skills is likely to be the limiting factor for big data in the enterprise for the foreseeable future.

At the core of any big data project is the data scientist — acquiring or developing data science capability is a critical factor in a big data project.

The shortage of big data tools

Compounding the problem of the data science talent gap — but perhaps also offering a possible solution –is the lack of suitable tools for the data scientist.

Hadoop and other data stores supply a brute force engine for computation and data storage. Hadoop clusters can consist of potentially thousands of commodity servers — each with their own disk storage and CPUs. Data is stored redundantly across nodes in the cluster. The MapReduce algorithm allows processing to be distributed across all the nodes in the cluster. The result is an amazingly cost effective way of distributing processing across potentially thousands of CPUs disks.

But programming in MapReduce is akin to programming in Assembly language – it’s not a practical way of creating big data algorithms. To turn big data into big value, the data scientist needs tools that can support statistical hypothesis testing, creating and training predictive models, as well as reporting and visualization. Open source projects such as Mahout, Weka and R provide a starting point, but none are easy to use, and often they are insufficiently scalable or otherwise unsuitable to be at the core of Big Data enterprise solutions.

Higher level toolkits – which might leverage Mahout, R and the like, but which make them accessible to a wider audience and allow them to be used as building blocks in more complex workflows – are the next stage of evolution for data science products. Without these Big Data analytic platforms, fully leveraging big data will only be possible in the largest enterprises, who have the budget and reputation sufficient to attract the limited supply of truly capable data scientists.

Data scientists need a more effective analysis framework and toolkit than is provided by Hadoop and its ecosystem. Producing these tools should be a priority for the software community.

The reduction in data quality

Hadoop succeeds as the basis for so many big data projects not just because it can economically store and process large quantities of data, but also because it can accept data in any form. In a traditional database, data must be converted to a pre-defined structure (a schema) before being loaded.

These ETL (Extract-Transform-Load) projects are typically expensive and time consuming. Furthermore, the economics of data warehousing typically required that the data be aggregated and pruned before loading, and therefore lost the granularity necessary for big data solutions.

Hadoop allows for “schema on read” — you need only define the structure of the data when you come to read it. This allows data to be loaded in its most raw form, without needing to analyze or define the data ahead of time. You load everything at low cost, and then only “pay” for the schemas you need.

However, this approach has some fairly obvious risks — machine-generated data in particular might be changing structure rapidly and by the time you come to mine the data it might be very hard to determine its structure. Furthermore, any errors in the generated data might not be picked up until it is too late.

So despite the promise of schema on read, success in a big data project may depend on careful vetting of incoming data — not to the extent of a full ETL process to be sure, but more than simply “load and hope”. After all, one of the first lessons of the computer age was GIGO: Garbage In, Garbage Out.

Pay attention to the quality and format of data streaming into Hadoop. Make sure you’ve identified the structure and assured the quality of that data.

Hadoop has proven it’s scalability at places like Yahoo and Facebook, and proven an ability to power the most complex analytics as the basis for IBM’s Watson AI. However, it misses some key features that the enterprise regards as important:

Security in Hadoop is weak. Once authenticated to a Hadoop cluster, a user can typically access all the data in that cluster. Although it’s possible to limit a user’s access to specific files in a Hadoop cluster it’s not possible to limit data to individual records in that file. Furthermore, because of the cumbersome nature of Hadoop security and the interaction with external tools such as Hive (Hadoop’s native SQL interface) the most common practice is to allow everybody access to everything.

Backup is also difficult. Hadoop is inherently fault tolerant, but enterprises still want to have a disaster recovery plan, or to restore to a point in time backup should some human error result in data corruption. Most distributions do not have these capabilities (the MapR distribution does provide a snapshot capability).

Integration with enterprise monitoring systems is lacking. Hadoop generates metrics, and each Hadoop vendor offers an “Enterprise” console, but these do not integrate properly with Enterprise monitoring systems such as Openview or Foglight.

Resource management is primitive. The ability to manage resources to prevent adhoc requests from blocking mission critical operations is only just emerging.

Real-time query is not a feature of Hadoop. While an emerging set of SQL-based languages and caching layers have been created, Hadoop is not a suitable basis for real time computing.

None of these issues are show stoppers for Hadoop, but failure to acknowledge these limitations may lead to unrealistic expectations for your Hadoop project that cannot be fulfilled.

Make sure you understand the technical strengths and limitations of Hadoop. Avoid unrealistic expectations for your Hadoop solution.

Organizational challenges

Big data is a complex and potentially disruptive challenge to many organizations. Globalization and e-commerce have flattened the world so much that for many businesses simply competing on price or store locality is no longer an option. Competitive differentiation will derive increasingly from personalization, targeting, predictive recommendations and so on. For many businesses, achieving some form of data-driven operation will be survival itself.

History has shown that when faced with this sort of disruptive threat, many companies “freeze” – clinging ever tighter to outmoded business models and hoping for a return to the competitive landscape of the past.

Big data analytics is an over-hyped, poorly-defined and over-used term. Despite that, and despite the challenges outlined above, I believe that for many businesses, the opportunities presented by the big data revolution are as significant and fundamental as those presented by e-commerce 15 years ago. Companies (particularly retailers) should be bold and determined in reacting to these challenges.

Organizational resistance and scepticism to big data is understandable. But don’t let big data risks blind you to the benefits -– and sometimes necessity — of a big data project. Indeed, drinking sensibly seems to be the best way to avoid the hangover without missing the party altogether.


Sony Xperia S Update Release Date: New Fix On July 8 To Patch Up NFC Issues From Jelly Bean

19 Jun

Sony Xperia S owners are in for another update, courtesy of leftover issues from the Android 4.1 Jelly Bean upgrade that was recently rolled out. If all goes well, the fix should be out over the air within a few weeks.

Thanks to Xperia Blog, which spotted a forum posting by a Sony representative, we now know that some kind of a software update is scheduled to roll out during week 28, which begins on July 8.

“A new software is planned to start rolling out during week 28. Please let me know if any of you still experience any problems after installing the upcoming update,” said Sony Xperia support team member Johan.

The rep’s comments were in response to a question about NFC connectivity, so one can only assume that this will be one of the fixes found in the patch. Not much else is known about the fix.

There is also the possibility the week 28 release time frame applies to the update 6.2.B.0.211, which was just pushed out last week. Update 6.2.B.0.211 is only available in certain regions so far, so the rep may have simply been pointing to a date when it will become available in more regions around the world. Update 6.2.B.0.211 fixed a number a connectivity issues that were found after Android 4.1 Jelly Bean hit the Xperia S at the end of May.

When the new update does start rolling out, the story will be the same as with all OTA upgrades:

“As always when rolling out new software versions. It might not reach your device upon roll out start since it rolls out gradually. I suggest that you use PC Companion, Update Service or Bridge for mac to check for software updates from time to time,” Sony stated when referring to the 6.2.B.0.211 fix.

Let us know if you’ve experienced any issues with your Xperia S, even if you’ve already downloaded 6.2.B.0.211, in the comments section below.

Nicole Eggert HOSPITALIZED After Botched High-Dive

21 Apr

ANOTHER celebrity injury on the set of “Splash” — TMZ has learned Nicole Eggert was hospitalized yesterday after bungling a high-dive and brutally back-flopping into the pool during a taping of the ABC show.

Sources tell us … the “Baywatch” babe was trying to execute a dive that included multiple backflips … but something went wrong, and she went crashing into the water back first … hard.

We’re told the impact looked so painful, on-site EMTs rushed in to pull the 41-year-old out of the pool … and she went to a nearby hospital for treatment.

Sources tell us … Eggert didn’t break any bones, but docs wanted to make sure she didn’t suffer internal injuries from the impact  — they’re monitoring her kidneys in particular.  Eggert was released from the hospital after a few hours.

It ain’t the first time a star has fallen on the show — Chuy Bravo 86’d himself from the competition after fracturing his heel during diving practice … and Katherine Webb bowed out earlier this week thanks to a back injury.

Rory Bushfield had ruptured his eardrum during a dive gone wrong a few weeks ago … but refused to quit the show.

AT&T, Verizon Control of Airwaves Challenged

13 Apr

AT&T Inc. T +0.13% and Verizon Communications Inc. VZ +0.69% shouldn’t be allowed to box out smaller wireless carriers from picking up some of the nation’s prime airwaves, the Justice Department told federal regulators Friday.

As Americans become increasingly dependent on smartphones and the mobile Internet, the Federal Communications Commission is planning to auction some of the airspace used by broadcast television to wireless companies. Those particular airwaves are in the lower frequency range and are ideal for transmitting data over long distances in rural areas or through walls.

A key issue now is which companies will be permitted to buy such valuable property.

AT&T and Verizon currently own much of the beachfront airspace. The two smaller national wireless carriers, Sprint Nextel Corp. S -0.64% and T-Mobile USA, argue they should get a leg up in acquiring these airwaves or larger companies should at least be required to sell some of their holdings.

The Justice Department appeared to side with the smaller carriers Friday in its filing with the FCC. It suggested the FCC should craft rules so that Sprint and T-Mobile have an advantage in picking up the new airwaves.

Since the two smaller national carriers have “virtually” no low-frequency spectrum, “this results in the two smaller nationwide carriers having a somewhat diminished ability to compete, particularly in rural areas,” the Justice’s filing said. The FCC, it added, “can potentially improve the competitive landscape by preventing the leading carriers from foreclosing their rivals from access to low-frequency spectrum.”

“The Justice Department is absolutely right,” Sprint’s vice president of government affairs, Larry Krevor, said in a written statement. “Ensuring that all carriers, large and small, have access to low-band spectrum would improve competition and benefit consumers.”

A Verizon spokesman declined to comment. Representatives for AT&T and T-Mobile didn’t immediately respond to requests for comment Friday afternoon.

AT&T and some lawmakers have argued that blocking AT&T and Verizon from fully participating in the auction would bring in less money for the federal treasury. One goal of the auction is paying for a national public-safety broadband network.

The FCC expects to set rules for its spectrum auction within the next year. The agency also has so far been skittish about barring anyone from participating in the auction.

The Justice Department cautioned the commission against allowing larger carriers from using their bidding power to block rivals from winning spectrum. “In a highly concentrated industry with large margins between the price and incremental cost of existing wireless broadband services, the value of keeping spectrum out of competitors’ hands could be very high,” the Justice filing said.

The department showed its concern about over-concentration in 2011, when it filed suit to block AT&T’s proposed acquisition of T-Mobile USA. The two firms later abandoned the deal.

Meantime, the Justice Department appeared to caution against any further consolidation in the U.S. wireless market. Some speculate that after Sprint, the nation’s No. 3 carrier, wins expected approval of its acquisition by Japan’s Softbank Corp., 9984.TO -1.05% it could attempt a future deal with T-Mobile, the nation’s No. 4 carrier.

“The Department has found that the four largest wireless carriers (AT&T, Verizon, Sprint and T-Mobile) compete across many dimensions,” the Justice Department filing said. It added later, “The Department believes it is essential to maintain vigilance against any lessening of the intensity of competitive forces.”

iPhone 5S to offer multiple screen sizes, analyst says

10 Apr

iPhone 5S buyers could have their choice of screen size, according to Topeka analyst Brian White.

Citing information from a meeting with a “tech-supply chain company,” White said today he believes Apple will unveil the iPhone 5S in at least two or possibly three different screen sizes.

“We believe Apple is coming around to the fact that one size per iPhone release does not work for everyone, and offering consumers an option has the potential to expand the company’s market share,” White said in an investors note released today.

The analyst didn’t specify or even speculate which screen sizes might be available. The iPhone 5 sports a display size of 4 inches, a boost from the 3.5-inch screen found in previous models.

This isn’t the first time White has pitched this prediction. In January, the analyst cited sources who claimed the next iPhone might be offered in different screen sizes as well as different colors.

Let’s play with the assumption that Apple considers three different screen sizes for the next iPhone.

One model would likely adopt a size larger than 4 inches. That could prove tempting to consumers who might otherwise gravitate to larger-screen Android phones. A second model would stick with the current 4-inch display for people who don’t want a change. And a third could go smaller than 4 inches and sell at a lower price.

However many screen sizes Apple offers, White believes the iPhone 5S will debut in July. That forecast echoes the opinion of other analysts eyeing a summer release for the next iPhone.

Piper Jaffray analyst Gene Munster believes the new iPhone will come out in late June, while KGI Securities analyst Ming-Chi Kuo thinks the iPhone 5S will be announced in June and available by July.

White also joins his fellow Apple analysts in anticipating a lower-priced iPhone this year, forecasting a summer launch along with the 5S. And just how low-priced will it be?

“Our research is now indicating that we should not expect the price to dip below $300 and those expecting a $150 to $200 iPhone will be disappointed,” White said. “We have previously discussed an [average selling price] of $250 to $300 for a lower priced iPhone; however, a price tag of $300 to $350 now makes more sense.”

The predicted price range would be for an unlocked, non-subsidized version of the low-cost iPhone targeted to developing markets such as China.

An iPhone in different sizes and colors? A low-cost iPhone? All of these notions sound atypical for a company such as Apple, which tends to move more slowly, surely, and traditionally.

But Apple is facing increasing pressure, both from Android rival Samsung and from investors. The company needs to apply more innovation and offer more choices across its traditional lineup to prove it’s still a competitive force.


One year after launch, Instagram’s Android app makes up nearly half of its users

7 Apr

After only one year, Instagram’s Android app has grown to account for almost half of the photo sharing app’s 100 million users, the company announced today.

To compare, Instagram had 30 million users a year ago, when it launched its Android app. And that was only a few days before Facebook ended up buying the company for a whopping $1 billion.

“Instagram for Android has helped make this community more global than ever,” wrote Philip McAllister, of the Instagram for Android team. “Major events such as Brazil’s Círio de Nazaré festival, the 85th birthday of Thailand’s King Bhumibol, and a streak of severe thunderstorms throughout Malaysia have been captured by Android Instagrammers and shared to global audiences like never before.”

Given Android’s massive market share lead over iOS, it’s only a matter of time until Instagram’s Android users outnumber their iOS siblings. I, for one, will enjoy the iPhone-owning hipster backlash when that happens.


Android to take 58 percent of smartphone apps

6 Apr

Microsoft’s Windows Phone will have a share of slightly less thawithin the year.

ABI Research senior analyst Aapo Markkanen said: “[This means the] most pressing issue for Google is how much of the handset momentum will trickle down to tablets, where Apple is holding the fort remarkably well.”

He noted there is an upside to the Android fragmentation issue since Google can actually benefit from Amazon’s tablet push. The Kindle Fire will add much “critical code mass” to positioning Android as a platform for tablet apps, he explained.

ABI Research’s estimates of smartphone app downloads corroborates with another report by IDC, which stated more smartphones than feature phones will be shipped in 2013 worldwide for the first time. Handset makers would ship about 918.6 million smartphones, making up 50.1 percent of total mobile shipments globally.

IDC attributed the shift largely to emerging economies. It said smartphone demand had been burgeoning in China, Brazil and India, as these economies had grown, creating a larger middle class prepared to buy smartphones.