feb 06

Big data is among the hottest trends in IT right now, and Hadoop stands front and center in the discussion of how to implement a big data strategy. There’s just one problem that keeps cropping up: many people don’t seem to know exactly what it means when somebody says “Hadoop.”

The problem surfaced again Monday in the form of complaints over Forrester’s new report titled “Enterprise Hadoop Solution, Q1 2012.” InformationWeek spoke with a few vendors that didn’t like how their products were assessed, and database industry analyst Curt Monash says the report “compares apples, peaches, almonds, and peanuts.” I thought the same thing when I saw a copy of the report last week. They all focus on Hadoop, but Hortonworks is not Datameer is not HStreaming.

Allow me to explain. Hopefully, this provides a foundation for parsing what people talk about when they talk about Hadoop, and for differentiating one type of product from another.

What Hadoop is

I went into this in more detail in a GigaOM Pro report published last March (sub req’d), but the long and short is that Hadoop is, at its core, an Apache Software Foundation project consisting of two primary subprojects — Hadoop MapReduce and the Hadoop Distributed File System. MapReduce is the parallel-processing engine that allows Hadoop to churn through large data sets in relatively short order. HDFS is the distributed file system that lets Hadoop scale across commodity servers and, importantly, store data on the compute nodes in order to boost performance (and potentially save money). These are the two must-have components for any Hadoop distribution.

There are also a number of Apache projects related to Hadoop, often built atop either Hadoop MapReduce or HDFS. These include — but are not limited to — Hive and Pig, two SQL-like query languages to provide data-warehouse-like capabilities to a Hadoop cluster, and HBase, a NoSQL database that leverages HDFS as its distributed storage engine.

Hadoop distributions

These are packaged software products that aim to ease deployment and management of Hadoop clusters compared with simply downloading the various Apache code bases and trying to cobble together a system. Presently, Cloudera, Hortonworks, MapR and EMC  all offer their own Hadoop distributions. Although they’re all unique — sometimes very unique, as with MapR’s proprietary file system — they all package a set of Hadoop projects (MapReduce, Hive, Sqoop, Pig, etc.) in a way that in theory makes them integrate more naturally, and to run both smoothly and securely.

Many Hadoop distributions integrate with various data warehouses, databases and other data-management products, with the goal of moving data between Hadoop clusters and other environments so each might process or query data stored in the other.

Hadoop management software

Just as the wording implies, Hadoop management software is designed to make it easier to manage and troubleshoot a Hadoop cluster. Such products are usually sold or offered by companies peddling Hadoop distributions, because even when commercially packaged, Hadoop is still a complex architecture and somewhat foreign to most IT personnel and products. However, third parties such as Platform Computing (now part of IBM) and Zettaset also sell software for managing Hadoop clusters, and their products are typically agnostic as to what distributions they support.

But distributions and management software are all about the infrastructure and the platform. Anyone actually wanting to use Hadoop still needs to know how to write applications that leverage the underlying architecture.

Hadoop application software (or, products that use Hadoop)

The Hadoop ecosystem gets really complex when we start looking at products that exist to help developers write Hadoop applications or otherwise analyze data stored within Hadoop in a manner other than writing traditional MapReduce jobs. These range from abstraction layers such as Karmasphere Analyst or IBM Infosphere BigInsights, to Hadapt, which offers a single-platform product fusing a SQL data warehouse with a Hadoop cluster, to HStreaming, which promises real-time processing and analytics.

The one common thing among all these products, however, is that they are not Hadoop distributions, but sit atop platform software from Hortonworks, EMC or whomever. Some products that get thrown into the Hadoop fray, such as Outerthought Lily or Drawn to Scale Spire, are essentially scale-out databases built atop HBase (which itself is a separate project built atop HDFS). The image below, from Karmasphere, gives a particularly clear map of how a Hadoop environment might look.

The applications and analytics space is probably where we’ll see the biggest influx of new companies, as writing Hadoop applications is still tough, but it’s also how companies will actually start experiencing direct business benefits. In fact, it’s these type of higher-level products that are the focal point of Accel Partners’ new big data fund.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.


Tagged with:
feb 06

Usare il credito telefonico è ormai un desiderio a lungo (mal) sopito in molti utenti Android, ed in tal senso avevamo visto qualcosa muoversi con l’operatore Vodafone circa un paio di mesi fa. Adesso vi proponiamo però una guida completa per abilitare gli agognati pagamenti, che dovrebbe funzionare su tutti i dispositivi (non è ancora stata testata così a fondo da poterlo affermare con certezza) a patto che abbiano i permessi di root.

(Continua...)
Leggi il resto di Come pagare con il credito telefonico Vodafone (solo root) (67 words)


© Nicola Ligas for AndroidWorld.it, 2012. | Permalink | Nessun commento | Add to del.icio.us
Post tags: , , , ,

Tagged with:
feb 06
Lars Torben Kremer has announced the availability of the release candidate for Snowlinux 2, a desktop distribution based on Debian's stable branch and featuring the GNOME 2 desktop: "The team is proud to announce the release of Snowlinux 2, code name 'Ice'. New features: improved installer (keyboard variants,....


feb 06

CPU-G
Spesso può capire di non ricordarsi il modello della nostra scheda madre o magari della RAM. CPU-G ci verrà in aiuto in queste situazioni, si tratta di un software in grado di interrogare il nostro sistema in modo da riuscire ad ottere delle informazioni sul nostro hardware.

Per installarlo su Ubuntu il procedimento è molto semplice:

wget http://sourceforge.net/projects/cpug/files/cpu-g-0.9.0.tar.gz/download

tar -xvzf cpu-g-0.9.0.tar.gz

cd cpu-g-0.9.0

python cpu-g

Su Arch Linux è disponibile il pacchetto su AUR.

Il programma è scritto in python quindi per eseguirlo su Ubuntu o su altre distro avrete bisogno delle libreri python, dunque assicuratevi di installare questi pacchetti:

sudo apt-get install python-gtk2 python-glade2

Tagged with: