Hadoop Hardware Infrastructure and Big Data

With Hortonworks filing an IPO and Cloudera reportedly doing more than $100m in sales, it’s tempting to think that Hadoop has already gone mainstream. Facebook has claimed to have the largest Hadoop cluster and half of the Fortune 500 are already using Hadoop.

However, deciding on the hardware required to run their Hadoop cluster remains complex for most companies. As a Fujitsu Select Expert Partner, DEITG hold certifications in Server, Storage and Infrastructure Solutions. Our engineers have put together a number of guide specifications as to what Processors, Ram and Storage is required to run your Hadoop solution.

Follow the links below to find the solution that fits your specifications for your data processing requirements.

Remember, these are only guide specs and quotes. For an accurate quote for your specific Hadoop hardware solution, please answer as many questions as you can on the quote request form below and we can calculate how much storage is required and the cost of the hardware needed to run your Hadoop Cluster.

Selecting Hadoop Server Architecture

  • Aim to get your Master Nodes right from the start, adding them after can be disruptive.
  • Worker Nodes can be added easily as the data storage requirement grows.
  • RAW Data should be multiplied by x 3.
  • Plan for 110% of the raw data requirement.
  • More Memory in the Master Node.
  • More Storage in the Worker Node.
  • Redundancy on the Masters is important as failures are disruptive, so SAS drives, Dual PSU’s, 3 year Warranty etc. are recommended, budget permitting of course.
  • Lower Redundancy on Worker Nodes as failed servers can be replaced easily.
50TB Big Data Cluster

50TB Data Cluster

100TB Big Data Cluster

100TB Data Cluster

250TB Big Data Cluster

250TB Data Cluster

800TB Big Data Cluster

800TB Data Cluster


About Hadoop Big Data

Developed by Google to handle the vast amount of data they were generating, it’s based on an algorithm that allows data to be chopped into smaller chunks and distributed through the cluster. Then mapped through many computers for calculation and brought back together to produce the data set. This algorithm is called MapReduce which was used to develop an open source project called Hadoop. It allows applications to run using the MapReduce algorithm, effectively processing data in parallel rather than serial.