CURRENT FACILITIES
Cluster ODISEA is a joint project between CSIC, IMDEA Matemáticas and UAM, coordinated through the SIMUMAT (Mathematic Modelization and Numeric Simulation in Science and Technology) programme of the Madrid Government, rendering service to researchers taking part in the network and their enviroment.
ODISEA. Funding.
| Stage |
Funding Institution |
Purchased Hardware |
| 1st stage |
SIMUMAT – Madrid Government |
- 8 dual nodes provided with Intel Xeon EMT64 processors, one of them with server properties.
- 4GbRAM per node and120 Gb hard drives.
- Infiniband and Gigabit Ethernet node interconnection.
- Stage balance: 16 processors.
- Installation and maintenance.
|
| 2nd stage |
CSIC |
- 8 dual nodes providedwith Intel Xeon EMT64 processors.
- 4Gb of RAM per node and 120 Gb hard drives.
- Stage balance: 16 processors.
- Installation and maintenance.
|
| 3rd stage |
CSIC |
- 35000 € approved budget.
- 50.000 € -> to be approved.
- Stage balance: 16 dual nodes provided with dual core processors.
- An additional extension of 8 dual nodes provided with dual core processors is planned.
- 16GB of RAMper node
- Installation and maintenance.
|
| Maintenance |
UAM |
- Air conditioning
- Global maintenance, including power consumption.
|
ODISEA. Hardware.
Cluster “Odisea” consists of
- 16 dual nodes provided with mono core processors (32 processors).
- Intel Xeon EMT64 3,2Ghz FSB 800 processors.
- 4 Gb of RAM per node.
- Hard drives
- Server: 146 Gb SCSI Ultra 320 hard drive.
- Slave nodes, ATA 250 Gb hard drives.
- Interconnection node network:
- Low latency interconnection network: SilverStorm 9024 (Infiniband) 24 @ 10/20Gbps.
- Gigabit interconnection network for cluster control
ODISEA. Software.
Operative system: Red Hat Enterprise Server 4.0 kernel 2.6.9-11.
- Fortran77 and C GNU compilers.
- Fortran77/90/95, Java and C/C INTEL compilers.
- Phyton compiler, provided withPythonMPI, PythonNumerics and Pythonf2py libraries, to enable the implementation of Python code using MPI.
- Task parallelization trough Scali Manage / Scali MPI Connect for InfiniBand.
- ScaTorque as queue manager system.
- Matlab 7.3
- R statistical library.
- INTEL Math Kernel, including LAPACK and BLAS libraries.
- SPARSEKIT and ARPACK libraries.
- CENTAUR.
ODISEA. Current state.
Cluster ODISEA is today a non pretentious cluster, provided with Intel Xeon EMT 64 a 3,2 Ghz processors(mid-range, at the high end we findIntel ITANIUM2 and the new IBM Power6 families). Each slave node has 2Gb of RAM per processor (4Gb per node), which turns out quite short for simulations in most fields. Even though, ODISEA is an excellent testing ground for different codes and applications and a useful approach to high performance computing.
ODISEA. Enlargement.
One of the main deficiencies in ODISEA was the short RAM per node. New nodes are planned to have at least 16 Gb and dual core processors. Mono-core processors are actually disappearing from market, so that new nodes will have dual core processors. There are also the new quad core processors, but their implementation is not efficient enough considering its high price.
Anyway, ODISEA is still quite far from the most powerful supercomputers in the world, as we can check in the next section.
ODISEA. World wide comparison with High Performance Computers Centre.
http://www.top500.org/ website is the best reference to find information about supercomputing trends, including data of the 500 most powerful machines in the world. Some relevant statistics are presented here.
| Number of Processors |
Count |
Share % |
Rmax Sum (GF) |
Rpeak Sum (GF) |
Processor Sum |
| 257-512 |
36 |
7.20 % |
117710 |
162157 |
17836 |
| 513-1024 |
192 |
38.40 % |
716626 |
1144890 |
171117 |
| 1025-2048 |
185 |
37.00 % |
865405 |
1423050 |
262844 |
| 2049-4096 |
38 |
7.60 % |
372432 |
584622 |
98884 |
| 4000-8000 |
19 |
3.80 % |
357877 |
470662 |
95140 |
| 8000-16000 |
17 |
3.40 % |
542883 |
717847 |
159128 |
Vendors of the main computation clusters in the World:
| Vendors |
Count |
Share % |
Rmax Sum (GF) |
Rpeak Sum (GF) |
Processor Sum |
| Cray Inc. |
15 |
3.00 % |
288171 |
357970 |
65415 |
| Dell |
17 |
3.40 % |
237620 |
341451 |
39788 |
| IBM |
236 |
47.20 % |
1747565 |
2633891 |
602658 |
| SGI |
20 |
4.00 % |
191687 |
218295 |
34992 |
| Sun Microsystems |
9 |
1.80 % |
44166 |
68484 |
14808 |
| Linux Networx |
7 |
1.40 % |
59127 |
84206 |
15820 |
| Hewlett-Packard |
158 |
31.60 % |
582026 |
978900 |
176002 |
Processor Family implemented:
| Processor Family |
Count |
Share % |
Rmax Sum (GF) |
Rpeak Sum (GF) |
Processor Sum |
| Power |
91 |
18.20 % |
1204808 |
1611805 |
416492 |
| PA-RISC |
20 |
4.00 % |
63786 |
119950 |
30708 |
| Intel IA-32 |
120 |
24.00 % |
448066 |
802549 |
131962 |
| Intel IA-64 |
35 |
7.00 % |
316934 |
374798 |
60862 |
| Intel EM64T |
108 |
21.60 % |
602989 |
1021525 |
123242 |
| AMD x86_64 |
113 |
22.60 % |
766661 |
1118476 |
230061 |
Operative system family:
Operating system Family
| Operating system Family |
Count |
Share% |
Rmax Sum(GF) |
Rpeak Sum(GF) |
Processor Sum |
| Linux |
376 |
75.20% |
2014910 |
3195766 |
516189 |
| Unix |
86 |
17.20% |
559636 |
807423 |
142104 |
| BSD Based |
3 |
0.60% |
47697 |
53248 |
5888 |
| Mixed |
32 |
6.40% |
872226 |
1104103 |
350484 |
| Mac OS |
3 |
0.60% |
32989 |
53008 |
6296 |
| Totals |
500 |
100% |
3527458.35 |
5213548.18 |
1020961 |
Studying the data, we can conclude that the average high performance cluster is provided with 500-2000 INTEL or AMD 64 bits processors developedby IBM, Hewlett Packard or Silicon Graphics. IBM are releasing their new POWER6 processors which reach 5Ghz and dissipate heat very efficiently, and therefore they should be taken into account when planning a supercomputing centre.
In what concerns the operative system, Linux, together with Unix, share 95% of supercomputing. Linux guarantees compatibility and code standardization, as well as scalability, and this is the explanation of its supremacy in HPC.
FUTURE PERSPECTIVE
General Considerations
The new IMDEA Institutes, supported by the Madrid Government are considering the creation of a HPC Centre. This Centre should not be considered as the extension of ODISEA, but as a entirely new centre with brand new computers in which ODISEA will be held as one more cluster.The forecast for the next four years consists in the start up of a 1000 processors centre, reaching a mid position in the top500 ranking, with scalability (size, power supply, safety conditions) up to 2000 processors in the future.
Concerning hardware, an in depth study in deep will be needed to buy proper equipment, taking into account the quick evolution of processors and RAM. Nowadays, Itanium 2 processors are the most powerful ones in the market, but IBM Power6 Family evolution is quite interesting, and it represents the future short term competition for Intel, starting to be implemented, for example, in the enlargement of the RZG ("Rechenzentrum Garching" http://www.hpcwire.com/hpc/1236561.html), the HPC Centre of the Max Planck Society in Garching, whose IBM eServer pSeries p5 575 1.9 GHz cluster with 688 processors, is located in the 159th position in the top500 list.
If we consider RAM, each node should be provided with 32 Gb, being available certain 64 Gb nodes, for specially demanding simulations, such as ones carried out in aeronautics.
About the operative system, the experience shows that the high standardization reached by Linux in computing together with its compatibility and scalability justifies its implementation in the computers whatever the vendor is (HP, SGI, IBM,…).
In any case, the purchase of these computers is a critical inversion and should be considered carefully, following compatibility, price, and vendor assistance and experience criteria and studying examples of research centres with experience in HPC to share difficulties and solutions.
Technical and structural features for a new HPC Centre.
In the following section we gather the technical and structural features needed for a HPC Centre. In general, three different rooms will be required, each one with its particular features:
Computing room
Function
Designed to host computing equipment, (racks with computing nodes), back-up servers and switches. It is the HPC Centre core, where IT staff controls administration, support and proper operation of the clusters.
Location
There are several choices to consider:
- Basement. Advantages: Reduction of vibrations, equipment weight is stand better than in any other floor. Disadvantages: flood risk, limited access, lifts needed, uncomfortable for staff working there.
- Ground floor Advantages: ease of access, lower flood risk, comfortable to work in. Disadvantages: need of resistant floor structures
- Mid floor: Advantages: no flood risk, comfortable to work in. Disadvantages: limited access for equipment, critical need of resistant floor structures. Not recommended.
To sum up, when choosing location, accessibility for equipment and floor resistance must be regarded as the critical points. Lifts are necessary when the computing machines are not located on the ground floor. At the same time, ground floors present lower flood risk than basements. These are the reasons why ground floors with suitable accesses are preferred to house HPC. An extra reason for choosing ground floor and not basement is its higher comfort for staff working there.
Size
Starting from representative HPC centre which could be a referent for our future one, we estimate a minimum area of 150 m2 for the room housing the computing equipment. This room must be thermically and acoustically isolated. In addition to this computing room, an administration room of about 50 m2 is required. On the whole, we need 200 m2 of floor area. A possible layout would be an administration room provided with glass walls located in the middle of the computing room, so that staff could directly visualize equipment and facilities. It should be accessed through fire resistant doors.
The computing room should be at least 5m high, to house a 1m fake floor, and a 1,5 m fake ceiling to lodge electric wiring, air conditioning and surveillance and fire control systems.

Each rack housing the computing nodes has the following rough dimensions:
- 2 m high
- 1.196 Kg.
- 1.524 mm width
- 1.220 mm deep
- About 2 m2 area, but it must be taken into account the need of a certain distance between racks and room for the corridors.
- Each rack hosts 32-64 processors.

There are actually the new blade server platforms, which face the complexities of integration and space and power threshold. Customers of all sizes are turning to blades to save space, increase density and decrease power consumption, while lowering total cost and improving infrastructure flexibility. They allow to host about 90 processors whereas a rack is limited to 32-64 in the same area.
It must not be forgotten that floor structures must be able to bear the weight of racks and blades (around800/m2).
Electric wiring and power, air conditioning, fire fighting and safety services.
Fire fighting: Facilities should be equipped with fire fighting systems according to the law. Chemicals used to extinguish fire are usually toxic, and it is an extra reason to isolate our computing room from the rest of the facilities. It is also recommended to install heat sensors to control room temperature to warn staff in case of reaching a dangerous threshold.
Access to facilities must be under control, so that only working staff is allowed to get there, and surveillance cameras should be installed in order to control activities inside the computing room.
Air conditioning system: It is crucial to guarantee a proper cooling of the computing room, through floor and ceiling to ensure a temperature between 16 and 18º in the facilities.
Power supply room
Function
It is designated to host UPS and power supply equipment in charge to supply power to computing devices and air conditioning system.
Basic electric configuration for a HPC Centre:

Power supply generators and UPS must be redundant and correctly connected to computing devices to ensure power supply even in cases of external power failure.
The most suitable location for such a room would be an auxiliary and isolated one floor building of 150m2 area, paying special attention to water and dampness isolation.
Here there would not be staff rooms, as its special features could damage workers’ health.
Electric wiring: crucial when designing the room. We need a usual current line to maintain light, heating and staff offices is needed plus a triphasic current line for computing devices and air conditioning. When defining line current we must keep in mind that a 100 A current line can supply power for roughly 80 computing nodes.
If we plan a 500 computing nodes centre (1000 processors) in four years time, we need a current line of 650 A.
Taking into account that we would need about 10 suitable air conditioning systems (4500 W), consuming about 100 A, we would reach 750 A, roughly 800 A for a 1000 processor HPC Centre. A 1000 A current line could be installed, and even an extra second 1000 A one for future enlargements of the centre.
Power supply generator: Used to maintain power supply in case of external power failure. It is usually based on diesel engines. It must generate power with the same current as the external power distribution. When the external power supply is properly working, it is off and electricity advances through a bypass. Considering the dimensions of our planned centre, it could be a good idea to have two power supply generators at our disposal.
UPS (Uninterrumpible Power Supply): battery in charge to ensure supply for the computing devices. In case of external power failure, and until generators start working, it supplies power through its batteries. It is highly recommended to implement a bypass system in case of UPS failure. We need roughly one UPS for each 100 A.
1000 A current line: electrical panels and residual-current devices (RCDs)

Finally if we dispose of a 1000 A current line (or even a 400 A one), it is recommended to bifurcate it from the main electrical panel to scale the line for the different enlargements. For example,

Air conditioning: A 4500 W cooling system for each two racks is required as well as some extra ones to prevent possible failures. If we dispose of enough current, the air conditioning system should be connected to the UPS. This is because in case of external power failure if the UPS system supplies power to computing devices but not to the air conditioning system, heat could damage our clusters. We can also connect air conditioning to the generator to avoid this kind of situations.
Back-up room
Function
It is designated to host back-up devices in fire-resistant facilities.
Size and location
It needs about 50 m2, and for safety reasons, it should be located separated from the computing room, even in a different building.