Contact Site Institutes Intranet
 
a_matematicas.gif


tercer_nivel_matematicas.gif

mail_mate.gif

   ICbanner.gif

CURRENT FACILITIES

 

Cluster ODISEA is a joint project between CSIC, IMDEA Matemáticas and UAM, coordinated through the SIMUMAT (Mathematic Modelization and Numeric Simulation in Science and Technology) programme of the Madrid Government, rendering service to researchers taking part in the network and their enviroment.

 

ODISEA. Funding.

Stage Funding Institution Purchased Hardware
1st stage SIMUMAT – Madrid Government
  • 8 dual nodes provided with Intel Xeon EMT64 processors, one of them with server properties.
  • 4GbRAM per node and120 Gb hard drives.
  • Infiniband and Gigabit Ethernet node interconnection.
  • Stage balance: 16 processors.
  • Installation and maintenance.
2nd stage CSIC
  • 8 dual nodes providedwith Intel Xeon EMT64 processors.
  • 4Gb of RAM per node and 120 Gb hard drives.
  • Stage balance: 16 processors.
  • Installation and maintenance.
3rd stage CSIC
  • 35000 € approved budget.
  • 50.000 € -> to be approved.
  • Stage balance: 16 dual nodes provided with dual core processors.
  • An additional extension of 8 dual nodes provided with dual core processors is planned.
  • 16GB of RAMper node
  • Installation and maintenance.
Maintenance UAM
  • Air conditioning
  • Global maintenance, including power consumption.

 

ODISEA. Hardware.

Cluster “Odisea” consists of

  • 16 dual nodes provided with mono core processors (32 processors).
    • Intel Xeon EMT64 3,2Ghz FSB 800 processors.
    • 4 Gb of RAM per node.
  • Hard drives
    • Server: 146 Gb SCSI Ultra 320 hard drive.
    • Slave nodes, ATA 250 Gb hard drives.
  • Interconnection node network:
    • Low latency interconnection network: SilverStorm 9024 (Infiniband) 24 @ 10/20Gbps.
    • Gigabit interconnection network for cluster control

ODISEA. Software.

Operative system: Red Hat Enterprise Server 4.0 kernel 2.6.9-11.

  • Fortran77 and C GNU compilers.
  • Fortran77/90/95, Java and C/C INTEL compilers.
  • Phyton compiler, provided withPythonMPI, PythonNumerics and Pythonf2py libraries, to enable the implementation of Python code using MPI.
  • Task parallelization trough Scali Manage / Scali MPI Connect for InfiniBand.
  • ScaTorque as queue manager system.
  • Matlab 7.3
  • R statistical library.
  • INTEL Math Kernel, including LAPACK and BLAS libraries.
  • SPARSEKIT and ARPACK libraries.
  • CENTAUR.

 

ODISEA. Current state.

Cluster ODISEA is today a non pretentious cluster, provided with Intel Xeon EMT 64 a 3,2 Ghz processors(mid-range, at the high end we findIntel ITANIUM2 and the new IBM Power6 families). Each slave node has 2Gb of RAM per processor (4Gb per node), which turns out quite short for simulations in most fields. Even though, ODISEA is an excellent testing ground for different codes and applications and a useful approach to high performance computing.

 

ODISEA. Enlargement.

One of the main deficiencies in ODISEA was the short RAM per node. New nodes are planned to have at least 16 Gb and dual core processors. Mono-core processors are actually disappearing from market, so that new nodes will have dual core processors. There are also the new quad core processors, but their implementation is not efficient enough considering its high price.

Anyway, ODISEA is still quite far from the most powerful supercomputers in the world, as we can check in the next section.

 

ODISEA. World wide comparison with High Performance Computers Centre.

http://www.top500.org/ website is the best reference to find information about supercomputing trends, including data of the 500 most powerful machines in the world. Some relevant statistics are presented here.

Number of Processors Count Share % Rmax Sum (GF) Rpeak Sum (GF) Processor Sum
257-512 36 7.20 % 117710 162157 17836
513-1024 192 38.40 % 716626 1144890 171117
1025-2048 185 37.00 % 865405 1423050 262844
2049-4096 38 7.60 % 372432 584622 98884
4000-8000 19 3.80 % 357877 470662 95140
8000-16000 17 3.40 % 542883 717847 159128

 

Vendors of the main computation clusters in the World:

Vendors Count Share % Rmax Sum (GF) Rpeak Sum (GF) Processor Sum
Cray Inc. 15 3.00 % 288171 357970 65415
Dell 17 3.40 % 237620 341451 39788
IBM 236 47.20 % 1747565 2633891 602658
SGI 20 4.00 % 191687 218295 34992
Sun Microsystems 9 1.80 % 44166 68484 14808
Linux Networx 7 1.40 % 59127 84206 15820
Hewlett-Packard 158 31.60 % 582026 978900 176002

 

Processor Family implemented:

Processor Family Count Share % Rmax Sum (GF) Rpeak Sum (GF) Processor Sum
Power 91 18.20 % 1204808 1611805 416492
PA-RISC 20 4.00 % 63786 119950 30708
Intel IA-32 120 24.00 % 448066 802549 131962
Intel IA-64 35 7.00 % 316934 374798 60862
Intel EM64T 108 21.60 % 602989 1021525 123242
AMD x86_64 113 22.60 % 766661 1118476 230061

 

 

Operative system family:

Operating system Family
Operating system Family Count Share% Rmax Sum(GF) Rpeak Sum(GF) Processor Sum
Linux 376 75.20% 2014910 3195766 516189
Unix 86 17.20% 559636 807423 142104
BSD Based 3 0.60% 47697 53248 5888
Mixed 32 6.40% 872226 1104103 350484
Mac OS 3 0.60% 32989 53008 6296
Totals 500 100% 3527458.35 5213548.18 1020961

 

Studying the data, we can conclude that the average high performance cluster is provided with 500-2000 INTEL or AMD 64 bits processors developedby IBM, Hewlett Packard or Silicon Graphics. IBM are releasing their new POWER6 processors which reach 5Ghz and dissipate heat very efficiently, and therefore they should be taken into account when planning a supercomputing centre.

In what concerns the operative system, Linux, together with Unix, share 95% of supercomputing. Linux guarantees compatibility and code standardization, as well as scalability, and this is the explanation of its supremacy in HPC.

 

FUTURE PERSPECTIVE

 

General Considerations

 

The new IMDEA Institutes, supported by the Madrid Government are considering the creation of a HPC Centre. This Centre should not be considered as the extension of ODISEA, but as a entirely new centre with brand new computers in which ODISEA will be held as one more cluster.The forecast for the next four years consists in the start up of a 1000 processors centre, reaching a mid position in the top500 ranking, with scalability (size, power supply, safety conditions) up to 2000 processors in the future.

Concerning hardware, an in depth study in deep will be needed to buy proper equipment, taking into account the quick evolution of processors and RAM. Nowadays, Itanium 2 processors are the most powerful ones in the market, but IBM Power6 Family evolution is quite interesting, and it represents the future short term competition for Intel, starting to be implemented, for example, in the enlargement of the RZG ("Rechenzentrum Garching" http://www.hpcwire.com/hpc/1236561.html), the HPC Centre of the Max Planck Society in Garching, whose IBM eServer pSeries p5 575 1.9 GHz cluster with 688 processors, is located in the 159th position in the top500 list.

If we consider RAM, each node should be provided with 32 Gb, being available certain 64 Gb nodes, for specially demanding simulations, such as ones carried out in aeronautics.

About the operative system, the experience shows that the high standardization reached by Linux in computing together with its compatibility and scalability justifies its implementation in the computers whatever the vendor is (HP, SGI, IBM,…).

In any case, the purchase of these computers is a critical inversion and should be considered carefully, following compatibility, price, and vendor assistance and experience criteria and studying examples of research centres with experience in HPC to share difficulties and solutions.

 

Technical and structural features for a new HPC Centre.

 

In the following section we gather the technical and structural features needed for a HPC Centre. In general, three different rooms will be required, each one with its particular features:

 

Computing room

Function

Designed to host computing equipment, (racks with computing nodes), back-up servers and switches. It is the HPC Centre core, where IT staff controls administration, support and proper operation of the clusters.

Location

There are several choices to consider:

  • Basement. Advantages: Reduction of vibrations, equipment weight is stand better than in any other floor. Disadvantages: flood risk, limited access, lifts needed, uncomfortable for staff working there.
  • Ground floor Advantages: ease of access, lower flood risk, comfortable to work in. Disadvantages: need of resistant floor structures
  • Mid floor: Advantages: no flood risk, comfortable to work in. Disadvantages: limited access for equipment, critical need of resistant floor structures. Not recommended.

To sum up, when choosing location, accessibility for equipment and floor resistance must be regarded as the critical points. Lifts are necessary when the computing machines are not located on the ground floor. At the same time, ground floors present lower flood risk than basements. These are the reasons why ground floors with suitable accesses are preferred to house HPC. An extra reason for choosing ground floor and not basement is its higher comfort for staff working there.

Size

Starting from representative HPC centre which could be a referent for our future one, we estimate a minimum area of 150 m2 for the room housing the computing equipment. This room must be thermically and acoustically isolated. In addition to this computing room, an administration room of about 50 m2 is required. On the whole, we need 200 m2 of floor area. A possible layout would be an administration room provided with glass walls located in the middle of the computing room, so that staff could directly visualize equipment and facilities. It should be accessed through fire resistant doors.

The computing room should be at least 5m high, to house a 1m fake floor, and a 1,5 m fake ceiling to lodge electric wiring, air conditioning and surveillance and fire control systems.

Laboratorio1.gif

 

Each rack housing the computing nodes has the following rough dimensions:

  • 2 m high
  • 1.196 Kg.
  • 1.524 mm width
  • 1.220 mm deep
  • About 2 m2 area, but it must be taken into account the need of a certain distance between racks and room for the corridors.
  • Each rack hosts 32-64 processors.

 

       

      There are actually the new blade server platforms, which face the complexities of integration and space and power threshold. Customers of all sizes are turning to blades to save space, increase density and decrease power consumption, while lowering total cost and improving infrastructure flexibility. They allow to host about 90 processors whereas a rack is limited to 32-64 in the same area.

      It must not be forgotten that floor structures must be able to bear the weight of racks and blades (around800/m2).

      Electric wiring and power, air conditioning, fire fighting and safety services.

      Fire fighting: Facilities should be equipped with fire fighting systems according to the law. Chemicals used to extinguish fire are usually toxic, and it is an extra reason to isolate our computing room from the rest of the facilities. It is also recommended to install heat sensors to control room temperature to warn staff in case of reaching a dangerous threshold.

      Access to facilities must be under control, so that only working staff is allowed to get there, and surveillance cameras should be installed in order to control activities inside the computing room.

      Air conditioning system: It is crucial to guarantee a proper cooling of the computing room, through floor and ceiling to ensure a temperature between 16 and 18º in the facilities.

       

      Power supply room

      Function

      It is designated to host UPS and power supply equipment in charge to supply power to computing devices and air conditioning system.

      Basic electric configuration for a HPC Centre:

      Laboratorio3.gif

       

      Power supply generators and UPS must be redundant and correctly connected to computing devices to ensure power supply even in cases of external power failure.

      The most suitable location for such a room would be an auxiliary and isolated one floor building of 150m2 area, paying special attention to water and dampness isolation.

      Here there would not be staff rooms, as its special features could damage workers’ health.

      Electric wiring: crucial when designing the room. We need a usual current line to maintain light, heating and staff offices is needed plus a triphasic current line for computing devices and air conditioning. When defining line current we must keep in mind that a 100 A current line can supply power for roughly 80 computing nodes.

      If we plan a 500 computing nodes centre (1000 processors) in four years time, we need a current line of 650 A.

      Taking into account that we would need about 10 suitable air conditioning systems (4500 W), consuming about 100 A, we would reach 750 A, roughly 800 A for a 1000 processor HPC Centre. A 1000 A current line could be installed, and even an extra second 1000 A one for future enlargements of the centre.

      Power supply generator: Used to maintain power supply in case of external power failure. It is usually based on diesel engines. It must generate power with the same current as the external power distribution. When the external power supply is properly working, it is off and electricity advances through a bypass. Considering the dimensions of our planned centre, it could be a good idea to have two power supply generators at our disposal.

      UPS (Uninterrumpible Power Supply): battery in charge to ensure supply for the computing devices. In case of external power failure, and until generators start working, it supplies power through its batteries. It is highly recommended to implement a bypass system in case of UPS failure. We need roughly one UPS for each 100 A.

      1000 A current line: electrical panels and residual-current devices (RCDs)

      Laboratorio4.gif

       

      Finally if we dispose of a 1000 A current line (or even a 400 A one), it is recommended to bifurcate it from the main electrical panel to scale the line for the different enlargements. For example,

       

      Laboratorio5.gif

       

      Air conditioning: A 4500 W cooling system for each two racks is required as well as some extra ones to prevent possible failures. If we dispose of enough current, the air conditioning system should be connected to the UPS. This is because in case of external power failure if the UPS system supplies power to computing devices but not to the air conditioning system, heat could damage our clusters. We can also connect air conditioning to the generator to avoid this kind of situations.

       

      Back-up room

      Function

      It is designated to host back-up devices in fire-resistant facilities.

      Size and location

      It needs about 50 m2, and for safety reasons, it should be located separated from the computing room, even in a different building.

       


        Enter |  Fractals are used with permission from their author Cory Ench  | © 2006-2008. Fundación IMDEA. All rights reserverd  | Legal Notice | Privacy Policy