Differences between revisions 2 and 3
Revision 2 as of 2010-04-27 21:19:14
Size: 2150
Editor: WernerScholz
Comment:
Revision 3 as of 2010-05-05 04:32:25
Size: 2151
Editor: WernerScholz
Comment:
Deletions are marked like this. Additions are marked like this.
Line 29: Line 29:
 * or use older MPICH2 version [[http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.0.8p1/mpich2-1.0.8p1.tar.gz|1.0.8p]]  * or use older MPICH2 version [[http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.0.8p1/mpich2-1.0.8p1.tar.gz|1.0.8p1]]

Description

This is not strictly a magpar bug but a problem with MPICH2 (version 1.2 or later), which affects parallel runs on multiple machines.

Steps to reproduce

  1. Prepare parallel magpar run on Linux cluster using MPICH2 >= 1.2

  2. Start mpd ring spanning several machines using "mpdboot" command
  3. mpdboot command does not return (hangs)

Example and Details

  • MPICH2 trouble ticket 963

  • related MPICH2 trouble ticket 974

Workaround

  • Follow the instructions at the bottom of this page in the MPICH2 trac system.

  • or edit/patch mpd.py directly according to this changeset

  • or download this version of mpd.py and use it instead of the mpd.py installed by MPICH2 >=1.2.

  • or use older MPICH2 version 1.0.8p1

Plan

  • Priority: Medium
  • Assigned to: MPICH2 developers
  • Status: fixed in MPICH2 revision 5923

Wait for new MPICH2 release, then update magpar's Makefile.libs to default to new (fixed) MPICH2 version.


CategoryMagparBugConfirmed

MagparWiki: BugTracker/Mpich2MpdbootHangs (last edited 2010-05-11 16:47:46 by WernerScholz)


Copyright (C) Werner Scholz 2010