Differences between revisions 1 and 2
Revision 1 as of 2010-04-27 21:17:06
Size: 2151
Editor: WernerScholz
Comment:
Revision 2 as of 2010-04-27 21:19:14
Size: 2150
Editor: WernerScholz
Comment:
Deletions are marked like this. Additions are marked like this.
Line 41: Line 41:
Category MagparBugConfirmed CategoryMagparBugConfirmed

Description

This is not strictly a magpar bug but a problem with MPICH2 (version 1.2 or later), which affects parallel runs on multiple machines.

Steps to reproduce

  1. Prepare parallel magpar run on Linux cluster using MPICH2 >= 1.2

  2. Start mpd ring spanning several machines using "mpdboot" command
  3. mpdboot command does not return (hangs)

Example and Details

  • MPICH2 trouble ticket 963

  • related MPICH2 trouble ticket 974

Workaround

  • Follow the instructions at the bottom of this page in the MPICH2 trac system.

  • or edit/patch mpd.py directly according to this changeset

  • or download this version of mpd.py and use it instead of the mpd.py installed by MPICH2 >=1.2.

  • or use older MPICH2 version 1.0.8p

Plan

  • Priority: Medium
  • Assigned to: MPICH2 developers
  • Status: fixed in MPICH2 revision 5923

Wait for new MPICH2 release, then update magpar's Makefile.libs to default to new (fixed) MPICH2 version.


CategoryMagparBugConfirmed

MagparWiki: BugTracker/Mpich2MpdbootHangs (last edited 2010-05-11 16:47:46 by WernerScholz)


Copyright (C) Werner Scholz 2010