Differences between revisions 1 and 4 (spanning 3 versions)
Revision 1 as of 2010-04-27 21:17:06
Size: 2151
Editor: WernerScholz
Comment:
Revision 4 as of 2010-05-11 16:47:46
Size: 2307
Editor: WernerScholz
Comment:
Deletions are marked like this. Additions are marked like this.
Line 10: Line 10:
This is not strictly a magpar bug but a problem with MPICH2 (version 1.2 or later), which affects parallel runs on multiple machines. This is not strictly a magpar bug but a problem with MPICH2 (version 1.2 and 1.2.1), which affects parallel runs on multiple machines.
Line 14: Line 14:
 1. Prepare parallel magpar run on Linux cluster using MPICH2 >= 1.2  1. Prepare parallel magpar run on Linux cluster using MPICH2 1.2 or 1.2.1
Line 26: Line 26:
 * Follow the instructions at the bottom of [[https://trac.mcs.anl.gov/projects/mpich2/ticket/963|this page]] in the MPICH2 trac system.  * Use MPICH2 version [[http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.2.1p1/mpich2-1.2.1p1.tar.gz|1.2.1p1]] or later
 * or follow the instructions at the bottom of [[https://trac.mcs.anl.gov/projects/mpich2/ticket/963|this page]] in the MPICH2 trac system.
Line 29: Line 30:
 * or use older MPICH2 version [[http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.0.8p1/mpich2-1.0.8p1.tar.gz|1.0.8p]]  * or use older MPICH2 version [[http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.0.8p1/mpich2-1.0.8p1.tar.gz|1.0.8p1]]
Line 36: Line 37:
 * Status: fixed in MPICH2 revision [[https://trac.mcs.anl.gov/projects/mpich2/changeset/5923|5923]]

Wait for new MPICH2 release, then update magpar's Makefile.libs to default to new (fixed) MPICH2 version.
 * Status: fixed in MPICH2 version [[http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.2.1p1/mpich2-1.2.1p1.tar.gz|1.2.1p1]]: revision [[https://trac.mcs.anl.gov/projects/mpich2/changeset/5923|5923]]
Line 41: Line 40:
Category MagparBugConfirmed CategoryMagparBugFixed

Description

This is not strictly a magpar bug but a problem with MPICH2 (version 1.2 and 1.2.1), which affects parallel runs on multiple machines.

Steps to reproduce

  1. Prepare parallel magpar run on Linux cluster using MPICH2 1.2 or 1.2.1
  2. Start mpd ring spanning several machines using "mpdboot" command
  3. mpdboot command does not return (hangs)

Example and Details

  • MPICH2 trouble ticket 963

  • related MPICH2 trouble ticket 974

Workaround

  • Use MPICH2 version 1.2.1p1 or later

  • or follow the instructions at the bottom of this page in the MPICH2 trac system.

  • or edit/patch mpd.py directly according to this changeset

  • or download this version of mpd.py and use it instead of the mpd.py installed by MPICH2 >=1.2.

  • or use older MPICH2 version 1.0.8p1

Plan

  • Priority: Medium
  • Assigned to: MPICH2 developers
  • Status: fixed in MPICH2 version 1.2.1p1: revision 5923


CategoryMagparBugFixed

MagparWiki: BugTracker/Mpich2MpdbootHangs (last edited 2010-05-11 16:47:46 by WernerScholz)


Copyright (C) Werner Scholz 2010