Standard

Communication lower bounds for distributed-memory matrix multiplication. / Irony, Dror; Toledo, Sivan; Tiskin, Alexander.

In: Journal of Parallel and Distributed Computing, Vol. 64, No. 9, 01.01.2004, p. 1017-1026.

Research output: Contribution to journalArticlepeer-review

Harvard

Irony, D, Toledo, S & Tiskin, A 2004, 'Communication lower bounds for distributed-memory matrix multiplication', Journal of Parallel and Distributed Computing, vol. 64, no. 9, pp. 1017-1026. https://doi.org/10.1016/j.jpdc.2004.03.021

APA

Irony, D., Toledo, S., & Tiskin, A. (2004). Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing, 64(9), 1017-1026. https://doi.org/10.1016/j.jpdc.2004.03.021

Vancouver

Irony D, Toledo S, Tiskin A. Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing. 2004 Jan 1;64(9):1017-1026. https://doi.org/10.1016/j.jpdc.2004.03.021

Author

Irony, Dror ; Toledo, Sivan ; Tiskin, Alexander. / Communication lower bounds for distributed-memory matrix multiplication. In: Journal of Parallel and Distributed Computing. 2004 ; Vol. 64, No. 9. pp. 1017-1026.

BibTeX

@article{1a43d88caa7e4b0e8cac18600486ad11,
title = "Communication lower bounds for distributed-memory matrix multiplication",
abstract = "We present lower bounds on the amount of communication that matrix multiplication algorithms must perform on a distributed-memory parallel computer. We denote the number of processors by P and the dimension of square matrices by n. We show that the most widely used class of algorithms, the so-called two-dimensional (2D) algorithms, are optimal, in the sense that in any algorithm that only uses O(n2 / P) words of memory per processor, at least one processor must send or receive Ω(n2 / P 1/2) words. We also show that algorithms from another class, the so-called three-dimensional (3D) algorithms, are also optimal. These algorithms use replication to reduce communication. We show that in any algorithm that uses O(n2 / P2/3) words of memory per processor, at least one processor must send or receive Ω(n2 / P2/3) words. Furthermore, we show a continuous tradeoff between the size of local memories and the amount of communication that must be performed. The 2D and 3D bounds are essentially instantiations of this tradeoff. We also show that if the input is distributed across the local memories of multiple nodes without replication, then Ω(n2) words must cross any bisection cut of the machine. All our bounds apply only to conventional O(n3) algorithms. They do not apply to Strassen's algorithm or other o(n3) algorithms. {\textcopyright} 2004 Elsevier Inc. All rights reserved.",
keywords = "Communication, Distributed memory, Lower bounds, Matrix multiplication",
author = "Dror Irony and Sivan Toledo and Alexander Tiskin",
year = "2004",
month = jan,
day = "1",
doi = "10.1016/j.jpdc.2004.03.021",
language = "English",
volume = "64",
pages = "1017--1026",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Elsevier",
number = "9",

}

RIS

TY - JOUR

T1 - Communication lower bounds for distributed-memory matrix multiplication

AU - Irony, Dror

AU - Toledo, Sivan

AU - Tiskin, Alexander

PY - 2004/1/1

Y1 - 2004/1/1

N2 - We present lower bounds on the amount of communication that matrix multiplication algorithms must perform on a distributed-memory parallel computer. We denote the number of processors by P and the dimension of square matrices by n. We show that the most widely used class of algorithms, the so-called two-dimensional (2D) algorithms, are optimal, in the sense that in any algorithm that only uses O(n2 / P) words of memory per processor, at least one processor must send or receive Ω(n2 / P 1/2) words. We also show that algorithms from another class, the so-called three-dimensional (3D) algorithms, are also optimal. These algorithms use replication to reduce communication. We show that in any algorithm that uses O(n2 / P2/3) words of memory per processor, at least one processor must send or receive Ω(n2 / P2/3) words. Furthermore, we show a continuous tradeoff between the size of local memories and the amount of communication that must be performed. The 2D and 3D bounds are essentially instantiations of this tradeoff. We also show that if the input is distributed across the local memories of multiple nodes without replication, then Ω(n2) words must cross any bisection cut of the machine. All our bounds apply only to conventional O(n3) algorithms. They do not apply to Strassen's algorithm or other o(n3) algorithms. © 2004 Elsevier Inc. All rights reserved.

AB - We present lower bounds on the amount of communication that matrix multiplication algorithms must perform on a distributed-memory parallel computer. We denote the number of processors by P and the dimension of square matrices by n. We show that the most widely used class of algorithms, the so-called two-dimensional (2D) algorithms, are optimal, in the sense that in any algorithm that only uses O(n2 / P) words of memory per processor, at least one processor must send or receive Ω(n2 / P 1/2) words. We also show that algorithms from another class, the so-called three-dimensional (3D) algorithms, are also optimal. These algorithms use replication to reduce communication. We show that in any algorithm that uses O(n2 / P2/3) words of memory per processor, at least one processor must send or receive Ω(n2 / P2/3) words. Furthermore, we show a continuous tradeoff between the size of local memories and the amount of communication that must be performed. The 2D and 3D bounds are essentially instantiations of this tradeoff. We also show that if the input is distributed across the local memories of multiple nodes without replication, then Ω(n2) words must cross any bisection cut of the machine. All our bounds apply only to conventional O(n3) algorithms. They do not apply to Strassen's algorithm or other o(n3) algorithms. © 2004 Elsevier Inc. All rights reserved.

KW - Communication

KW - Distributed memory

KW - Lower bounds

KW - Matrix multiplication

UR - http://www.scopus.com/inward/record.url?scp=10844258198&partnerID=8YFLogxK

U2 - 10.1016/j.jpdc.2004.03.021

DO - 10.1016/j.jpdc.2004.03.021

M3 - Article

AN - SCOPUS:10844258198

VL - 64

SP - 1017

EP - 1026

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

IS - 9

ER -

ID: 127712276