CVA Publications

This is a partial list of the publications by the Stanford Concurrent VLSI Architecture group organized by project. Click on the Section of the page you would like to browse.

Smart Memories

Papers | Theses

High Speed Signalling

Papers | Theses

Imagine Publications

Efficient Deep Learning Publications

EIE: Efficient Inference Engine on Compressed Deep Neural Network
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, William J. Dally
International Symposium on Computer Architecture (ISCA), June 2016; Hotchips, Aug 2016.

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han, Huizi Mao, William J. Dally
NIPS Deep Learning Symposium, December 2015.
International Conference on Learning Representations (ICLR), May 2016, Best Paper Award.

Learning both Weights and Connections for Efficient Neural Networks
Song Han, Jeff Pool, John Tran, William J. Dally
Advances in Neural Information Processing Systems (NIPS), December 2015.

DSD: Dense-Sparse-Dense Training for Deep Neural Networks
Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Shijian Tang, Erich Elsen, Bryan Catanzaro, John Tran, William J. Dally
International Conference on Learning Representations (ICLR), April 2017.

SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and < 0.5MB Model Size
Forrest Iandola, Song Han, Matthew Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer
arXiv 2016.

Distributed Systems Publications

Distributed Systems Technical Reports

Nic McDonald and William J. Dally. “Network Management Unit (NMU): A Network Interface Architecture for Job-Level Protection Domains” CVA Technical Report 133, October 2013.
Nic McDonald and William J. Dally. “Sikker: A Distributed System Architecture for Secure High Performance Computing” CVA Technical Report 137, August 2015.
Nic McDonald and William J. Dally. “Sikker: A High-Perfomance Distributed System Architecture for Secure Service-Oriented Computing” CVA Technical Report 138, March 2016.

Distributed Systems Technical Theses

Nic McDonald. “High-Performance Service-Oriented Computing” Stanford University Ph.D. Thesis, 2016.

Interconnection Publications

Interconnection Networks Papers

John Kim, James Balfour, and William J. Dally. “Flattened Butterfly Topology for On-Chip Networks” International Symposium on Microarchitecture (MICRO) Dec. 2007 Chicago,IL
John Kim, William J. Dally, and Dennis Abts. “Flattened Butterfly: Cost-Efficient Topology for High-Radix Networks” Proceedings. 33rd Annual International Symposium on Computer Architecture(ISCA), June 2007.
John Kim, William J. Dally, and Dennis Abts. “Adaptive Routing in High-Radix Clos Networks” Supercomputing 2006, November 2006, Tampa, Florida, USA
James Balfour and William J. Dally, "Design Tradeoffs for Tiled CMP On-Chip Networks", Proceedings. 20th ACM International Conference on Supercomputing (ICS), June 2006.
Steve Scott, Dennis Abts, John Kim, and William J. Dally, "The BlackWidow High-Radix Clos Network", Proceedings 33rd Annual International Symposium on Computer Architecture(ISCA), June 2006.
Gupta, Amit and Dally, William J., "Topology Optimization of Interconnection Networks", Computer Architecture Letters, Volume 4, July 2005.
Kim, John, Dally, William J., Towles, Brian, and Gupta, Amit, "Microarchitecture of a High-Radix Router", Proceedings. 32th Annual International Symposium on Computer Architecture(ISCA), June 2005, pp. 420-431.
Arjun Singh and William J. Dally, "Buffer and Delay Bounds in High Radix Interconnection Networks", Computer Architecture Letters, Volume 3, December, 2004.
Arjun Singh, William J. Dally, Amit K. Gupta and Brian Towles,"Adaptive Channel Queue Routing on k-ary n-cubes", ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), Barcelona, Spain, June, 2004.
Arjun Singh, William J. Dally, Brian Towles and Amit K. Gupta,"Globally Adaptive Load-Balanced Routing on Tori", Computer Architecture Letters, Volume 3, March, 2004.
Towles, B., and Dally, W.J. "Guaranteed scheduling for switches with configuration overhead," IEEE/ACM Transactions on Networking, 11(5), pp. 835-847, October, 2003.
Singh, A., Dally, W.J., Gupta, A.K., and Towles, B., GOAL: a load-balanced adaptive routing algorithm for Torus networks, Proceedings. 30th Annual International Symposium on Computer Architecture, June 2003, pp. 194- 205.
Towles, Brian, Dally, William J., Boyd, Stephen P., "Throughput-Centric Routing Algorithm Design," ACM Symposium on Parallel Algorithms and Architectures (SPAA), San Diego, CA, June, 2003.
Gupta, Amit K., Dally, William J., Singh, Arjun, and Towles, Brian, "Scalable Opto-Electronic Network (SOEnet)," 10th Symposium on High Performance Interconnects (HotI X), Stanford, CA, August, 2002.
Singh, Arjun, Dally, William J., Towles, Brian, and Gupta, Amit K., " Locality-Preserving Randomized Oblivious Routing on Torus Networks," ACM Symposium on Parallel Algorithms and Architectures (SPAA), Winnipeg, Manitoba, Canada, August, 2002.
Towles, Brian and Dally, William J., "Worst-case Traffic for Oblivious Routing Functions," ACM Symposium on Parallel Algorithms and Architectures (SPAA), Winnipeg, Manitoba, Canada, August, 2002.
Towles, Brian and Dally, William J., "Worst-case Traffic for Oblivious Routing Functions", Computer Architecture Letters, Vol. 1, Feb. 2002.
Towles, Brian and Dally, William J., "Guaranteed Scheduling for Switches with Configuration Overhead", accepted to INFOCOM 2002, New York, NY, June 2002.
Dally, William J., and Towles, Brian, "Route Packets, Not Wires: On-Chip Interconnection Networks," in Proceedings of the 38th Design Automation Conference, Las Vegas, NV, June 2001.
Peh, Li-Shiuan and Dally, William J., "A Delay Model for Router Micro-architectures," IEEE Micro, vol. 21, issue 1, pp. 26-34, Jan/Feb 2001.
Peh, Li-Shiuan and Dally, William J., "A Delay Model and Speculative Architecture for Pipelined Routers," in the Proceedings of the 7th International Symposium on High-Performance Computer Architecture, pp.255-266, Monterrey, Mexico, Jan. 2001
Peh, Li-Shiuan and Dally, William J., "A Delay Model for Router Micro-Architectures," Proceedings of Hot Interconnects 8, Stanford, CA, August 2000.
Peh, Li-Shiuan and Dally, William J., "Flit-Reservation Flow Control," in Proceedings of the 6th International Symposium on High-Performance Computer Architecture, pp. 73-84, Toulouse, France, Jan. 1999.
"Deadlock-free Adaptive Routing in Multicomputer Networks Using Virtual Channels," IEEE Trans. on Parallel and Distributed Systems, 4(4):466-475, April 1993.
Dally, William J., "Virtual-channel Flow Control," IEEE Trans. on Parallel and Distributed Systems, 3(2):194-205, March 1992.
Dally, William J., "Express Cube: Improving the Performance of k-ary n-cube Interconnection Networks," IEEE Trans. on Computers, C-40(9):1016-1023, Sept. 1991.
Dally, William J., "Performance Analysis of k-ary n-cube Interconnection Networks," IEEE Trans. on Computers, C-39(6):775-785, June 1990.
Dally, William J., "Virtual-channel Flow Control," in Proceedings of the 17th Int. Symp. on Computer Architecture, ACM SIGARCH vol. 18, no. 2, May 1990, pp. 60-68.
Dally, William J., "Wire-efficient VLSI Multiprocessor Communication Networks," in Proceedings of the 1987 Stanford Conference on Advanced Research in VLSI, pp. 391-415, Stanford, CA, 1987.
Dally, William J. and Seitz, Charles L., "Deadlock Free Message Routing in Multiprocessor Interconnection Networks," IEEE Trans. on Computers, C-36(5):547-553, May, 1987.
Dally, William J. and Seitz, Charles L., "The Torus Routing Chip," Journal of Parallel and Distributed Computing, 1(3):187-196, 1986.

Interconnection Networks Theses

Towles, Brian. Distributed Router Fabrics. Stanford University Ph.D. Thesis, 2005.
Singh, Arjun. Load-Balanced Routing in Interconnection Networks. Stanford University Ph.D. Thesis, 2005.
Peh, Li-Shiuan. Flow Control and Micro-Architectural Mechanisms for Extending the Performance of Interconnection Networks. Stanford University Ph.D. Thesis, August 2001.

Networks Technical Reports and Memos

George Michelogiannakis, James Balfour, William J. Dally, "Elastic Buffer Networks-on-Chip" August 2008.
Arjun Singh and William J. Dally, "Delay and Buffer Bounds in High Radix Interconnection Networks," Septempber 2004.
Arjun Singh, William J. Dally, Brian Towles, and Amit K. Gupta."Globally Adaptive Load-Balanced Routing on k-ary n-cubes,"Concurrent VLSI Architecture Technical Report, Feb 2004.
Towles, Brian, "Finding Worst-case Permutations for Oblivious Routing Algorithms," Concurrent VLSI Architecture Technical Report 121, December 2001.

Smart Memories Publications

Smart Memories Papers

K. Mai, T. Paaske, N. Jayasena, R. Ho, W. Dally, M. Horowitz. "Smart Memories: A Modular Reconfigurable Architecture". Proceedings of the 27th International Symposium on Computer Architecture, June 2000, Vancouver, Canada.

Smart Memories Theses

Harris, Sarah. Synergistic Caching in Single-Chip Multiprocessors. Stanford University Ph.D. Thesis, 2005.
Shaw, Kelly. Resource Management in Single-Chip Multiprocessors Stanford University Ph.D. Thesis, 2005.

High Speed Signalling Publications

High Speed Signalling Papers

Patrick Chiang, William J. Dally, Ming-Ju Edward Lee, Ramesh Senthinathan, Yangjin Oh, and Mark Horowitz. A 20Gb/s 0.13um CMOS Serial Link Transmitter Using an LC-PLL to Directly Drive the Output Multiplexer. IEEE Journal of Solid State Circuits, Vol. 4, No. 4, April 2005.
Patrick Chiang, William J. Dally, Ming-Ju Edward Lee, Ramesh Senthinathan, Yangjin Oh, and Mark Horowitz. A 20Gb/s 0.13um CMOS Serial Link Transmitter Using an LC-PLL to Directly Drive the Output Multiplexer. IEEE Symposium on VLSI Circuits, Honolulu, Hawaii, June 15-19, 2004.
Patrick Chiang, William J. Dally, Ming-Ju E. Lee. A 20Gb\ /s 0.13um CMOS Serial Link, IEEE Hotchips 2002, Stanford, CA, Aug. 18-20, 2002.
M.-J. Edward Lee, William J. Dally, Ramin Farjad-Rad, Hiok-Tiaq Ng, Ramesh Senthinathan, John H. Edmondson, John Poulton: CMOS High-Speed I/Os - Present and Future, Proceedings of the IEEE International Conference on Computer Design, 2003, pp. 454-461.
Ng, H-T, Lee, M.-J., Farjad-Rad, R., Senthinathan, R.,Dally, W., Nguyen, A., Rathi, R., Greer, T., Poulton, J., Edmondson, J., and Tran, J., "A 33mW 622 Mb/s-8Gb/s CMOS CDR for Highly Integrated I/Os," 2003 IEEE Symposium on VLSI Circuits, June 2003.
Lee, M.-J. Edward, Dally, William J., Greer, Trey, Ng, Hiok-Tiaq, Farjad-Rad, Ramin, Poulton, John, Senthinathan, Ramesh. Jitter transfer characteristics of delay-locked loops theories and design techniques. IEEE Journal of Solid-State Circuits, 38(4), pp. 614-621, April, 2003.
M.-J Edward Lee, William J. Dally, John Poulton, Trey Greer, John Edmondson, Ramin Farjad-Rad, Hiok-Tiaq Ng, Rohit Rathi, Ramesh Senthinathan., A Second-Order Semi-Digital Clock Recovery Circuit Based on Injection Locking, 2003 IEEE International Solid State Circuits Conference Digest of Technical Papers, San Francisco, CA, February, 2003.
Lee, M.-J. Edward, Dally, William J., Greer, Trey, Ng, Hiok-Tiaq, Farjad-Rad, Ramin, Poulton, John, Senthinathan, Ramesh. A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips. IEEE Journal of Solid-State Circuits, 37(12), pp. 1804-1812, December, 2002.
Ming-Ju Edward Lee, William J. Dally, John w. Poulton, Patrick Chiang, Stephen F. Greenwood. An 84-mW 4Gb/s Clock and Data Recovery Circuit for Serial Link Applications. VLSI Circuits Symposium, Kyoto, Japan, June 2001.
Ming-Ju Edward Lee, William Dally, Patrick Chiang. Low-Power Area-Efficient High-Speed I/O Circuit Techniques. IEEE Journal of Solid-State Circuits, November 2000, Vol. 35, No. 11, pp. 1591-1599.
Ming-Ju Edward Lee, William Dally, Patrick Chiang. A 90mW 4Gb/s Equalized I/O Circuit with Input Offset Cancellation. International Solid State Circuits Conference, San Francisco, February 2000, TP 15.3.
William Dally, Ming-Ju Edward Lee, Fu-Tai An, John Poulton, Steve Tell High-Performance Electrical Signaling Fifth International Conference on Massively Parallel Processing, 15-17 Jun 1998 pp11-16

High Speed Signalling Theses

Lee, Ming-Ju Edward. An Efficient I/O and Clock Recovery Design for Terabit Integrated Circuits. Stanford University Ph.D. Thesis, August 2001.

Merrimac Publications

Merrimac Papers

Mattan Erez, Nuwan Jayasena, Timothy J. Knight, William J. Dally, "Fault Tolerance Techniques for the Merrimac Streaming Supercomputer", SC|05, November 12-18 2005, Seattle, Washington, USA.
Mattan Erez, Jung Ho Ahn, Ankit Garg, William J. Dally, Eric Darve, Analysis and Performance Results of a Molecular Modeling Application on Merrimac, Super Computing 2004, November 2004, Pittsburgh, Pennsylvania, USA.
Nuwan Jayasena, Mattan Erez, Jung Ho Ahn, and William J. Dally. Stream Register Files with Indexed Access. Tenth International Symposium on High Performance Computer Architecture, Madrid, Spain, February 2004.
William J. Dally, Patrick Hanrahan, Mattan Erez, Timothy J. Knight, Francois Labonte, Jung-Ho Ahn, Nuwan Jayasena, Ujval J. Kapasi, Abhishek Das, Jayanth Gummanraju, and Ian Buck. "Merrimac: Supercomputing with Streams", Super Computing 2003, November 2003, Phoenix, Arizona.

Merrimac Theses

Jayasena, Nuwan. Memory Hierarchy Design for Steram Computing, Stanford University Ph.D. Thesis, 2005.

M-Machine Publications

M-Machine Papers

Keckler, Stephen W., Chang, Andrew., Lee, Whay Sing, Chatterjee, Sandeep, and Dally, William J., "Concurrent Event Handling through Multithreading", IEEE Transactions on Computers, Vol 48, No 9, September 1999. pp. 903-916. Abstract
Lee, Whay Sing, Dally, William J., Keckler, Stephen W., Carter, Nicholas P., and Chang, Andrew, "Efficient Protected Message Interface in the MIT M-Machine", IEEE Computer Special Issue on Design Challenges for High Performance Network Interfaces, November 1998. pp 69-75. Abstract
Chang, Andrew, Dally, William J., Keckler, Stephen W., Carter, Nicholas P., and Lee, Whay Sing, "The Effects of Explictly Parallel Mechanisms on the Multi-ALU Processor Cluster Pipeline", 1998 International Conference on Computer Design, Austin, TX, October 1998. pp 474-481. Abstract
Keckler, Stephen W., Dally, William J., Maskit, Daniel, Carter, Nicholas P., Chang, Andrew, and Lee, Whay Sing, "Exploiting Fine-Grain Thread Level Parallelism on the MIT Multi-ALU Processor", 25th Annual International Symposium on Computer Architecture, Barcelona, Spain, July 1998. pp 306-317. Abstract
Keckler, Stephen W., Dally, William J., Chang, Andrew, Carter, Nicholas P., and Lee, Whay Sing, "The MIT Multi-ALU Processor", HotChips IX, Stanford, CA, August 1997. pp 1-7. Abstract
Fillo, Marco, Keckler, Stephen W., Dally, William J., Carter, Nicholas P., Chang, Andrew, Gurevich, Yevgeny, and Lee, Whay S., "The M-Machine Multicomputer", International Journal of Parallel Programming - Special Issue on Instruction-Level Parallel Processing Part II. Vol 25, No 3, 1997 pp 183-212.
Fillo, Marco, Keckler, Stephen W., Dally, William J., Carter, Nicholas P., Chang, Andrew, Gurevich, Yevgeny, and Lee, Whay S., "The M-Machine Multicomputer" , Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, MI 1995. pp 146-156. Abstract
Carter, Nicholas P., Keckler, Stephen W., and Dally, William J., "Hardware Support for Fast Capability-based Addressing", 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI). San Jose, CA, 1994.
Keckler, Stephen W. and Dally, William J., "Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism", 19th Annual International Symposium in Computer Architecture, Queensland, Australia, 1992.

M-Machine Theses

Carter, Nicholas P., "Processor Mechanisms for Software Shared Memory", PhD Thesis, Masschusetts Institute of Technology, February 1999.
Lee, Whay Sing, "Mechanisms for Efficient, Protected Messaging", PhD Thesis, Masschusetts Institute of Technology, February 1999.
Keckler, Stephen W., "Fast Thread Communication and Synchronization Mechanisms for a Scalable Single Chip Multiprocessor", PhD Thesis, Massachusetts Institute of Technology, June 1998.
Chang, Andrew, "VLSI Datapath Choices: Cell-Based Versus Full-Custom" SM Thesis, Masschusetts Institute of Technology, February 1998.
Klayman, Keith,"Design of the Configuration and Diagnostic Units of the MAP Chip", MEng Thesis, Masschusetts Institute of Technology, May 1997.
Ma, Albert, "An I/O Controller for the MAP Chip", MEng Thesis, Masschusetts Institute of Technology, May 1997.
Shultz, Andrew, "Advances in the M-Machine Runtime System", MEng Thesis, Masschusetts Institute of Technology, May 1997.
Chatterjee, Sandeep, "Asynchronous Event Handling", SM Thesis, Masschusetts Institute of Technology, May 1996.
Gupta, Parag, "Design and Implementation of the Integer Unit Datapath of the MAP Cluster of the M-Machine" , MEng Thesis, Masschusetts Institute of Technology, May 1996.
Hartman, Daniel, "M-Machine Floating-Point Multiplier Datapath", MEng Thesis, Masschusetts Institute of Technology, May 1996.
Gurevich, Yevgeny, "The M-Machine Operating System", MEng Thesis, Masschusetts Institute of Technology, September 1995.
Gurevich, Yevgeny, "An Assembler and Linker System for the M-Machine Software Project", Advanced Undergraduate Project, Massachusetts Institute of Technology, May, 1994.
Keckler, Stephen W., "A Coupled Multi-ALU Processing Node for a Highly Parallel Computer" , SM Thesis, Artificial Intelligence Laboratory Technical Report 1355, Massachusetts Institute of Technology, 1992.

M-Machine Technical Reports and Memos

Dally, William J., Keckler, Stephen W., Carter, Nick, Chang, Andrew, Fillo, Marco, and Lee, Whay S., "The M-Machine Instruction Set Reference Manual v1.55" , CVA Memo 59, 1997.
Gupta, Parag, "The Design and Implementation of the Memory Unit" , CVA Memo - Preliminary Design Specification, Massachusetts Institute of Technology, January 21, 1997.
Keckler, Stephen W., "The Architecture of the MAP Floating Point Unit" , CVA Memo - Preliminary Design Specification, Massachusetts Institute of Technology, January 20, 1996.
Dally, William J., "The MAP Instruction Fetch Unit (IFU)", CVA Memo - Preliminary Design Specification, Massachusetts Institute of Technology, August 4, 1995.
Keckler, Stephen W., "The Architecture of the MAP Synchronization Stage" , CVA Memo - Preliminary Design Specification, Massachusetts Institute of Technology, February 15, 1995.
Chang, Andrew, "The Architecture of the RR Stage of the MAP Pipeline" , CVA Memo - Preliminary Design Specification, Massachusetts Institute of Technology, February 14, 1995.
Chang, Andrew, "The Architecture of the Integer Execution Stage of the MAP Pipeline" , CVA Memo - Preliminary Design Specification, Massachusetts Institute of Technology, February 14, 1995.
Fillo, Marco, Keckler, Stephen W., Dally, William J., Carter, Nicholas P., Chang, Andrew, Gurevich, Yevgeny, and Lee, Whay S., "The M-Machine Multicomputer" , Artificial Intelligence Laboratory Memo 1532, Massachusetts Institute of Technology, 1995.
Dally, William J., Keckler, Stephen W., Carter, Nick, Chang, Andrew, Fillo, Marco, and Lee, Whay S., "The M-Machine Architecture v1.0" , CVA Memo 58, 1995.
Keckler, Stephen W., "The Importance of Locality in Scheduling and Load Balancing for Multiprocessors", Concurrent VLSI Architecture Memo 61, Massachusetts Institute of Technology, 1994.
Keckler, Stephen W., "A Coupled Multi-ALU Processing Node for a Highly Parallel Computer" , SM Thesis, Artificial Intelligence Laboratory Technical Report 1355, Massachusetts Institute of Technology, 1992.

J-Machine Publications

J-Machine Papers

Dally, William J., Chang, Andrew., Chien, Andrew., Fiske, Stuart., Horwat, Waldemar., Keen, John., Lethin, Richard., Noakes, Michael., Nuth, Peter., Spertus, Ellen., Wallach, Deborah., and Wills, D. Scott. "The J-Machine". Retrospective in 25 Years of the International Symposia on Computer Architecture - Selected Papers. pp 54-58.
Noakes, Michael D.,Wallach, Deborah, and Dally, William J., "The J-Machine Multicomputer: An Architectural Evaluation", Twentieth Annual International Symposium in Computer Architecture, San Diego, CA, 1993.
Spertus, Ellen, Goldstein, Seth C.,Schauser, Klaus Erik Schauser., von Eiken, Thorsten., Culler, David E., and Dally, William J., "Evaluation of Mechanisms for Fine-Grained Parallel Programs in the J-Machine and the CM-5", Twentieth Annual International Symposium in Computer Architecture, San Diego, CA, 1993.
Dally, William J., "The J-Machine: System Support for Actors", in Towards Open Information Science, Editors, Hewitt, Carl and Agha, Gul, MIT Press, 1992.
Dally, William J. and Fiske, J.A. Stuart and Keen, John S. and Lethin, Richard A. and Noakes, Michael D. and Nuth, Peter R. and Davison, Roy E. and Fyler, Gregory A., "The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms", "IEEE Micro", April, 1992.
Dally, William J. and others, The Message-Driven Processor: An Integrated Multicomputer Processing Element, Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors, IEEE Press, 1992.
Lethin, Richard A. and Dally, William J., "MDP Tools and Methods", Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1992.
Nuth, Peter R. and Dally, William J., "The J-Machine Network", Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, October, 1992.
Dally, William J. and others, "Design and Implementation of the Message-Driven Processor", Proceedings of the 1992 Brown/MIT Conference on Advanced Research in VLSI and Parallel Systems, MIT Press, March, 1992.
Spertus, Ellen and Dally, William J. , "Experiences with Dataflow on a General-Purpose Parallel Computer" , Proceedings of International Conference on Parallel Processing, 1991.
Dally, William J., "The J-Machine System", in Artificial Intelligence at MIT: Expanding Frontiers, editor Patrick Winston with Sarah A. Shellard, MIT Press, 1990.
Noakes, Michael and Dally, William J.,"System Design of the J-Machine",Sixth MIT Conference of Advanced Research in VLSI, The MIT Press, 1990.
Horwat, Waldemar and Chien, Andrew and Dally, William J. , "Experience with CST:Programming and Implementation", Proceedings of the ACM SIGPLAN 89 Conference on Programming Language Design and Implementation, 1989.
Chien, Andrew and Dally, William J., "CST: An Object-Oriented Concurrent Language" Object-Based Concurrent Programming Workshop, September, 1988, Conference held at San Diego, CA. SIGPLAN Notices, February 1989.
Dally, William J., Fine-Grain Message Passing Concurrent Computers, Proceedings of the Third Conference on Hypercube Concurrent Computers, Pasadena, CA, 1988.
Dally, William J. and others, "Architecture of a Message-Driven Processor",Proceedings of the 14th International Symposium on Computer Architecture, 1987.
Dally, William J. and Seitz, Charles L., "Deadlock Free Message Routing in Multiprocessor Interconnection Networks", IEEE Transactions on Computing, Volume C-36, May, 1987.
Dally, William J. and Kajiya, James T., An Object Oriented Architecture, Proceedings of the 12th International Symposium on Computer Architecture, 1985.

J-Machine Theses

Kaneshiro, Shaun Yoshie, "Branch and Bound Search on the J-Machine", SM Thesis, Massachusetts Institute of Technology, Department of Electrical Engineering, September 1993.
Nuth, Peter R., "The Named-State Register File", PhD Thesis, Artificial Intelligence Laboratory Technical Report 1459, Massachusetts Institute of Technology, August 1993.
Spertus, Ellen, "Execution of Dataflow Programs on General-Purpose Hardware", SM Thesis, Massachusetts Institute of Technology, Department of Electrical Engineering, August, 1992.
Fatovic, Jerko, "A Ray Tracer for the J-Machine", SB Thesis, Massachusetts Institute of Technology, Department of Electrical Engineering, May, 1992.
Lethin, Richard A., A Simulator for the Message-Driven Processor, SM Thesis, Massachusetts Institute of Technology, 1991.
Horwat, Waldemar, Concurrent Smalltalk on the Message-Driven Processor, SM Thesis, Massachusetts Institute of Technology, May 1989.
Chao, Linda, "Architectural Features of a Message-Driven Processor", SB Thesis, Massachusetts Institute of Technology, May, 1987.
Totty, Brian, "An Operating Environment for the Jellybean Machine", SB Thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 1988.
Horwat, Waldemar, "A Concurrent Smalltalk Compiler for the Message-Driven Processor", MIT AI Memo, 545 Technology Sq., Cambridge, MA 02139, May, SB Thesis.

J-Machine Technical Reports and Memos

Nuth, Peter R., "The Named-State Register File: Implementation and Performance", Concurrent VLSI Architecture Memo, Massachusetts Institute of Technology, 1993.
Horwat, Waldemar, "Revised CST Manual Version ", CVA Memo
Spertus, Ellen and Dally, William J., "Experiments with Dataflow on a General-Purpose Parallel Computer", Artificial Intelligence Laboratory Technical Report 1272, Massachusetts Institute of Technology, 1991.
An experimental QCD code has been implemented on the J-machine by Richard Lethin and Robert Rippel in order to evaluate communication and computation behavior (report in PostScript).

Reliable Router Publications

Reliable Router Papers

Dally, William J., Dennison, Larry R., Harris, David, Kan, Kinhong, and Xanthopoulos, Thucydides "The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers," in Proceedings of the First International Parallel Computer Routing and Communication Workshop, Seattle WA, May 1994.
Dally, William J., Dennison, Larry R., Harris, David, Kan, Kinhong, and Xanthopoulos, Thucydides, "Architecture and Implementation of the Reliable Router," in Proceedings of Hot Interconnects II, Stanford CA, August 1994.
Dennison, Larry R., Dally, William J., and Xanthopoulos, Duke, "Low-Latency Plesiochronous Data Retiming," in Proceeding of the 1995 Advanced Research in VLSI Conference, Chapel Hill NC, March 1995.
Dennison, Larry R., Lee, Whay S., and Dally, William J., "High-Performance Bidirectional Signalling in VLSI Systems," in Proceedings of the 1993 Symposium on Research on Integrated Systems, Seattle WA, January 1993.

Reliable Router Theses

Dennison, Larry R., "The Reliable Router: An Architecture for Fault Tolerant Interconnect", PhD Thesis, Masschusetts Institute of Technology, June 1996.
Xanthopoulos, Thucydides, "Fault Tolerant Adaptive Routing in Multicomputer Networks", SM Thesis, Massachusetts Institute of Technology, Department of Electrical Engineering, February, 1995.
Dennison, Larry R., "Reliable Interconnection Networks for Parallel Computers", SM Thesis, Masschusetts Institute of Technology, January 1991.

Other Publications

Efficient Super-Computing

Enabling Technology for On-Chip Networks

ELM Publications

Sequoia Publications

Papers

Gebhart, Mark, Johnson, Daniel R., Tarjan, David, Keckler, Stephen W., Dally, William J., Lindholm, Erik, and Skadron, Kevin, "Energy-efficient Mechanisms for Managing Thread Context in Throughput Processors", Proceedings. 38th Annual International Symposium on Computer Architecture (ISCA), June 2011.
Chiang, Patrick, Dally, William J., and Lee, Ming-Ju, "A Monolithic Communications System", 2001 IEEE International Symposium on Circuits and Systems, Sydney, Australia, May 6-9, 2001, paper 2145.
Dally, William J., and Chang, Andrew, "The Role of Custom Design in ASIC Chips", Proceedings of the 37th Design Automation Conference, Los Angeles, CA. June 2000.
Dally, William J., and Lacy, Steve, "VLSI Architecture: Past, Present, and Future", Proceedings of the Advanced Research in VLSI conference, Atlanta, GA, 1999.
Fiske, J. A. Stuart and Dally, William J., "Thread Prioritization: A Thread Scheduling Mechanism For Multiple-Context Parallel Processors," Future Generation Computer Systems, Vol. 11, No. 6, November 1995.
Spertus, Ellen and Dally, William J., "Evaluating the Locality Benefits of Active Messages," Proceedings of Principles and Practice of Parallel Programming (PPOPP), Santa Barbara, CA, July 19-21, 1995.
Nuth, Peter R. and Dally, William J., "The Named-State Register File: Implementation and Performance", Proceedings of the 1st International Symposium on High-Performance Computer Architecture, Raleigh, NC, January 1995.
Nuth, Peter R. and Dally, William J., "Named State and Efficient Context Switching", in "Multithreading: A Summary of the State of the Art", Edited by Robert A. Iannucci, Kluwer 1993.
"International Symposium on Computer Architecture 1992", by Keckler, Stephen W. and Dally, William J..Scientific Information Bulletin, Office of Naval Research Asian Office, Vol. 17, No. 4, 1992.

Other Theses

Knobe, Kathleen B., "The Subspace Model: Shape-based Compilation for Parallel Systems", PhD Thesis, Massachusetts Institute of Technology, January 1997.
Fiske, J. Stuart A., "Thread Scheduling Mechanims for Multiple Context Parallel Processors", PhD Thesis, Artificial Intelligence Laboratory Technical Report 1545, Massachusetts Institute of Technology, June 1995.
Keen, John S., "Logging and Recovery in a Highly Concurrent Database"", PhD Thesis, Massachusetts Institute of Technology, May 1994.
Von Kapff, Marcus Alexander, "Industry Assesment and Market Analysis of Massively Parallel Computers", Master's Thesis, Sloan School at Massachusetts Institute of Technology, May, 1993.
Wallach, Deborah A., "PHD: A Hierarchical Cache Coherent Protocol", SM Thesis, Masschusetts Institute of Technology, September 1992.

Reports and Memos

Mattan Erez, Brian Towles, and William J. Dally, "Spills, Fills, and Kills - An Architecture for Reducing Register-Memory Traffic," Concurrent VLSI Architecture Technical Report, November 2000.