Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology

  • ,
  • Donghyuk Lee ,
  • Thomas Mullins ,
  • Hasan Hassan ,
  • Amirali Boroumand ,
  • Jeremie Kim ,
  • Michael A. Kozuch ,
  • Onur Mutlu ,
  • Phillip B. Gibbons ,
  • Todd C. Mowry

IEEE/ACM Annual International Symposium on Microarchitecture (MICRO) |

Publication

Many important applications trigger bulk bitwise operations,
i.e., bitwise operations on large bit vectors. In fact, recent
works design techniques that exploit fast bulk bitwise operations
to accelerate databases (bitmap indices, BitWeaving)
and web search (BitFunnel). Unfortunately, in existing architectures,
the throughput of bulk bitwise operations is limited
by the memory bandwidth available to the processing unit
(e.g., CPU, GPU, FPGA, processing-in-memory).

To overcome this bottleneck, we propose Ambit, an
Accelerator-in-Memory for bulk bitwise operations. Unlike
prior works, Ambit exploits the analog operation of DRAM
technology to perform bitwise operations completely inside
DRAM, thereby exploiting the full internal DRAM bandwidth.
Ambit consists of two components. First, simultaneous activation
of three DRAM rows that share the same set of sense
amplifiers enables the system to perform bitwise AND and OR
operations. Second, with modest changes to the sense amplifier,
the system can use the inverters present inside the sense
amplifier to perform bitwise NOT operations. With these
two components, Ambit can perform any bulk bitwise operation
efficiently inside DRAM. Ambit largely exploits existing
DRAM structure, and hence incurs low cost on top of commodity
DRAM designs (1% of DRAM chip area). Importantly,
Ambit uses the modern DRAM interface without any changes,
and therefore it can be directly plugged onto the memory bus.
Our extensive circuit simulations show that Ambit works
as expected even in the presence of significant process variation.

Averaged across seven bulk bitwise operations, Ambit
improves performance by 32X and reduces energy consumption
by 35X compared to state-of-the-art systems. When
integrated with Hybrid Memory Cube (HMC), a 3D-stacked
DRAM with a logic layer, Ambit improves performance of
bulk bitwise operations by 9.7X compared to processing in
the logic layer of the HMC. Ambit improves the performance
of three real-world data-intensive applications, 1) database
bitmap indices, 2) BitWeaving, a technique to accelerate
database scans, and 3) bit-vector-based implementation of
sets, by 3X-7X compared to a state-of-the-art baseline using
SIMD optimizations. We describe four other applications that
can benefit from Ambit, including a recent technique proposed
to speed up web search. We believe that large performance
and energy improvements provided by Ambit can
enable other applications to use bulk bitwise operations.