Modern large-scale heterogeneous computers incorporating GPUs offer impressive processing capabilities. It is desirable to fully utilize such systems for serving multiple users concurrently to visualize large data at interactive rates. However, as the disparity between data transfer speed and compute speed continues to increase in heterogeneous systems, data locality becomes crucial for performance. We present a new job scheduling design to support multi-user exploration of large data in a heterogeneous computing environment, achieving near optimal data locality and minimizing I/O overhead. The targeted application is a parallel visualization system which allows multiple users to render large volumetric data sets in both interactive mode and batch mode. We present a cost model to assess the performance of parallel volume rendering and quantify the efficiency of job scheduling. We have tested our job scheduling scheme on two heterogeneous systems with different configurations. The largest test volume data used in our study has over two billion grid points. The timing results demonstrate that our design effectively improves data locality for complex multi-user job scheduling problems, leading to better overall performance of the service.