This paper presents an efficient VLSI architecture with the small internal word length and the simple scheduling for two-dimensional discrete wavelet transform. The proposed VLSI architecture combines the semi-recursive pyramid algorithm with non-interleaved operations and the lapped block processing. It computes 2-D DWT in real-time. In the semi-recursive pyramid algorithm with non-interleaved operation, which modifies the Recursive Pyramid Algorithm, there are two filter modules for the 2-D DWT computation of the 1st octave and the other octaves. The optimized internal word lengths for all the functional units are determined with the reasonable accuracy by the accuracy analysis. Functional simulations of 2-D DWT with the proposed architecture are performed with fixed-point arithmetic operations. Compared with the classical architecture, the internal word lengths are reduced and the scheduling is made simpler by the semi-recursive pyramid algorithm with non-interleaved operation. Therefore, the hardware size for multipliers, adders and scheduling controls is reduced in the the proposed VLSI architecture. In addition, the storage size and the latency are minimized by the lapped block processing. This architecture has only $2(T-2)×∑_{k=1}^J(N/2^{k-1})$ elements of overlapped storages and NxN elements of transposition storage for 2-D DWT, where T is the length of the FIR filter.