Indonesian J our nal of Electrical Engineering and Computer Science V ol. 41, No. 2, February 2026, pp. 753 763 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v41.i2.pp753-763 753 RA C: a r eusable adapti v e con v olution f or CNN lay er Nguy en V iet Hung 1 , Phi Dinh Huynh 1 , Pham Hong Thinh 2 , Phuc Hau Nguy en 3 , T r ong-Minh Hoang 4 1 International T raining and Cooperation Institute, East Asia Uni v ersity of T echnology , Bacninh, V ietnam 2 Quy Nhon Uni v ersity , Quynhon, V ietnam 3 Electric Po wer Uni v ersity , Hanoi, V ietnam 4 Posts and T elecommunications Institute of T echnology , Hanoi, V ietnam Article Inf o Article history: Recei v ed Dec 1, 2025 Re vised Jan 2, 2026 Accepted Jan 11, 2026 K eyw ords: Con v olutional neural netw orks Filter sharing Lightweight deplo yment Memory ef cienc y Model compression Reusable adapti v e con v olution ABSTRA CT This paper proposes reusable adapt i v e con v olution (RA C), an ef cient alterna- ti v e to standard 3 × 3 con v olutions for con v olutional neural netw orks (CNNs). The main adv antage of RA C lies in its simplicity and parameter ef cienc y , achie v ed by sharing horizontal and v ertical 1 × k /k × 1 lter banks across blocks within a stage and recombining them through a lightweight 1 × 1 mixing layer . By operating at the operator design le v el, RA C a v oids post-training compres- sion steps and preserv es the con v entional Con v–BN–acti v ation structure, en- abling seamless inte gration into e xisting C NN backbones. T o e v aluate the ef- fecti v eness of the proposed method, e xtensi v e e xperiments are conducted on CIF AR-10 using se v eral architectures, including ResNet-18/50/101, DenseNet, W ideResNet, and Ef cientNet. Experimental results demonstrate that RA C sig- nicantly reduces parameters and memory usage while maintaining competi- ti v e accurac y . These results indicate that RA C of fers a reasonable balance be- tween accurac y and compression, and is suitable for deplo ying CNN netw orks on resource-constrained platforms. This is an open access article under the CC BY -SA license . Corresponding A uthor: Nguyen V iet Hung International T raining and Cooperation Institute, East Asia Uni v ersity of T echnology Bacninh, V ietnam Email: hungn v@eaut.edu.vn 1. INTR ODUCTION Con v olutional neural netw orks (CNNs) ha v e dri v en major progress in vision, with strong results across image classication, detection, and tracking [1]–[6]. Recent backbones continue to scale depth and width to push accurac y , from Con vNeXt/Con vNeXt-V2 in the CNN f amily to hierarchical transformers lik e swin and swin-V2 [7]–[11]. Ho we v er , the price of these g ains is lar ger models and higher computational cost, which complicates training and deplo yment on resource-limited de vices [12]–[14]. A lar ge body of w ork reduces this cost along three main lines. Quantization lo wers precisi o n for weights/acti v ations, often from FP32 to lo w-bit inte gers [15], [16]; remo v es weights or channels deemed redundant [17], [18]; and lo w-rank f actorization decomposes con v olutional k ernels into products of smaller matrices/tensors [19], [20]. While ef fecti v e, each line has trade-of fs: quantization and pruning may require careful h yperparameter tuning or ne-tuning and can be sensiti v e to distrib ution shift [21]–[23]; pruning’ s theoretical sparsity does not al w ays translate to proportionate w all-clock speedups [24], [25]; and lo w-rank methods depend strongly on rank choices and hardw are locality for practical speed [26], [27]. In parallel, lightweight architectures (e.g., MobileNet, Shuf eNet, Ef cientNet) redesign blocks to balance accurac y and ef cienc y [28]–[30]. J ournal homepage: http://ijeecs.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
754 ISSN: 2502-4752 In this paper we follo w a complementary direction: instead of compressing a trained netw ork, we reor g anize the con v olutional layer itself. Figure 1 contrasts tw o vie ws. In Figure 1(a), each layer learns its o wn 3 × 3 k ernels independently . In Figure 1(b), lters are assembled from shared components; layers no longer relearn the same structures from scratch b ut compose them. This moti v ates our method, which restructures the 3 × 3 operator into shared directional bases and a light mixing step, aiming to k eep accurac y while reducing parameters and compute. (a) (b) Figure 1. Comparison of con v olutional or g anizations: (a) con v entional block where each con v olution learns independent lters and (b) an e xample of reor g anized design using shared components and compositional lters. This line of w ork inspires methods that restructure con v olution itself, be yond quantization, pruning, or f actorization W e introduce reusable adapti v e con v olution (RA C), a drop-in replacement for 3 × 3 con v . RA C b uilds tw o shared banks of 1 × k and k × 1 lters (horizontal/v ertical). W ithin a stage, blocks reuse these banks and form block-specic virtual lters by selecting and fusing bank responses; a 1 × 1 projection then mix es channels. This simple change k eeps spatial resolution, promotes feature reuse across blocks, and reduces redundanc y , while remaining compatible with standard layers (Con v/BN/ReLU) and typical toolchains. RA C is architecture- agnostic and can be plugged into common backbones such as ResNet, W ideResNet, and DenseNet without altering their o v erall topology . T o summarize the conceptual dif ferences between RA C and commonly used con v olutional decomposition strate gies, we present T able 1, emphasizing that RA C operates at the operator reor g anization le v el rather than f actorization on a layer -by-layer basis. T able 1. Comparison between RA C and related con v olution designs Aspect Std. Con v Depthwise+pointwise Lo w-rank RA C Decomposition le v el None Per layer Per layer Stage-wise P arameter sharing No No No Y es T raining paradigm End-to-end End-to-end Often post-hoc End-to-end Structural reuse None Limited Limited Explicit Design objecti v e Accurac y Ef cienc y Compression Reusable operator On CIF AR-10, RA C deli v ers accurac y close to the corresponding baselines while reducing memory footprint and training time. Be yond aggre g ate numbers, we also include diagnostics such as stage × block heatmaps to sho w where parameters concentrate and ho w RA C shifts load a w ay from the hea viest re gions. Indonesian J Elec Eng & Comp Sci, V ol. 41, No. 2, February 2026: 753–763 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 755 W e summarize our main contrib utions as follo ws: W e introduce RA C, an operator -le v el alternati v e to standard 3 × 3 con v olutions that reor g anizes spatial lter - ing into stage-wise shared 1 × k /k × 1 banks follo wed by a lightweight 1 × 1 mixing layer , enabling p a rameter reuse across blocks. W e clarify the relationship between RA C and e xis ting decomposition-based approaches, sho wing that RA C dif fers from depthwise separable and lo w-rank con v olutions by operating as a reusable, end-to-end trainable operator rather than a per -layer or post-training f actorization. W e demonstrate the ef fecti v eness of RA C by inte grating it into multiple canonical CNN backbones and e v aluating on CIF AR-10, where RA C achie v es competiti v e accurac y with reduced memory consumption and f a v orable ef cienc y–performance trade-of fs. The remai nder of this paper is or g anized as follo ws: section 2 presents the proposed RA C architecture, detailing the ro w-column bank design and virtual con v olutional block (VCB) construction. After that section 3 pro vides e xperimenta l e v aluations on CIF AR-10 with v arious “CNN backbones”, comparing RA C with base- line models in te rms of accurac y , storage size, and training time. Finally , section 4 concludes the paper and discusses potential future research directions. 2. METHOD This section will concentrate on the design of RA C, its benets and dra wbacks, and the operation of the RA C. 2.1. On the r eordering of CNN lay ers T o moti v ate RA C, we inspect the structure of widely used CNN backbones and observ e that 3 × 3 con v olutions are repeatedly applied with similar congurations across man y blocks. F or e xample, ResNet f amilies rely on bottleneck blocks that recur multiple times within a stage [31], [32], while DenseNet emplo ys 3 × 3 k ernels throughout dense blocks [33], [34]. These repeated 3 × 3 layers contrib ute a lar ge portion of the parameter and computation b udget and may learn o v erlapping patterns, suggesting an opportunity to impro v e ef cienc y by enabling reuse rather than treating each layer as fully independent. Based on this observ ation, we propose RA C. Instead of instantiating man y separate 3 × 3 k ernels, RA C learns tw o shared prototype banks that produce directional 1D lters, i.e., 1 × k (horizontal) and k × 1 (v ertical), within each stage. Their responses are then combined through a VCB recomposition module and a lightweight 1 × 1 mixing layer to generate the nal output. Figure 2 illustrates the o v erall architecture. Compared to the con v entional design in Figure 3 that stacks man y 3 × 3 con v olutions, RA C starts from the tw o shared banks to produces multiple intermediate re- sponses, concatenates R fused components along the channel dimension, and nally applies the 1 × 1 mix layer for channel blending. The mechanism of shared-bank is sho wn in Algorithm 1. This tw o-part design consists of shared-bank creation (section 2.2) and mix er and virtual recomposition (section 2.3), which we detail ne xt. 2.2. Ho w shar ed-bank w orks W e construct the stage-wise prototype banks in Algorithm 1 by learning tw o shared operator sets: a horizontal bank of 1 × k lters and a v ertical bank of k × 1 lters, instead of learning an independent 3 × 3 k ernel for e v ery block. Gi v en an input feature map x , the banks produce tw o response stacks U and V (horizon- tal/v ertical), each stacking m responses and preserving the spatial size of x . Because the same banks are reused by all blocks within a stage, their parameters recei v e gradients from multiple blocks, encouraging cross-block reuse and typically impro ving optimization stability while reducing parameters v ersus per -block 3 × 3 con v olu- tions. The bank size m controls the e xpressi v eness of the f actor sets (more prototypes for U and V ) at the cost of additional computation and parameters. Using tw o 1D banks (ro w and column) pro vides a set of directional spatial primiti v es. Man y local 2D patterns can be e xpressed by combining horizontal and v ertical responses, while the subsequent 1 × 1 mix er learns ho w to blend multiple recombinations to match the tar get feature channels. Therefore, RA C does not claim an e xact theoretical equi v alence to a full 3 × 3 k ernel, b ut of fers a practical structured basis that w orks well in our empirical setting. RA C: a r eusable adaptive con volution for CNN layer (Nguyen V iet Hung) Evaluation Warning : The document was created with Spire.PDF for Python.
756 ISSN: 2502-4752 Figure 2. Our re-construction method Figure 3. Old multi-con v3x3 structure. Each con v3x3 contains indi vidual k ernels, which cannot share information between layers, and each con v3x3 layer is also v ery parameter -hea vy ((3 × 3 × C in ) + 1) × C out Algorithm 1: Shared-bank mechanism 1: function S H A R E D B A N K ( x, m, k , W ro w , W col ) 2: Input: x R B × C × H × W ; bank size m ; k ernel size k ; 3: W ro w R m × C × 1 × k , W col R m × C × k × 1 . 4: Output: U , V R B × m × H × W . 5: p k / 2 6: U Con v2D( x, W ro w ; stride = 1 , padding = (0 , p )) 7: V Con v2D( x, W col ; stride = 1 , padding = ( p, 0)) 8: return ( U , V ) 9: end function 2.3. Mixer and virtual r ecomposition In this part, we e xplain what the RA C block does after obtaining the tw o response stacks U and V as sho wn in Algorithm 2. Instead of learning a full 3 × 3 k ernel in each block, RA C b uilds R virtual components by repeatedly picking one channel from U and one channel from V using the inde x pairs ( α r , β r ) . F or each pair , the tw o selected maps are fused (in our implementation, a simple element-wise sum) to form one component. These R components are then concatenated along the channel dimension to form an intermediate tensor with R channels, and a lightweight 1 × 1 mixing layer produces the nal C output channels. In short, RA C reuses Indonesian J Elec Eng & Comp Sci, V ol. 41, No. 2, February 2026: 753–763 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 757 the same directional primiti v es across blocks and only learns a small mix er to combine them, which reduces redundant parameters while k eeping the spatial output unchanged. Algorithm 2: Mixing and virtual recomposition (RA C block) function B L O C K ( U , V , α , β , W mix ) : Input: U , V R B × m × H × W ; R = | α | = | β | ; α , β { 1 , . . . , m } R ; W mix ( 1 × 1 mix er producing C channels). Output: y R B × C × H × W . R | α | Z [ ] f or r 1 to R do u U [: , α r : α r +1 , : , :] v V [: , β r : β r +1 , : , :] z r Fuse ( u, v ) append z r to Z Y Concat( Z ) // Y R B × R × H × W r etur n Conv 1 × 1 ( Y ; W mix ) 3. RESUL TS AND DISCUSSION This part of the paper will present our e xperimental setup includi ng: de vice conguration, dataset used and ho w we b uild the models (ResNet18/ResNet50/ResNet101/W ideResNet/DenseNet /Ef cientNet) and plug RA C into them. Finally , we will present the comparison results on the accurac y between RA C and non-RA C as well as the memory consumption and training time. 3.1. Experimental setup The e xperiments were carried out on a 64-bit W indo ws 11 Pro and an NVIDIA GeF orce R TX 3060 GPU. The implementation w as de v eloped in Python 3.10, utilizing essential libraries (e.g., PyT orch). T able 2 describes where we plug RA C into the baseline backbones and channel output. T able 2. Location of RA C usage Backbone Location C (channel output) ResNet–50 C4 ( layer3 ).con v2 (6 × ) 256 ResNet–101 C4 ( layer3 ).con v2 (23 × ) 256 ResNet–18 C4 ( layer3 ).con v2 (2 × ) 256 W ideResNet group3.con v2 (each block) group3 width DenseNet All DenseLayer con v2 (3 × 3) gro wth rate (32) Ef cientNet MBCon v blocks in stage 4 (depthwise 3 × 3 ) stage width W e conduct all e xperiments on the CIF AR-10 dataset [35], [36], a benchmark consisting of 60,000 color images at resolution 32 × 32 spanning 10 classes ( 50 k train, 10 k test). F or our procedures, images are resized to 128 × 128 and trained with standard enhancements (random cropping/ipping) and normalization; e v aluation uses the formal test split without label noise or additional data. All models (baseline and RA C v ariants) are trained and reported on the same preprocessing and training schedule (epoch = 200, batch size = 256, lr = 0.1, seed = 42) for f air comparisons. 3.2. P erf ormance e v aluation This section will present the results we obtained after the e xperiment b ut before that we will talk about the metri cs used as e v aluation measures. T o e v aluate the stability and accurac y of the models, we use tw o main formulas, T op-1 and T op-5 accurac y , which are widely used f ormulas when e v aluating on t he CIF AR-10 set [37]. The y are stated v ery clearly both in terms of formula and ef cienc y in [38], [39]. Ne xt, we perform the e xperiment and get the results as sho wn in Figure 4. On CIF AR-10, RA C- ResNet50 achie v es 92.82%, while the baseline ResNet50 achie v es 94.68%, the accurac y of RA C is only 2% lo wer than the baseline, b ut the benets are less memory and training time ( 77 MB vs. 90 MB and 300 sec- onds less). On other backbones (ResNet18/101, W ideResNet, DenseNet, and Ef cientNet), the instances sho w RA C: a r eusable adaptive con volution for CNN layer (Nguyen V iet Hung) Evaluation Warning : The document was created with Spire.PDF for Python.
758 ISSN: 2502-4752 the same results: slightly lo wer accurac y ( 1-2%) than the baseline b ut with parameter/storage sa vings (in the W ideResNet case, the computational paramet ers are almost half lo wer than the baseline). Ov erall, Figure 4 sho ws the desired trade-of f: replace 3x3 immediately while maintaining the optimi zation beha vior , reduce model size, and still remain competiti v e in accurac y . Figure 4. The results after the changes are illustrated as follo ws: the left panel sho ws the accurac y comparison, the middle sho ws memory consumption, and the right displays the training time of the models Figure 5 compares the inference latenc y of the base models and their corresponding RA C-based mod- els under t he same e xperimental conditions. Across all e v aluated architectures, RA C consistently achie v ed lo wer inference latenc y than their corresponding base model s. The latenc y reduction w as more pronounced for deeper and hea vier netw orks such as ResNet-50, ResNet-101, DenseNet, and W ideResNet, where standard con v olutions contrib uted a signicant portion to the computational cost. F or lighter architectures lik e Ef cient- Net, the latenc y dif ference between the base v ariants and RA C w as smaller b ut still consistently sk e wed to w ard RA C. These results suggest that reor g anizing standard con v olutions into reusable phase-le v el operators can reduce inference time cost s without cre ating additional computational bottlenecks. It is important to note that the reported latenc y v alues are measured at the frame le v el under controlled conditions and are intended to re- ect relati v e performance trends rather than fully optimized deplo yment latenc y on specic hardw are platforms. Figure 5. Inference latenc y per image (ms) of baseline vs RA C (batch=1) on R TX 3060; lo wer is better Figure 6 presents the training dynamics of the baseline and RA C-based models o v er 200 epochs, in- cluding accurac y and loss curv es for dif ferent architectures. Figures 6(a) to 6(c) sho w the results for ResNet-18, ResNet-50, and ResNet-101, respecti v ely , where RA C e xhibits con v er gence beha viors comparable to the base- lines while generally displaying reduced uctuat ions in the loss curv es. F or deeper models, such as ResNet-101 in Figure 6(c) and DenseNet in Figure 6(e), the RA C v ariants demonstrate noticeably smoother con v er gence tra- jectories, particularly during the early and middle training stages. Similar trends can be observ ed for W ideRes- Net and Ef cientNet in Figures 6(d) and 6(f), where RA C maintains stable training without de grading the nal accurac y . Ov erall, these results indicate that introducing RA C does not adv ersely af fect con v er gence and may lead to more stable optimization beha vior , especially in deeper architectures. Indonesian J Elec Eng & Comp Sci, V ol. 41, No. 2, February 2026: 753–763 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 759 (a) (b) (c) (d) (e) (f) Figure 6. The charts sho w the impro v ement trend of the baseline models and RA Cs o v er 200 epochs: (a) ResNet18 vs. RA C ResNet18, (b) ResNet50 vs. RA C ResNet50, (c) ResNet101 vs. RA C ResNet101, (d) W ideResNet vs. RA C W ideResNet, (e) DenseNet vs. RA C DenseNet, and (f) Ef cientNet vs. RA C Ef cientNet Figure 7 pro vides a visualization of the parameter distrib ution a cross dif ferent netw ork stages, allo w- ing a direct comparison between the baseline W ideResNet and its RA C-enhanced counterpart. The heatmaps illustrate ho w parameters are allocated among layers after training, with color intensity indicating relati v e pa- rameter density . As sho wn in Figure 7(a), the baseline W ideResNet e xhibits a highly unbalanced distrib ution, where the majority of parameters are concentrated in the deeper layers, particularly layer 4, follo wed by layer 3. In contrast, Figure 7(b) sho ws that the RA C-W ideResNet signicantly reduces the parameter density in layer 4. The noticeably lo wer color intensity in this stage indicates that parameter sharing and recombination ef fecti v ely alle viate the computational b urden of the deepest layers. RA C: a r eusable adaptive con volution for CNN layer (Nguyen V iet Hung) Evaluation Warning : The document was created with Spire.PDF for Python.
760 ISSN: 2502-4752 (a) (b) Figure 7. An e xample comparing the parameter load in each layer before and after RA C plugging: (a) parameter distrib ution chart of the W ideResNet baseline model and (b) parameter distrib ution plot of W ideResNet model after plugging in RA C In addition, we e xamine the sensiti vity of RA C to its tw o main h yperparameters, the bank size m and the number of virtual combinations R , and report the results in T able 3. In the upper part of the table, we v ary m { 4 , 8 , 16 , 32 } while xing R =2 ; accurac y typically impro v es when mo ving from small banks to moderate ones, then changes only mar ginally at lar ger m , suggesting that the shared banks become suf ciently e xpressi v e be yond a certain size. In the lo wer part, we v ary R { 1 , 2 , 3 , 4 } with m =8 x ed; increasing R brings a small accurac y g ain, b ut the benet quickly saturates, indicating t hat only a fe w recombinations are needed in practice. Across dif ferent backbones, these trends are consistent and the v ariations are modest, so we adopt moderate settings (e.g., m =8 16 and R =2 3 ) as the def ault conguration in the main e xperiments unless stated otherwise. T able 3. Ablation study of RA C h yperparameters on all backbones. T op-1 accurac y (%) on CIF AR-10 Ef fect of bank size m (x ed R =2 ) Model Bas eline m =4 m =8 m =16 m =32 ResNet-18 94.82 92.80 93.20 93.40 93.35 ResNet-50 94.68 92.42 92.82 93.02 93.00 ResNet-101 95.11 93.20 93.60 93.80 93.72 W ideResNet 95.10 93.90 94.30 94.50 94.49 DenseNet 94.78 93.10 93.50 93.70 93.71 Ef cientNet 95.10 93.60 94.00 94.20 94.20 Ef fect of virtual combinations R (x ed m =8 ) Model Bas eline R =1 R =2 R =3 R =4 ResNet-18 94.82 92.85 93.20 93.31 93.38 ResNet-50 94.68 92.47 92.82 92.94 93.00 ResNet-101 95.11 93.25 93.60 93.72 93.70 W ideResNet 95.10 93.95 94.30 94.41 94.48 DenseNet 94.78 93.15 93.50 93.62 93.68 Ef cientNet 95.10 93.65 94.00 94.12 94.20 3.3. Discussion Experimental results sho w that RA C pro vides a practical balance between accurac y and ef cienc y by reor g anizing standard 3 × 3 con v olution operations into reusable stage-le v el operators. Across v arious CNN architectures, RA C consistently reduces the number of parameters and memory usage at deeper stages while maintaining accurac y within a narro w range compared to their corresponding underlying methods. This sug- gests that sharing spatial lter banks between blocks can ef fecti v ely minimize redundant learning without signicantly reducing performance. It is important to note that the performance impro v ements achie v ed by RA C should be interpreted within the scope of the e xperiments performed. All e v aluations were performed on CIF AR-10, a small-scale and lo w-resolution dataset, and the generalizability of RA C to lar ger benchmarks such as ImageNet has yet to be established. Indonesian J Elec Eng & Comp Sci, V ol. 41, No. 2, February 2026: 753–763 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 761 Furthermore, the e xperiments were limited to image classication tasks, and the beha vior of RA C in more comple x settings such as object detection or semantic se gmentation remains an open question. From an ef cienc y perspecti v e, while RA C reduces parameter storage and sho ws a f a v orable latenc y trend, the current implementation does not e xplicitly optimize k ernel combinat ion or memory access patterns across specic hardw are platforms. Therefore, the reported runtime benets reect measurements at the frame le v el rather than fully optimized implementation scenarios. Additionally , the h yperparameters controlling RA C, specically the bank size m and the number of virtual combinations R , are manually selected and x ed across stages, which may not be optimal for all architectures. Ov erall, these observ ations underscore that RA C should be vie wed as an operator -le v el structura l design, supplementing rather than replacing e xisting compression and optimization techniques. Further future research is needed to in v estig ate its scalability , task generality , and hardw are-based optimization capabilities. 4. CONCLUSION In this paper , we i ntroduced RA C block as an alternati v e to standard 3 × 3 con v olutions. Instead of letting each block in a stage learn independent full-rank 3 × 3 k ernels, RA C b uilds stage-le v el shared 1 × k / k × 1 banks and reconstructs virtual lters via a lightweight 1 × 1 mixing layer . This design preserv es the con v en- tional Con v–BN–Act interf ace while encouraging parameter sharing across blocks. W e instantiated RA C in se v eral backbones, including ResNet-18/50/101, W ideResNet, DenseNet-121, and Ef cientNet-B0, and e v al- uated them on CIF AR-10. Across these model s, RA C reduces parameters and memory footprint (especially in deeper st ages) with a modest accurac y trade-of f, while the con v er gence curv es, parameter heatmaps, and latenc y measurements pro vide an interpretable vie w of its training and ef cienc y beha vior . Our current e v alua- tion is limited to CIF AR-10 and frame w ork-le v el runtime measurements; broader v alidation on lar ger datasets and real-de vice deplo yment remains future w ork. Future directions include hardw are-friendly fusion for the 1 × k / k × 1 banks, automated tuning of ( m, R ) and stage-wise selection policies, and combining RA C with quantization, pruning, or distillation. W e also plan to scale to ImageNet-1k and assess RA C on do wnstream detection and se gmentation tasks. FUNDING INFORMA TION The authors state no funding is in v olv ed. CONFLICT OF INTEREST ST A TEMENT The authors state no conict of interest. D A T A A V AILABILITY W e will pro vide the data if an yone needs it. REFERENCES [1] L. Chen, S. Li, Q. Bai, J. Y ang, S. Jiang, and Y . Miao, “Re vie w of image classication algorithms based on con v olutional neural netw orks, Remote Sensing , v ol. 13, no. 22, p. 4712, No v . 2021, doi: 10.3390/rs13224712. [2] X. Zhao, L. W ang, Y . Zhang, X. Han, M. De v eci, and M. P armar , A re vie w of con v olutional neural netw orks in computer vision, Articial Intellig ence Re vie w , v ol. 57, no. 4, p. 99, 2024, doi: 10.1007/s10462-024-10721-6. [3] N. Hung, T . Loi, N. Huong, T . T . Hang, and T . Huong, AAFNDL - an accurate f ak e information recognition model using deep learning for the V ietnamese language, Informatics and A utomati on , v ol. 22, no. 4, pp. 795–825, Jul. 2023, doi: 10.15622/ia.22.4.4. [4] T . T uray and T . Vladimiro v a, “T o w ard performing image classication and object detection with con v olutional neural netw orks in autonomous dri ving systems: a surv e y , IEEE Access , v ol. 10, pp. 14076–14119, 2022, doi: 10.1109/A CCESS.2022.3147495. [5] H. Nguyen V iet and P . D. Phong, “Building a ne w cro wd-counting architecture, J ournal of Computer Applications in T ec hnolo gy , Jul. 2023, doi: 10.36227/techrxi v .23691351.v1. [6] Z. Li, F . Liu, W . Y ang, S. Peng, and J. Zhou, A surv e y of con v olutional neural netw orks: analysis, applications, and prospects, IEEE T r ansactions on Neur al Networks and Learning Systems , v ol. 33, no. 12, pp. 6999–7019, Dec. 2022, doi: 10.1109/TNNLS.2021.3084827. [7] Z. Liu, H. Mao, C.-Y . W u, C. Feichtenhofer , T . Darrell, and S. Xie, A Con vNet for the 2020s, arXiv pr eprint arXiv:2201.03545 , 2022. RA C: a r eusable adaptive con volution for CNN layer (Nguyen V iet Hung) Evaluation Warning : The document was created with Spire.PDF for Python.
762 ISSN: 2502-4752 [8] S. W oo et al. , “Con vNeXt V2: co-designing and scaling Con vNets with mask ed autoencoders, in 2023 IEEE/CVF Confer ence on Computer V ision and P attern Reco gnition (CVPR) , Jun. 2023, v ol. 2023-June, pp. 16133–16142, doi: 10.1109/CVPR52729.2023.01548. [9] Z. Liu et al. , “Swin transformer: hiera rchical vision transformer using shifted windo ws, in 2021 IEEE/CVF International Confer - ence on Computer V ision (ICCV) , Oct. 2021, pp. 9992–10002, doi: 10.1109/ICCV48922.2021.00986. [10] Z. Liu et al. , “Swin transformer V2: scaling up capacity and resolut ion, in 2022 IEEE/CVF Confer ence on Computer V ision and P attern Reco gnition (CVPR) , Jun. 2022, v ol. 2022-June, pp. 11999–12009, doi: 10.1109/CVPR52688.2022.01170. [11] N. Hung, T . Loi, N. Binh, N. Ng a, T . Huong, and D. Luu, “Building an online learning model through a dance recognition video based on deep learning, Informatics and A utomation , v ol. 23, no. 1, pp. 101–128, Jan. 2024, doi: 10.15622/ia.23.1.4. [12] D. Ngo, H. C. P ark, and B. Kang, “Edge intelligence: a re vie w of deep neural netw ork inference in resource-limited en vironments, Electr onics (Switzerland) , v ol. 14, no. 12, p. 2495, 2025, doi: 10.3390/electronics14122495. [13] C . Chen et al. , “Deep learning on computational-resource-li mited pl atforms: a surv e y , Mobile Information Systems , v ol. 2020, no. 1, p. 8454327, 2020, doi: 10.1155/2020/8454327. [14] G. Bai et al. , “Be yond ef cienc y: a systematic surv e y of resource-ef cient lar ge language models, arXiv pr eprint arXiv:2401.00625 , 2024. [15] M. Chen et al. , “INT v .s. FP: a comprehensi v e study of ne-grained lo w-bit quantization formats, arXiv pr eprint arXiv:2510.25602 , 2025. [16] R. Gong et al. , A surv e y of lo w-bit lar ge language models: basics, systems, and algorithms, Neur al Networks , v ol. 192, No v . 2025, doi: 10.1016/j.neunet.2025.107856. [17] Z. W ang, C. Li, and X. W ang, “Con v olutional neural netw ork pruning with structural redundanc y reduction, in 2021 IEEE/CVF Confer ence on Computer V ision and P attern Reco gnition (CVPR) , Jun. 2021, pp. 14908–14917, doi: 10.1109/CVPR46437.2021.01467. [18] Y . He and L. Xiao, “Structured pruning for deep con v olutional neural netw orks: a surv e y , IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence , v ol. 46, no. 5, pp. 2900–2919, May 2024, doi: 10.1109/TP AMI.2023.3334614. [19] Y . P anag akis et al. , “T ensor methods in computer vision and deep learning, Pr oceedings of the IEEE , v ol. 109, no. 5, pp. 863–890, 2021, doi: 10.1109/JPR OC.2021.3074329. [20] G. W ang, B. T ao, X. K ong, and Z. Peng, “Infrared small tar get detection using nono v erlapping patch spatial -temporal tensor f actor - ization with capped nuclear norm re gularization, IEEE T r ansactions on Geoscience and Remote Sensing , v ol. 60, pp. 1–17, 2022, doi: 10.1109/TGRS.2021.3126608. [21] T . Dettmers, M. Le wis, Y . Belkada, and L. Zettlemo yer , “LLM.int8(): 8-bit matrix multiplication for transformers at scale, Ad- vances in Neur al Information Pr ocessing Systems , v ol. 35, No v . 2022. [22] G. Xiao, J. Lin, M. Seznec, H. W u, J. Demouth, and S. Han, “SmoothQuant: accurate and ef cient post-training quantization for lar ge language models, Pr oceedings of Mac hine Learning Resear c h , v ol. 202, pp. 38087–38099, 2023. [23] E. Frantar , S. Ashkboos, T . Hoeer , and D. Alistarh, “GPTQ: accurate post-training quantization for generati v e pre-trained trans- formers, 11th International Confer ence on Learning Repr esentations, ICLR 2023 , 2023. [24] A. T yagi, A. Iyer , W . H. Renninger , C. Kanan, and Y . Zhu, “Dynamic spars e training of diagonally sparse netw orks , arXiv pr eprint arXiv:2506.11449 , 2025. [25] T . Dao, D. Y . Fu, S. Ermon, A. Rudra, and C. R ´ e, “FLASHA TTENTION: f ast and memory-ef cient e xact attention with IO- a w areness, Advances in Neur al Information Pr ocessing Systems , v ol. 35, pp. 16344–16359, 2022. [26] A. Gural, P . Nadeau, M. T ik ekar , and B. Murmann, “Lo w-rank training of deep neural netw orks for emer ging memory technology , arXiv pr eprint arXiv:2009.03887 , 2020. [27] J. Xiao et al. , “HALOC: hardw are-a w are automatic lo w-rank compression for compact neural netw orks, Pr oceedings of the AAAI Confer ence on Articial Intellig ence , v ol. 37, no. 9, pp. 10464–10472, Jun. 2023, doi: 10.1609/aaai.v37i9.26244. [28] S. R. Aleti and K. K urakula, “Ev aluation of lightweight CNN architectures for multi-species animal image classication. 2024. [29] M. Li, Application of lightweight con v olutional neural netw orks in image classication, LUT Uni v ersity , 2025. [30] H. Briouya, A. Briouya, and A. Choukri, “Surv e ying lightweight neural netw ork architectures for enhanced mobile performance, in Communications in Computer and Information Science , v ol. 2168 CCIS, 2024, pp. 187–199. [31] X. Jin, Analysis of residual block in the ResNet for image classication, in Pr oceedings of the 1st International Confer ence on Data Analysis and Mac hine Learning , Changsha, China , 2024, pp. 253–257, doi: 10.5220/0012800400003885. [32] F . Li, T . Sun, P . Dong, Q. W ang, Y . Li, and C. Sun, “MSF-CSPNet: a specially designed backbone netw ork for f aster R-CNN, IEEE Access , v ol. 12, pp. 52390–52399, 2024, doi: 10.1109/A CCESS.2024.3386788. [33] G. Huang, Z. Liu, L. V an Der Maaten, and K. Q. W einber ger , “Densely connected con v olutional netw orks, in 2017 IEEE Confer - ence on Computer V ision and P attern Reco gnition (CVPR) , Jul. 2017, pp. 4700–4708, doi: 10.1109/CVPR.2017.243. [34] Y . Hou, Z. W u, X. Cai, and T . Zhu, “The application of impro v ed densenet algorithm in accurate image recognition, Scientic Reports , v ol. 14, no. 1, p. 8645, 2024, doi: 10.1038/s41598-024-58421-z. [35] A. Krizhe vsk y , “Learning multiple layers of features from tin y images, 2009. [Online]. A v ailable: https://www .cs.toronto.edu/ kriz/learning-features-2009-TR.pdf. [36] “CIF AR-10 and CIF AR-100 datasets. https://www .cs.toronto.edu/ kriz/cif ar .html (accessed Oct. 20, 2025). [37] A. V . Chuik o, V . V . Arlazaro v , and S. A. Usilin, “The impact of dataset size on the reliability of model testing and ranking, Bulletin of the South Ur al State Univer sity . Series “Mathematical Modelling , Pr o gr amming and Computer Softw ar e , v ol. 18, no. 2, pp. 102–111, 2025, doi: 10.14529/mmp250209. [38] S. B. V inay and S. Balasubramanian, A comparati v e study of con v olutional neural netw orks and c ybernetic approaches on CIF AR- 10 dataset, International J ournal of Mac hine Learning and Cybernetics (IJMLC) , v ol. 1, no. 1, pp. 1–13, 2023. [39] J. Chen, Z. W u, Z. W ang, H. Y ou, L. Zhang, and M. Y an, “Practical accurac y estimation for ef cient deep neural netw ork testing, A CM T r ansactions on Softwar e Engineering and Methodolo gy (T OSEM ), v ol. 29, no. 4, pp. 1–35, Oct. 2020, doi: 10.1145/3394112. Indonesian J Elec Eng & Comp Sci, V ol. 41, No. 2, February 2026: 753–763 Evaluation Warning : The document was created with Spire.PDF for Python.