Indonesian J our nal of Electrical Engineering and Computer Science V ol. 21, No. 3, March 2021, pp. 1622 1633 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v21i3.pp1622-1633 r 1622 What netw ork simulator questions do users ask? a lar ge-scale study of stack o v erflo w posts Syful Islam 1 , Y usuf Sulisty o Nugr oho 2 , Md. J a v ed Hossain 3 1 Nara Institute of Science and T echnology , Japan 2 Uni v ersitas Muhammadiyah Surakarta, Indonesia 1,3 Noakhali Science and T echnology Uni v ersity , Bangladesh Article Inf o Article history: Recei v ed Oct 2, 2020 Re vised Dec 2, 2020 Accepted Dec 23, 2020 K eyw ords: Netw ork simulators Discussion topics Stack o v erflo w ABSTRA CT The use of netw ork simulator as a modern tool in analyzing and predicting the beha viour of computer netw orks has gro wn to reduce the comple xity of its accurac y measure- ment. This attracts researchers and practitioners to share problems and discuss them to impro v e the features. T o communicate the related issues, users mo v e to online question- answering platforms. Although recent studies ha v e sho wn the popularity and benefits of adopting netw ork sim ulation tools, the challenges users f ace in using the netw ork simulator remain unkno wn. In this research paper , we e xamine 2,322 netw ork simulat or related stack o v erflo w question posts to g ain insights into the topics and challenges that users ha v e discussed. W e adopt the latent dirichl et allocation model to understand the topics discussed in stack o v erflo w . W e then in v estig ate the popularity and dif ficulty of each topic. The results sho w that users use stack o v erflo w as an implementation guideline for the netw ork simulation model. W e determine 8 discussion topics that are mer ged into 5 major cate gories. Simulation model configuration is the most useful topic for users. W e also observ e that tar get netw ork protocol modification and netw ork simulator installation are the most popular topics. Netw ork simulator installation and tar get netw ork protocol modification issues ha v e been challenging for most users. The findings also highlight future research that suggests w ays to help the netw ork simulator community in the early stages to o v ercome the popular and dif ficult topics f aced when using netw ork simulation tools. This is an open access article under the CC BY -SA license . Corresponding A uthor: Syful Islam Laboratory of Softw are Engineering Nara Institute of Science and T echnology , Japan Email: islam.syful.il4@is.naist.jp 1. INTR ODUCTION Since the comple xity of communication netw orks ha v e increased to meas u r e the accurac y of system beha vior in traditional analytical techniques [1], netw ork simulators (NS) are used as a modern technique to analyze and predict the beha vior of computer netw orks. In the implementation, NS allo ws the users to design, modify and test the netw orking protocols in a simulation mode and is modeled with de vices, links and application to report the performance of the tested netw orks. T oday’ s application of simulation techniques has attracted researchers and pr actitioners to discuss and find a w ay to impro v e the features. T o communicate the NS-related issues and update, users turn to an online question-answering platforms, such as stack o v erflo w (SO), and to get help and advice from community about the technical problems the y f ace. Stack o v erflo w is an online J ournal homepage: http://ijeecs.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1623 question-answering platform that accommodate the need of v arious users and discusses a wide-range of topics and programming languages. A number of research on the quality and content of online question-answering platforms, such as stack o v erflo w , has indicated its significance. Zag alsk y et al. [2] reported that SO with R-tag has become a communication channels that ha v e a relationship wit h the topics discussed in R softw are de v elopment community forum. Squire [3] sho wed that the mea surement of quality , users’ participation, and the ef f ecti v eness of responding time in the SO forum ha v e been the k e y f actors that caused de v elopers to mo v e from mailing list. Calef ato et al. [4] proposed a guideline of making a successful question in SO forum, while W ang et al. [5] analyzed the important f actors that impact the time of recei ving an accepted ans wer in four Stack Exchange websites. SO discussion topics were also in v estig ated in se v eral studies. Be yer et al. [6] performed a lar ge scale empirical study on SO to a n a lyze topics and the current trends that de v elopers discuss. Finally the y automatically classify the posts into se v en question cate gories. Other study on mobil e-de v elopers topics in SO w as conducted by Rosen and Shihab [7]. Extending prior w orks that analyze the quality of SO questions and their topics, in this paper , w e conduct a lar ge scale empirical study on the topics co v ered by NS questions posted in SO. W e also in v estig ate the types of issues f aced by the NS users by cate gorizing the k e yw ords used in the questions . W e further study the NS questions that are most dif ficult to answer by calculating the PD score [8]. Based on the analysis of 2,322 NS-related questions from SO, we find that the NS-related threads seek ers use SO platform to discuss simulation model configura tion, tar get netw ork protocol modification, NS installation, simulation model performance measure and NS b uild error . Although simulation model configuration is the most useful topic amongst the NS users, it does not so popular . The most vie wed NS-related discussion topic by the users is NS installation. Most discussions in Stack Ov erflo w are initially triggered with a How-to type of questions. This indicates that NS users usually ask for an instruct ion, guidance or tut orial to solv e their problem s. Ho we v er , although the number of responses is high, NS installation and tar get netw ork protocol modification are the most dif ficult types of questions to get the appropriate answers. Our findings sho w that NS users need a s pecific discussion forum to reach the maturity le v el of similar project-specific discussion platforms, such as the Eclipse forum. So that the most common issues, such as plug-ins and documentation, could be identified and suggesting to ne w users ho w the y can address the issues that pre v ent their entry to the discussion forums [9]. In addition, our insights can be used for guidance to conduct future research on NS-related discussions in other channels and artef acts in the comple x softw are de v elopment en vironment. P aper Or g anization: the rest of this paper are or g anized is being as. S ection 2. presents the resea rch methodology . In detail, we describe the research questions, our procedures to collect data, and our online appendix. The study results are presented in section 3. to answer the formulated research questions. The implication of the study are then e xplained in section 4. section 5. and 6. describe the threat to v alidity and present the related w orks, respecti v ely . Finally , we conclude the paper in section 7. 2. RESEARCH METHOD This section presents our research questions, data collection process, and the replication packages in an online appendix. 2.1. Resear ch questions This study aims to e xtract insights into NS-related discussion characteri stics on the SO platform. T o achie v e this goal, we ha v e created three research questions to guide our research. W e present these questions with moti v ation. RQ 1 : What kind of topics presented in the netw ork-simulators related discussion? Netw ork simulators (NS) ha v e become a high demand for netw ork engineers and researchers [10] . Dif ferent type of users will ha v e v arious NS-related problems that require a dif ferent area of e xpertise. F or e x a mple, some users require specific e xpertise i n the tcl scripting, b ut others could ha v e problems on netw ork protocols, or design features. Thus, the dif ficulties f aced by users are lik ely to dif fer . Since users get the benefit from question-answering platforms to communicate issues, the objecti v e of this research quest ion is to understand the most useful and popular NS topics that are frequently f aced by NS What network simulator questions do user s ask? a lar g e-scale study of stac k o verflow posts (Syful Islam) Evaluation Warning : The document was created with Spire.PDF for Python.
1624 r ISSN: 2502-4752 community . In addition, identifying widely discussed NS topics is the first step in highlighting issues that are g aining more attention. RQ 2 : What types of questions do users face? T aking the results from RQ 1 , we then set out to empirically study the types of question that were ask ed by users. Pre vious study [7] sho ws that users ask the questions in dif f erent types (i.e. ho w , what, wh y). Similar to the approach of prior study [7], this analysis is performed to identify the nature of dif ficulties encountered while using NS. RQ 3 : What topics are the most dif ficult to answ er? The k e y m o t i v ation of this RQ is to in v estig ate the topics that are dif cult to answer . Finding the topics that are hard to answer will help the users to get more attention from the NS community . Furthermore, it highlights the topics that require better support (tools/frame w ork/ of ficial documentation) for addressing NS usage related dif ficulty . 2.2. Data Collection Figure 1 outlines the methodology for col lecting the data which is described is being as. W e initially do wnloaded the latest SO data dump (July 2008 to December 2019) that is publicly a v ailable on the SO T orrent [11] . The SO data contains all the Q&A with the metadata (creation date, f a v ourite count, vie ws, and score). The initial collected dataset contains 46,947,633 posts, where 39.83% (18,699,426) are question posts and 60.17% (28,248,207) are answer posts spanning from July 2008 to December 2019. Step 1: Filter using # simulator tag. SO posts are typically tagged by rele v ant tags to impro v e visibility of the posts. F o llo wing the similar approach that w as used in pre vious study [7] , we collect the initial NS dataset by filtering posts that contain simulator as a tag w ord. The output of this step is 1,407 NS posts. Step 2: Disco v er rele v ant tags. In this step, we e xtract the co-occurring tags with simulator from 1,407 posts to disco v er rele v ant tags. One major risk of disco v ering rele v ant tag is the possibility of introducing noise in the main dataset. F or e xample, JNS ja v a net w ork simulator (JNS)for implementing ns2 is a rela v ant post that contains tag w ords Ja v a along with simulator and ns2. Therefore, to mitig ate this problem, we group and aggre g ate the tar get t ags through semi-automatic process. In detail, the proces s includes string search using manual v erification of ne wly e xplored tags. In addition, we v alidate the tags by applying tag rele v ance threshold (TR T) and tag significance threshold (TST) as metrics to v alidate tags: T R T tag = # tag posts S um (# tag posts ) (1) T S T tag = # tag posts # popul ar tag posts (2) where #tag posts is the number of NS posts for the tag, Sum(#tag posts) is the total number of pos ts for the tag, and #popular tag posts means number of NS posts for most popular tag. F or instance, omnet ++ is a tag w ord that occur only 11 times as a co-occurring tag in simulator tagged post while the total number of question posts on SO is 1406. Therefore, we also included such kind of tags in the final tag set. Thus, the output of step 2 is manually v alidated 4 tags (see T able 1 ) as being representati v e of NS discussions. Step 3: Collect tagged posts. After getting the NS-related tag sets, we utilize those tags to identify and e xtract posts to create the final NS post dat aset. The output of step 3 is 2,322 posts from SO being used as final dataset in the subsequent sections. Step 4: Extract post title and preprocess for LD A. In this step, we apply a filter to remo v e irrele v ant information. F or topic modeling we only focus on title of the question posts since body of post can introduce noise to our analysis. Using the similar approach t o prior studies [7] to e xtract the title of the posts, we performed pre-processing of the data. This includes remo v al of emails, ne wline characters, stop w ords using re gular e xpression [12] and p ython NL TK [13]. W e s ubsequently b uild a bigram model using Gensim [14] and lemmatize the w ords to map the original w ords. The output of step 4 is NS post title corpus which is used as the input of LD A (Latent Dirichlet Allocation) model. Step 5: LD A topic modeling. As illustrated in Algorithm 1 , in this step, we identify the NS topics using SO post title. T o obtain the topic names, we use the LD A technique [15] , which w as also used by pre vious studies [7, 16-18]. Indonesian J Elec Eng & Comp Sci, V ol. 21, No. 3, March 2021 : 1622 1633 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1625 SO T orre nt Filt er  using #sim ula tor  tag Data   col le ct ion Final   Data set Col le c ta gge post s 1 3 Ext rac post  ti tl and pre - proc ess  for LDA Sele ct   opti m al   topic num ber 1 Cat egor iz posts int topi cs 2 La bel   topi cs  base on  key wo r and p os ti tl e 3 Pos ts  ca te gorize into  la bel ed  topi c LDA  t opic  m odell ing Discove rel eva nt  ta gs 2 Figure 1. Ov ervie w of the methodology of NS study T able 1. The tag list used to identify and e xtract NS related posts. The TR T and TST v alues are presented in percentages Filtered tag #Initial posts #Final posts TR T (%) TST (%) simulator 1,407 1,407 100.00 100.00 ns2 15 599 2.50 1.06 ns-3 11 316 3.48 0.78 omnet++ 11 1,406 0.78 0.78 In this study , we apply the Mallet model of the LD A technique [19] to create group of NS posts in our dataset based on the k e yw ords e xists in the title of posts. T o obtain the optimal number of topics k, we perform the modeling process in se v eral iterations. First, we run the LD A for range (0-50) with 3 step size increment. Second, we choose the sub-optimal range (4-20) based on the coherence score [20]. Third, we ag ain run the model for sub-optimal range with 1 step size increment and thus optimally come-up with 8 topics. Finally , we run the model with topic number k=8 and obtain 8 NS topics with their associated k e yw ords (20 k e yw ords per topic). Algorithm 1: NS topic modeling using LD A Input: N S q posts =Stack Ov erflo w NS question posts obtained in step-3 Result: Suggested NS topics ( k ) and k e yw ords Method: N S al l post titl e =Extract titles from N S q posts ; N S post titl e =Preprocess the N S al l post titl e to remo v e noise; f or run LD A topic modeling on N S post titl e using a custom r ang e ( N 1 N 2 ) do iterate with step size 3; compute coherence score for each topic number; end select sub-optimal topic range ( O 1 O 2 ) based on coherence score; f or run LD A topic modeling on N S post titl e using sub-optimal r ang e ( O 1 O 2 ) do iterate with step size 1; compute coherence score for each topic number; end select optimal topic number k based on coherence score; run the LD A topic modeling on N S post titl e for the optimal topic number k return Suggested NS topics ( k ) and k e yw ords 2.3. Online A ppendix W e Publish the replication package. It contains (1) the NS dataset, (2) p ython codes, and (3) results of the study . The package is a v ailable at https://github .com/syful-is/Netw ork/simulator . What network simulator questions do user s ask? a lar g e-scale study of stac k o verflow posts (Syful Islam) Evaluation Warning : The document was created with Spire.PDF for Python.
1626 r ISSN: 2502-4752 T able 2. T op 5 discussion topics that relate to netw ork simulators T opic name T op 10 k e yw ords #Posts Simulation model configuration v ein, node, pack et, netw ork, message,send, omnet, v ehicle, create, node 999 T ar get netw ork protocol modification file, implement, find, add, function, route, set, parameter , code, module 527 NS installation installation, error , omnet, omnetpp, v ariable, windo w , unable, read, problem, ub untu 278 Simulation model performance measurement simulation, run, time, calculate, w ork, delay , distance, end, through- put, result 276 NS b uild error omnet, inet, mak e, project, b uild, link, error , library , f ail, command 242 sum 2,322 3. RESUL T AND DISCUSSION This section describes the analyses of SO posts and topics to answer the research questions. In details, we present each research question alongside with the approach and the results. 3.1. RQ 1 : What kind of topics presented in the netw ork-simulators related discussion? Approach: T o answer this RQ , we apply the LD A topic modeling for identifying topics based on the ti tle of NS post as described in section 2. W e label the topic names based on the k e yw ords suggested by LD A and by a manual reading of the top 25 question posts for each topic. Manual analysis of topic k e yw ords and question posts re v eals that some topics ha v e similar meanings and ask similar types of questions, such as k e yw ords related to simulation model and netw ork model configuration. While these are dif ferent topics, the y relate to simulation model configuration. Therefore, some topics are mer ged and grouped into the same topic name. Thus, we obtain 5 final topic names from 8 topics suggested by LD A topic modeling.In addition to the results of this RQ , we will also look at the most popular NS topics among users. T o identify the most use ful and popular topics, we use three dif ferent metrics (i.e., score, f a v orite, and vie ws) that were also used in pre vious studies [7, 8, 21, 22]. W e used the SO tour [23 ] as reference for definition of the metrics. The a v erage score of the NS question posts. According t o SO tour , members are allo wed to up-v ote posts that are considered useful to users. This v otes are summarized as the score. W e use this score as one of the metric to measure the usefulness of the post topics. The a v erage number of posts mark ed as f a v ourite by SO users. Thi s metric is used to measure the usefulness of the post topics. The a v erage number of vie ws of the post by both unre gistered and re gistered users. According to the SO T our , if a question post i s vie wed by man y users, this post is considered popular among them. Therefore, this metric sho ws the popularity of the topic. Results: LD A topic modelling on SO posts suggest that users mainly discuss 5 NS-related topics. T able 2 sho ws the 5 topic names, number of posts for each topic, and the top 10 associated-k e yw ords obtained from LD A topic modeling. W e find that, simulation model configuration is the most com mon topics discussed by users, follo wed by tar get netw ork protocol modification. The other three topics discussed by users are NS installation, simulation model performance measurement, and b uild error. In the second part of analysis, we e xamine the NS topics usefulness and popularity among users. T able 3 sho ws that simulation model configuration is the most useful topic based on the a v erage score and f a v orite count of the posts. This indic ates that, users find the posts re lated to this topic as most useful. In addition, we find that based on vie w count of posts, tar get netw ork protocol modification and NS installation are the top tw o most popular topics among the users. This indicates that, posts related to this topics are most commonly searched by NS users on SO. Indonesian J Elec Eng & Comp Sci, V ol. 21, No. 3, March 2021 : 1622 1633 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1627 T able 3. Usefulness and popularity of the top 5 discussion topics. The score, f a v orite and vie w counts are presented in a v erage T opic name Score F a v orite V ie w Simulation model configuration 0.37 0.28 420.52 NS b uild error 0.32 0.15 477.67 Simulation model performance measurement 0.32 0.21 488.59 T ar get netw ork protocol modification 0.31 0.14 634.86 NS installation 0.18 0.09 602.08 RQ 1 : Summary LD A topic modelling on SO posts suggest that users mainly discuss 5 NS-related topics. W e find that Simulation Model Configuration is the most useful topic to users. In addition, we observ e that tar get netw ork protocol modification and NS installation issues are the most popular topics among the users. 3.2. RQ 2 : What types of questions do users face? Approach: T o e xamine the questions f aced by the users, we apply the same approach as pre vious studies to identify the types of posts in SO [7, 8, 24]. W e utilize tw o steps to obtain the results. First, we manually in v estig ate 30 random sample questions using k e yw ords (i.e., ho w , what, and wh y). W e observ ed that some question ask ed for instruction without using ‘ho w-to’ k e yw ord. F or e xample ‘Is there an yw ay to change co v erage in wireless node?’ is a ‘ho w-to’ types NS question post. Therefore, after manual in v estig ation, we also include ‘Is there an yw ay’ as search string to identify ‘ho w-to’ questions. In the same w ay , we append a k e yw ord list to classify the post into question (i.e., ho w , what, and wh y) types. Finally , we apply the k e yw ord list in the post title and body to obtain classification of the questions. The cate gories used in NS question pos t labelling are as: Ho w - is a question type asks for instructions to perform a tas k. F or e xample, “how to dr aw xgr aph in satcom in ns2 simulation” . This question asks for instruction on xgraph feature in ns2 simulation. What - is a question type ask for information that are more abs tract, conceptual in nature, asking for decision help, or ask on non-functional requirements. F or e xample, “What network simulation model should I use for simulat ing the behavior of an ad-hoc network in OMNET++” . This type of question is asking on the netw ork simulation model for predicting beha vior of an adhoc netw ork in OMNET++. Wh y - is a question post that ask for re vie w , reason, or cause for something. F or e xample, “Why is the following tcl script for NS2 gives err or for pr ocedur e implementation?” . This question asks for clarifying wh y an error has happened. Others - is a question post that can’ t be classified by k e yw ord search in the title and body of the post. F or e xample, “#includes in OMNeT++ Unit T ests” . Results: T able 4 sho ws that most NS posts of each topic ask for an instruction to perform their specific tasks. This is indicated by high percentage of ho w-to type of question, ranging from 55.80% to 78.06%. The NS Installation topic has the highest ho w-to type of question 78.06%, sho wing a necessity for rich resources of guidance to install and manage the NS tools. The Simulation Model Configuration has the highest What type of post 12.02%. This suggests the necessity of general information about supported features on simulation model configuration of the NS. Finally , the Si mulation Model Performance Measure has the highest wh y type of post 2.17%. This suggests the necessity of discussion forums and impro v ed documentation on simulation model performance measurements issues. RQ 2 : Summary Results sho w that users mainly ask ho w-to type of questions, follo wed by what and wh y, respecti v ely . In addition, we find that NS Installation is the most dominant t opic in asking for an instruction (i.e., ho w-to). This indicates the necessity of pro viding a guidance to reliably install NS tools. What network simulator questions do user s ask? a lar g e-scale study of stac k o verflow posts (Syful Islam) Evaluation Warning : The document was created with Spire.PDF for Python.
1628 r ISSN: 2502-4752 T able 4. Dif ferent issues of the top 5 topics f aced by de v elopers. The v alues are presented in percentage T opic name ho w-to what wh y others NS installation 78.06 4.68 1.43 15.83 NS b uild error 70.25 7.85 0.00 21.90 T ar get netw ork protocol modification 64.71 8.92 1.33 25.05 Simulation model configuration 56.66 12.02 0.90 30.43 Simulation model performance measurement 55.80 9.05 2.17 32.97 Year # Question post 0 50 100 150 200 2008 2010 2012 2014 2016 2018 Simulation model configuration  Taget netwok protocol modification NS installation Simulation model performance measure NS build error Figure 2. NS topic cate gories e v olution o v er time 3.3. RQ 3 : What topics are the most dif ficult to answ er? Approach: T o answer this RQ , we in v estig ate the dif ficulty of the NS topic by utilizing four met rics which are also used in the pre vious studies [7, 8]. F or the first three metrics, we collect the metadata of each topic, that are, answer count, accepted answer count, and comments count to compute the a v erage v alues. F or the fourth metric, that is, the Probability of Dif culty (PD) score, we e xtract the answer count and vie w count of each topic. W e then calculate the a v erage (i.e., a vg. answer count and a vg. vie w count) to find the PD score, formulated as: P D scor e = Av er ag e Answ er C ount Av er ag e V iew C ount 100% (3) In general, a high number of vie ws on a topic b ut a small number of answers indicates that only a small number of people can answer the topic’ s questions. Therefore, we adopt the PD score to measure the dif culty of the topic. The lo wer the PD score, the harder it is to answer questions in NS-related discussion topics. Results: T able 5 sho ws the dif culty measure of NS topics. W e find t hat NS topics in this analysis do not significantly dif fer in term of the a vg. answer count (0.9-0.99), a vg. accepted answer count (0.35-0.38) and a vg. comment count (1.19-1.71). Hence, we consider the PD score to determine the most dif ficult topic to answer . As described in T able 5 , NS Installation is the most dif ficult topic f aced by users, follo wed by T ar get Net w ork Protocol modification. Although NS b uild error and Simulation model performance measurement are not as dif ficult as the top tw o topics, the y ha v e similar le v el of dif ficulty according to our analysis. Finally , simulation model configuration is the least dif ficult topics to answer , accounting for 0.095% of PD score in the result. Indonesian J Elec Eng & Comp Sci, V ol. 21, No. 3, March 2021 : 1622 1633 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1629 T able 5. Dif ficulty measure of NS topics. The of number answers, accepted answers and comments are presented in a v erage while the PD score is presented in percentage T opic name answers accepted comments PD (%) answers NS installation 0.99 0.35 1.71 0.057 T ar get netw ork protocol modification 0.92 0.38 1.19 0.060 NS b uild error 0.93 0.35 1.27 0.073 Simulation model performance measurement 0.96 0.36 1.22 0.074 Simulation model configuration 0.90 0.35 1.26 0.095 RQ 3 : Summary NS Installation and Target Network Protocol modification are the top tw o most dif ficult NS-related topics to answer , with PD score 0.057% and 0.060%, respecti v ely . 4. IMPLICA TIONS The results of this study will help the NS users better understand and focus on the most press ing NS issues. This s ection describes ho w our results can help practitioners, researchers, and guides to softw are de v elopment projects. 4.1. Practitioners According to T able 5 , the most popular topic on ho w to install NS has less probability to recei v e accepted answer . The NS community can benefit from these findings to de vise better tutorials (i.e., video tutorial, documentation) to reduce the barrier of NS usage. Our findings can also help NS community to prioritize the task by considering the areas of the dif ficult NS topics while performi ng e xperiments. NS installation, b uild error , simulation model configuration are the topics with highest share without accepted answers. NS de v elopers can tak e these issues into account to impro v e the user e xperience. In addition, Figure 2 sho ws the trend of NS topic cate gories e v olution o v er time. This trend hint s the necessity of dedicated online blog for each NS that can help to create lar ge community and impro v e NS usage e xperience. 4.2. Resear chers Our lar ge-scale em pirical research pro vides an o v erall vie w of NS-related topics being discus sed on the SO pl atform. W e found v e NS-related topics discussed in SO, that are, Simulation Model Configuration, T ar get Netw ork Protocol modification, NS Installation, Simulation Model performance measure, and NS Build error . W e also focus the most popular and dif ficult NS topics. Therefore, we encourage researchers to de v elop techniques to help NS users answer this dif ficult question. 4.3. Softwar e de v elopment pr ojects The findings of this study recommend for softw are de v elopment projects to consider preparing project- specific discussion forums. As observ ed in the study , the number of NS-related discussions in SO ha v e increased o v er time, b ut the attention from the community is not high. This leads to the most problem discussed remains unsolv ed. The results indicate that considering to prepare project-specific discussion forums is important. Sharing the project-specific problems in the project-related community forums will increase the probabi lity of getting the solutions. This will also enable de v elopers to pro vide a com munity-related information or announcement. As reported by T antisuw ankul et al. [25] that softw are projects tend to adopt communication channel for both capturing ne w kno wledge and updating e xisting kno wledge, and since the inform ation or announcement that specifically relates to the softw are projects is important to share amongst the community , pro viding an NS-specific discussion forum is necessary . 5. THREA T T O V ALIDITY There are se v eral threats that may af fect the v alidity of the NS study . This section describes the threats to v alidity in detail. What network simulator questions do user s ask? a lar g e-scale study of stac k o verflow posts (Syful Islam) Evaluation Warning : The document was created with Spire.PDF for Python.
1630 r ISSN: 2502-4752 5.1. Construct v alidity The threats to construct v alidity may emer ge in our e xperiments. During t he SO question e xtracti on phase, we may miss some NS questions due to the tag-based e xtraction technique. Since the number of this issue is small, thus, the impact of the missing questions is not significant. 5.2. Exter nal v alidity The threats to e xternal v alidity may appear in data preparation phase. W e conducted an empirical study of 2,322 NS questions from Stack Ov erflo w , b ut could not generalize the results to other question-answering online platforms. 5.3. Reliability W e mitig ate the threats to reliability by preparing online appendix of the dataset and scripts. This online appendix is described in Section 2.3. 6. RELA TED W ORK This section describes the NS-related w ork. First, we re vie w some prior research on NS and it s implementation on netw ork related research domain. Second, we discuss SO-based case studies and studies on topic modeling. 6.1. Netw ork simulator There are se v eral studi es on NS. Prior w orks compared NS tools to subjug ate barrier to select the suitable one that support users objecti v e [10] and the usage on wireless netw orks [26-29]. In another paper , Campanile et al. conducted a case study to demonstr ate the ef fecti v eness of netw ork simulator in real applications, and modeling studies [27]. Comparati v e studies on wireless netw ork simulators were also conducted by Lessman et al. [30] and K orkalainen et al. [31] to help other users to quickly identify which simulator is most suitable for their needs. There are also some pre vious research [32-36] to utilize NS as the tools to perform simulation w ork for dif ferent wireless netw ork scenarios and intrusion detection. As NS impleme ntations increase in academia and industry , we in v estig ate the problems that users are f acing. The results of this study pro vide the research community with insights to understand areas that require more attention. 6.2. Stack o v erflo w The SO data ha v e also been analyzed in se v eral studies. In a study by Rosen and Shihab [7], the authors has summarized mobile-related questions from SO to identify specific issues on v arious mobile platforms The SO dataset w as also used to understand the challenges chatbot de v elopers [22]. Mahajan et al. [37] proposed a recommendation system to fix run-time e xception by utilizing SO dataset. Riccardo et al. [38] utilized SO dataset in PostFinder system to support softw are de v elopers with suitable code snippets. Cai et al. [39] proposed a API recommendation method that also depends on SO dataset. Uddin et al. [40], proposed an automated system to mine the API usage. This study also utilize SO as a primary data source. As f ar as we kno w , no research has been conducted on SO NS-related posts. Our study complements pre vious w ork at SO by analyzing NS-related posts. W e collected and cate gorized NS topics and in v estig ated the popularity and dif ficulty of the topics. W e belie v e that our research sheds spot light on the areas where NS users are f acing challenges. 7. CONCLUSION T o understand the characteristics of NS issues discussed by users, we conducted a lar ge-scale empirical study on 2,322 NS questions posted in SO. In our study , we analyze (i) the types of discussion topics and their popularity , (ii) types of questions that frequently f aced by the users, and (iii) the dif ficulty of topics shared in SO. The results of our study ha v e sho wn that simulation model confi gu r ation is the most common discussed and useful topic amongst the users, while tar get netw ork protocol modification and NS installation are become the most popular NS-related topics in SO. NS users are frequentl y ask for an instruction of NS installation by posting a ho w-to type of question. This suggests the importance of pro viding impro v e d NS installation document. Furthermore, the findings sho w that the most dif ficult NS related questions posted in SO are NS installation and tar get netw ork protocol modification. Based on this study , we al so sho ws the increase of NS rel ated discussion in Stack Ov erflo w . Therefore, there are m an y open issues in future w ork, such as a comprehensi v e understanding of the e v olution of NS-related discussions and further study of NS topics on other online discussion platforms. Indonesian J Elec Eng & Comp Sci, V ol. 21, No. 3, March 2021 : 1622 1633 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1631 REFERENCES [1] V . Stock er , G. Smaragdakis, W . Lehr , S. Bauer , “The Gro wing Comple xity of Content Deli v ery Netw orks: Challenges and implications for the Internet Ecosystem, T elecommunications Polic y , v ol. 41, no. 10, 1003–1016, 2017. [2] A. Zag alsk y , D. M. German, M. A. Store y , C. G. T eshima, G. Poo-Caamano, “Ho w the r community creates and curates kno wledge: An e xtended study of stack o v erflo w and mailing lists, Empirical Softw are Engineering , v ol. 23, no. 2, pp. 953–986, Apr . 2018. [3] M. Squire, “Should we mo v e to stack o v erflo w?: Measuring the utility of social media for de v eloper support, in Proceedings of the 37th International Conference on Softw are Engineering - V olume 2, ser . ICSE ’15. Piscata w ay , NJ, USA: IEEE Press, 2015, pp. 219–228. [4] F . Calef ato, F . Lanubile, N. No vielli, “Ho w to ask for technical help? e vidence-based guidelines for writing questions on stack o v erflo w , Information and Softw are T echnology , v ol. 94, pp. 186–207, Feb . 2018. [5] S. W ang, T .-H. Chen, A. E. Hassan, “Understanding the f actors for f ast answers in technical qa websites, Empirical Softw are Engineering , v ol. 23, no. 3, pp. 1552–1593, Jun. 2018. [6] S. Be yer , C. Macho, M. Di Penta, M. Pinzger , “What kind of questions do de v elopers as k on stack o v erflo w? a comparison of automated approaches to classify posts into question cate gories, Empirical Softw are Engineering , v ol. 25, no. 3, pp. 2258–2301, 2020. [7] C. Rosen, E. Shihab, “What are mobile de v elopers asking about? a lar ge scale study using stack o v erflo w , Empirical Softw are Engineering , v ol. 21, no. 3, pp. 1192–1223, 2016. [8] X. L. Y ang, D. Lo, X. Xia, Z.-Y . W an, J.-L. Sun, “What security questions do de v elopers ask? a lar ge-scale study of stack o v erflo w posts, Journal of Computer Science and T echnology , v ol. 31, no. 5, pp. 910–924, 2016. [9] N. Kahani, M. Bagherzadeh, J. Dingel, J. R. Cordy , “The problems with eclipse model ing tools: A topic analysis of eclipse forums, in Proceedings of the A CM/IEEE 19th International Conference on Model Dri v en Engineering Languages and Systems , ser . MODELS ’16, 2016, pp. 227–237. [10] M. H. Kabir , S. Islam, M. J. Hossain, S. Hossai n , “Detail comparison of netw ork simulators, International Journal of Scientific and Engineering Research , v ol. 5, no. 10, pp. 203–218, 2014. [11] S. Baltes, L. Dumani, C. T reude, S. Diehl, “Sotorrent: reconstructing and analyzing the e v olution of stack o v erflo w posts , in Proceedings of the 15th International Conference on Mining Softw are Repositories , MSR 2018, Gothenb ur g, Sweden, May 28-29, 2018, A. Zaidman, Y . Kamei, and E. Hill, Eds. A CM, 2018, pp. 319–330. [Online]. A v ailable: https://doi.or g/10.1145/3196398.3196430. [12] “Re gular e xpression operations” accessed 02-01-2020 . [Online]. A v ailable: https://docs.p ython.or g/3/ library/re.html. [13] “Python NL TK” accessed 02-01-2020 . [Online]. A v ailable: https://www .nltk.or g/. [14] “Gensim model” accessed 02-01-2020 . [Online]. A v ailable: https://radimrehurek.com/gensim/ [15] D. M. Blei , A. Y . Ng, M. I. Jordan, “Latent dirichlet allocation, Journal of ma chine Learning research , v ol. 3, pp. 993–1022, Jan 2003. [16] H. Zhang, S. W ang, T .-H. Chen, A. E. Hassan, “Reading answers on stack o v erflo w: Not enough!” IEEE T ransactions on Softw are Engineering , 2019. [17] S. Liu, R.-Y . Zhang, T . Kishimot o, Analysis and prospect of clinical psychology based on topic models: hot research topics and scientific trends in the latest decades, Psychology , Health Medicine , pp. 1–13, 2020. [18] S. Choi, J. Seo, An e xploratory study of the research on care gi v er depression: Using bibliometrics and lda topic modeling, Issues in Mental Health Nursing , 2020, pp. 1–10. [19] A. McCallum, AK McCallum, S Thrun, T Mitchell, “Mechine Learning , v ol. 39, no. 2, pp. 103-134, 2020. [20] S. Boussaadi, H. Aliane, A. Cerist, P . O. Abdeldjalil, “Modeling of scientists profiles base d on lda. [21] M. Zahedi, R. N. Rajapakse, M. A. Babar , “Mining questions ask ed about continuous softw are engineering: A case study of stack o v erflo w , in Proceedings of the Ev aluation and Assessment i n Softw are Engineering , pp. 41–50, 2020. [22] A. Abdellatif, D. Costa, K. Badran, R. Abdalkareem, E. Shihab, “Challenges in chatbot de v elopment: A study of stack o v erflo w posts, in Proceedings of the 17th International Conference on Mining Softw are Repositories , pp. 174–185, 2020. [23] “SO T our” accessed 10-02-2020. [Online]. A v ailable: https://stack o v erflo w .com/tour What network simulator questions do user s ask? a lar g e-scale study of stac k o verflow posts (Syful Islam) Evaluation Warning : The document was created with Spire.PDF for Python.