Indonesian J our nal of Electrical Engineering and Computer Science V ol. 14, No. 3, June 2019, pp. 1356 1372 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v14i3.pp1356-1372 r 1356 T o ward Semantic Similarity Measur e Between Concepts in an Ontology Suwan T ongphu, Boontawee Suntisri v arapor n, P akinee Aimmanee School of Information, Computer , and Communication T echnology , Sirindhorn International Institute of T echnology , Thammasat Uni v ersity , Thailand Article Inf o Article history: Recei v ed Oct 1, 2018 Re vised Dec 10, 2018 Accepted Jan 15, 2019 K eyw ords: Description logic Semantic Analysis Concept Similarity Reasoning ABSTRA CT A concept similarity measure is one classical problem in Description Logic which aims at identifying similarity between concepts in an ontology . Mea- suring a distance between concepts is an essential process. Most methods used for measuring, the y usually do not tak e semantic for consideration. This w ork introduces a ne w method for concept si milarity measure. The proposed method semantically analyzes structures of tw o concepts and then computes the similar - ity score based on the number of shared structures. The ef ficienc y of the pro- posed algorithm is measured by means of the satisf action of desirable properties and intensi v e e xperiments on the S N O M E D C T ontology . Copyright c 2019 Institute of Advanced Engineering and Science . All rights r eserved. Corresponding A uthor: Suw an T ongphu, School of Information, Computer , and Communication T echnology , Sirindhorn International Institute of T echnology , Thammasat Uni v ersity , Thailand. Email: stongphu@gmail.com 1. INTR ODUCTION W ith a rapid increasing of internet users and a massi v e online data, retrie ving a rele v ant information from a gi v en query is one of the most challenging topics. Semantic querying [1] i s one recent aspect of infor - mation retrie v al [2], which aims at representing kno wledge in a well-found w ay and incorporating intelligence into the system. W i th the help of Description Logics (DLs) [3, 4], the use of W eb Ontology language (O WL) [5, 6] to model the kno wledge has been introduced and is lately recommended as a ne w standard for kno wl- edge representation by W3C. A f amily of Description Logics (DLs) is a common tool to formally equip the kno wledge base and of fers se v eral decidable reasoning services which are suf ficient for se v eral scenari os. F or e xample, determining whether or not a concept is a subclass of another one can be done using a concept sub- sumption. Besides a usefulness of classical reasoning services [7], there are some cases in which the classical reasoners are inapplicable. An e xample includes a measuring similarity score between conce p t s. By using a classical DL reasoner , it is e vidently insuf ficient since subsumption reasoning service simply returns a boolean v alue so the y cannot pro vide a de gree of similarity between concepts. Se v eral methods ha v e been proposed for measuring similarity between concepts. The most well- kno wn techniques are the distance-based [8] and the patte rn-based analysis [9, 10]. These methods basically can be used for only learning a ne w pattern of concept. Ho we v er , due to the f act that the y ha v e a lack of semantic J ournal homepage: http://iaescor e .com/journals/inde x.php/ijeecs Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1357 analysis. The y can only pro vide rough concept similarity outputs. T o address this problem, modern semantic- based techniques, which aim at quantitati v ely analyzing v alues of concepts by means of their definitions, are lately introduced. The techniques are normally equipped to w ork with a dif ferent f amily of DLs. Distel et al. [11] proposed a ne w method for concept dissimilarity measure. The dif ference between tw o concepts C and D is measured by means of the number of operations required for relaxing a concept D until subsumed by a concept C . If the tw o concepts are concluded to be totally similar , the method returns 0 as an output. In addition to the method the y proposed, the dissimilarity score is computed based on the number of relaxing operations. Jaccard [12] proposed a simple method for computing similarity between concepts. Ho we v er , the proposition merely supports the concept conjunction, which is mostly not practical in man y real life ontologies. F or e xample, it has been pro v ed that b uilding a lar ge-scaled ontol o gy requires at least a f amily of DL E LH (see e.g. S N O M E D C T [13] and gene ontology). Recently , a similarity measures for a less-e xpressi v e DL F L 0 w as proposed by Racharak and Suntisri v araporn [14]. Lehmann and T urhan [15] e xtended the w ork of Jaccard to support more constructors. The y proposed a ne w similarity frame w ork for DL E LH . The operators of the proposed formulas are described by means of desired properties and left for interested users to customize. In the w ork proposed by Jano wic [16], a more refined semantic measure w as proposed to emplo y high e xpressi v e DLs, e.g. ALN . The e xtension to support DL S H I is subsequently proposed in the later w ork [17]. d’Amato et al. [18] introduced a ne w method for ALE concept similarity measure. The method satisfies se v eral desirable properties including symmetric, equi v alent in v ariant, structural dependent, and re v erse subsumption preserving property . The adoption for DL ALC , which equally satisfies the same properties, is proposed in their later w ork [19]. In this w ork, we introduce a ne w algorithm for computing similari ty between concepts based on shared features. Unlik e an y other approaches which are tailored for a specific domain, this w ork proposes a ne w notion for a concept similarity measure for a general domain. The proposed method is designed to w ork with the kno wledge base modeled using at most the lightweight DL ALE H f amily . Comparing to more e xpressi v e DLs, modeling the kno wledge base using the f amily of ALE H is more practical since a computing time is polynomially bounded. Moreo v er , it is more con v enient to meet a lar ge-scale e xpansion. Examples include the modeling of kno wledge bases using the DL E L , e.g. the well-kno wn kno wledge bases for clinical terms ( S N O M E D C T ), le xical terms (W ordNet), and genes (Gene ontology). T o enable semantic measure, we first transform the concept descriptions to their equi v alent des crip- tion trees. The le v el of similarity from one concept to another is then measured based on ho w well the tw o description trees can be mapped. The o v erall similarity rate i s lastly reported as an a v erage of similarity . The ef- fecti v eness of the proposed method is measured by means of satisf actory of desirable properties and compared to state-of-the-art methods. In the ne xt section, we briefly introduce the notion of DLs, describe the e xpansion process for a concept description, describe the rules which we use to normalize e xpanded concept description, and also pro vide steps which we use to construct a so-called concept description tree. Later sections introduce notions of a homomorphism score which measures a similarity from one concept description tree to another . The notion of ALE H semantic similarity measure is introduced. The e xample of computation is e x emplified by means of a prototypical f amily ontology . More intensi v e e xperiments are performed on the well-kno wn S N O M E D C T ontology and reported in the e xperiment section. The last section gi v es a conclusion of this w ork. 2. B A CKGR OUND In DL ALE H , concepts are us ed to describe classes of objects and roles are used to describe their relations. In this w ork, we use CN to represent a set of concept names and RN to represent a set of role names. Comple x concept descriptions can be formulated based on CN , RN , and concept constructors such as a concept conjunction u (the upper section of T able 1 sho w all constructors for DL ALE H ). Con v entionally , we use the symbols r and s to represent role names ( r ; s 2 RN ), A and B to represent concept names ( A ; B 2 CN ), and C and D to represent comple x concept descriptions. F or e xample, let F emale ; Male ; P erson 2 CN and child 2 RN , we can define a concept of W oman by means of the follo wing concept description: F emale u P erson : Lik e wise, we can define a concept of Mother based on the e xisting concept W oman as follo w: T owar d Semantic Similarity Measur e Between Concepts in an Ontolo gy (Suwan T ongphu) Evaluation Warning : The document was created with Spire.PDF for Python.
1358 r ISSN: 2502-4752 W oman u 9 child : P erson : F ormally , we define the semantics of DL ALE H by means of an interpr etation I = ( I ; I ) , which is a pair of an interpretation domain I (i.e. a finite set of indi viduals of the domain of interest), and an interpretation function I (i.e. a function that maps A 2 CN to a subset A I of I and r 2 RN to a binary relation r I on I ). There are tw o f acilities to define a ne w concept: a) concept equi v alence ( ) and b) concept inclusion ( v ). See the syntax in the lo wer part of T able 1. F or e xample, we can define the concept Mother using the concept equi v alence as sho wn belo w: Mother W oman u 9 child : P erson : This i nfers that a mother is a w oman who has some child person and vice v ersa. Ho we v er , if a c on c ept is defined using the concept inclusion, it will be interpreted merely in a forw ard direction. F or e xample, if we define a concept F ather as follo ws: F ather v Man u 9 child : P erson this infers that a f ather is a man who has some child person. Ho we v er , it is still unkno wn whether a man who has some child person will be a f ather . Ne v erthel ess, for each concept inclusion B in which B v D , it can be equally transformed to a concept equi v alence B F u D where F is a fr esh concept name ( F is unkno wn). Therefore, the concept F ather can be transformed to the follo wing form: F ather F u Man u 9 child : P erson : In addition, assume that each defined concept has only one definition and does not contain an y c ycl ic depen- dencies, by recursi v ely repla cing defined concepts with their definitions, we ha v e a ne w equi v alent concept definition which contains only primiti v e concept names (concept names that appear only on the right-hand side of concept definitions). Symbolically , we denote by CN p ri a set of primiti v e concepts. W e call a set of concept definitions a kno wle d ge base or a terminology ( TBox ). F or instance, we can define the TBox for a f amily domain as a set of concepts sho wn in Figure 3. A TBox is unfoldable if all concept definitions are e xpandable. Gi v en, for e xample, the definition of MotherNoSon : MotherNoSon Mother u 8 child : W oman By replac ing Mother with W oman u 9 child : P erson and W oman with F emale u P erson , we then ha v e an equi v alent definition of MotherNoSon as follo ws: MotherNoSon F emale u P erson u 8 child : ( F emale u P erson ) u 9 child : P erson where P erson ; F emale 2 CN p r i . In symbol, for e v ery ALE H concept which defined in an unfoldable TBox, we assume without lost of generality in the follo wing form: l l i =1 P i u m l j =1 9 r j :C j u n l k =1 8 s k :D k (1) where P i 2 CN p ri , r j ; s k 2 RN , and C j ; D k 2 CN [ f> ; ?g . F or simplicit y , we assign P C := f P 1 ; : : : ; P l g , E C := f9 r 1 :C 1 ; : : : ; 9 r m :C m g , and A C := f8 s 1 :D 1 ; : : : ; 8 s n :D n g where l is the size of P C , m is the size of E C , and n is the size of A C . Additionally , gi v en that v be the transiti v e closure of v o v er the role names, we use the symbols R 9 r = f s 2 RN j r v s g to represent a set of super -roles of r and R 8 r = f t 2 RN j t v r g to represent a set of sub-roles of r . Indonesian J Elec Eng & Comp Sci, V ol. 14, No. 3, June 2019 : 1356 1372 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1359 T able 1. Syntax and semantics of the DL ALE H Name Syntax Semantics bottom ? ; top > I concept name A A I I atomic ne g ation : A I n A concept conjunction C u D C I \ D I e xistential restriction 9 r :C f x j 9 y 2 I : ( x; y ) 2 r I ^ y 2 C I g uni v ersal restriction 8 r :C f x j 8 y 2 I : ( x; y ) 2 r I ) y 2 C I g concept inclusion B v D A I D I concept equi v alent B D A I = D I role hierarch y r v s r I s I In addition to the e xpanded form of the ALE H concept description, there may e xists a case which mak es the description implicit. This can be eliminated by applying the follo wing rules o v er the e xpanded description: 8 r :C u 9 r :D ! 8 r :C u 9 r : ( C u D ) ; 8 r :C u 8 r :D ! 8 r : ( C u D ) ; 8 r : > ! > ; 9 r : ? ! ? ; C u ? ! ? : A u : A ! ? ; C u > ! C ; T o b e more illustrati v e, by applying the rules abo v e to the e xpanded form of MotherNoSon , we ha v e the follo wing normalized definition: F emale u P erson u 9 child : ( F emale u P erson ) u 8 child : ( F emale u P erson ) 3. RESEARCH METHOD In the w ork proposed by Baader and K usters [20], a characterization using homomorphism for an unfoldable ALE H TBox has been proposed. The authors pro v ed that if the concept C is subsumed by D , then there must e xist a homomorphism from a concept description tree of D to that of C . Our proposed concept similarity measure is directly deri v ed from a concept homomorphism, which is one important characterization of a concept subsumption. The measure is, ho we v er , e xtended for the case where the tw o concepts are out of a subsumption relation b ut there still e xist some shared structures. Definition 1. ( ALE H concept subsumption) Let C and D ar e ALE H concept descriptions whic h defined in the terminolo gy O , we say that C v D if C I D I . Mor eo ver , C D if C v D and D v C . T owar d Semantic Similarity Measur e Between Concepts in an Ontolo gy (Suwan T ongphu) Evaluation Warning : The document was created with Spire.PDF for Python.
1360 r ISSN: 2502-4752 C o n c e p t   E x p a n s i o n C o n c e p t   C C C o n c e p t   D D C o n c e p t   E x p a n s i o n C o n c e p t   N o r m a l i z a t i o n C o n c e p t   N o r m a l i z a t i o n D e s c r i p t i o n  T r e e   C o n s t r u c t i o n D e s c r i p t i o n  T r e e   C o n s t r u c t i o n S i m i l a r i t y  M e a s u r e   s i m ( C C , D D ) S i m i l a r i t y  S c o r e Figure 1. An o v ervie w of the similarity measure system Figure 1 depicts the o v ervie w of our similarity measure system. Starting with tw o input concept descriptions, we e xpand and transform them into the normal form s. A so-called ALE H description tree is then constructed. F or e xample, gi v en C an e xpanded and normalized concept description, we construct a concept description tree G C := ( V ; E ; v 0 ; `; ) where V is a set of nodes, E V V is a set of edges, v 0 is the root, ` : V ! 2 CN p ri is a function representing a set of node labels, and : E ! 2 RN is a function representing a set of edge labels. The follo wing sho ws the steps for constructing an ALE H description tree: i. Create a ne w node v 0 and assign P C to ` ( v 0 ) . ii. F or each 9 r :D j 2 E C , create a ne w node w and then introduce a ne w edge ( v 0 ; w ) with w an r -successor of v 0 and assign R 9 r to ( v 0 ; w ) . Repeat from step (i) by treating D j as C and w as v 0 . iii. F or each 8 s:D k 2 A C , create a ne w node w 0 and then introduce a ne w edge ( v 0 ; w 0 ) with w 0 an s -successor of v 0 and assign R 8 s to ( v 0 ; w 0 ) . Repeat from step (i) by treating D k as C and w 0 as v 0 . Theorem 1 sho ws that the concept subsumption can be characterized by means of a homomorphism mapping from an opposite direction. Theor em 1 ( Let C and D be ALE H concept descriptions, and G C and G D be the corresponding ALE H concept description trees. W e say that C v D if there is a homomorphism h : G D ! G C which maps all nodes and edges of G D to the corresponding nodes and edges of G C [21]) . . Figure 2. A homomorphism (dashed arro ws) mapping G Mother to G MotherNoSon and a f ailure of mapping (dotted arro ws) G Mother to G NonAdoptiveF athe r . T o be more visible, consider the normalized description of the concept MotherNoSon and the follo wing nor - malized description of the concept Mother and NonAdoptiveF ather : Mother F emale u P erson u 9 child : P erson ; NonAdoptiveF ather : F emale u P erson u 9 child : P erson u 8 achild : ? : (2) W e can construct the ALE H description trees G MotherNoSon , G Mother , and G NonAdoptiveF ather using the process pre viously described. Figure 2 sho ws a successful attempt of the homomorphism mapping from G Mother Indonesian J Elec Eng & Comp Sci, V ol. 14, No. 3, June 2019 : 1356 1372 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1361 to G MotherNoSon . It is ob vious that all nodes and edges G Mother can be mapped to G MotherNoSon . The figure also sho ws a f ailed attempt of a homomorphism mapping from G Mother to G NonAdoptiveF athe r . By Theorem 1, we can conclude that MotherNoSon v Mother b ut NonAdoptiveF ather 6v Mother . By emplo ying a classical subsumption reasoning service, it is ob vious that MotherNoSon is Mother and NonAdoptiveF ather is not Mother . Ho we v er , by analyzing the structure of G NonAdoptiveF ather and G Mother , there are some shared structure s (e.g. both are person and ha v e some child). Thus, there must e xist some similarity between these tw o concepts though out of subsumption relation. Our interest is to measure their de gree of similarity . 3.1. Homomor phism scor e From Theorem 1, it is ob vious that a subsumption relation can be characterized by means of a ho- momorphism mapping i n a re v erse direction. In this section, we consider a case where the homomorphism condition is not fully satisfied b ut there is some shared structure between tw o description trees. Symbolically , let C and D be ALE H concept descriptions, P C and P D be sets of primiti v e concepts, E C and E D be sets of e xistential restrictions, A C and A D be as sets of uni v ersal restrictions, and G C and G D be ALE H concept description trees. W e measure the similarity from C to D by means of the homomorphism score hd ( G D ; G C ) . The homomorphism scor e function hd : G ALE H G ALE H ! [0 ; 1] is mathematically defined as follo ws: hd ( G D ; G C ) := (1 e a ) p hd ( P D ; P C ) + e e set hd ( E D ; E C ) + a a set hd ( A D ; A C ) Where each component consti tuting this function is defined in the follo wing manners. The parameter e = jE D j jP D [ E D [ A D j and a = jA D j jP D [ E D [ A D j assign the weights indicating ho w important the e xistentially and uni v ersally quantified subconcepts are to be considered. Intuiti v ely , if the number of top-le v el primiti v e con- cepts P D is greater than the number top-le v el e xistential rest rictions E D and the number of top-le v el e xistential restriction A D , we consider that the similarity between nodes is more important than the similarity between edges, which results in an increasing of . Otherwise, the similarity between edges is more important than that of between nodes, which results a decreasing of . Additionally , the homom orphism score hd is a measure from G D to G C . It is defined as a weighted summati on of the similarity between nodes ( p hd ), e xistential restrictions ( e set hd ), and uni v ersal restrictions ( a set hd ). The function p hd determines the similarity score between nodes and is defined as follo ws: p hd ( P D ; P C ) := ( 1 if P D = ; or P C = f?g jP D \ P C j jP D j otherwise ; (3) where j j represents the set cardinality . T o identify the similarity among edges, we consi der the similarity from E D to E C , and also from A D to A C using the function e set hd ( E D ; E C ) and a set hd ( A D ; A C ) , respecti v ely . The function e set hd ( E D ; E C ) is defined as follo w: e set hd ( E D ; E C ) := 8 > < > : 1 if E D = ; 0 if E D 6 = ; ; E C = ; P i 2E D max f e hd ( i ; j ): j 2E C g jE D j otherwise ; (4) where i ; j are e xistential restrictions; Note that all 9 r : ? will be transformed to ? during the normalization process. Therefore, we need not to treat this case in Equation 4. F or each e xistential restriction i , we compute the similarity to each j using the function e hd . e hd ( 9 r :X ; 9 s:Y ) := e ( e ( r ) + (1 e ( r )) hd ( G X ; G Y )) (5) where e : RN ! [0 ; 1) is a role weight function. It assigns dif ferent weight to each role name. Moreo v er , we use e = jR 9 r \ R 9 s j jR 9 r j to indicate an inclusion score between labels of tw o edges. F or the case e = 0 , T owar d Semantic Similarity Measur e Between Concepts in an Ontolo gy (Suwan T ongphu) Evaluation Warning : The document was created with Spire.PDF for Python.
1362 r ISSN: 2502-4752 this infers that the tw o edges ha v e no feature in common. Therefore, the homomorphism score of a successor can be omitted. The function e hd returns the similarity score if there is an e xistential edge label in common; Ho we v er , the successors’ structures must be recursi v ely check ed using the function hd ( G X ; G Y ) . Similar to the e xistential restrictions, we apply the same operations for the uni v ersal restrictions as sho wn belo w: a set hd ( A D ; A C ) := 8 > < > : 1 if A D = ; ; 0 if A D 6 = ; ; A C = ; ; P i 2A D max f a hd ( i ; j ): j 2A C g jA D j otherwise (6) where i ; j are uni v ersal restrictions; and a hd ( 8 r :X ; 8 s:Y ) := a if P Y = f?g ; a ( a ( r ) + (1 a ( r )) hd ( G X ; G Y )) otherwise (7) where a = jR 8 r \ R 8 s j jR 8 r j and a : RN ! [0 ; 1) . ! 1 W o man F emale u P erson ! 2 Man : F e m a l e u P erson ! 3 P a rent P erson u 9 child : P erson ! 4 Mother W oman u P a rent ! 4 F ather Man u P a rent ! 5 MotherNoSon Mother u 8 child : W oman ! 5 MotherNoDaughter Mother u 8 child : Man ! 5 AdoptiveF a t h er Man u 9 achild : P erso n ! 5 NonAdoptiveF ather F ather u 8 achild : ? ! 5 achild v child Figure 3. An e xample ALE H terminology O family ; here child , achild are shorthands for hasChild and hasAdoptedChild , respecti v ely . T o demonstrate ho w the algorithm w orks, we consider the similarity measure between the concepts Mother and NonAdoptiveF ather depicted in Figure 2. By using e , a , e , and a as pre viously defined and fixing e ( r ) and a ( r ) to 0.4 for each r 2 RN , the follo wing sho w the computing steps. Note that, for simplicity , the abbre viations of concept names M and NAF for Mother and NonAdoptiveF ather are used, respecti v ely . hd ( G M ; G NAF ) := 2 3 p hd ( P M ; P NAF ) + 1 3 e set hd ( E M ; E NAF ) + (0) a set hd ( A M ; A NAF ) := 2 3 [ 1 2 ] + 1 3 e hd ( i ; j ) // with e = 1 3 , a = 0 , i = 9 child : P erson and j = 9 child : P erson := 2 3 [ 1 2 ] + 1 3 [ 1 1 ][ 2 5 + 3 5 hd ( G P er son ; G P erson )] := 2 3 [ 1 2 ] + 1 3 [ 2 5 + 3 5 [ 1 1 ]] := 0 : 67 The homomorphism score of the opposite direction is computed as follo ws: hd ( G NAF ; G M ) := 2 4 p hd ( P NAF ; P M ) + 1 4 e set hd ( E NAF ; E M ) + 1 4 a set hd ( A NAF ; A M ) := 2 4 [ 1 2 ] + 1 4 e hd ( i ; j ) + 1 4 a hd ( i ; j ) // with e = 1 4 , a = 1 4 , i = 9 child : P erson and j = 9 child : P erson i = 8 achild : ? and j = ; := 2 4 [ 1 2 ] + 1 4 [ 1 1 ][ 2 5 + 3 5 hd ( G P e r son ; G P erson )] + 1 4 [0] := 0 : 50 Indonesian J Elec Eng & Comp Sci, V ol. 14, No. 3, June 2019 : 1356 1372 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1363 By applying the abo v e computation steps, the homomorphism score from G Mother to G NonAdoptiveF ather is 0.67, and that from the G NonAdoptiveF ather to G Mother is 0.50. F or the other pairs of concepts defined in O family , we apply the same steps. T able 2 sho ws the homomorphism scores among concepts in O family . T able 2. Homomorphism scores among defined concepts in O f amily . hd ( # ; ! ) W oman Man P a rent Mother F ather MNS MND AF NAF W oman 1.00 0.50 0.50 0.67 0.33 0.50 0.50 0.33 0.25 Man 0.50 1.00 0.50 0.33 0.67 0.25 0.25 0.67 0.50 P a rent 0.50 0.50 1.00 0.67 0.67 0.43 0.43 0.50 0.50 Mother 1.00 0.5 1.00 1.00 0.67 0.68 0.68 0.50 0.50 F ather 0.50 1.00 1.00 0.67 1.00 0.43 0.43 0.83 0.75 MotherNoSon (MNS) 1.00 0.50 1.00 1.00 0.67 1.00 0.85 0.50 0.55 MotherNoDaughter (MND) 1.00 0.50 1.00 1.00 0.67 0.85 1.00 0.50 0.55 AdoptiveF ather (AF) 0.50 1.00 1.00 0.67 1.00 0.43 0.43 1.00 0.75 NonAdoptiveF ather (N AF) 0.50 1.00 1.00 0.67 1.00 0.68 0.68 0.83 1.00 By observing the v alues in T able 2 and by using Proposition 2, it is ob vi ou s that that the closer the hd ( G D ; G C ) is equal to 1, the more lik ely the subsumption may hold in a re v erse direction. Moreo v er , if C v D , this means that hd ( G D ; G C ) = 1 and vice v ersa. From Theorem 1 [20, 22], it implies Proposition 2 stated as follo ws; Pr oposition 2. Let C and D be ALE H concept descriptions, and G C and G D be concept description tr ees, the following ar e similar: 1. hd ( G D ; G C ) = 1 . 2. C v D , 3.2. ALE H Semantic Similarity The homomorphism score function returns a v alue that represents the si milarity of a concept com- paring to another concept. The v alue, ho we v er , measures the similarity only in one direction. F or e xample, hd ( G M ; G NAF ) = 0 : 67 , whereas hd ( G NAF ; G M ) = 0 : 50 . Since the homomorphism scores of both the forw ard and the backw ard direction indicates the similarity score of the tw o concepts, we therefore define the similarity for ALE H concept descriptions using the a v erage v alue. The follo wing Defintion 2 pro vides the definition of the ALE H similarity measure. The proposed measure is the a v erage of the homomorphism score in both directions, which ensures that sim ( C ; D ) = sim ( D ; C ) . T able 3 sho ws the similarity score among concepts in O f amily . Definition 2. Let C and D be ALE H concepts. A similarity scor e between C and D is calculated as follows: sim ( C ; D ) := hd ( G C ; G D ) + hd ( G D ; G C ) 2 ; (8) T able 3. Similarity score among defined concepts in O f amily . sim ( # ; ! ) W oman Man P a rent Mother F ather MNS MND AF NAF W oman 1.00 0.50 0.50 0.83 0.42 0.75 0.75 0.42 0.38 Man 1.00 0.50 0.42 0.83 0.38 0.38 0.83 0.75 P a rent 1.00 0.83 0.83 0.71 0.71 0.75 0.75 Mother 1.00 0.67 0.84 0.84 0.75 0.71 F ather 1.00 0.55 0.55 0.92 0.88 MotherNoSon (MNS) 1.00 0.85 0.46 0.61 MotherNoDaughter (MND) 1.00 0.46 0.59 AdoptiveF ather (AF) 1.00 0.79 NonAdoptiveF ather (N AF) 1.00 Cor ollary 3. Let C and D be concept descriptions, P C and P D be sets of primitive concept s, E C and E D be sets of e xistential r estrictions, and A C and A D be sets of univer sal r estrictions. W e say that C v D if T owar d Semantic Similarity Measur e Between Concepts in an Ontolo gy (Suwan T ongphu) Evaluation Warning : The document was created with Spire.PDF for Python.
1364 r ISSN: 2502-4752 (a) P D P C , (b) for eac h 9 r :D 0 2 E D ther e e xists 9 s:C 0 suc h that s v r and C 0 v D 0 , and (c) for eac h 8 r :D 0 2 A D ther e e xists 8 s:C 0 suc h that s v r and C 0 v D 0 . Cor ollary 4. Let C , C 0 , D , and D 0 be concept descriptions, we say that E D = E C if f for eac h 9 r :D 0 2 E D ther e e xists 9 s:C 0 2 E C suc h that s v r , r v s , C 0 v D 0 , and D 0 v C 0 . Cor ollary 5. Let C , C 0 , D , and D 0 be concept descriptions, we say that A D = A C if f for eac h 8 r :D 0 2 A D ther e e xists 8 s:C 0 2 A C suc h that s v r , r v s , C 0 v D 0 , and D 0 v C 0 . Cor ollary 6. Let C and D be concept descriptions, C D if f P D = P C , E D = E C , and A D = A C . 3.3. Desirable Pr operties f or Concept Similarity T o identify whether the proposed similarity measure has a good performance, it is important to check the satisf actory of desirable properties. This section describes all important similarity properties and gi v es mathematical proofs. Let C , D and E be ALE H concept descriptions, we say that the similarity measure is: i. symmetrical if sim ( C ; D ) = sim ( D ; C ) , ii. equivalence closed if sim ( C ; D ) = 1 if f C D , iii. equivalence in variant if C D then sim ( C ; E ) = sim ( D ; E ) , i v . subsumption pr eserving if C v D v E then sim ( C ; D ) sim ( C ; E ) , v . r e ver se subsumption pr eserving if C v D v E then sim ( C ; E ) sim ( D ; E ) , vi. structur ally dependent if lim n !1 sim ( D 0 ; E 0 ) = 1 where D 0 := d i n C i u D , E 0 := d i n C i u E , C i and C j are atom comcepts in C where C i 6v C j . vii. triangle inequality if 1 + sim ( D ; E ) sim ( D ; C ) + sim ( C ; E ) . The proposed similarity measure sim ( ; ) are symmetric, equi v alence closed, equi v alence in v ariant, subsump- tion preserving, structurally dependent, not re v erse subsumption preserving, and not satisfying triangle inequal- ity . The follo wing are the proofs i. From Definition 2, it is ob vious that sim ( C ; D ) = sim ( D ; C ) . ii. ( = ) ) By Equation 8, sim ( C ; D ) = 1 implies that hd ( G C ; G D ) = 1 and hd ( G D ; G C ) = 1 . From Propo- sition 2, we ha v e C v D and D v C . Therefore, C D . ( ( = ) Gi v en that C D , using the same proposition, we ha v e C v D and D v C . This implies that hd ( G C ; G D ) = 1 , and hd ( G D ; G C ) = 1 , therefore sim ( C ; D ) = 1 . iii. Gi v en that C D , from Corollary 6, we ha v e P C = P D , E C = E D , and A D = A C . Therefore, G C = G D and this implies hd ( G C ; G E ) = hd ( G D ; G E ) and hd ( G E ; G C ) = hd ( G E ; G D ) . Such that sim ( C ; E ) = sim ( D ; E ) . i v . From Definition 2, it is suf ficient to pro v e that hd ( G C ; G D )+ hd ( G D ; G C ) 2 hd ( G C ; G E )+ hd ( G E ; G C ) 2 Indonesian J Elec Eng & Comp Sci, V ol. 14, No. 3, June 2019 : 1356 1372 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1365 Gi v en that C v D v E , this implies that C v E . From Proposition 2, we ha v e hd ( G E ; G C ) = hd ( G D ; G C ) = 1 . Therefore, it is suf ficient to sho w that hd ( G C ; G D ) hd ( G C ; G E ) . On both sides of the inequality , if e xpanded, we ha v e the same e and a , where e = jE C j jP C [ E C [ A C j ; and a = jA C j jP C [ E C [ A C j ; Therefore, it is enough to pro v e that a) p hd ( P C ; P D ) p hd ( P C ; P E ) b) e set hd ( E C ; E D ) e set hd ( E C ; E E ) c) and a set hd ( A C ; A D ) a set hd ( A C ; A E ) In a), we need to sho w that jP C \ P D j jP C j jP C \ P E j jP C j . In short, we need to sho w that j P C \ P D j j P C \ P E j (9) By Corollary 3, C v D v E ensures that P E P D P C . Therefore j P D jj P E j and Equation 9 is true. T o pro v e that b) is true, we sho w that P i 2E C max f e hd ( i ; j ): j 2E D g jE C j P i 2E C max f e hd ( i ; j ): j 2E E g jE C j (10) P i 2E C max f e hd ( i ; j ) : j 2 E D g P i 2E C max f e hd ( i ; j ) : j 2 E E g : Let ^ i 2 E E such that e hd ( i ; ^ i ) = max f e hd ( i ; j ) : j 2 E E g , b ut since ^ i 2 E E E D , then max f e hd ( i ; j ) : j 2 E D g e hd ( i ; ^ i ) . Therefore, Equation 10 is true. By applying the same steps, it implies that c) is also true. v . Let D 0 := d i n C i u D , E 0 := d i n C i u E , and n = n P + n E + n A be the number of all atomic sequences in C where n P , n E , n A be the number of primiti v e concepts, the number of e xistential restrictions, and the number of uni v ersal restrictions, respecti v ely . T o pro v e this, we consider the follo wing case distinctions. (a) If n P ! 1 , and both n E and n A are finite, it suf fices to sho w i) lim n P !1 e = 0 , ii) lim n P !1 a = 0 and iii) lim n P !1 p hd ( P D 0 ; P E 0 ) = 1 . Therefore, hd ( G D 0 ; G E 0 ) = hd ( G E 0 ; G D 0 ) = 1 which implies sim ( D 0 ; E 0 ) = 1 . Starting from e = jE D 0 j jP D 0 [ E D 0 [ A D 0 j = jE D 0 j jP C j + jP D j + jE D 0 j + jA D 0 j = jE D 0 j n P + jP D j + jE D 0 j + jA D 0 j (11) since j P D j , j E D 0 j and j A D 0 j are constants, lim n P !1 e = lim n P !1 jE D 0 j n P + jP D j + jE D 0 j + jA D 0 j = 0 . T o sho w ii) lim n P !1 a = 0 , consider the formula defining a = jA D 0 j jP D 0 [ E D 0 [ A D 0 j = jA D 0 j n P + jP D j + jE D 0 j + jA D 0 j Therefore, lim n P !1 a = lim n P !1 jA D 0 j n P + jP D j + jE D 0 j + jA D 0 j = 0 : F or iii) lim n P !1 p hd ( P D 0 ; P E 0 ) = 1 , we consider the definition of p hd . T owar d Semantic Similarity Measur e Between Concepts in an Ontolo gy (Suwan T ongphu) Evaluation Warning : The document was created with Spire.PDF for Python.