A Formal Language to Describe and Animate Signs in Brazilian Sign Language

In this paper, we propose the definition of a formal, expressive and consistent language to describe signs in Brazilian Sign Language (LIBRAS). This language allows the definition of all parameters (phonemes) of a sign and from this definition an animation is generated based on a 3D humanoid avatar. The proposed language is also flexible in the sense that it has mechanisms to include new parameters (or phonemes) “on the fly”. In order to provide a case study for the proposed language, a human computation system for collaborative construction of a LIBRAS vocabulary was also developed. This system, called WikiLIBRAS, allows deaf users and LIBRAS interpreters to describe signs and generate animations for signs in LIBRAS. Some preliminary tests with Brazilian deaf users and LIBRAS interpreters were also performed to evaluate the proposal. Our preliminary evaluation indicates that the proposed language can represent a significant number of signs in LIBRAS and WikiLIBRAS users can generate signs in LIBRAS more productively than 3D designers.


INTRODUCTION
Sign languages are natural languages used by deaf people to communicate.They are considered natural languages because they emerged from the interaction between deafs and allow expressing any descriptive, concrete, rational, literal, metaphorical, emotional or abstract concept [1].In addition, they have their own grammars and vocabularies and are composed of lexical items called signs, which are composed of a set of parameters called phonemes.
According to Quadros and Karnopp [2], the Brazilian Sign Language (LIBRAS), as well as other sign languages, is a visual language and uses hand movements and facial expressions to express concepts, which are realized by the human visual system.Thus, it is different of Brazilian Portuguese language, which uses audio as communication channel.Furthermore, they also differ in respect to grammar structures.
In scientific literature, there are some works related to the development of sign language dictionaries based on humanoid avatars [1][3] [4].To develop these dictionaries, these works generally define intermediate languages to describe the sign parameters and allow automatic generation of signs.
Felice, Di Mascio and Gennari [3], for example, proposed the creation of a web bilingual Italian-Italian Sign Language (LIS) dictionary based on a humanoid avatar.Their proposed intermediate language describes a sign as a composition of handshape, location, palm orientation and hand movements.It defines 56 handshapes and groups them based on the number of fingers used in each configuration and defines seven types of movements: none, straight, circular, bending the wrist, opening/closing the hand, sinuous and interaction.It does not define, however, body movements and facial expressions (non manual features), which reduces the flexibility and naturalness of the generated animations.
Buttussi, Chittaro and Coppo [4] proposed the development of a multilingual sign language dictionary.This dictionary allows signs to be described or searched from words (in spoken languages) or configuration parameters in sign languages.This system uses the same parameters defined in the proposal of Felice, Di Mascio and Gennari [3] (handshape, location, palm orientation and hand movements).Thus, as in Felice, Di Mascio and Gennari [3], it is not possible to express non-manual features (NMF).The inclusion of non-manual features is a proposal of future work.The hand movements are classified in four types: straight, arc, curved and others.
Fusco and Brega [1] proposed the X-LIBRAS, a XMLbased language to represent signs in Brazilian Sign Language (LIBRAS).To validate the language, a humanoid avatar based on H-anim specification [5] was developed to represent signs.The proposed language is based on the definitions of Brito [6], but it does not model or implement all types of movements.Furthermore, there are also some limitations on handshapes, since it is not possible to represent handshapes where the fingers cross.
In this paper, we define a comprehensive and flexible language for describing signs in LIBRAS, called FleXLIBRAS.The proposed language specifies a broad class of phonemes and parameters and allows new parameters (or phonemes) to be specified and included in the language, making it flexible.Since LIBRAS is a natural and alive language, new signs, parameters and phonemes can arise naturally.Thus, a descriptive language that allows the inclusion of these new parameters or phonemes in runtime, can adapt to changes in LIBRAS.Thus, the language has also support to include new parameters and phonemes by defining language descriptors.These descriptors allow the definition of new parameters and phonemes, which are included in the proposed language in runtime.
In addition, XML documents are generated from the combination of these phonemes (and parameters) and may be interpreted and transformed into animations based on a 3D humanoid avatar model.Thus, it is possible to use the proposed language to develop virtual environments for representation of signs in LIBRAS using 3D humanoid avatars, as well as to create LIBRAS vocabularies and dictionaries, to develop tools for automatic generation of videos in LIBRAS, among others.
The rest of the paper is organized as follow.In Section II, we present some important concepts related to the linguistics of LIBRAS.In Section III, we present the proposed language and the 3D humanoid avatar model.In Section IV, we present the WikiLIBRAS, a case study for the proposed language.In Section V, we describe some tests with Brazilian deaf users to evaluate the solution.Finally, final remarks are presented in Section VI.

II. LIBRAS LINGUISTIC ISSUES
The Brazilian Sign Language (LIBRAS) is the sign language used by most Brazilian deaf and recognized by Brazilian law Nº 10436 of April, 24, 2002, as the official sign language of Brazil.It has its own grammar structure and lexical items called signs.
According to Brito and Langevin [7] and Quadros and Karnopp [2], there is a difference between spoken languages and sign languages with respect to their presentation structure over the time.The spoken languages have a sequential structure with the phonemes succeeding linearly in time, whereas the sign languages have a parallel structure, i.e., each sign can use several parts of body simultaneously.
The signs consist of some parameters, called phonemes, which are considered the minimal distinctive units of sign languages.According to the Brazilian National Federation and Education of Deaf (FENEIS) [8], the signs are composed of a combination of hand movements with a certain format in a certain location.This location may be a body part or an area in front of the body.These parameters can be compared to phonemes or morphemes.Thus, the phonemes can be combined to form the signs, which can be combined to form phrases or sentences when inserted within a given context.
There is no consensus on the number of parameter that composes a sign.Stokoe [9], for example, proposes that a sign is composed of three phonemes: handshape, location and movement.Battison [10] and Quadros and Karnopp [2] propose to add two more phonemes: palm orientation and non-manual features (NMFs -facial and body movements).Souza and Pinto [11] states that sign are represented by hands and accompanied by body movements and nonmanual features.
The handshape refers to the position of fingers.A sign can be represented with the dominant hand (right-hand for right-handed person) or with both hands.If a sign is represented only with the dominant hand, the other hand may serve as a support.A handshape can be distinguished from the others by extension (position and number of extended fingers), by contraction (open or closed hand), by contact or divergence of fingers.In this paper, we adopt 60 handshapes defined by Felipe [12] (see Figure 2).According to Brito and Langevin [7], palm orientation refers to the direction of palm during the sign that can assume the following values: upward, downward, facing left, facing right, facing the body or facing forward.
The location refers to the place where the dominant hand articulates the sign.A sign can be articulated with the dominant hand touching a body part or a region in front of the interpreter.
According to Brito and Langevin [7], the hand movement is a complex parameter, which may involve several forms and directions.The movements can be directional in the space, within the hand, with pulse or a combination of these movements.Some signs, however, do not have any movement.
The last parameter is the non-manual features (NMFs), which refers to the face, eyes, head or torso movements.According to Quadros and Karnopp [2], they are used to differentiate lexical items or as syntactic markings (e.g., marking of interrogative sentences, relative clauses and focus).Thus, they represent a differentiating feature in many signs.
LIBRAS has also some phonological restrictions, which serves to aid in sign composition.For example, the signs can be represented using one or two hands.According to Quadros and Karnopp [2] and Battison [10], when two hands are used, it is possible to have both hands actives or one hand works only as a location.Another restriction is related to symmetry.In this case, two hands are used and their movements can be simultaneous or alternating.
In the next section, we will present the sign description language proposed in this work.

III. FLEXLIBRAS: A SIGN DESCRIPTION
LANGUAGE In this section, we describe a formal language to describe signs in LIBRAS, called FleXLIBRAS.From FleXLIBRAS, it is possible to describe signs in LIBRAS and develop animations for these signs, allowing the design of virtual environments based on LIBRAS, the development of LIBRAS vocabularies and dictionaries, among others.
In the proposed description language, a LIBRAS sign is defined as the combination of a set of phonemes, such as handshape, palm orientation, location, hand movements and NMFs.More specifically, a sign is defined as a set of movements, where each movement has an initial and final handshape, location, orientation and facial expression, a type of trajectory (eg, straight, circular, semi-circular), a direction (eg, inside out, from right to left), and flags to indicate which hands are used in the movement (right, left or both), among others.
Since the proposed language defines a sign as a set of movements with initial and final states, it allows to model posture changes during the movement of a sign.In this case, a sign with many posture changes may be defined as a combination of small movements with initial and final states (handshape, location, palm orientation, NMFs).Formally, a S sign is defined as follow1 : Where <mov>+ is a set of movements, which can be classified as Contact, Interaction, Twisting the wrist, Bending the wrist, Inner hands or Geometric movements.The Geometric movements can also be subclassified as Point, Straight, Circular, Arc, Sinuous, Spiral or Angular movements.The Param represents a set of common parameters such as Cf ini , Cf fin , rep, hu, sync, which represent, respectively, the initial and final configuration of a movement , the number of repetitions in the movement, the hands used in the sign and the synchronization between hands movements.The ct, rot, dir, t, s, cd, rs, mo, ty parameters represents, respectively, the contact type, the rotational direction of hand, the rotational direction of wrist, the type of inner hand movement, the simultaneity, the direction of movement, the size of the radius of circular and arc movements, the orientation of movement, the arc movement type, respectively.Simultaneity refers to the synchronization of the right hand over the left.hs, o, loc, fe fields represents the handshape, the palm orientation (e.g., facing up, facing down, facing out, facing in, etc), location and facial expression phonemes of each configuration, respectively.ho, hd e fd represents the palm orientation, palm direction(e.g., forward, inside, down) and finger directions (up, down, left, right).The possible values defined for the main fields are listed in Appendix.
Based on the above formalization, a XML-based representation was defined to represent and describe all parameters defined above.Figures 3 and 4 illustrate the XML documents that describe the LIPS and CORRECT signs in LIBRAS, respectively.The type attribute represents the type of movement.The hands-used and direction attributes represent the hands used in the sign (left, right or both) and the direction of movement (clockwise or counterclockwise), respectively.The orientation attribute represents a reference plane defined for the movement (parallel or perpendicular to the body).The repetition-flag indicates if there is movement repetition.The orientation, direction and finger-direction attributes of palm tag refer to the adopted reference of the hand to the body, the direction and fingers direction, respectively.The facial-expression field represents the NMF phoneme.
In Figure 3, the LIPS sign was defined with one hand and the movement was performed around the mouth.The initial and final configurations are the same, since the handshape, palm orientation and location do not change.
According to Figure 4, the CORRECT sign was defined with one hand with initial and final location configurations.The hand moves linearly from the initial location (the starting point) to the final location (the final point).
The proposed language has also support to the phonological restrictions described in Section II.In this case, when the left hand works only as a location, it is represented inside the XML document as illustrated in Figure 5.

A. Expanding phonemes and parameters
Since LIBRAS is a natural and alive language, new signs, parameters or phonemes may arise spontaneously.
Consequently, a description language proposed to represent signs in LIBRAS must be flexible to incorporate natural changes of the language.To make it possible, a mechanism for including new phonemes was also developed and incorporated into the proposed language.This mechanism uses a set of pose libraries, where each pose have the coordinates of location and rotation of the related bones according to a 3D avatar model (presented in Section III.B). Figure 6 illustrates the locations pose library previously defined.Figures 7, 8 and 9 illustrate XML documents used to include new parameters in these pose libraries.
According to Figures 7, 8 and 9, the mechanism used to include new parameters in the FleXLIBRAS, is based on the definition of descriptors, which contains the coordinates of location and rotation of the related bones.For example, to add a new handshape to the pose library (Figure 7), the descriptor must provide the coordinates of location and rotation of each bones found in the hand of the 3D avatar (described in Section III.B).To add a new location only the coordinates of two bones (ik_FK.R and bnpolyV.R) should be provided.The first bone controls the wrist and the second controls the elbow deformation (Figure 8).To include a new palm orientation the bones related with the hand and forearm should be configured (Figure 9).The facial expressions, however, differs from the other phonemes (or parameters).To include a new facial expression the user can control different bones located in the face of the 3D avatar.For example, the "wide eyes" facial expression manipulates the bones located in the upper face, but does not modify the ones that deform the chin.

B. 3D Humanoid Avatar Model
To represent the sign described by the proposed language and allows the inclusion of new parameters in the pose libraries, a 3D humanoid avatar was developed using the Blender software [14].It has as armature composed of 82 bones, distributed as follows: • 15 bones in each hand to setup handshape; • 23 bones on the face to setup facial expressions and movements; • 22 bones in arms and body to setup arm and body movements and • 7 auxiliary bones (i.e., bones that do not deform the mesh directly).
Thus, it is necessary to define the location and rotation of the 15 bones located the hand of the 3D avatar to setup a handshape.To setup facial expressions, it is necessary to configure the bones used in the face.The arm movement is performed moving only two bones (ik_FK.R and bnpolyV.R), the first one controls the wrist and the second one controls the elbow deformation.
The deformation between related bones is performed using inverse kinematics (ik).Thus, whenever we have a movement in the wrist bone, for example, it will spread to the bones of arm and forearm.
Figure 10 illustrates the proposed 3D humanoid avatar.Figures 11, 12 and 13 illustrate the bones of face (Figure 11), hand (Figure 12) and body (Figure 13) of the avatar.

IV. CASE STUDY: WIKILIBRAS
To validate the proposed language, a case study was developed with the WikiLIBRAS, a human computation Web system proposed to allow the collaborative construction of a multimedia dictionary in LIBRAS.The generated dictionary can be used in the teaching or dissemination of LIBRAS, in the machine translation systems to LIBRAS , in applications that perform synthesis of signs in LIBRAS [15][16], among others.The architecture of the proposed system is illustrated on Figure 14.
According to Figure 14, initially the users access the collaborative Web system through a Web interface.In this interface, they can generate new signs or search and display the signs already created.When the user wants to generate a new sign, they configure the parameters (phonemes) of a sign in the Web interface.A LanguageGenerator component then gets these parameters and converts them to a XML Document, according to the FleXLIBRAS proposed language (see Figures 3 and 4).
In the server side, the XML document is received by the Parser component and converted to an intermediate language to be rendered by the Render component.The Render component interprets these parameters and generates an animation for the sign.The animation, illustrated in Figure 15, is displayed in the Web Interface to the user that validates it.One of the major challenges to develop the WikiLIBRAS is to design the Web user interface.Since it is addressed for Brazilian deaf users that, in general, have difficulty to read texts in spoken languages, the proposed interface has to use alternative strategies to interact with users and to promote the intelligibility of the service offered.
To do this task, the Web interface was designed and developed using Adobe® Flash® CS5.5 [17].This interface focuses on graphic and animation elements and uses a minimum of textual elements.Initially, to register a new sign, the users choose the type of movement and define the number of repetitions of the movement (see Figure 16).According to Figure 16, the "preview" side of the main screen shows the user configurations in real time.If the user chooses, for example, the circular movement type, an animation would be presented with the right hand performing a circular movement, helping the users.According to Figures 17,18,19 and 20, in these screens, a set of images (options) associated with the phoneme is presented to the users.The user then should choose the option related with the corresponding sign.In addition, below the screen the timeline is presented with the options already configured by the user (See Figure 21).The user then can change the phonemes already configured, just by clicking the small frame and updating the configuration.
Finally, after setting all parameters, an animation is generated by running the other WikiLIBRAS components (see Figure 14).This animation is then presented to the user that can validate it or not.In the next section, we will describe some tests performed with Brazilian deaf users and LIBRAS interpreters to evaluate this prototype of the proposed solution.

V. TESTS AND RESULTS
In order to evaluate the proposed solution, some tests with LIBRAS interpreters and Brazilian deaf users were performed with the prototype of the proposed solution (i.e., the WikiLIBRAS).In these tests, LIBRAS interpreters and Brazilian deaf users were invited to generate a set of signs in LIBRAS using the WikiLIBRAS and their effectiveness (signs generated correctly) and efficiency (median time to generate the signs) were compared with the effectiveness and efficiency of 3D-designers generating the same signs in animation tools.
The experiments were performed with eleven Brazilian deaf users and three LIBRAS interpreters in the Foundation Center for Integrated Support for People with Disabilities (Funad) of João Pessoa, a northeastern Brazilian city.The group of users consists of seven men and seven women ranging in age from 12 to 42 years old, with an average value of 25.43 years.We also observed their education degree, their knowledge of LIBRAS and Brazilian Portuguese.Table 1 shows the degrees of education, knowledge of LIBRAS and Brazilian Portuguese of the users.The results are on a scale 1 to 6.
Users were invited to generate five signs in LIBRAS using WikiLIBRAS and complete a questionnaire about some aspects of the solution, such as its usability, the naturalness of the 3D humanoid avatar, among others.
The applied questionnaire had three parts.In the first part, some users' personal information were collected, such as gender, age, education level, degree of knowledge in LIBRAS and Brazilian Portuguese, among others.In the second part, users indicate the signs generated correctly (effectiveness) and why the other signs (if any) were generated incorrectly or were not generated.Finally, in third part, some aspects are evaluated to measure the degree of users' satisfaction using the tool.In this part, the applied questionnaire had six questions.These questions rated WikiLIBRAS on a 1-to-6 scale for usability, naturalness of the generated animation, handshapes, palm orientations, facial expression and location screens.The time required to generate each sign (efficiency) was also stored in WikiLIBRAS.Since the signs in the proposed language, as well the interaction in WikiLIBRAS, are defined according to the type of movements, the signs selected for the test have different types of movements.Table 2 presents these signs and their types of movements.According to Table 2, the PRESIDENT sign has a straight movement, the TEACHER sign has a semi-circular movement, the LIPS sign has a circular movement and the SHUT UP and UNCLE signs have a point movement.Thus, it is possible to evaluate the generation of signs with straight, circular, semi-circular and point movements, which are, in general, the most representative types of movements in sign languages.According to Gibet, Lebourque and Morteau [18], these types of movements are used in approximately 97% of the signs in French Sign Language.
Figures 22 and 23 illustrate some photos of the evaluation process and Tables 3, 4 and 5 present its main results.The effectiveness and efficiency of users in the generation of signs in WikiLIBRAS is presented in Table 3. Table 4 presents these measures per sign and Table 5 presents some measures related to the degree of users' satisfaction using WikiLIBRAS.According to Table 3, we can observe that, considering all users, 81.43% of signs were generated correctly (where 81.82% were generated correctly by deaf users and 80.00% by LIBRAS interpreters).The average time to generate each signs was about 87.05 seconds.When we analyze these results per sign (see Table 4), it is possible to observe that users had more difficulty to generate the TEACHER and PRESIDENT signs.The TEACHER sign was generated correctly by 57.14% of users, whereas the PRESIDENT was generated correctly by 75.86% of users.For the other signs (LIPS, SHUT UP and UNCLE), more than 85% of users generated them correctly.These results are compatible with the average time spent by users to generate the signs.As we can observe, users spent more time to generate the TEACHER and PRESIDENT signs (118.72 and 126.33 seconds, respectively), and less time to generate the LIPS, SHUT UP and UNCLE signs (87.88, 58.91 and 53.21 seconds, respectively).
According to users, they have difficulty to generate these signs because they do not understand the meaning of some parameters used in the Web interface.A proposal for future work is to include videos with LIBRAS interpreters for helping users during navigation.According to Table 5, with respect to users' satisfaction, the usability of WikiLIBRAS had one of the highest scores (4.77).This result is compatible with the evaluation scores of the handshape (4.69), facial expression (4.92) and location screens of WikiLIBRAS (see Figures 17,19 and 20).
The palm orientation screen of WikiLIBRAS, however, caused some confusion among users, and had a lower score (3.92).
The naturalness of the generated 3D animations had the lowest score (3.62).This can be explained because avatar signing naturalness is not comparable to a human signing.As mentioned in previous works [16][19] [20], avatar-based approaches are not the first choice for the majority of deaf, who prefer human signing.One of the reasons for this preference, according to Kipp et al. [19], is the difficulty of virtual signing approaches to represent emotions and movements with less rigidity.Thus, we believe that it is necessary to keep investing more effort to increase flexibility and naturalness of 3D humanoid avatar.
Finally, the effectiveness and efficiency of users in WikiLIBRAS was compared with the effectiveness and efficiency of designers-3D using an animation tool (e.g., Blender).To do this task, we invited three experienced 3D designers to animate the same signs (see Table 2) in Blender software.All the designers animate all signs correctly and the average time spent to animate the signs was stored.Table 6 shows these measures.According to Table 6, we can observe that, although the 3D-designers had generated all the signs correctly (effectiveness), the average time required to generate each sign was much greater than the time spent by users on WikiLIBRAS.Moreover, the number of deaf and LIBRAS interpreters is also much greater than the number of 3Ddesigners, and thus it is possible to generate a LIBRAS dictionary using WikiLIBRAS in a more productive way than generate it manually with 3D-designers.

VI. CONCLUSION
This work presents a language for description of LIBRAS signs.This language allows the parameters that represents one sign is described and an animation is generated using these parameters through an 3D avatar.With the proposal, it is possible to develop virtual reality tools for teaching LIBRAS, build system for automatic generation of LIBRAS window, among others.
In order to provide a case study for the proposed solution, a human computation system for collaborative construction of a LIBRAS vocabulary was also developed.The idea is allow users (e.g., deaf users) to help in the development of a LIBRAS dictionary.Some tests with Brazilian deaf users and LIBRAS interpreters were performed to validate the system and evaluate the usability, acceptance, interaction time, among others.These tests shows that users generated more than 80% of signs correctly and the time spent to generate signs was much smaller than the time spent by 3D designers generating these same signs manually in an animation tool.
As future works we plan to integrate motion capture equipments in WikiLIBRAS, such as Microsoft Kinect (www.xbox.com)and Cybergloves (www.cyberglovesystems.com), to improve its usability and animate the signs in a more natural way.Furthermore, we plan also to extend the description language proposed, allowing the creation of virtual environments and multilingual vocabularies.

Figure 1 -
Figure 1 -Parameters of a LIBRAS sign

Figure 3 -
Figure 3 -XML Document of LIPS sign

Figure 4 -Figure 5 -
Figure 4 -XML Document of CORRECT sign

Figure 6 -
Figure 6 -Example of Pose library for Location.

FigurE 7 -
FigurE 7 -XML Document of Descriptor to include a new Handshape.

Figure 8 -
Figure 8 -XML Document of Descriptor to include a new Location

Figure 15 -
Figure 15 -Excerpt from a Python Script

Figure 16 -
Figure 16 -Main Screen of WikiLIBRAS Afterwards, the handshape, palm orientation, location and facial expression phonemes are configured.Figures 17,18, 19 and 20 illustrate the screenshots of these steps.According toFigures 17,18, 19 and 20, in these screens, a set of images (options) associated with the phoneme is presented to the users.The user then should choose the option related with the corresponding sign.In addition, below the screen the timeline is presented with the options already configured by the user (See Figure21).The user then can change the phonemes already configured, just by clicking the small frame and updating the configuration.Finally, after setting all parameters, an animation is generated by running the other WikiLIBRAS components (see Figure14).This animation is then presented to the user that can validate it or not.

Figure
Figure 21 -Confirmation Screen