scanit |
Table of contents
ProcedureSCANIT ( Scan a character string ) SUBROUTINE SCANIT ( STRING, START, ROOM, . NMARKS, MARKS, MRKLEN, PNTERS, . NTOKNS, IDENT, BEG, END ) AbstractThis routine serves as an umbrella routine for routines that are used to scan a string for recognized and unrecognized substrings. Required_ReadingNone. KeywordsPARSE SEARCH DeclarationsIMPLICIT NONE CHARACTER*(*) STRING INTEGER ROOM INTEGER NMARKS CHARACTER*(*) MARKS ( * ) INTEGER MRKLEN ( * ) INTEGER PNTERS ( * ) INTEGER START INTEGER NTOKNS INTEGER BEG ( * ) INTEGER END ( * ) INTEGER IDENT ( * ) Brief_I/OVARIABLE I/O DESCRIPTION -------- --- -------------------------------------------------- STRING I a string to be scanned. ROOM I space available for located substrings. NMARKS I-O number of recognizable substrings. MARKS I-O recognizable substrings. MRKLEN I-O an auxiliary array describing MARKS. PNTERS I-O an auxiliary array describing MARKS. START I-O position from which to commence/resume scanning. NTOKNS O number of scanned substrings. BEG O beginnings of scanned substrings. END O endings of scanned substrings. IDENT O position of scanned substring within array MARKS. Detailed_InputSTRING is any character string that is to be scanned to locate recognized and unrecognized substrings. ROOM is the amount of space available for storing the results of scanning the string. NMARKS is the number of marks that will be recognized substrings of STRING. MARKS is an array of marks that will be recognized by the scanning routine. The array must be processed by a call to SCANPR before it can be used by SCAN. Further details are given in documentation for the individual entry points. MRKLEN is an auxiliary array populated by SCANPR for use by SCAN. It should be declared with length equal to the length of MARKS. PNTERS is an auxiliary array populated by SCANPR for use by SCAN. It should be declared in the calling program as INTEGER PNTERS ( RCHARS ) RCHARS is given by the expression MAX - MIN + 5 where MAX is the maximum value of ICHAR(MARKS(I)(1:1)) over the range I = 1, NMARKS MIN is the minimum value of ICHAR(MARKS(I)(1:1)) over the range I = 1, NMARKS Further details are provided in the entry point SCANPR. START is the position in the STRING from which scanning should commence. Detailed_OutputNMARKS is the number of marks in the array MARKS after it has been prepared for SCANPR. MARKS is an array of recognizable substrings that has been prepared for SCAN by SCANPR. Note that MARKS will be sorted in increasing order. MRKLEN is an auxiliary array, populated by SCANPR for use by SCAN. PNTERS is an auxiliary array, populated by a call to SCANPR and is intended for use by SCAN. START is the position from which scanning should continue in order to fully scan STRING (if sufficient memory was not provided in BEG, END, and IDENT on the current call to SCAN). NTOKNS is the number of substrings identified in the current scan of STRING. BEG beginnings of scanned substrings. This should be declared so that it is at least as large as ROOM. END endings of scanned substrings. This should be declared so that it is at least as large as ROOM. IDENT positions of scanned substring within array MARKS. If the substring STRING(BEG(I):END(I)) is not in the list of MARKS then IDENT(I) will have the value 0. This should be declared so that it is at least as large as ROOM. ParametersNone. Exceptions1) If this routine is called directly, the error SPICE(BOGUSENTRY) is signaled. FilesNone. ParticularsThis routine serves as an umbrella routine for the two entry points SCANPR and SCAN. It can be used to locate keywords or delimited substrings within a string. The process of breaking a string into those substrings that have recognizable meaning, is called "scanning." The substrings identified by the scanning process are called "tokens." Scanning has many applications including: -- the parsing of algebraic expressions -- parsing calendar dates -- processing text with embedded directions for displaying the text. -- interpretation of command languages -- compilation of programming languages This routine simplifies the process of scanning a string for its tokens. ExamplesExample 1. ---------- Suppose you need to identify all of the words within a string and wish to ignore punctuation marks such as ',', ':', ';', ' ', '---'. The first step is to load the array of marks as shown here: The minimum ASCII code for the first character of a marker is 32 ( for ' '). INTEGER FCHAR PARAMETER ( FCHAR = 32 ) The maximum ASCII code for the first character of a marker is 59 (for ';' ) INTEGER LCHAR PARAMETER ( LCHAR = 59 ) INTEGER RCHAR PARAMETER ( RCHAR = LCHAR - FCHAR + 5 ) LOGICAL FIRST CHARACTER*(3) MARKS INTEGER NMARKS ( 5 ) INTEGER MRKLEN ( 5 ) INTEGER PNTERS ( RCHAR ) INTEGER ROOM PARAMETER ( ROOM = 50 ) INTEGER BEG ( ROOM ) INTEGER END ( ROOM ) INTEGER IDENT ( ROOM ) SAVE FIRST SAVE MARKS SAVE MRKLEN SAVE PNTERS IF ( FIRST ) THEN FIRST = .FALSE. MARKS(1) = ' ' MARKS(2) = '---' MARKS(3) = ':' MARKS(4) = ',' MARKS(5) = ';' NMARKS = 5 CALL SCANPR ( NMARKS, MARKS, MRKLEN, PNTERS ) END IF Notice that the call to SCANPR is nested inside an IF ( FIRST ) THEN ... END IF block. In this and many applications the marks that will be used in the scan are fixed. Since the marks are not changing, you need to process MARKS and set up the auxiliary arrays MRKLEN and PNTERS only once (assuming that you SAVE the appropriate variables as has been done above). In this way if the code is executed many times, there is only a small overhead required for preparing the data so that it can be used efficiently in scanning. To identify the substrings that represent words we scan the string using the prepared MARKS, MRKLEN and PNTERS. CALL SCAN ( STRING, MARKS, MRKLEN, PNTERS, ROOM, . START, NTOKNS, IDENT, BEG, END ) To isolate only the words of the string, we examine the array IDENT and keep only those Begin and Ends for which the corresponding identity is non-positive. KEPT = 0 DO I = 1, NTOKNS IF ( IDENT(I) .LE. 0 ) THEN KEPT = KEPT + 1 BEG(KEPT) = BEG(I) END(KEPT) = END(I) END IF END DO Example 2. ---------- To parse an algebraic expression such as ( X + Y ) * ( 2*Z + SIN(W) ) ** 2 You would select '**', '*', '+', '-', '(', ')' and ' ' to be the markers. Note that all of these begin with one of the characters in the string ' !"#$%&''()*+,-./' so that we can declare PNTERS to have length 20. Prepare the MARKS, MRKLEN, and PNTERS. LOGICAL FIRST CHARACTER*(4) MARKS INTEGER NMARKS ( 8 ) INTEGER MRKLEN ( 8 ) INTEGER PNTERS ( 20 ) SAVE FIRST SAVE MARKS SAVE MRKLEN SAVE PNTERS IF ( FIRST ) THEN MARKS(1) = '(' MARKS(2) = ')' MARKS(3) = '+' MARKS(4) = '-' MARKS(5) = '*' MARKS(6) = '/' MARKS(7) = '**' MARKS(8) = ' ' NMARKS = 8 CALL SCANPR ( NMARKS, MARKS, MRKLEN, PNTERS ) Locate the blank character in MARKS once it has been prepared. BLANK = BSRCHC ( ' ', NMARKS, MARKS ) END IF Once all of the initializations are out of the way, we can scan an input string. CALL SCAN ( STRING, MARKS, MRKLEN, PNTERS, ROOM, . START, NTOKNS, IDENT, BEG, END ) Next eliminate any white space that was returned in the list of tokens. KEPT = 0 DO I = 1, NTOKNS IF ( IDENT(I) .NE. BLANK ) THEN KEPT = KEPT + 1 BEG (KEPT) = BEG (I) END (KEPT) = END (I) IDENT(KEPT) = IDENT (I) END IF END DO Now all of the substrings remaining point to grouping symbols, operators, functions, or variables. Given that the individual "words" of the expression are now in hand, the meaning of the expression is much easier to determine. The rest of the routine is left as a non-trivial exercise for the reader. Restrictions1) The array of MARKS, MRKLEN, and PNTERS must be properly formatted prior to calling SCAN. This is accomplished by calling SCANPR. Literature_ReferencesNone. Author_and_InstitutionJ. Diaz del Rio (ODC Space) W.L. Taber (JPL) VersionSPICELIB Version 1.1.0, 26-OCT-2021 (JDR) Added IMPLICIT NONE statement. Edited the header to comply with NAIF standard. SPICELIB Version 1.0.0, 26-JUL-1996 (WLT) |
Fri Dec 31 18:36:45 2021