Let's say I want to tokenize a string of words (symbols) and numbers separated by whitespaces. For example, the expected result of tokenizing "aa 11" would be [tkSym("aa"), tkNum(11)].
My first attempt was the code below:
whitespace --> [Ws], { code_type(Ws, space) }, whitespace.
whitespace --> [].
letter(Let)     --> [Let], { code_type(Let, alpha) }.
symbol([Sym|T]) --> letter(Sym), symbol(T).
symbol([Sym])   --> letter(Sym).
digit(Dg)        --> [Dg], { code_type(Dg, digit) }.
digits([Dg|Dgs]) --> digit(Dg), digits(Dgs).
digits([Dg])     --> digit(Dg).
token(tkSym(Token)) --> symbol(Token). 
token(tkNum(Token)) --> digits(Digits), { number_chars(Token, Digits) }.
tokenize([Token|Tokens]) --> whitespace, token(Token), tokenize(Tokens).
tokenize([]) --> whitespace, [].  
Calling tokenize on "aa bb" leaves me with several possible responses:
 ?- tokenize(X, "aa bb", []).
 X = [tkSym([97|97]), tkSym([98|98])] ;
 X = [tkSym([97|97]), tkSym(98), tkSym(98)] ;
 X = [tkSym(97), tkSym(97), tkSym([98|98])] ;
 X = [tkSym(97), tkSym(97), tkSym(98), tkSym(98)] ;
 false.
In this case, however, it seems appropriate to expect only one correct answer. Here's another, more deterministic approach:
whitespace --> [Space], { char_type(Space, space) }, whitespace.
whitespace --> [].
symbol([Sym|T]) --> letter(Sym), !, symbol(T).
symbol([])      --> [].
letter(Let)     --> [Let], { code_type(Let, alpha) }.
% similarly for numbers
token(tkSym(Token)) --> symbol(Token).
tokenize([Token|Tokens]) --> whitespace, token(Token), !, tokenize(Tokens).
tokenize([]) --> whiteSpace, [].
But there is a problem: although the single answer to token called on "aa" is now a nice list, the tokenize predicate ends up in an infinite recursion:
 ?- token(X, "aa", []).
 X = tkSym([97, 97]).
 ?- tokenize(X, "aa", []).
 ERROR: Out of global stack
What am I missing? How is the problem usually solved in Prolog?
via Chebli Mohamed
 
Aucun commentaire:
Enregistrer un commentaire