用lisp来让计算机学会写作

        大部分的代码、思路参考了《Ansi Common Lisp》P138~P141。

问题:给一篇英文文本,如何让计算机依据此文本而生成随机但可读的文本。如:

|Venture|

 The National Venture Capital Association estimates that wealth associated with a deal a big spending by regulations that will spend one another's main reason these projects . 

这是计算机学习了Paul Graham的一些文章后生成的随机文本。它根据Venture这个单词向两边延伸成一个句子。令人惊喜的是,文本常常是可读的。


  算法:记录每个单词后面出现的单词以及出现的次数,如I leave在原文中出现了5次,I want出现了3次,除此之外,其它地方没有出现过I,所以在生成随机文章的时候,当遇到I,有5/8的概率选择leave为下一个单词。假如选择了leave的话,则看看leave后面出现过哪些单词,重复以上过程。


现用lisp来解决问题。

lisp里的符号类型,即symbol,可以很好记录各种字符串还有标点符号,所以采用它来记录。采用内附的hashtable来建立列表:

        (defparameter *words* (make-hash-table :size 10000))

那如何建立列表呢?

 

(let ((prev '|.|))

  (defun see (sym)

    (let ((pair (assoc sym (gethash prev *words*))))

      (if pair

	  (incf (cdr pair))

	  (push (cons sym 1) (gethash prev *words*))))

    (setf prev sym)))


以当前单词为keyword,以assoc-list关系列表为该keyword下的值。

 

        如I下有( (|leave| . 5) (|want| . 3) )。没有单词word的话,则push入(word . 1)。

        如何随机选一个词呢?

 

(defun random-word (word ht)

  (let* ((choices (gethash word ht))

	 (x (random (reduce #'+ choices :key #'cdr))))

    (dolist (pair choices)

      (decf x (cdr pair))

      (if (minusp x)

	  (return (car pair))))))

 

        这里巧妙用了reduce函数。


        现在再来思考,如何将给定一个词向两侧延伸成一句话呢?

1)先将文本反向,得到一个反向的列表,也即I leave,I want变成leave I,want I。

2)将hashtable反向,得到另外一个hashtable,以后一个单词为关键字,前面可能出现的单词及次数构成assoc-list。

3)碰运气,从一个标点开始延续文章,直到出现给定单词为止。

        我用了第二个方法:

 

(defparameter *r-words* (make-hash-table :size 10000))



(defun push-words (w1 w2 n)

  (push (cons w2 n) (gethash w1 *r-words*)))



(defun get-reversed-words ();a cat -> cat a

  (maphash #'(lambda (k lst)

	       (dolist (pair lst)

		 (push-words (car pair) k (cdr pair))))

	   *words*))


         遍历原来的hashtable,再把每一对单词先后换个位置插入另外一个hashtable。

 

给出双向延伸句子的自动生成文本代码:

 

(defparameter *words* (make-hash-table :size 10000))

(defconstant maxword 100)

(defparameter nwords 0)

(defconstant debug nil)

(let ((prev '|.|))

  (defun see (sym)

    (incf nwords)

    (let ((pair (assoc sym (gethash prev *words*))))

      (if pair

	  (incf (cdr pair))

	  (push (cons sym 1) (gethash prev *words*))))

    (setf prev sym)))



(defun check-punc (c);char to symbol

  (case c

    (#\. '|.|) (#\, '|,|)

    (#\; '|;|) (#\? '|?|)

    (#\: '|:|) (#\! '|!|)))



(defun read-text (pathname)

  (with-open-file (str pathname :direction :input)

    (let ((buf (make-string maxword))

	  (pos 0))

      (do ((c (read-char str nil 'eof)

	      (read-char str nil 'eof)))

	  ((eql c 'eof))

	(if (or (alpha-char-p c)

		(eql c #\'))

	    (progn

	      (setf (char buf pos) c)

	      (incf pos))

	    (progn

	      (unless (zerop pos)

		(see (intern (subseq buf 0 pos)))

		(setf pos 0))

	      (let ((punc (check-punc c)))

		(if punc

		    (see punc)))))))))



(defun print-ht (ht)

  (maphash #'(lambda (k v)

		(format t "~A ~A~%" k v))

	     ht))



(defparameter *r-words* (make-hash-table :size 10000))



(defun push-words (w1 w2 n)

  (push (cons w2 n) (gethash w1 *r-words*)))



(defun get-reversed-words ();a cat -> cat a

  (maphash #'(lambda (k lst)

	       (dolist (pair lst)

		 (push-words (car pair) k (cdr pair))))

	   *words*))



(defun print-a-word (word ht)

  (maphash #'(lambda (k lst)

	       (if (eql k word)

		   (format t "~A ~A~%" k lst)))

	   ht))



(if debug

    (print-a-word '|leave| *r-words*))



(defun punc-p (sym);symbol to char,nil when fails.

  (check-punc (char (symbol-name sym) 0)))



(defun random-word (word ht)

  (let* ((choices (gethash word ht))

	 (x (random (reduce #'+ choices :key #'cdr))))

    (dolist (pair choices)

      (decf x (cdr pair))

      (if (minusp x)

	  (return (car pair))))))



(defun gen-former (word str)

  (let ((last (random-word word *r-words*)))

    (if (not (punc-p last))

	(progn 

	  (gen-former last str)

	  (format str "~A " last)))))



(defun gen-latter (word str)

  (let ((next (random-word word *words*)))

    (format str "~A " next)

    (if (not (punc-p next))

        (gen-latter next str))))



;(gen-latter '|leave| t)



(defun get-a-word (ht);get a random word

  (let ((x (random nwords)))

    (maphash #'(lambda (k v)

		 (dolist (pair v)

		   (decf x (cdr pair))

		   (if (minusp x)

		       (return-from get-a-word (car pair)))))

	     ht)))

;(get-a-word *words*)

(defun gen-sentence (word str)

  (gen-former word str)

  (format str "~A " word)

  (gen-latter word str))



(defun test ()

  (setf nwords 0)

  (read-text "essay.txt")

  (get-reversed-words)

  (let ((word (get-a-word *words*)))

    (print word)

    (gen-sentence word t)))

(test)


     文本语料库、lisp源代码见: Here

 

 

你可能感兴趣的:(lisp)