Contents

Common-Lisp strings

你应该知道,string 在 common lisp 中它既是arrays 也是 sequences. 也就是说,arrays 和 sequences的操作都可以应用在string上。如果你找不到某个string特有的函数,你应该去找一找arrays 和 sequences的函数。

还有一些额外的libraries 托管在 quicklisp上,这里只给出英文介绍 ASDF3, which is included with almost all Common Lisp implementations, includes Utilities for Implementation- and OS- Portability (UIOP), which defines functions to work on strings (strcat, string-prefix-p, string-enclosed-p, first-char, last-char, split-string, stripln). Some external libraries available on Quicklisp bring some more functionality or some shorter ways to do.

  • str defines trim, words, unwords, lines, unlines, concat, split, shorten, repeat, replace-all, starts-with?, ends-with?, blankp, emptyp, …
  • Serapeum is a large set of utilities with many string manipulation functions.
  • cl-change-case has functions to convert strings between camelCase, param-case, snake_case and more. They are also included into str.
  • mk-string-metrics has functions to calculate various string metrics efficiently (Damerau-Levenshtein, Hamming, Jaro, Jaro-Winkler, Levenshtein, etc),
  • and cl-ppcre can come in handy, for example ppcre:replace-regexp-all. See the regexp section.

Last but not least, when you’ll need to tackle the format construct, don’t miss the following resources:

the official CLHS documentation

  • a quick reference
  • a CLHS summary on HexstreamSoft
  • plus a Slime tip: type C-c C-d ~ plus a letter of a format directive to open up its documentation. Again more useful with ivy-mode or helm-mode.

创建 字符串

最简单的,我们可以使用双引号创建string.但是其实我们还有别的方法:

  • 使用format nil
    1
    2
    
    (defparameter person "you")
    (format nil "hello ~a" person) ;; => "hello you"
    
  • make-string count 创建指定长度的字符串。 :initial-element 字符会被重复count次
    1
    
    (make-string 3 :initial-element #\♥) ;; => "♥♥♥"
    

访问子串

string 是一个sequence,你可以使用subseq 来访问它的子串 先给出一个比较易懂的签名

1
(subseq my-string start end)

这里是调用

1
2
3
4
5
6
7
8
* (defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
* (subseq *my-string* 8)
"Marx"
* (subseq *my-string* 0 7)
"Groucho"
* (subseq *my-string* 1 5)
"rouc"

也可以像序列那样用setf 和 subseq 配合来操作字符串

1
2
3
4
5
6
7
8
* (defparameter *my-string* (string "Harpo Marx"))
*MY-STRING*
* (subseq *my-string* 0 5)
"Harpo"
* (setf (subseq *my-string* 0 5) "Chico")
"Chico"
* *my-string*
"Chico Marx"
string isn`t stretchable

字符串的长度是不可变的,如果新的子串的长度和原始子串的长度不同,短的那一个将决定多少个字符将被替换,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
* (defparameter *my-string* (string "Karl Marx"))
*MY-STRING*
* (subseq *my-string* 0 4)
"Karl"
* (setf (subseq *my-string* 0 4) "Harpo")
"Harpo"
* *my-string*
"Harp Marx"
* (subseq *my-string* 4)
" Marx"
* (setf (subseq *my-string* 4) "o Marx")
"o Marx"
* *my-string*
"Harpo Mar"

访问单个字符

char函数专门用来访问字符串中的单个字符,char也可以和setf配合使用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
(char *my-string* 11)
#\x
(char *my-string* 7)
#\Space
(char *my-string* 6)
#\o
(setf (char *my-string* 6) #\y)
#\y
*my-string*
"Grouchy Marx"

还有一个schar也可以做到同样的事情,但是在特定情况下,schar会更快一些 因为strings 既是 arrays 也是 sequence. 你也可以用更加通用的aref 和 elt (但是char的效率会更高)

1
2
3
4
5
6
(defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
(aref *my-string* 3)
#\u
(elt *my-string* 8)
#\M

从string中删除和替换

可以使用 sequence的函数来对string中的子串进行删除和替换操作

  • 从string中删除一个字符
1
2
3
4
5
6
7
8
(remove #\o "Harpo Marx")
"Harp Marx"
(remove #\a "Harpo Marx")
"Hrpo Mrx"
(remove #\a "Harpo Marx" :start 2)
"Harpo Mrx"
(remove-if #'upper-case-p "Harpo Marx")
"arpo arx"
  • 使用substitute(non destructive) 或者 replace (destructive) 来替换一个字符
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
(substitute #\u #\o "Groucho Marx")
"Gruuchu Marx"
(substitute-if #\_ #'upper-case-p "Groucho Marx")
"_roucho _arx"
(defparameter *my-string* (string "Zeppo Marx"))
*MY-STRING*
(replace *my-string* "Harpo" :end1 5)
"Harpo Marx"
*my-string*
"Harpo Marx"

拼接字符串 (Concatenating string)

concatenate 是sequence的通用函数,在对string进行操作时,应该指定返回值的类型

1
2
3
4
(concatenate 'string "karl" " " "Marx")
;; => "Karl Marx"
(concatenate 'list "Karl" " " "Marx")
;; => (#\K #\a #\r #\l #\Space #\M #\a #\r #\x)

使用UIOP库的话,可以用strcat:

1
(uiop:strcat "karl" " " marx")

或者是str library 使用concat:

1
(str:concat "foo" "bar")

一次操作一个字符

使用Map函数一次操作一个字符

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
(defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
(map 'string #'(lambda (c) (print c)) *my-string*)
#\G
#\r
#\o
#\u
#\c
#\h
#\o
#\Space
#\M
#\a
#\r
#\x
"Groucho Marx"

或者使用loop 函数

1
2
3
(loop for char across "Zeppo"
      collect char)
(#\Z #\e #\p #\p #\o)

根据word 或 character翻转string

使用reverse (或者destructive 版的 nreverse) 来根据character反转字符串

1
2
3
4
(defparameter *my-string* (string "DSL"))
*MY-STRING*
(reverse *my-string*)
"LSD"

在CL中 没有直接根据word反转字符串的函数,你可以使用第三方库 比如SPLIT-SEQUENCE 或者你自己实现一套解决方案 我们可以使用str库

1
2
3
4
5
6
(defparameter *singing* "singing in the rain")
*SINGING*
(str:words *SINGING*)
;; => ("singing" "in" "the" "rain")
(str:unwords (reverse (str:words *singing*)))
;; => "rain the in singing"

Breaking strings into graphenes,sentences,lines and words

These functions use SBCL’s sb-unicode: they are SBCL specific.

  • sb-unicode:sentences 将string 以段落切割,根据他默认的段落分割规则
  • sb-unicode:lines 将string 分割成行(长度不会超过:margin 指定的参数 默认80)
1
2
3
4
5
6
(sb-unicode:lines "A first sentence. A second somewhat long one." :margin 10)
;; => ("A first"
;; "sentence."
;; "A second"
;; "somewhat"
;; "long one.")
  • sb-unicode:words 和 sb-unicode:graphenes 可以自己去查看
确保运行在sbcl中
1
2
3
4
#+sbcl
(runs on sbcl)
#-sbcl
(runs on other implementations)

Controlling Case 控制大小写

Common lisp 提供了大量的函数来控制字符串的大小写

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(string-upcase "cool")
;; => "COOL"
(string-upcase "Cool")
;; => "COOL"
(string-downcase "COOL")
;; => "cool"
(string-downcase "Cool")
;; => "cool"
(string-capitalize "cool")
;; => "Cool"
(string-capitalize "cool example")
;; => "Cool Example"

这些函数可以接受:start 和 :key 所以你可以只对字符串的指定部分进行操作。 这些函数也有destructive的版本都以n开头

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(string-capitalize "cool example" :start 5)
;; => "cool Example"
(string-capitalize "cool example" :end 5)
;; => "Cool example"
(defparameter *my-string* (string "BIG"))
;; => *MY-STRING*
(defparameter *my-downcase-string* (nstring-downcase *my-string*))
;; => *MY-DOWNCASE-STRING*
*my-downcase-string*
;; => "big"
*my-string*
;; => "big"
warning
对于 string-upcase,string-downcase 和 string-capitalize,string 是没有被修改的。但是如果在string中没有任何字符需要转换,那么返回值有可能是源string 或者 源string的副本
tips
在CL中 n开头的函数一般是destructive的

使用format函数控制

  • To lower case:
    1
    2
    
    (format t "~(~a~)" "HELLO WORLD")
    ;; => hello world
    
  • Capitalize every word:
    1
    2
    
    (format t "~:(~a~)" "HELLO WORLD")
    ;; => Hello World
    
  • Capitalize the first word:
    1
    2
    
    (format t "~@(~a~)" "hello world")
    ;; => Hello world
    
  • To upper case
    1
    2
    
    (format t "~@:(~a~)" "hello world")
    ;; => HELLO WORLD
    

将字符串左右的空格截掉

其实不单单可以截掉空格,还可以丢弃一些不需要的字符。string-trim,string-left-trim,string-right-trim 返回一个子串,子串不包含第一个参数中的字符。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
(string-trim " " " trim me ")
;; => "trim me"
(string-trim " et" " trim me ")
;; => "rim m"
(string-left-trim " et" " trim me ")
;; => "rim me "
(string-right-trim " et" " trim me ")
;; => " trim m"
(string-right-trim '(#\Space #\e #\t) " trim me ")
;; = >" trim m"
(string-right-trim '(#\Space #\e #\t #\m) " trim me ")

在symbol 和 字符串之间转换

  • intern 将string转化成symbol
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    
    (in-package "COMMON-LISP-USER")
    ;; => #<The COMMON-LISP-USER package, 35/44 internal, 0/9 external>
    (intern "MY-SYMBOL")
    ;; => MY-SYMBOL
    (intern "MY-SYMBOL")
    ;; => MY-SYMBOL
    ;; =>:INTERNAL
    (export 'MY-SYMBOL)
    ;; => T
    (intern "MY-SYMBOL")
    ;; => MY-SYMBOL
    ;; => :EXTERNAL
    (intern "My-Symbol")
    ;; => |My-Symbol|
    ;; => NIL
    (intern "MY-SYMBOL" "KEYWORD")
    ;; => :MY-SYMBOL
    ;; => NIL
    (intern "MY-SYMBOL" "KEYWORD")
    ;; => :MY-SYMBOL
    ;; => :EXTERNAL
    
  • symbol-name 和 string 将symbol 转换成 string
    1
    2
    3
    4
    5
    6
    7
    8
    
    (symbol-name 'MY-SYMBOL)
    ;; => "MY-SYMBOL"
    (symbol-name 'my-symbol)
    ;; => "MY-SYMBOL"
    (symbol-name '|my-symbol|)
    ;; => "my-symbol"
    (string 'howdy)
    ;; => "HOWDY"
    

在string 和 character之间转换

  • coerce 将string(长度为1)转换成character.
    1
    2
    3
    4
    
    (coerce "a" 'character)
    ;; => #\a
    (coerce (subseq "cool" 2 3) 'character)
    ;; => #\o
    
  • coerce 将字符串转换中字符list
    1
    2
    
    (coerce "cool" 'list)
    ;; => (#\c #\o #\o #\l)
    
  • coerce 将字符list转换成string
    1
    2
    
    (coerce '(#\h #\e #\y) 'string)
    ;; => "hey"
    
  • coerce 将array 转换成string
    1
    2
    3
    4
    5
    6
    
    (defparameter *my-array* (make-array 5 :initial-element #\x))
    ;; => *MY-ARRAY*
    *my-array*
    ;; => #(#\x #\x #\x #\x #\x)
    (coerce *my-array* 'string)
    ;; => "xxxxx"
    

在string中寻找一个元素

使用find,position 和他们的-if后缀的函数 查找string中的character

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
(find #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
;; => #\t
(find #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
;; => #\T
(find #\z "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
;; => NIL
(find-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
;;=> #\1
(find-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :from-end t)
;; => #\0
(position #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
;; => 17
(position #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
;; => 0
(position-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
;; => 37
(position-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :from-end t)
;; => 43

使用count族函数计算字符在字符串中出现的次数

1
2
3
4
5
6
7
8
(count #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
;; => 2
(count #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
;; => 3
(count-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
;; => 6
(count-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :start 38)
;; => 5

在字符串中查找一个子串

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(search "we" "If we can't be free we can at least be cheap")
;; => 3
(search "we" "If we can't be free we can at least be cheap" :from-end t)
;; => 20
(search "we" "If we can't be free we can at least be cheap" :start2 4)
;; => 20
(search "we" "If we can't be free we can at least be cheap" :end2 5 :from-end t)
;; => 3
(search "FREE" "If we can't be free we can at least be cheap")
;; => NIL
(search "FREE" "If we can't be free we can at least be cheap" :test #'char-equal)
;; => 15

将string 转换成number

  • to integer 会返回两个值,一个是被转换后的值,另一个是转换停止的位置
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    
    (parse-integer "42")
    ;; => 42
    ;; => 2
    (parse-integer "42" :start 1)
    ;; => 2
    ;; => 2
    (parse-integer "42" :end 1)
    ;; => 4
    ;; => 1
    (parse-integer "42" :radix 8)
    ;; => 34
    ;; =>2
    (parse-integer " 42 ")
    ;; => 42
    ;; => 3
    (parse-integer " 42 is forty-two" :junk-allowed t)
    ;; => 42
    ;; => 3
    (parse-integer " 42 is forty-two")
    
    Error in function PARSE-INTEGER:
    There's junk in this string: " 42 is forty-two".
    
  • 转换成任意number: read-from-string
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
    (read-from-string "#X23")
    ;; => 35,4
    (read-from-string "4.5")
    ;; => 4.5,3
    (read-from-string "6/8")
    ;; => 3/4,3
    (read-from-string "#C(6/8 1)")
    ;; => #C(3/4 1),9
    (read-from-string "1.2e2")
    ;; => 120.00001,5
    (read-from-string "symbol")
    ;; SYMBOL.6
    (defparameter *foo* 42)
    ;; => *FOO*
    (read-from-string "#.(setq *foo* \"gotcha\")")
    ;; => "gotcha",23
    *foo*
    ;; => "gotcha"
    

转换成float

parse-float 库提供转换成float的函数

1
2
3
(ql:quickload "parse-float")
(parse-float:parse-float "1.2e2")
;; => 120.00001,5

number 转 string

1
2
3
4
5
6
7
8
(write-to-string 250)
;; => "250"
(write-to-string 250.02)
;; => "250.02"
(write-to-string 250 :base 5)
;; => "2000"
(write-to-string (/ 1 3))
;; => "1/3"

字符串比较

equal 和 equalp 都可以比较两个字符串是否相同,但是equal是大小写敏感的,而equalp不是。还有一些string专用的函数。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
(string= "Marx" "Marx")
;; => T
(string= "Marx" "marx")
;; => NIL
(string-equal "Marx" "marx")
;; => T
(string< "Groucho" "Zeppo")
;; => 0
(string< "groucho" "Zeppo")
;; => NIL
(string-lessp "groucho" "Zeppo")
;; => 0
(mismatch "Harpo Marx" "Zeppo Marx" :from-end t :test #'char=)
;; => 3

String formatting

see https://lispcookbook.github.io/cl-cookbook/strings.html#string-formatting

捕获哪些东西被打印进了stream

在(with-output-to-string (mystream) …) 中任何打印进stream中的内容都会被捕获

1
2
3
4
5
6
7
8
9
(defun greet (name &key (stream t))
   ;; by default, print to standard output.
   (format stream "hello ~a" name))

(let ((output (with-output-to-string (stream)
                (greet "you" :stream stream))))
   (format t "Output is: '~a'. It is indeed a ~a, aka a string.~&" output (type-of output)))
;; Output is: 'hello you'. It is indeed a (SIMPLE-ARRAY CHARACTER (9)), aka a string.
;; NIL

删除标点符号

使用(str:remove-punctuation s) 或者 (str:no-case s)

1
2
3
4
5
(str:remove-punctuation "HEY! What's up ??")
;; "HEY What s up"

(str:no-case "HEY! What's up ??")
;; "hey what s up"