Common-Lisp strings

2022-10-04 1932 words 10 minutes views

Contents

你应该知道，string 在 common lisp 中它既是arrays 也是 sequences. 也就是说，arrays 和 sequences的操作都可以应用在string上。如果你找不到某个string特有的函数，你应该去找一找arrays 和 sequences的函数。

还有一些额外的libraries 托管在 quicklisp上，这里只给出英文介绍 ASDF3, which is included with almost all Common Lisp implementations, includes Utilities for Implementation- and OS- Portability (UIOP), which defines functions to work on strings (strcat, string-prefix-p, string-enclosed-p, first-char, last-char, split-string, stripln). Some external libraries available on Quicklisp bring some more functionality or some shorter ways to do.

str defines trim, words, unwords, lines, unlines, concat, split, shorten, repeat, replace-all, starts-with?, ends-with?, blankp, emptyp, …
Serapeum is a large set of utilities with many string manipulation functions.
cl-change-case has functions to convert strings between camelCase, param-case, snake_case and more. They are also included into str.
mk-string-metrics has functions to calculate various string metrics efficiently (Damerau-Levenshtein, Hamming, Jaro, Jaro-Winkler, Levenshtein, etc),
and cl-ppcre can come in handy, for example ppcre:replace-regexp-all. See the regexp section.

Last but not least, when you’ll need to tackle the format construct, don’t miss the following resources:

the official CLHS documentation

a quick reference
a CLHS summary on HexstreamSoft
plus a Slime tip: type C-c C-d ~ plus a letter of a format directive to open up its documentation. Again more useful with ivy-mode or helm-mode.

创建字符串

最简单的，我们可以使用双引号创建string.但是其实我们还有别的方法:

使用format nil

1
2


(defparameter person "you")
(format nil "hello ~a" person) ;; => "hello you"

make-string count 创建指定长度的字符串。 :initial-element 字符会被重复count次
1

(make-string 3 :initial-element #\♥) ;; => "♥♥♥"

访问子串

string 是一个sequence,你可以使用subseq 来访问它的子串先给出一个比较易懂的签名

1

(subseq my-string start end)

这里是调用

1
2
3
4
5
6
7
8


* (defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
* (subseq *my-string* 8)
"Marx"
* (subseq *my-string* 0 7)
"Groucho"
* (subseq *my-string* 1 5)
"rouc"

也可以像序列那样用setf 和 subseq 配合来操作字符串

1
2
3
4
5
6
7
8


* (defparameter *my-string* (string "Harpo Marx"))
*MY-STRING*
* (subseq *my-string* 0 5)
"Harpo"
* (setf (subseq *my-string* 0 5) "Chico")
"Chico"
* *my-string*
"Chico Marx"

string isn`t stretchable

字符串的长度是不可变的，如果新的子串的长度和原始子串的长度不同，短的那一个将决定多少个字符将被替换，

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


* (defparameter *my-string* (string "Karl Marx"))
*MY-STRING*
* (subseq *my-string* 0 4)
"Karl"
* (setf (subseq *my-string* 0 4) "Harpo")
"Harpo"
* *my-string*
"Harp Marx"
* (subseq *my-string* 4)
" Marx"
* (setf (subseq *my-string* 4) "o Marx")
"o Marx"
* *my-string*
"Harpo Mar"

访问单个字符

char函数专门用来访问字符串中的单个字符，char也可以和setf配合使用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


(defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
(char *my-string* 11)
#\x
(char *my-string* 7)
#\Space
(char *my-string* 6)
#\o
(setf (char *my-string* 6) #\y)
#\y
*my-string*
"Grouchy Marx"

还有一个schar也可以做到同样的事情，但是在特定情况下，schar会更快一些因为strings 既是 arrays 也是 sequence. 你也可以用更加通用的aref 和 elt (但是char的效率会更高)

1
2
3
4
5
6


(defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
(aref *my-string* 3)
#\u
(elt *my-string* 8)
#\M

从string中删除和替换

可以使用 sequence的函数来对string中的子串进行删除和替换操作

从string中删除一个字符

1
2
3
4
5
6
7
8


(remove #\o "Harpo Marx")
"Harp Marx"
(remove #\a "Harpo Marx")
"Hrpo Mrx"
(remove #\a "Harpo Marx" :start 2)
"Harpo Mrx"
(remove-if #'upper-case-p "Harpo Marx")
"arpo arx"

使用substitute(non destructive) 或者 replace (destructive) 来替换一个字符

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


(substitute #\u #\o "Groucho Marx")
"Gruuchu Marx"
(substitute-if #\_ #'upper-case-p "Groucho Marx")
"_roucho _arx"
(defparameter *my-string* (string "Zeppo Marx"))
*MY-STRING*
(replace *my-string* "Harpo" :end1 5)
"Harpo Marx"
*my-string*
"Harpo Marx"

拼接字符串 (Concatenating string)

concatenate 是sequence的通用函数，在对string进行操作时，应该指定返回值的类型

1
2
3
4


(concatenate 'string "karl" " " "Marx")
;; => "Karl Marx"
(concatenate 'list "Karl" " " "Marx")
;; => (#\K #\a #\r #\l #\Space #\M #\a #\r #\x)

使用UIOP库的话，可以用strcat:

1

(uiop:strcat "karl" " " marx")

或者是str library 使用concat:

1

(str:concat "foo" "bar")

一次操作一个字符

使用Map函数一次操作一个字符

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


(defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
(map 'string #'(lambda (c) (print c)) *my-string*)
#\G
#\r
#\o
#\u
#\c
#\h
#\o
#\Space
#\M
#\a
#\r
#\x
"Groucho Marx"

或者使用loop 函数

1
2
3


(loop for char across "Zeppo"
      collect char)
(#\Z #\e #\p #\p #\o)

根据word 或 character翻转string

使用reverse (或者destructive 版的 nreverse) 来根据character反转字符串

1
2
3
4


(defparameter *my-string* (string "DSL"))
*MY-STRING*
(reverse *my-string*)
"LSD"

在CL中没有直接根据word反转字符串的函数，你可以使用第三方库比如SPLIT-SEQUENCE 或者你自己实现一套解决方案我们可以使用str库

1
2
3
4
5
6


(defparameter *singing* "singing in the rain")
*SINGING*
(str:words *SINGING*)
;; => ("singing" "in" "the" "rain")
(str:unwords (reverse (str:words *singing*)))
;; => "rain the in singing"

Breaking strings into graphenes,sentences,lines and words

These functions use SBCL’s sb-unicode: they are SBCL specific.

sb-unicode:sentences 将string 以段落切割，根据他默认的段落分割规则
sb-unicode:lines 将string 分割成行（长度不会超过:margin 指定的参数默认80）

1
2
3
4
5
6


(sb-unicode:lines "A first sentence. A second somewhat long one." :margin 10)
;; => ("A first"
;; "sentence."
;; "A second"
;; "somewhat"
;; "long one.")

sb-unicode:words 和 sb-unicode:graphenes 可以自己去查看

确保运行在sbcl中

1
2
3
4


#+sbcl
(runs on sbcl)
#-sbcl
(runs on other implementations)

Controlling Case 控制大小写

Common lisp 提供了大量的函数来控制字符串的大小写

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


(string-upcase "cool")
;; => "COOL"
(string-upcase "Cool")
;; => "COOL"
(string-downcase "COOL")
;; => "cool"
(string-downcase "Cool")
;; => "cool"
(string-capitalize "cool")
;; => "Cool"
(string-capitalize "cool example")
;; => "Cool Example"

这些函数可以接受:start 和 :key 所以你可以只对字符串的指定部分进行操作。这些函数也有destructive的版本都以n开头

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


(string-capitalize "cool example" :start 5)
;; => "cool Example"
(string-capitalize "cool example" :end 5)
;; => "Cool example"
(defparameter *my-string* (string "BIG"))
;; => *MY-STRING*
(defparameter *my-downcase-string* (nstring-downcase *my-string*))
;; => *MY-DOWNCASE-STRING*
*my-downcase-string*
;; => "big"
*my-string*
;; => "big"

warning

对于 string-upcase,string-downcase 和 string-capitalize,string 是没有被修改的。但是如果在string中没有任何字符需要转换，那么返回值有可能是源string 或者源string的副本

tips

在CL中 n开头的函数一般是destructive的

使用format函数控制

To lower case:

1
2


(format t "~(~a~)" "HELLO WORLD")
;; => hello world

Capitalize every word:

1
2


(format t "~:(~a~)" "HELLO WORLD")
;; => Hello World

Capitalize the first word:

1
2


(format t "~@(~a~)" "hello world")
;; => Hello world

To upper case

1
2


(format t "~@:(~a~)" "hello world")
;; => HELLO WORLD

将字符串左右的空格截掉

其实不单单可以截掉空格，还可以丢弃一些不需要的字符。string-trim,string-left-trim,string-right-trim 返回一个子串，子串不包含第一个参数中的字符。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


(string-trim " " " trim me ")
;; => "trim me"
(string-trim " et" " trim me ")
;; => "rim m"
(string-left-trim " et" " trim me ")
;; => "rim me "
(string-right-trim " et" " trim me ")
;; => " trim m"
(string-right-trim '(#\Space #\e #\t) " trim me ")
;; = >" trim m"
(string-right-trim '(#\Space #\e #\t #\m) " trim me ")

在symbol 和字符串之间转换

intern 将string转化成symbol

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


(in-package "COMMON-LISP-USER")
;; => #<The COMMON-LISP-USER package, 35/44 internal, 0/9 external>
(intern "MY-SYMBOL")
;; => MY-SYMBOL
(intern "MY-SYMBOL")
;; => MY-SYMBOL
;; =>:INTERNAL
(export 'MY-SYMBOL)
;; => T
(intern "MY-SYMBOL")
;; => MY-SYMBOL
;; => :EXTERNAL
(intern "My-Symbol")
;; => |My-Symbol|
;; => NIL
(intern "MY-SYMBOL" "KEYWORD")
;; => :MY-SYMBOL
;; => NIL
(intern "MY-SYMBOL" "KEYWORD")
;; => :MY-SYMBOL
;; => :EXTERNAL

symbol-name 和 string 将symbol 转换成 string

1
2
3
4
5
6
7
8


(symbol-name 'MY-SYMBOL)
;; => "MY-SYMBOL"
(symbol-name 'my-symbol)
;; => "MY-SYMBOL"
(symbol-name '|my-symbol|)
;; => "my-symbol"
(string 'howdy)
;; => "HOWDY"

在string 和 character之间转换

coerce 将string(长度为1)转换成character.

1
2
3
4


(coerce "a" 'character)
;; => #\a
(coerce (subseq "cool" 2 3) 'character)
;; => #\o

coerce 将字符串转换中字符list

1
2


(coerce "cool" 'list)
;; => (#\c #\o #\o #\l)

coerce 将字符list转换成string

1
2


(coerce '(#\h #\e #\y) 'string)
;; => "hey"

coerce 将array 转换成string

1
2
3
4
5
6


(defparameter *my-array* (make-array 5 :initial-element #\x))
;; => *MY-ARRAY*
*my-array*
;; => #(#\x #\x #\x #\x #\x)
(coerce *my-array* 'string)
;; => "xxxxx"

在string中寻找一个元素

使用find,position 和他们的-if后缀的函数查找string中的character

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


(find #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
;; => #\t
(find #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
;; => #\T
(find #\z "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
;; => NIL
(find-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
;;=> #\1
(find-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :from-end t)
;; => #\0
(position #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
;; => 17
(position #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
;; => 0
(position-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
;; => 37
(position-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :from-end t)
;; => 43

使用count族函数计算字符在字符串中出现的次数

1
2
3
4
5
6
7
8


(count #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
;; => 2
(count #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
;; => 3
(count-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
;; => 6
(count-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :start 38)
;; => 5

在字符串中查找一个子串

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


(search "we" "If we can't be free we can at least be cheap")
;; => 3
(search "we" "If we can't be free we can at least be cheap" :from-end t)
;; => 20
(search "we" "If we can't be free we can at least be cheap" :start2 4)
;; => 20
(search "we" "If we can't be free we can at least be cheap" :end2 5 :from-end t)
;; => 3
(search "FREE" "If we can't be free we can at least be cheap")
;; => NIL
(search "FREE" "If we can't be free we can at least be cheap" :test #'char-equal)
;; => 15

将string 转换成number

to integer 会返回两个值，一个是被转换后的值，另一个是转换停止的位置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


(parse-integer "42")
;; => 42
;; => 2
(parse-integer "42" :start 1)
;; => 2
;; => 2
(parse-integer "42" :end 1)
;; => 4
;; => 1
(parse-integer "42" :radix 8)
;; => 34
;; =>2
(parse-integer " 42 ")
;; => 42
;; => 3
(parse-integer " 42 is forty-two" :junk-allowed t)
;; => 42
;; => 3
(parse-integer " 42 is forty-two")

Error in function PARSE-INTEGER:
There's junk in this string: " 42 is forty-two".

转换成任意number: read-from-string

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


(read-from-string "#X23")
;; => 35,4
(read-from-string "4.5")
;; => 4.5,3
(read-from-string "6/8")
;; => 3/4,3
(read-from-string "#C(6/8 1)")
;; => #C(3/4 1),9
(read-from-string "1.2e2")
;; => 120.00001,5
(read-from-string "symbol")
;; SYMBOL.6
(defparameter *foo* 42)
;; => *FOO*
(read-from-string "#.(setq *foo* \"gotcha\")")
;; => "gotcha",23
*foo*
;; => "gotcha"

转换成float

parse-float 库提供转换成float的函数

1
2
3


(ql:quickload "parse-float")
(parse-float:parse-float "1.2e2")
;; => 120.00001,5

number 转 string

1
2
3
4
5
6
7
8


(write-to-string 250)
;; => "250"
(write-to-string 250.02)
;; => "250.02"
(write-to-string 250 :base 5)
;; => "2000"
(write-to-string (/ 1 3))
;; => "1/3"

字符串比较

equal 和 equalp 都可以比较两个字符串是否相同，但是equal是大小写敏感的，而equalp不是。还有一些string专用的函数。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


(string= "Marx" "Marx")
;; => T
(string= "Marx" "marx")
;; => NIL
(string-equal "Marx" "marx")
;; => T
(string< "Groucho" "Zeppo")
;; => 0
(string< "groucho" "Zeppo")
;; => NIL
(string-lessp "groucho" "Zeppo")
;; => 0
(mismatch "Harpo Marx" "Zeppo Marx" :from-end t :test #'char=)
;; => 3

String formatting

see https://lispcookbook.github.io/cl-cookbook/strings.html#string-formatting

捕获哪些东西被打印进了stream

在(with-output-to-string (mystream) …) 中任何打印进stream中的内容都会被捕获

1
2
3
4
5
6
7
8
9


(defun greet (name &key (stream t))
   ;; by default, print to standard output.
   (format stream "hello ~a" name))

(let ((output (with-output-to-string (stream)
                (greet "you" :stream stream))))
   (format t "Output is: '~a'. It is indeed a ~a, aka a string.~&" output (type-of output)))
;; Output is: 'hello you'. It is indeed a (SIMPLE-ARRAY CHARACTER (9)), aka a string.
;; NIL

删除标点符号

使用(str:remove-punctuation s) 或者 (str:no-case s)

1
2
3
4
5


(str:remove-punctuation "HEY! What's up ??")
;; "HEY What s up"

(str:no-case "HEY! What's up ??")
;; "hey what s up"