文本三剑客之gawk程序基础

虽然sed已经很牛逼了,但是再牛逼也有自身的限制。gawk就是用来搞定sed不能搞定的问题。

gawk可以做以下几件事情:

定义变量来保存数据;

使用算术和字符串操作符来处理数据;

使用结构化编程概念,为数据处理增加逻辑;

提取数据文件中的数据元素进行格式化。

gawk命令格式:

gawk options program file1

还是直接看例子吧。

列操作

首先假设我们有这样一个文本数据data:
No.1 Google Gmail
No.2 Microsoft Windows
No.3 SAP ERP
No.4 Intel Core
No.5 Cisco Rout

输入命令并得出输出结果:
$ gawk '{print $1}' data
No.1
No.2
No.3
No.4
No.5

看到了吧,啥情况,这是输出文本所有以空格为分隔符的第一列。输出第二列当然就是这样写了。
$ gawk '{print $2}' data
Google
Microsoft
SAP
Intel
Cisco

如果命令是‘$0’,会出现啥?
$ gawk '{print $0}' data
No.1 Google Gmail
No.2 Microsoft Windows
No.3 SAP ERP
No.4 Intel Core
No.5 Cisco Router

很显然,如果输入‘$0’,那整个文本就输出了。

好,有了这个功能,那我们分析一些系统文件是不是就省事多了,比如passwd文件里面的内容打开以后,如果直接看,那貌似比较困难,现在有了gwak,来先输出一下第一列吧。
$ gawk -F: '{print $1}' /etc/passwd
root
daemon
bin
sys
sync
games
man
lp
mail
news
uucp
proxy
www-data
backup
list
irc
gnats
nobody
systemd-timesync
systemd-network
systemd-resolve
systemd-bus-proxy
syslog
......

看到这个结果就爽,不过要注意:这里用了一个参数F,这个参数的意思就是指定行中分隔数据字段的字段分隔符。F后面紧跟了个‘:’,意思就是以‘:’为分隔符。

gawk当然也可以用于管道数据处理:
$ echo "I have a pen" | gawk '{$4="apple"; print $0}'
I have a apple

这个命令的意思就是将第四个词改为“apple”,然后全部输出。注意:$4=”apple”这个地方必须是双引号,如果是单引号就只会输出“I have a”。

与sed一样,gawk也可以一行一行地输入脚本命令:
$ gawk '{
> $4="apple"
> print $0}'

apple

apple

因为我们之前啥也每输入,也就是说没有原始字符串,所以每次回车都替换第四个单词为apple,然后通过ctrl+D就可以结束运行了。

从文件中读取命令

新建脚本script
{print $1 "'s home direcotry is " $6}1

运行并得出结果:
$ gawk -F: -f script /etc/passwd
root's home direcotry is /root
daemon's home direcotry is /usr/sbin
bin's home direcotry is /bin
sys's home direcotry is /dev
sync's home direcotry is /bin
games's home direcotry is /usr/games
man's home direcotry is /var/cache/man
lp's home direcotry is /var/spool/lpd
mail's home direcotry is /var/mail
news's home direcotry is /var/spool/news
uucp's home direcotry is /var/spool/uucp
proxy's home direcotry is /bin
www-data's home direcotry is /var/www
backup's home direcotry is /var/backups
list's home direcotry is /var/list
irc's home direcotry is /var/run/ircd
gnats's home direcotry is /var/lib/gnats
nobody's home direcotry is /nonexistent
systemd-timesync's home direcotry is /run/systemd
systemd-network's home direcotry is /run/systemd/netif
systemd-resolve's home direcotry is /run/systemd/resolve
systemd-bus-proxy's home direcotry is /run/systemd
syslog's home direcotry is /home/syslog
_apt's home direcotry is /nonexistent
messagebus's home direcotry is /var/run/dbus
uuidd's home direcotry is /run/uuidd
lightdm's home direcotry is /var/lib/lightdm
whoopsie's home direcotry is /nonexistent
avahi-autoipd's home direcotry is /var/lib/avahi-autoipd
avahi's home direcotry is /var/run/avahi-daemon
dnsmasq's home direcotry is /var/lib/misc
colord's home direcotry is /var/lib/colord
speech-dispatcher's home direcotry is /var/run/speech-dispatcher
hplip's home direcotry is /var/run/hplip
kernoops's home direcotry is /
pulse's home direcotry is /var/run/pulse
rtkit's home direcotry is /proc
saned's home direcotry is /var/lib/saned
usbmux's home direcotry is /var/lib/usbmux
comac's home direcotry is /home/comac
sshd's home direcotry is /var/run/sshd

当然这个脚本也可以这么写:
{
text="'s home directory is "
print $1 text $6
}

注意:每个命令分别放到新的一行,有没有分号无所谓。

BEGIN,END关键字的使用

这两个关键字理解也简单,BEGIN就是在执行脚本前运行这个,END就是在执行脚本后运行这个。

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/a87785cb2049b1936dce53b379fee2e8.html