Linux/grep

Print only matching: grep -o '[PATTERN]'

Get IP addresses: ifconfig | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}'

IP=`curl -s ip.oeey.com` echo "$IP" | grep -o '^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}$' echo $? # 0 on success, 1 on fail

Ignore fully commented lines: cat [file] | grep -v "^\s*#" | grep -v "^$"

Show Non ASCII Characters
grep -a --color='auto' -P -n "[\x80-\xFF]" file.xml grep -a --color='auto' -P -n "[^\x00-\x7F]" file.xml

echo '소녀시대' | grep -P "[\x80-\xFF]"

grep -axv '.*' file.txt # doesn't seem to work on anything

Sample: https://www.w3.org/2001/06/utf-8-wrong/UTF-8-test.html

ref: https://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters

Detect Corrupted Unicode Characters
awk '/[^\x00-\x7F]/{ print NR ":", $0 }' file.txt

$ awk '/[^\x00-\x7F]/{ print NR ":", $0 }' file 1: Interruptor EC nÃ£o estÃ¡ em DESLOCAR 4: è¾…åŠ©é©¾é©¶å®¤é—¨å…³é—­ 5: Porte cab. aux. fermÃ©e 7: Ð”Ð²ÐµÑ€ÑŒ Ð°Ð¿Ð¿Ð°Ñ€Ð°Ñ‚Ð½Ð¾Ð¹ ÐºÐ°Ð¼ÐµÑ€Ñ‹ Ð·Ð°ÐºÑ€Ñ‹Ñ‚Ð° 13: é«˜åŽ‹ä¿æŠ¤æ‰‹æŸ„å‘ä¸‹ 14: BarriÃ¨re descendue 16: ÐžÐ³Ñ€Ð°Ð½Ð¸Ñ‡. ÐŸÐ»Ð°Ð½ÐºÐ° Ð’Ð’Ðš Ð¾Ð¿ÑƒÑ‰. 19: Barra de separaÃ§Ã£o descida 22: DPæœªå¯åŠ¨ 23: Puiss. rÃ©p. non activÃ©e 25: !!! Ð’Ð½ÐµÑˆÐ½ÑÑ Ð¼Ð¾Ñ‰Ð½Ð¾ÑÑ‚ÑŒ Ð½Ðµ Ð²ÐºÐ»ÑŽÑ‡ÐµÐ½Ð° 26: PotÃªncia Dist NÃ£o Ativada 28: PotÃªncia dist nÃ£o activada 31: æœºè½¦æœªç§»åŠ¨ 33: Motor no se estÃ¡ moviendo 34: Ð›Ð¾ÐºÐ¾Ð¼Ð¾Ñ‚Ð¸Ð² Ð½ÐµÐ¿Ð¾Ð´Ð²Ð¸Ð¶ÐµÐ½ 35: Auto NÃ£o se Movendo 37: A nÃ£o se move 40: æœºè½¦çŠ¶å†µå…è®¸è‡ªåŠ¨åœæœº 41: Conditions auto\npermettent arrÃªt auto 43: Ð£ÑÑ‚Ð°Ð½Ð¾Ð²ÐºÐ¸ Ð»Ð¾ÐºÐ¾Ð¼Ð¾Ñ‚Ð¸Ð²Ð°\nÐŸÑ€ÐµÐ´ÑƒÑÐ¼Ð°Ñ‚Ñ€Ð¸Ð²Ð°ÑŽÑ‚ Ð    °Ð²Ñ‚Ð¾Ð¼Ð°Ñ‚Ð¸Ñ‡ÐµÑÐºÑƒÑŽ Ð¾ÑÑ‚Ð°Ð½Ð¾Ð²ÐºÑƒ 44: CondiÃ§Ãµes da moto\nPermitem Auto Parada

Ref: https://stackoverflow.com/questions/30738924/detecting-corrupt-characters-in-utf-8-encoded-text-file

--

Assuming you have your locale set to UTF-8 (see locale output), this works well to recognize invalid UTF-8 sequences:

grep -axv '.*' file.txt

Ref: https://stackoverflow.com/questions/29465612/how-to-detect-invalid-utf8-unicode-binary-in-a-text-file