Thursday, December 29, 2011

Using grep command

Let's talk about something used by some users of Linux that handle large amounts of information, the grep command.
This command lets you search for occurrences of text or expressions in a file or stdin, excellent links to search, numbers, IPs and other elements in large log files.


Its syntax is  

  grep [OPTION] ... PATTERN [FILE]

and usually the most used options are 
  -E (use extended regular expression
  -i (ignore case) 
  -v (show only non-matching occurrences)
  -o (show only the part of the line matching)

Now as an example, we use the following template file "test.txt"

123
test
192.168.0.1
192.168.300.0
something.168.0.0
let's put some numbers too 123456 done!
http://hackedprojects.blogspot.com
mymail@provider.com
see my blog in http://hackedprojects.blogspot.com but don't send anything to mymail@provider.com because it isn't my mail address
:)
<a href="http://www.google.com/">Google</a> 

And now a set of commands used frequently (at least by me)
  • grep "[0-9]" teste.txt
Show everything with numbers (line 1,3,4,5,6)

  • grep "[^0-9]" teste.txt
 Shows anything that hasn't just numbers (every line except 1) 

  • grep "^[^0-9]" teste.txt
Show everything that doesn't start with numbers (every line except 1,3,4) 

  • grep -E -o "[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+" teste.txt
Get every email address, note the -E and -o arguments, they are important. I can't guarantee that this expression will work with all mail, however there are many more expressions, some more effective than others

  • grep "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" teste.txt
Get any ip address, without number range validation (line 3 and 4)

  • grep "\([^\.]\|^\)\([0-9]\{1,2\}\|1[0-9]\{2\}\|2[0-4][0-9]\|25[0-5]\)\.\([0-9]\{1,2\}\|1[0-9]\{2\}\|2[0-4][0-9]\|25[0-5]\)\.\([0-9]\{1,2\}\|1[0-9]\{2\}\|2[0-4][0-9]\|25[0-5]\)\.\([0-9]\{1,2\}\|1[0-9]\{2\}\|2[0-4][0-9]\|25[0-5]\)\([^\.]\|$\)" teste.txt
Get any ip address with number range validation (line 3)
Credits goes to a user in http://superuser.com/questions/202818/match-ip-address, I only merge it in one expression

  • grep -o "\(ftp\|http\|https\)://.*\..[^\ \"'<>]*" teste.txt
Return only urls starting with ftp, http or https (line 7 and url from line 9)

  • grep -o "http://[^\ \"'<>]*" teste.txt
Same as the last expression, but simpler and only work with http

You can use logic too

OR: grep "192\|123" teste.txt
   Returns line 1,3,4 and 6
AND: grep "192" teste.txt | grep "300"
   Returns line 4 
NOT: grep -v "192" teste.txt
   Returns every line except 3 and 4

That's all for now, time to continue working, see you later :)

No comments:

Post a Comment