Grep Unique Sorted URL’s From A File

Today I had a 13 MB WSDL file that I had to diagnose what HTTP(S) calls were made by the service. Because the file was so large, it made it difficult. Grep to the rescue!

First, I got all of the HTTP calls using a regular expression:

grep -o -e http://[^[:space:]\”]* file

This brought the 13 MB file down to 3 MB’s. But in looking at the file, there were a ton of repeats included. So how do I limit the results to unique URL’s?

grep -o -e http://[^[:space:]\”]* file | sort | uniq > portal_url_calls.txt

After running this, the file was now 41k, and had all of the unique URL’s sorted. Awesome!

I hope this helps someone else down the road!