I am interested in learning how Pig works and also what Tez can do. It was really nice to see a definition for "tez" (it's Hindi for "speed"). I've often wondered about various tool names, why they were given their names and what they mean. Ok, then. Moving on.
So I started up my Sandbox and connected to it from a terminal via
ssh root@127.0.0.1 -p 2222
The tutorial tells me to start with a baseball data set that can be downloaded using the command
wget http://hortonassets.s3.amazonaws.com/pig/lahman591-csv.zip
However, my Sandbox VM could not resolve the host. That's because I hadn't set up VirtualBox's network adapters properly. I shut down the Sandbox, opened Preferences | Network | Host-only Networks and then clicked on the +Adapter button (right side of menu). That added 'vboxnet0' to the (previously empty) list. I went back to the Sandbox settings and added Adapter 2. I changed the 'Attached to' to Host-only Adapter and selected 'Name' to be 'vboxnet0'.
Then I started up the Sandbox again and tried to run the wget command (above), but it still could not resolve the host. So I checked /etc/resolv.conf in the Sandbox and it showed
nameserver 8.8.8.8
That's the Google DNS, but for some reason it wasn't working for me. I tried to ping 8.8.8.8 from the Sandbox and it worked just fine, but it could not ping www.google.com. What?!
Then I did
nslookup hortonassets.s3.amazonaws.com
on my computer and got an ip address of 54.231.2.49, but the Sandbox could not do
wget 54.231.2.49/pig/lahman591-csv.zip
Weird. So I changed /etc/resolv.conf to look at my local network's DNS (which was what my computer's /etc/resolv.conf was pointing at).
Then I tried to wget again and...
Success!
Good, because that took a while to think up all that stuff (and more). I even tried to download the csv onto my laptop and scp it over to the VM with
I didn't fully work out those problems. (Though I did see that the Sandbox's Network settings has a Port Forward button that shows that the ssh Host Port is 2222 -- like how I ssh'd into the Sandbox -- but the Guest Port is 22. There's a resolution in there somewhere. But I digress...)
So now I need to return to the first step in the Faster Pig with Tez tutorial...
Then I started up the Sandbox again and tried to run the wget command (above), but it still could not resolve the host. So I checked /etc/resolv.conf in the Sandbox and it showed
nameserver 8.8.8.8
That's the Google DNS, but for some reason it wasn't working for me. I tried to ping 8.8.8.8 from the Sandbox and it worked just fine, but it could not ping www.google.com. What?!
Then I did
nslookup hortonassets.s3.amazonaws.com
on my computer and got an ip address of 54.231.2.49, but the Sandbox could not do
wget 54.231.2.49/pig/lahman591-csv.zip
Weird. So I changed /etc/resolv.conf to look at my local network's DNS (which was what my computer's /etc/resolv.conf was pointing at).
Then I tried to wget again and...
Success!
Good, because that took a while to think up all that stuff (and more). I even tried to download the csv onto my laptop and scp it over to the VM with
scp ~/Downloads/lahman591-csv.zip root@127.0.0.1:~
but I got an error
ssh: connect to host 127.0.0.1 port 22: Connection refused
I didn't fully work out those problems. (Though I did see that the Sandbox's Network settings has a Port Forward button that shows that the ssh Host Port is 2222 -- like how I ssh'd into the Sandbox -- but the Guest Port is 22. There's a resolution in there somewhere. But I digress...)
So now I need to return to the first step in the Faster Pig with Tez tutorial...
No comments:
Post a Comment