It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of FOSS4G, it was the season of Commercial Software, it was the spring of hope, it was the winter of despair, we had everything before us for the right price, we had nothing before us because we didn’t RTFM.
My apologies to Charles Dickens. My compliments to the guillotine.
To say that this dataset had some miles on it is an understatement. It had been a shapefile. It had become a geodatabase. It went back to shapefiles. It became 4, 16, 80+ shapefiles, and has now been moved to PostGIS and I’m down to 20+ tables plus a lot of Foreign Keys. Why? Well – I upgraded a client from ArcView (I know it’s not called that) to QGIS/PostGIS. Why is it an upgrade to Free Software? Because we have more functionality and much much better data. I’ve talked about these guys before – and I have some posts I’ve left hanging but now we talk about topology.
This data in all of it’s changing and moving had developed some problems. I will accept full blame for the problems developing because I’m better than this data looked. Last year I made the choice to abandon shapefiles and learn postgis so the last jump was made and it has made everyone’s life much better. Don’t tell my client but I tell them it takes a day to prep the data for delivery to their client – it really only takes an hour now (it used to take 24 hours) – and in those extra hours I clean up the data. No one has noticed, but me, of the problems with gaps and overlaps and zero area polygons. I like clean data. I guess I am a dying breed that still thinks the data is the most important thing and not having all your data sitting exposed in all of it’s terribleness out in some online story map thing.
For kicks last week I started on the topological errors and the first thing I did was read up on PostGIS topology. I started. I failed miserably due to some error that made no sense and I gave up. It’s on my list. I will “get it” eventually – and I was close this time. Really Close.
So for my next trick I pulled up QGIS and looked:
For those of you that don’t know – shapefiles aren’t a topological dataset. It’s easy to get things in a mess if you aren’t careful. QGIS can check shapefiles for topological rules (Arcview can’t last time I checked – license issue). In this case I was checking for gaps. What I see is a lot of red that are problems from GPS data and problems that appeared when I pulled this dataset apart two years ago and recombined it to fix another problem.
So what do I do to fix it?
Well – for fun I pulled it into ArcGIS. I dumped my data out back into a shapefile (shudder) and imported it into a file based geodatabase. I built topology (I’ve got ArcINFO – and I know it’s not called that anymore) and the topology build removed all the errors in about 15 to 20 seconds. I know – I’m mentioning ArcGIS as a solution. Hey – it worked.
BUT – I’ve switched my client to QGIS and PostGIS – do I really want to fall back to ESRI as a fix? No – I’m going to make this work with the tools my client has available.
So I started reading up on GRASS. You get GRASS when you install QGIS. Right now I think it’s GRASS 7 (or at least on my machine it’s Grass 7) and GRASS has this wonderful 30 year run. Yes – the software has been around for 30+ years. GRASS is one of those things that if you ask someone you first get 1. A Drug reference and 2. I used it in College once with another Drug Reference. Of course it’s 2015 and with the number of colleges flailing with GIS you’re not even getting a 2 anymore. Just glassy eyed stares of “Well I made a Story map”.
Grass 7 is really – for lack of a better word – pretty now…or nicer. I’m not sure how to describe it. Grass 6.x always to me was a bit confusing – GRASS 7 isn’t. Maybe it’s me. Maybe not. The user interface is more intuitive – the Help button does just that. It’s not as “mysterious” as it has been in the past. I used it and with a little effort (I need to RTFM more) I was working. It’s powerful. To get QGIS/GRASS/GDAL and all these things in one QGIS install – that’s huge. There’s no excuse for you as a GIS person to not have these packages installed on your computer.
GRASS makes me nostalgic for the old days of Arc Workstation. GRASS datasets are topological. Problem being if you’re using the GUI for import there is no place to set snapping (that I could find). So I imported data and I kept the problems. After reading the manual I did this little command: v.in.ogr /export/data/topo/stands.shp out=stands_snap snap=0.001 and re-imported my data. I had Clean data in about 15 seconds (probably less). What happened is GRASS pulled it in and snapped the data as it was being converted (same as ArcGIS). If you watch the command line you get a lot of information as to what it’s doing.
I checked my ArcGIS software. I checked QGIS. After all this was done I had 10 problems using the topology tools in both softwares with this new clean dataset. 10 problems that took me 10 minutes to fix. Please note I had the exact same problems in ArcGIS and QGIS – no difference. I took the GRASS snapped data and put it back into PostGIS and I’m done. Clean data now. Happy Clean Data as I invoke my inner Bob Ross.
So what did we learn from all of this:
1. If you aren’t worried about topology as a GIS Person – turn in your badge at the door. If you aren’t worried about your data you’re a terrible person.
2. If you believe you can’t create production professional data using FOSS4G. Turn in your mouse and your half completed GISP application. I took QGIS/PostGIS/GRASS and created a topologically clean production dataset in a little under an hour. My next dataset will probably take about 15 minutes.
3. I failed slightly – I could connect GRASS to PostGIS and I didn’t. I would have avoided the whole “going back to shapefiles” thing again. That won’t happen next time. It’s going to be connected from here on out. It will be pulled into Grass and pushed back into PostGIS.
4. My hope is that I get my topology problems figured out in PostGIS. I’m close.
Anyway – excuse my slight irreverence in this post as I poked a bit. I walked out of a professional meeting the other day where I got the “you only use QGIS because you’re too cheap to buy ArcGIS”. Well……no. So expect the next few posts to dive into a lot of technical as to why this is a good way to work. Plus you get commercial support from a lot of companies while doing it. It’s a good choice. Granted – there are a lot of tools out in GIS land – but picking these tools doesn’t indicate anything about the user or your organization.