Administrator
|
I'm struggling with how best to present information on the website, and wonder if any of y'all have suggestions.
First, my goal with the website is to make it the premier Bullnose documentation site in the world. Certainly not for my glory or gain, but to help our fellow Bullnosians. (Or, is that Bullnosers? ) But, in order to help our friends they need to find the site. However, in order to find the site Google needs to know about it. And therein lies the problem - Google doesn't read the words in pictures, and many of our ~550 pages are pictures of words. And pdf's don't really work either. So I've been trying to convert pdf's into HTML as I've been told that Google will find HTML - although I've yet to prove that's the case. And I'm not at all happy with the results. But, you can be the judge of that by going to Driveline/Wheel Covers and looking at these tabs, going to the right: Pin To ID # Cross-Ref: This tab has two forms of the wheel cover cross-reference table from the MPC.
|
Banned User
|
This is why I avoid text in images, and use SMN's captions so heavily. It's probably also why my SMN registries get so many hits. Google can find the plain-text captions.
But they're plain-text; no rich text, or formatting, or fonts, or bold, or italic, or underline, or tables (other than crude ASCII, which looks like crap in that site's font). But I still think real ASCII text is the way to go for technical web pages whose text needs to be searchable. I don't know how the big-boys set up tables and spreadsheets on their web pages, so I can't help you on that. But I still think when you OCR a document, you should pull the text out, and then display it on your page so it looks similar to the original's fonts & formatting (unless you find a better layout for that particular data), but NOT an image of the original text. I've downloaded some really-good OCR software, but it's part of the install pack for a really-old legal-size scanner (hp C7710A) whose drivers haven't been supported in Windows for the past 4 or 5 versions, and that pack isn't even on the hp site any more. I'd have to try to pull it off one of my old HDDs, without picking up the malware that made me replace them. So I haven't tried to load it on this machine to find out if it will work without detecting that scanner. I've wanted to dual-boot the last OS that supported that scanner, but I'm not comfortable or desperate enough to risk it yet. |
Administrator
|
Steve - I don't have a problem OCR'ing things. In fact, at the moment I have three applications loaded that will do it: Foxit, the app I've had for several years and the one that OCR'd the MPC; ABBYY Finereader which I have on a 30-day trial; and Adobe Acrobat DC, supposedly the king of kings and which is on a 7-day trial.
But it looks like Foxit will be the winner given a few problems encountered with the others and the fact that they are $300 and up. Instead, Foxit is paid for and does a good, albeit not perfect, job. And, it runs the scanner very nicely, creating nicely-straightened and OCR'd results in one go. Concerning the page, as you know we aren't displaying pictures of text for the TSB's, nor much of anything that we are doing going forward. Instead we are using pdf's that have the text searchable. Last night I was discussing this with Keith Dickson, Mr FORDification, and he told me that he's been searching for years for a way to do "this", meaning get the search engines to find things like TSB's. But, he's ruled out using HTML, basically for the same reason I have just now - nothing does the conversion well and it takes way too much time to edit the results - and editing results in HTML is not my forte, nor desire. But, having the pages in HTML certainly would be nice. Those misshapen HTML pages I put up last night have already been found and you can find "D5TA 1000-BA" on the website as of this morning. So one option would be to put both the pdf and the misshapen HTML on the page. The pdf would give the user a clean view of the TSB, and the HTML would be found by the search engines. To test that theory I searched for "Rear Spring Squeak - Tip Liner And", which is a phrase that's in TSB 80-1-12-S REAR SPRING SQUEAK. Sure enough, even though that TSB has been in place for a week or so the search engines haven't found it - because it is actually a file that resides elsewhere with a link to it from the page, even though it looks like it is on the page. And now I've added the Adobe version of HTML for that TSB to the bottom of the page and have asked Google to crawl and index the page. So in a few hours we should be able to find anything on that TSB with a Google search, and later we'll be able to find it with other search engines as well. So please take a look and see what y'all think of doing it that way. It is ugly as the formatting is all wrong, there's an image missing, and so on. But it should work. THOUGHTS?
Gary, AKA "Gary fellow": Profile
Dad's: '81 F150 Ranger XLT 4x4: Down for restomod: Full-roller "stroked 351M" w/Trick Flow heads & intake, EEC-V SEFI/E4OD/3.50 gears w/Kevlar clutches
|
Administrator
|
And, just like clockwork you can now do a Google search for "Rear Spring Squeak - Tip Liner And" and the one and only hit you'll get is TSB 80-1-12-S REAR SPRING SQUEAK. So adding the HTML, as ugly as it is, does work. I wonder if I can get Adobe to help me since I'm testing their software.
Gary, AKA "Gary fellow": Profile
Dad's: '81 F150 Ranger XLT 4x4: Down for restomod: Full-roller "stroked 351M" w/Trick Flow heads & intake, EEC-V SEFI/E4OD/3.50 gears w/Kevlar clutches
|
Edit this page |