Using .htaccess to redirect to (or mirror) a folder... am I doing it right?

Contributor ,
Nov 22, 2018 Nov 22, 2018

Copy link to clipboard

Copied

Designer cosplaying as a programmer here (and have been for 10 years), so feel free to dumb down those replies, I won't be insulted.

So here's the thing... I currently have all my landing pages in the domain.com/pages folder. However, I'd like to clean up my urls so that the redundant /pages portion of the url is removed from the user's view, by having domain.com/welcome.php forward to where the actual page is stored, domain.com/pages/welcome.php.

I've run some tests, and a simple...

redirect 301      /welcome.php     /pages/welcome.php

...will accomplish this. And since I don't plan on having more than a dozen pages on this site, I could just do this for each page, on 12 separate lines in the .htaccess file.

However, with this method, the user can see there was a redirect there, because the link they clicked isn't where they ended up.

If there was a way to simply mirror the /pages folder at the root, that would probably solve everything with 1 line. But then, how would files like .htaccess, robots and favicon get read by the browser ? If I did this, would I have to move everything but .htaccess (including robots + favicon) to /pages for the browser to read them? I smell unforeseen complications down the line.

So should I just give up the idea of mirroring (to mask the fact that these are redirects) and just be glad I can redirect with minimal fuss?

Or is there a better way to tell .htaccess to do something like this?

mirror     /welcome.php     /pages/welcome.php

(I know that's not a real thing, just made it up to illustrate what I'm trying to achieve.)

I welcome your advice.

Thanks!

Views

1.4K

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Adobe Community Professional , Nov 27, 2018 Nov 27, 2018

I pay for my certs because I need more than  Let's Encrypt's free certs provide. 

If you don't have to pay anything, cost is no concern.  Get as many certs as you need.

I think the www vs non-www question is a little like hanging toilet paper.  Do you hang the flap on the inside or the outside?  You decide which works best for you.

Likes

Translate

Translate
Adobe Community Professional ,
Nov 22, 2018 Nov 22, 2018

Copy link to clipboard

Copied

Don't duplicate content.  Google will penalize your for it.

Get rid of Pages folder.

Move your files to root level.

Use a 301 permanent redirect from old URL to new URL.

301 Redirects | CSS-Tricks

While you're at it, remove file extensions.

Instead of yourdomain.com/welcome.php  visitors will see  yourdomain.com/welcome

That way if you decide to change technologies later, it won't hurt your SEO.

https://tecadmin.net/remove-file-extension-from-url-using-htaccess/

Nancy O'Shea, Adobe Product User & Community Professional
Alt-Web Design & Publishing ~ Web : Print : Graphics : Media

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 22, 2018 Nov 22, 2018

Copy link to clipboard

Copied

https://forums.adobe.com/people/Nancy+OShea  wrote

Don't duplicate content.  Google will penalize your for it.

What do you mean by duplicating content? I don't think I mentioned anything about that (at least not consciously.) Mirroring urls shouldn't duplicate the content (only 1 copy of each page).

Are you suggesting that 2 functioning urls leading to the same page will be penalized by Google? (ie, if both domain.com/pages/page.html and domain.com/page.html take you to the same physical file)

Get rid of Pages folder.

Move your files to root level.

Not really an option, unfortunately. (There's just no way I'm cluttering my root folder by putting all my pages there.)

The pages are stored in /pages, assembled from various parts via php includes. This is where I've always put them, and am most comfortable manipulating them. Each page is made up of 4-5 includes (the layout is modular and some parts repeat across all pages, so I include them separately).

Unless there's a better reason to clutter my root folder than "shorter url's" I think I'd rather give up on this altogether than go that route (ie, just leave the redundant /pages portion of the url visible to all).

I was just wondering if there was a simple way for me to accomplish what I'm trying to do (simple = without completely changing the structure of my website folders).

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 22, 2018 Nov 22, 2018

Copy link to clipboard

Copied

When one or more URLs go to the same content, it's considered duplicate content. 

rel=canonical: the ultimate guide to canonical URLs • Yoast

I think what you want is a Mod_rewrite rule to hide the folder name in URLs. 

regex - how to remove folder name from url using htaccess - Stack Overflow

Nancy O'Shea, Adobe Product User & Community Professional
Alt-Web Design & Publishing ~ Web : Print : Graphics : Media

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 22, 2018 Nov 22, 2018

Copy link to clipboard

Copied

https://forums.adobe.com/people/Nancy+OShea  wrote

When one or more URLs go to the same content, it's considered duplicate content. 

rel=canonical: the ultimate guide to canonical URLs • Yoast

Ah, that makes sense now that I think about it... 2 urls for 1 page means 2 slots occupied in Google's rankings. People would just flood the internet with alternate urls if they weren't depowered.

Okay, let me present this differently.

Could I tell my .htaccess file that domain.com/pages/contact can ONLY be pulled up via HTTP by going to domain.com/contact? A regular redirect makes BOTH paths to the file a possibility, I'm talking about ONLY the virtual one working. This would eliminate the duplicate content issue, since only the 'virtual' location would be the one getting indexed by Google.

Then as a follow-up, could I also tell .htaccess to make literally everything on the website non-accessible from the outside, EXCEPT for those 12 pages I specified new virtual locations for? In this hypothetical scenario, not even a direct link to a JPG on the server would work for the end user. Only one of the 12 'green-lit' pages can call and display that JPG to the end user.

Then, if I wanted to create a drop box folder that's exceptionally open to all, I could specify it in that same .htaccess file... but otherwise, I'd have Googlebot's attention focused solely on the 12 pages I want indexed, those urls would be short, we'd be avoiding the duplicate content problem AND protecting everything else at the same time.

Am I just dreaming in technicolor here, or would that be possible?

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 22, 2018 Nov 22, 2018

Copy link to clipboard

Copied

1) Redirect is not what you want.  You want Mod_rewrite rules with RegEx.

2) Content on a publicly accessible server is open to anyone who has the URL unless you password protect it.

Nancy O'Shea, Adobe Product User & Community Professional
Alt-Web Design & Publishing ~ Web : Print : Graphics : Media

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 22, 2018 Nov 22, 2018

Copy link to clipboard

Copied

https://forums.adobe.com/people/Nancy+OShea  wrote

1) Redirect is not what you want.  You want Mod_rewrite rules with RegEx.

I'm embarrassed to say the instructions you linked me to lost me on the very first line.

Enable mod_rewrite and .htaccess through httpd.conf and

I don't know what this means.

Is httpd.conf a file?

One I should already be familiar with?

(I feel like I showed up at a meeting and forgot my pants, so awkward.)

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 22, 2018 Nov 22, 2018

Copy link to clipboard

Copied

httpd.conf is the configuration file on an apache server, if you are managing your own server you may need to check that mod_rewrite is enabled in the httppd.conf file by looking for this line

#LoadModule rewrite_module modules/mod_rewrite.so   

If it has a '#' at the beginning of the line remove it (see below example) to enable mod_rewrite and restart apache

LoadModule rewrite_module modules/mod_rewrite.so

If you are using a shared hosting service or managed hosting then the chances are mod_rewrite is already enabled and you won't need to worry about the above stuff, you could double-check with your hosting provider to make sure mod_rewrite is enabled.
Paul-M, ACP

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 22, 2018 Nov 22, 2018

Copy link to clipboard

Copied

If I understand correctly, this is what you need in htaccess, it will check for the requested file in the pages directory first before rewriting:

RewriteEngine On

RewriteCond %{REQUEST_FILENAME} !-f

RewriteCond %{REQUEST_FILENAME} !-d

RewriteCond %{REQUEST_URI} !^/pages/

RewriteRule ^(.*)$ /pages/$1

Paul-M, ACP

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 25, 2018 Nov 25, 2018

Copy link to clipboard

Copied

Energize  wrote

If I understand correctly, this is what you need in htaccess, it will check for the requested file in the pages directory first before rewriting:

RewriteEngine On

RewriteCond %{REQUEST_FILENAME} !-f

RewriteCond %{REQUEST_FILENAME} !-d

RewriteCond %{REQUEST_URI} !^/pages/

RewriteRule ^(.*)$ /pages/$1

Tested, and works well! I can access each page via /pages/page.php or just /page.php. The url never gives away the true location of the files, either, so the short url will be the only one people will be using.

All of that is great, and exactly what I wanted to achieve.

But what about Nancy OShea​'s warning that duplicate pages can work against me in Google's algorithms? Although Joe User won't know about the /pages folder, Googlebot will, right? And will know /pages/page.php + /page.php are essentially the same page. Or does that concern only apply to Redirect, and not RewriteCond?

That's why I wanted to limit Googlebot to only one set of urls (the shorter ones) so that the longer versions of the same urls never get indexed, and the site doesn't get penalized. Any thoughts on the matter?

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 25, 2018 Nov 25, 2018

Copy link to clipboard

Copied

Run a Nibbler report.

Nibbler - Test any website

Nancy O'Shea, Adobe Product User & Community Professional
Alt-Web Design & Publishing ~ Web : Print : Graphics : Media

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 25, 2018 Nov 25, 2018

Copy link to clipboard

Copied

You're using mod_rewrite to remove pages folder from URL.   Only 1 page contains content.

To give an example of duplicate content.   Let's say you have both

yourdomain . com

and

www. yourdomain . com

Google wants you to pick one (usually non-www) and have the .htaccess file redirect all www. traffic to non-www URL.

You can also tell Google Console your preferred URL.  This is only necessary if search results point to both.

Nancy O'Shea, Adobe Product User & Community Professional
Alt-Web Design & Publishing ~ Web : Print : Graphics : Media

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 25, 2018 Nov 25, 2018

Copy link to clipboard

Copied

But what about Nancy OShea's warning that duplicate pages

You can just put the canonical tag in the head of each page :

<link rel="canonical" href="http://www.mydomain.com/mypage.php">

That will prevent duplicate penalties.

Paul-M, ACP

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 26, 2018 Nov 26, 2018

Copy link to clipboard

Copied

Energize  wrote

But what about https://forums.adobe.com/people/Nancy+OShea 's warning that duplicate pages

You can just put the canonical tag in the head of each page :

<link rel="canonical" href="http://www.mydomain.com/mypage.php">

That will prevent duplicate penalties.

What does that do, and how does it prevent duplicates? I just Googled the term a couple of different ways and couldn't find much.

Thanks!

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 26, 2018 Nov 26, 2018

Copy link to clipboard

Copied

It tells Google which is the master or canonical URL : Consolidate duplicate URLs - Search Console Help

It works & Google endorses the method so don't worry.

Paul-M, ACP

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 26, 2018 Nov 26, 2018

Copy link to clipboard

Copied

Energize  wrote

It tells Google which is the master or canonical URL : Consolidate duplicate URLs - Search Console Help

It works & Google endorses the method so don't worry.

Thank you for that link, I was having an unusually hard time finding info on this. If I understand what I just read correctly, each individual page would have its own "canonical" url specified in the header.

I currently have all pages assembled via includes this way :

  1. PHP variable definitions *
  2. Header
  3. Content *
  4. Footer

* data that is specific to this page.

The PHP definitions file assigns -- among other things -- a custom title, description and keywords for the page that will help populate the next php-include (the actual header). So I would only have to add one more string for canonical in the header, and one more line of code in my definitions (to define it). Correct?

Should be easy enough, assuming I got it. Thanks!

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 26, 2018 Nov 26, 2018

Copy link to clipboard

Copied

Can you post a link to the site?

Yes each page has its own canonical tag  that goes in the head tag and if your site is accessible via  https the URLs should be in the form of

https://www.yourdomain.com/pagename.php

Paul-M, ACP

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 26, 2018 Nov 26, 2018

Copy link to clipboard

Copied

Energize  wrote

Can you post a link to the site?

Unfortunately not at the moment.

Energize  wrote

Yes each page has its own canonical tag  that goes in the head tag and if your site is accessible via  https the URLs should be in the form of

https://www.yourdomain.com/pagename.php

Is it important to keep the superfluous 'www' there?

My hosting service offers an option for every domain to automatically remove the 'www' from the url every time it's used. I've never used the feature, but would it be considered good practice for me to enable it + make the non-www urls the canonical ones in the header?

Or does the more traditional 'www.domain.tld' format re-assure Google and help SEO?

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 26, 2018 Nov 26, 2018

Copy link to clipboard

Copied

Only use WWW if that's what your site and Google are using.

Nancy O'Shea, Adobe Product User & Community Professional
Alt-Web Design & Publishing ~ Web : Print : Graphics : Media

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 26, 2018 Nov 26, 2018

Copy link to clipboard

Copied

I would personally keep www. and make sure its in the canonical tag  like my example

https://www.yourdomain.com/pagename.php

Do you have https support?  a lot of hosts offer Lets Encrypt, definitely set that up and use https if you can.

Paul-M, ACP

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 27, 2018 Nov 27, 2018

Copy link to clipboard

Copied

https://forums.adobe.com/people/Nancy+OShea  wrote

Only use WWW if that's what your site and Google are using.

My site and Google are going to use whatever I tell them to, so this doesn't really help me decide which version to set everything to. On what would I be basing the decision to use WWW or not?

Energize  wrote

I would personally keep www. and make sure its in the canonical tag  like my example

https://www.yourdomain.com/pagename.php

I don't know how adding 4 extra chrs to every url helps, but with nothing else to go on to make this decision, I'm more than happy to take the free expert advice. Is Google more comfortable with www-prefixed addresses?

Energize  wrote

Do you have https support?  a lot of hosts offer Lets Encrypt, definitely set that up and use https if you can.

They do offer it and I was going to do it later but I'll speed that up, thanks. So it'll be "https://www.domain.tld/filename" then... just as soon as you help me remove extensions from urls I Googled it, it's do-able with another couple of lines of code, but I have no idea how to incorporate them with yours so that both work (.htaccess is a scary file to someone like me).

Here's your code :

RewriteCond %{REQUEST_FILENAME} !-f

RewriteCond %{REQUEST_FILENAME} !-d

RewriteCond %{REQUEST_URI} !^/pages/

RewriteRule ^(.*)$ /pages/$1

Works great, but would be even better if I didn't have to use extensions. However, if there's a hello.html and a hello.php file in the same folder, how does the server know which to pull? When removing those extensions, can I also specify the priority of file type to pull up?

Example : The url is https://www.domain.tld/filename (no extensions) so the server first looks for a .php match, if none then a .html match, if none then a .htm match, if none then a .jpg match, and so on. That's possible, right? I'd love to incorporate that in your RewriteCond snippet, if that's at all possible (if I need to target a secondary file type, I can always specify the extension in the href -- this would just be the default order when no extension is present).

Since "canonical" helped counter the duplicate page issue with the removal of the /pages directory from all urls, I'm assuming it will do the same for the lack of extension in the base url (even if that same page would be accessible WITH the extension).

If I'm wrong about that, feel free to correct me before I go too far down the wrong path.

Thanks again, to both of you. You saved me weeks of trial/error.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 27, 2018 Nov 27, 2018

Copy link to clipboard

Copied

I don't mean to contradict Energize but I maintain my position about WWW  prefix on URLs.

Only use WWW if that's what your site and Google are using.

Do a test.  Open your browser and type in the address bar:

site:www.yourdomain.com

repeat without the www prefix.  The one with the most results is the one you should use.

Nancy O'Shea, Adobe Product User & Community Professional
Alt-Web Design & Publishing ~ Web : Print : Graphics : Media

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Nov 27, 2018 Nov 27, 2018

Copy link to clipboard

Copied

Is it a new website? Maybe put something like this in in htaccess:

RewriteCond %{HTTPS} off [OR]

RewriteCond %{HTTP_HOST} !^www\. [NC]

RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]

RewriteRule ^ https://www.%1%{REQUEST_URI} [L,NE,R=301]

It makes sure all pages are served on https and www, to ensure you won't get any canonical or duplicate penalty issues with www. and non www. versions of the same page. Of course you can do the reverse and make non www. URLs canonical.

Paul-M, ACP

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 27, 2018 Nov 27, 2018

Copy link to clipboard

Copied

Energize  wrote

Is it a new website? Maybe put something like this in in htaccess:

RewriteCond %{HTTPS} off [OR]

RewriteCond %{HTTP_HOST} !^www\. [NC]

RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]

RewriteRule ^ https://www.%1%{REQUEST_URI} [L,NE,R=301]

It makes sure all pages are served on https and www, to ensure you won't get any canonical or duplicate penalty issues with www. and non www. versions of the same page. Of course you can do the reverse and make non www. URLs canonical.

Great, can't wait to try this code a little later.

BTW, when I was searching for ways to do this without your help, returning a "301" to the server was brought up as something very important to the proceedings. I was going to bring it up, but I see you just added it to your code. Obviously, I have no idea what the syntax even refers to, but I'm reassured to see it included after those other tutorials made such a big deal about it.

So if I understand your last sentence correctly, it doesn't matter if I use "www" or not, or even if I use extensions or not... what matters is that there is a single canonical url specified in the header for every page. Correct? As long as it's in the header and actually leads to the page somehow, we're safe. Right?

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Nov 27, 2018 Nov 27, 2018

Copy link to clipboard

Copied

https://forums.adobe.com/people/Nancy+OShea  wrote

I don't mean to contradict Energize but I maintain my position about WWW  prefix on URLs.

Only use WWW if that's what your site and Google are using.

Do a test.  Open your browser and type in the address bar:

site:www.yourdomain.com

repeat without the www prefix.  The one with the most results is the one you should use.

Tested on both Google and Bing.

Same results with and without the www (the site's been inactive for 5-6 years)
3 or 4 results of each.

So I'm guessing whatever I populate Google with going forward will take the lead, once the redesign is up. Right?

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines