RSS

Monthly Archives: October 2015

How To Install the Django Web Framework on Ubuntu

By: Justin Ellingwood

Introduction

Django is a full-featured Python web framework for developing dynamic websites and applications. Using Django, you can quickly create Python web applications and rely on the framework to do a good deal of the heavy lifting.

In this guide, we will show you how to get Django up and running on an Ubuntu 14.04 server. After installation, we’ll show you how to start a new project to use as the basis for your site.

Different Methods

There are a number of different ways in which you can install Django depending upon your needs and how you want to configure your development environment. These have different advantages and one method may lend itself better to your specific situation than others.

Some of the different methods are below:

  • Global Install from Packages: The official Ubuntu repositories contain Django packages that can be installed easily with the conventional apt package manager. This is very simple, but not as flexible as some other methods. Also, the version contained in the repositories may lag behind the official versions available from the project.
  • Global Install through pip: The pip tool is a package manager for Python packages. If you installpip, you can easily install Django on the system level for use by any user. This should always contain the latest stable release. Even so, global installations are inherently less flexible.
  • Install through pip in a Virtualenv: The Python virtualenv package allows you to create self-contained environments for various projects. Using this technology, you can install Django in a project directory without affecting the greater system. This allows you to provide per-project customizations and packages easily. Virtual environments add some slight mental and process overhead in comparison to globally accessible installation, but provide the most flexibility.
  • Development Version Install through git: If you wish to install the latest development version instead of the stable release, you will have to acquire the code from the git repo. This is necessary to get the latest features/fixes and can be done globally or locally. Development versions do not have the same stability guarantees, however.

With the above caveats and qualities in mind, select the installation method that best suites your needs out of the below instructions.

Global Install from Packages

If you wish to install Django using the Ubuntu repositories, the process is very straight forward.

First, update your local package index with apt, and then install the python-django package:

sudo apt-get update
sudo apt-get install python-django

You can test that the installation was successful by typing:

django-admin --version
1.6.1

This means that the software was successfully installed. You may also notice that the Django version is not the latest stable. To learn a bit about how to use the software, skip ahead to learn how to create sample project.

Global Install through pip

If you wish to install the latest version of Django globally, a better option is to use pip, the Python package manager. First, we need to install the pip package manager. Refresh your apt package index:

sudo apt-get update

Now you can install pip. If you plan on using Python version 2, install using the following commands:

sudo apt-get install python-pip

If, instead, you plan on using Python 3, use this command:

sudo apt-get install python3-pip

Now that you have pip, we can easily install Django. If you are using Python 2, you can type:

sudo pip install django

If you are using Python 3, use the pip3 command instead:

sudo pip3 install django

You can verify that the installation was successful by typing:

django-admin --version
1.7.5

As you can see, the version available through pip is more up-to-date than the one from the Ubuntu repositories (yours will likely be different from the above).

Install through pip in a Virtualenv

Perhaps the most flexible way to install Django on your system is with the virtualenv tool. This tool allows you to create virtual Python environments where you can install any Python packages you want without affecting the rest of the system. This allows you to select Python packages on a per-project basis regardless of conflicts with other project’s requirements.

We will begin by installing pip from the Ubuntu repositories. Refresh your local package index before starting:

sudo apt-get update

If you plan on using version 2 of Python, you can install pip by typing:

sudo apt-get install python-pip

If, instead, you plan on using version 3 of Python, you can install pip by typing:

sudo apt-get install python3-pip

Once pip is installed, you can use it to install the virtualenv package. If you installed the Python 2 pip, you can type:

sudo pip install virtualenv

If you installed the Python 3 version of pip, you should type this instead:

sudo pip3 install virtualenv

Now, whenever you start a new project, you can create a virtual environment for it. Start by creating and moving into a new project directory:

mkdir ~/newproject
cd ~/newproject

Now, create a virtual environment within the project directory by typing:

virtualenv newenv

This will install a standalone version of Python, as well as pip, into an isolated directory structure within your project directory. We chose to call our virtual environment newenv, but you should name it something descriptive. A directory will be created with the name you select, which will hold the file hierarchy where your packages will be installed.

To install packages into the isolated environment, you must activate it by typing:

source newenv/bin/activate

Your prompt should change to reflect that you are now in your virtual environment. It will look something like (newenv)username@hostname:~/newproject$.

In your new environment, you can use pip to install Django. Regardless of whether you are using version 2 or 3 of Python, it should be called just pip when you are in your virtual environment. Also note that youdo not need to use sudo since you are installing locally:

pip install django

You can verify the installation by typing:

django-admin --version
1.7.5

To leave your virtual environment, you need to issue the deactivate command from anywhere on the system:

deactivate

Your prompt should revert to the conventional display. When you wish to work on your project again, you should re-activate your virtual environment by moving back into your project directory and activating:

cd ~/newproject
source newenv/bin/activate

Development Version Install through git

If you need a development version of Django, you will have to download and install Django from its gitrepository.

To do so, you will need to install git on your system with apt. Refresh your local package index by typing:

sudo apt-get update

Now, we can install git. We will also install the pip Python package manager. We will use this to handle the installation of Django after it has been downloaded. If you are using Python 2, you can type:

sudo apt-get install git python-pip

If you are using Python 3 instead, you should type this:

sudo apt-get install git python3-pip

Once you have git, you can clone the Django repository. Between releases, this repository will have more up-to-date features and bug fixes at the possible expense of stability. You can clone the repository to a directory called django-dev within your home directory by typing:

git clone git://github.com/django/django ~/django-dev

Once the repository is cloned, you can install it using pip. We will use the -e option to install in “editable” mode, which is needed when installing from version control. If you are using version 2 of Python, type:

sudo pip install -e ~/django-dev

If you are using Python 3, type:

sudo pip3 install -e ~/django-dev

You can verify that the installation was successful by typing:

django-admin --version
1.9.dev20150305171756

Note that you can also combine this strategy with the use of virtualenv above if you wish to install a development version of Django in a single environment.

Creating a Sample Project

Now that you have Django installed, we can show you briefly how to get started on a project.

You can use the django-admin command to create a project:

django-admin startproject projectname
cd projectname

This will create a directory called projectname within your current directory. Within this, a management script will be created and another directory called projectname will be created with the actual code.

Note: If you were already in a project directory that you created for use with the virtualenv command, you can tell Django to place the management script and inner directory into the current directory without the extra layer by typing this (notice the ending dot):

django-admin startproject projectname .

To bootstrap the database (this uses SQLite by default) on more recent versions of Django, you can type:

python manage.py migrate

If the migrate command doesn’t work, you likely are using an older version of Django. Instead, you can type:

python manage.py syncdb

You will be asked to create an administrative user as part of this process. Select a username, email address, and password for the user.

If you used the migrate command above, you’ll need to create the administrative user manually. You can create an administrative user by typing:

python manage.py createsuperuser

You will be prompted for a username, an email address, and a password for the user.

Once you have a user, you can start up the Django development server to see what a fresh Django project looks like. You should only use this for development purposes. Run:

python manage.py runserver 0.0.0.0:8000

Visit your server’s IP address followed by :8000 in your web browser

server_ip_address:8000

You should see something that looks like this:

Django public page

Now, append /admin to the end of your URL to get to the admin login page:

server_ip_address:8000/admin

Django admin login

If you enter the admin username and password that you just created, you should be taken to the admin section of the site:

Django admin page

When you are finished looking through the default site, you can stop the development server by typingCTRL-C in your terminal.

The Django project you’ve created provides the structural basis for designing a more complete site. Check out the Django documentation for more information about how to build your applications and customize your site.

Conclusion

You should now have Django installed on your Ubuntu 14.04 server, providing the main tools you need to create powerful web applications. You should also know how to start a new project and launch the developer server. Leveraging a complete web framework like Django can help make development faster, allowing you to concentrate only on the unique aspects of your applications.

 

Copy from: https://www.digitalocean.com/community/tutorials/how-to-install-the-django-web-framework-on-ubuntu-14-04

Advertisements
 
Leave a comment

Posted by on October 16, 2015 in Django, Python

 

How To Set Up an Apache, MySQL, and Python (LAMP) Server

By: Alvin Wan

Introduction

This article will walk you through setting up a server with Python 3, MySQL, and Apache2, sans the help of a framework. By the end of this tutorial, you will be fully capable of launching a barebones system into production.

Django is often the one-shop-stop for all things Python; it’s compatible with nearly all versions of Python, comes prepackaged with a custom server, and even features a one-click-install database. Setting up a vanilla system without this powerful tool can be tricky, but earns you invaluable insight into server structure from the ground up.

This tutorial uses only package installers, namely apt-get and Pip. Package installers are simply small programs that make code installations much more convenient and manageable. Without them, maintaining libraries, modules, and other code bits can become an extremely messy business.

Prerequisites

To follow this tutorial, you will need:

  • One Ubuntu 14.04 Droplet.
  • A sudo non-root user, which you can set up by following this tutorial.

Step 1 — Making Python 3 the Default

In this step, we will set Python 3 as the default for our python command.

First, check your current Python version.

  • python –version

On a fresh Ubuntu 14.04 server, this will output:

Python 2.7.6

We would like to have python run Python 3. So first, let’s remove the old 2.7 binary.

  • sudo rm /usr/bin/python

Next, create a symbolic link to the Python 3 binary in its place.

  • sudo ln -s /usr/bin/python3 /usr/bin/python

If you run python --version again, you will now see Python 3.4.0.

Step 2 — Installing Pip

In this section, we will install Pip, the recommended package installer for Python.

First, update the system’s package index. This will ensure that old or outdated packages do not interfere with the installation.

  • sudo apt-get update

Pip allows us to easily manage any Python 3 package we would like to have. To install it, simply run the following:

  • sudo apt-get install python3-pip

For an overview of Pip, you can read this tutorial.

Step 3 — Installing MySQL

In this section, we will install and configure MySQL.

Installing SQL is simple:

  • sudo apt-get install mysql-server

Enter a strong password for the MySQL root user when prompted, and remember it, because we will need it later.

The MySQL server will start once installation completes. After installation, run:

  • mysql_secure_installation

This setup will take you through a series of self-explanatory steps. First, you’ll need to enter the root password you picked a moment ago. The first question will ask if you want to change the root password, but because you just set it, enter n. For all other questions, press ENTER to accept the default response.

Python 3 requires a way to connect with MySQL, however. There are a number of options, like MySQLclient, but for the module’s simplicity, this tutorial will use pymysql. Install it using Pip:

  • sudo pip3 install pymysql

Step 4 — Installing Apache 2

In this section, we will install Apache 2, and ensure that it recognizes Python files as executables.

Install Apache using apt-get:

  • sudo apt-get install apache2

Like MySQL, the Apache server will start once the installation completes.

Note: After installation, several ports are open to the internet. Make sure to see the conclusion of this tutorial for resources on security.

We want to place our website’s root directory in a safe location. The server is by default at/var/www/html. To keep convention, we will create a new directory for testing purposes, called test, in the same location.

  • sudo mkdir /var/www/test

Finally, we must register Python with Apache. To start, we disable multithreading processes.

  • sudo a2dismod mpm_event

Then, we give Apache explicit permission to run scripts.

  • sudo a2enmod mpm_prefork cgi

Next, we modify the actual Apache configuration, to explicitly declare Python files as runnable file and allow such executables. Open the configuration file using nano or your favorite text editor.

  • sudo nano /etc/apache2/sites-enabled/000-default.conf

Add the following right after the first line, which reads <VirtualHost *:80\>.

<Directory /var/www/test>
    Options +ExecCGI
    DirectoryIndex index.py
</Directory>
AddHandler cgi-script .py

Make sure that your <Directory> block is nested inside the <VirtualHost> block, like so. Make sure to indent correctly with tabs, too.

/etc/apache2/sites-enabled/000-default.conf

<VirtualHost *:80>
    <Directory /var/www/test>
        Options +ExecCGI
        DirectoryIndex index.py
    </Directory>
    AddHandler cgi-script .py

    ...

This Directory block allows us to specify how Apache treats that directory. It tells Apache that the/var/www/test directory contains executables, considers index.py to be the default file, then defines the executables.

We also want to allow executables in our website directory, so we need to change the path forDocumentRoot, too. Look for the line that reads DocumentRoot /var/www/html, a few lines below the long comment at the top of the file, and modify it to read /var/www/test instead.

DocumentRoot /var/www/test

Your file should now resemble the following.

/etc/apache2/sites-enabled/000-default.conf

<VirtualHost *:80>
        <Directory /var/www/test>
                Options +ExecCGI
                DirectoryIndex index.py
        </Directory>
        AddHandler cgi-script .py

        ...

        DocumentRoot /var/www/test

        ...

Save and exit the file. To put these changes into effect, restart Apache.

  • sudo service apache2 restart

Note: Apache 2 may throw a warning which says about the server’s fully qualified domain name; this can be ignored as the ServerName directive has little application as of this moment. They are ultimately used to determine subdomain hosting, after the necessary records are created.

If the last line of the output reads [ OK ], Apache has restarted successfully.

Step 5 — Testing the Final Product

In this section, we will confirm that individual components (Python, MySQL, and Apache) can interact with one another by creating an example webpage and database.

First, let’s create a database. Log in to MySQL. You’ll need to enter the MySQL root password you set earlier.

  • mysql -u root -p

Add an example database called example.

  • CREATE DATABASE example;

Switch to the new database.

  • USE example;

Add a table for some example data that we’ll have the Python app add.

  • CREATE TABLE numbers (num INT, word VARCHAR(20));

Press CTRL+D to exit. For more background on SQL, you can read this MySQL tutorial.

Now, create a new file for our simple Python app.

  • sudo nano /var/www/test/index.py

Copy and paste the following code in. The in-line comments describe what each piece of the code does. Make sure to replace the passwd value with the root MySQL password you chose earlier.

#!/usr/bin/python

# Turn on debug mode.
import cgitb
cgitb.enable()

# Print necessary headers.
print("Content-Type: text/html")
print()

# Connect to the database.
import pymysql
conn = pymysql.connect(
    db='example',
    user='root',
    passwd='your_root_mysql_password',
    host='localhost')
c = conn.cursor()

# Insert some example data.
c.execute("INSERT INTO numbers VALUES (1, 'One!')")
c.execute("INSERT INTO numbers VALUES (2, 'Two!')")
c.execute("INSERT INTO numbers VALUES (3, 'Three!')")
conn.commit()

# Print the contents of the database.
c.execute("SELECT * FROM numbers")
print([(r[0], r[1]) for r in c.fetchall()])

Save and exit.

Next, fix permissions on the newly-created file. For more information on the three-digit permissions code, see the tutorial on Linux permissions.

  • sudo chmod 755 /var/www/test/index.py

Now, access your server’s by going to http://your_server_ip using your favorite browser. You should see the following:

your_server_ip’>http://your_server_ip
[(1, 'One!'), (2, 'Two!'), (3, 'Three!')]

Congratulations! Your server is now online.

Conclusion

You now have a working server that can run Python 3 with a robust, SQL database. The server is now also configured for easy maintenance, via well-documented and established package installers.

However, in its current state, the server is vulnerable to outsiders. Whereas elements like SSL encryption are not essential to your server’s function, they are indispensable resources for a reliable, safe server. Learn more by reading about how to configure Apache, how to create an Apache SSL certificate and how to secure your Linux server.

Copy from: https://www.digitalocean.com/community/tutorials/how-to-set-up-an-apache-mysql-and-python-lamp-server-without-frameworks-on-ubuntu-14-04

 
Leave a comment

Posted by on October 16, 2015 in Python

 

Installing RabbitMQ on Ubuntu

By: MONKEY HACKS

 

By default, RabbitMQ packages are included in Ubuntu and Debian Linux distributions. The problem is, they are horribly outdated. You are far better off downloading the package online and installing it yourself.
Installing RabbitMQ
sudo -i

sudo echo "deb http://www.rabbitmq.com/debian testing main" >> /etc/apt/sources.list
After the repository is added, we will add the RabbitMQ public key to our trusted key list to avoid any warnings about unsigned packages.
wget https://www.rabbitmq.com/rabbitmq-signing-key-public.asc
sudo apt-key add rabbitmq-signing-key-public.asc
Now we just need to run an update, and install the rabbitmq-server from our newly added package.
sudo apt-get update
sudo apt-get install rabbitmq-server
If everything installed correctly, you should see a message similar to this:
RabbitMQ Management
To manage your RabbitMQ server, you can use the rabbitmq-management plugin. This plugin allows you to manage and monitor your RabbitMQ server in a variety of ways, such as listing and deleting exchanges, queues, bindings and users. You can send and receive messages, and monitor activity on specific queues.
To install the plugin, use the following command:
sudo rabbitmq-plugins enable rabbitmq_management


ou can access the user-management with rabbitmqctl and use the command:

add_user {username} {password}

or more preferably maybe edit an existing user, or set the permissions for the new user with:

set_permissions [-p vhostpath] {user} {conf} {write} {read}

For example use the following commands: (it is important to perform these three steps even when creating a new user, if you want to be able to login to the UI console and for your programs to work without facing any permission issues)

add_user newadmin s0m3p4ssw0rd
set_user_tags newadmin administrator
set_permissions -p / newadmin ".*" ".*" ".*"
Once the plugin is installed, you are able to access it via the following url:
http://[your-server]:15672/
If you are running the server on your local machine, the url will be as simple as http://localhost:15672/. You should see the following screen when you first access the plugin:
The default username and password are guest and guest. Login, and you will be greeted with the RabbitMQ dashboard.
Congratulations! You now have your own RabbitMQ Server. If you have any questions, feel free to leave them in the comments below.
*** Notation: if guest fail login please do as follow:

So I created the rabbitmq.config inside the directory /etc/rabbitmq with this:

[{rabbit, [{loopback_users, []}]}].

Then

sudo invoke-rc.d rabbitmq-server stop

sudo invoke-rc.d rabbitmq-server start

and both the console and the java client can access using guest/guest:

 

 
Leave a comment

Posted by on October 14, 2015 in Application Server, RabbitMQ

 

Yii2 htaccess – How to hide frontend/web and backend/web COMPLETELY

By: mohit

 

Step 1

Create .htaccess file in root folder, i.e advanced/.htaccess and write below code.

Options +FollowSymlinks
RewriteEngine On

# deal with admin first
RewriteCond %{REQUEST_URI} ^/(admin) <------
RewriteRule ^admin/assets/(.*)$ backend/web/assets/$1 [L]
RewriteRule ^admin/css/(.*)$ backend/web/css/$1 [L]

RewriteCond %{REQUEST_URI} !^/backend/web/(assets|css)/  <------
RewriteCond %{REQUEST_URI} ^/(admin)  <------
RewriteRule ^.*$ backend/web/index.php [L]


RewriteCond %{REQUEST_URI} ^/(assets|css)  <------
RewriteRule ^assets/(.*)$ frontend/web/assets/$1 [L]
RewriteRule ^css/(.*)$ frontend/web/css/$1 [L]

RewriteCond %{REQUEST_URI} !^/(frontend|backend)/web/(assets|css)/  <------
RewriteCond %{REQUEST_URI} !index.php
RewriteCond %{REQUEST_FILENAME} !-f [OR]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^.*$ frontend/web/index.php

Note : if you are trying in local server then replace ^/ with ^/project_name/ where you see arrow sign. Remove those arrow sign <------ after setup is done.

Step 2

Now create a components/Request.php file in common directory and write below code in this file.

namespace common\components;


class Request extends \yii\web\Request {
    public $web;
    public $adminUrl;

    public function getBaseUrl(){
        return str_replace($this->web, "", parent::getBaseUrl()) . $this->adminUrl;
    }


    /*
        If you don't have this function, the admin site will 404 if you leave off 
        the trailing slash.

        E.g.:

        Wouldn't work:
        site.com/admin

        Would work:
        site.com/admin/

        Using this function, both will work.
    */
    public function resolvePathInfo(){
        if($this->getUrl() === $this->adminUrl){
            return "";
        }else{
            return parent::resolvePathInfo();
        }
    }
}

Step 3

Installing component. Write below code in frontend/config/main.php and backend/config/main.phpfiles respectively.

//frontend, under components array
'request'=>[
    'class' => 'common\components\Request',
    'web'=> '/frontend/web'
],
'urlManager' => [
        'enablePrettyUrl' => true,
        'showScriptName' => false,
],

// backend, under components array
'request'=>[
    'class' => 'common\components\Request',
    'web'=> '/backend/web',
    'adminUrl' => '/admin'
],
'urlManager' => [
        'enablePrettyUrl' => true,
        'showScriptName' => false,
],

Thats it! You can try your project with
http://www.project.com/admin, http://www.project.com

in local server
localhost/project_name/admin, localhost/project_name

 

Copy from: http://stackoverflow.com/questions/28118691/yii2-htaccess-how-to-hide-frontend-web-and-backend-web-completely

 
3 Comments

Posted by on October 8, 2015 in PHP, Yii

 

Defining text field types in schema.xml

By: 

Overview

Solr’s world view consists of documents, where each document consists of searchable fields. The rules for searching each field are defined using field type definitions. A field type definition describes the analyzers, tokenizers and filters which control searching behaviour for all fields of that type.

 

When a document is added/updated, its fields are analyzed and tokenized, and those tokens are stored in solr’s index. When a query is sent, the query is again analyzed, tokenized and then matched against tokens in the index. This critical function of tokenization is handled by Tokenizer components.

 

In addition to tokenizers, there are TokenFiltercomponents, whose job is to modify the token stream.

There are also CharFilter components, whose job is to modify individual characters. For example, HTML text can be filtered to modify HTML entities like &amp; to regular &.

 

Defining text field types in schema.xml

Here’s a typical text field type definition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

What this type definition specifies is:

  • When indexing a field of this type, use an analyzer composed of
    • a WhitespaceTokenizerFactory object
    • a StopFilterFactory
    • a WordDelimiterFilterFactory
    • a LowerCaseFilterFactory
  • When querying a field of this type, use an analyzer composed of
    • a WhitespaceTokenizerFactory object
    • a SynonymFilterFactory
    • a StopFilterFactory
    • a WordDelimiterFilterFactory
    • a LowerCaseFilterFactory

If there is only one analyzer element, then the same analyzer is used for both indexing and querying.

    It’s important to use the same or similar analyzers that process text in a compatible manner at index and query time. For example, if an indexing analyzer lowercases words, then the query analyzer should do the same to enable finding the indexed words.

 

Under the hood

Solr builds a TokenizerChain instance for each of these analyzers. A TokenizerChain is composed of 1 TokenizerFactory instance, 0-n TokenFilterFactory instances, and 0-n CharFilterFactoryinstances. These factory instances are responsible for creating their respective objects from the Lucene framework. For example, a TokenizerFactory creates a Lucene Tokenizer; its concrete implementation WhitespaceTokenizerFactory creates a Lucene WhitespaceTokenizer. image

The class design diagram shows how a TokenizerChain works:

  • Raw input is provided by a Reader instance
  • CharReader (is-a CharStream) wraps the raw Reader
  • Each CharFilterFactory creates a character filter that modifies input CharStream and outputs a CharStream. So CharFilterFactories can be chained.
  • TokenizerFactory creates a Tokenizer from the CharStream.
  • Tokenizer is-a TokenStream, and can be passed to TokenFilterFactories.
  • Each TokenFilterFactory modifies the token stream and outputs another TokenStream. So these can be chained.

Commonly used CharFilterFactories

solr.MappingCharFilterFactory Maps a set of characters to another set of characters.The mapping file is specified by mappingattribute, and should be present under /solr/conf.
Example: <charFilter class=”solr.MappingCharFilterFactory” mapping=”mapping-ISOLatin1Accent.txt”/>
The mapping file should have this format:
# Ä => A “u00C4″ => “A”
# Å => A “u00C5″ => “A”
solr.HTMLStripCharFilterFactory Strips HTML/XML from input stream.The input need not be an HTML document as only constructs that look like HTML will be removed.
Removes HTML/XML tags while keeping the content
Attributes within tags are also removed, and attribute quoting is optional.
Removes XML processing instructions: <?foo bar?>
Removes XML comments
Removes XML elements starting with <! and ending with >
Removes contents of <script> and <style> elements.
Handles XML comments inside these elements (normal comment processing won’t always work)
Replaces numeric character entities references like A or The terminating ‘;’ is optional if the entity reference is followed by whitespace.
Replaces all named character entity references.
terminating ‘;’ is mandatory to avoid false matches on something like “Alpha&Omega Corp” Examples: <charFilter class=”solr.HTMLStripCharFilterFactory”/>
The text
my <a href=”www.foo.bar”>link</a>
becomes
my link

 

Commonly used TokenizerFactories

solr.WhitespaceTokenizerFactory A tokenizer that divides text at whitespaces, as defined byjava.lang.Character.isWhiteSpace().Adjacent sequences of non-whitespace characters form tokens.

Example: HELLOtttWORLD.txt   is tokenized into 2 tokensHELLO WORLD.txt

solr.KeywordTokenizerFactory Treats the entire field as one token, regardless of its content.This is a lot like the “string” field type, in that no tokenization happens at all.Use it if a text field requires no tokenization, but does require char filters, or token filtering like LowerCaseFilter and TrimFilter. Example:http://example.com/I-am+example?Text=-Hello is retained as http://example.com/I-am+example?Text=-Hello
solr.StandardTokenizerFactory A good general purpose tokenizer.

  • Splits words at punctuation characters, removing punctuation. However, a dot that’s not followed by whitespace is considered part of a token.
  • Not suitable for file names because the .extension is treated as part of token.
  • Splits words at hyphens, unless there’s a number in the token, in which case the whole token is interpreted as a product number and is not split.
  • Recognizes email addresses and internet hostnames as one token.

Example: This sentencet can’t be “tokenized_Correctly” by http://www.google.com or IBM  or NATO 10.1.9.5 test@email.org product-number 123-456949 file.txt is tokenized as Thissentence can’t be tokenized Correctly by http://www.google.com or IBM or NATO 10.1.9.5 test@email.org product number 123-456949 file.txt

solr.PatternTokenizerFactory Uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: “pattern” and “group”.

  • “pattern” is the regular expression.
  • “group” says which group to extract into tokens.

group=-1 (the default) is equivalent to “split”. In this case, the tokens will be equivalent to the output from (without empty tokens): String.split(java.lang.String) Using group >= 0 selects the matching group as the token. For example, if you have:

  pattern = '([^']+)'
  group = 0
  input = aaa 'bbb' 'ccc'

the output will be two tokens: ‘bbb’ and ‘ccc’ (including the ‘ marks). With the same input but using group=1, the output would be: bbb and ccc (no ‘ marks)

solr.NGramTokenizerFactory Not clear when and where to use, but the idea is that input is split into 1-sized, then 2-sized, then 3-sized, etc tokens.Perhaps useful for partial matching…It takes “minGram” and “maxGram” arguments, but again, not clear how to set them. Example: email becomes e m a i l em ma ai il

 

Commonly used TokenFilterFactories

solr.WordDelimiterFilterFactory Splits words into subwords and performs optional transformations on subword groups.One use for WordDelimiterFilter is to help match words withdifferent delimiters. One way of doing so is to specifygenerateWordParts="1" catenateWords="1" in the analyzer used for indexing, andgenerateWordParts="1" in the analyzer used for querying. Given that the current StandardTokenizer immediately removes many intra-word delimiters, it is recommended that this filter be used after a tokenizer that leaves them in place (such as WhitespaceTokenizer). By default, words are split into subwords with the following rules:

  • split on intra-word delimiters (all non alpha-numeric characters).
    • "Wi-Fi" -> "Wi", "Fi"
  • split on case transitions (can be turned off – see splitOnCaseChange parameter)
    • "PowerShot" -> "Power", "Shot"
  • split on letter-number transitions (can be turned off – see splitOnNumerics parameter)
    • "SD500" -> "SD", "500"
  • leading and trailing intra-word delimiters on each subword are ignored
    • "//hello---there, 'dude'" -> "hello", "there", "dude"
  • trailing “‘s” are removed for each subword (can be turned off – see stemEnglishPossessive parameter)
    • "O'Neil's" -> "O", "Neil"
      • Note: this step isn’t performed in a separate filter because of possible subword combinations.

Splitting is affected by the following parameters:

  • splitOnCaseChange=”1″causes lowercase => uppercase transitions to generate a new part [Solr 1.3]:
    • "PowerShot" => "Power" "Shot"
    • "TransAM" => "Trans" "AM"
    • default is true (“1″); set to 0 to turn off
  • splitOnNumerics=”1″causes alphabet => number transitions to generate a new part [Solr 1.3]:
    • "j2se" => "j" "2" "se"
    • default is true (“1″); set to 0 to turn off
  • stemEnglishPossessive=”1″causes trailing “‘s” to be removed for each subword.
    • "Doug's" => "Doug"
    • default is true (“1″); set to 0 to turn off

There are also a number of parameters that affect what tokens are present in the final output and if subwords are combined:

  • generateWordParts=”1″causes parts of words to be generated:
    • "PowerShot" => "Power" "Shot" (ifsplitOnCaseChange=1)
    • "Power-Shot" => "Power" "Shot"
    • default is 0
  • generateNumberParts=”1″causes number subwords to be generated:
    • "500-42" => "500" "42"
    • default is 0
  • catenateWords=”1″causes maximum runs of word parts to be catenated:
    • "wi-fi" => "wifi"
    • default is 0
  • catenateNumbers=”1″causes maximum runs of number parts to be catenated:
    • "500-42" => "50042"
    • default is 0
  • catenateAll=”1″causes all subword parts to be catenated:
    • "wi-fi-4000" => "wifi4000"
    • default is 0
  • preserveOriginal=”1″causes the original token to be indexed without modifications (in addition to the tokens produced due to other options)
    • default is 0

Example of generateWordParts=”1″ and catenateWords=”1″:

  • "PowerShot" -> 0:"Power", 1:"Shot" 1:"PowerShot"(where 0,1,1 are token positions)
  • "A's+B's&C's" -> 0:"A", 1:"B", 2:"C", 2:"ABC"
  • "Super-Duper-XL500-42-AutoCoder!" -> 0:"Super",
  • 1:"Duper",
  • 2:"XL",
  • 2:"SuperDuperXL",
  • 3:"500"
  • 4:"42",
  • 5:"Auto",
  • 6:"Coder",
  • 6:"AutoCoder"
solr.SynonymFilterFactory Matches strings of tokens and replaces them with other strings of tokens.

  • The synonyms parameter names an external file defining the synonyms.
  • If ignoreCase is true, matching will lowercase before checking equality.
  • If expand is true, a synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list.

Synonym file format: i-pod, i pod => ipod, sea biscuit, sea biscit => seabiscuit

solr.StopFilterFactory Discards common words.<filter class=”solr.StopFilterFactory” words=”stopwords.txt” ignoreCase=”true”/> Stop words file should be in /solr/conf. Format:#Standard english stop words a an
solr.SnowballPorterFilterFactory Uses the Tartarus snowball stemmer framework for different languages. Set the “language” attribute.Not clear how this is different from PorterStemFilterFactory.Example: running gives run
solr.HyphenatedWordsFilterFactory Combines words split by hyphens. Use only at indexing time.
solr.KeepWordFilterFactory Retains only words specified in the “words” file.
solr.LengthFilterFactory Retains only tokens whose length falls between “min” and “max”
solr.LowerCaseFilterFactory Changes all text to lower case.
solr.PorterStemFilterFactory Transforms token stream according to the Porter stemming algorithm. The input token stream should already be lowercase (pass through a LowerCaseFilter).Example:running is tokenized to run
solr.ReversedWildcardFilterFactory
solr.ReverseStringFilterFactory
Useful if wildcard queries like “Apache*” should be supported.Factory for ReversedWildcardFilter-s. When this factory is added to an analysis chain, it will be used both for filtering the tokens during indexing, and to determine the query processing of this field during search. This class supports the following init arguments:

  • withOriginal – if true, then produce both original and reversed tokens at the same positions. If false, then produce only reversed tokens.
  • maxPosAsterisk – maximum position (1-based) of the asterisk wildcard (‘*’) that triggers the reversal of query term. Asterisk that occurs at positions higher than this value will not cause the reversal of query term. Defaults to 2, meaning that asterisks on positions 1 and 2 will cause a reversal.
  • maxPosQuestion – maximum position (1-based) of the question mark wildcard (‘?’) that triggers the reversal of query term. Defaults to 1. Set this to 0, and maxPosAsteriskto 1 to reverse only pure suffix queries (i.e. ones with a single leading asterisk).
  • maxFractionAsterisk – additional parameter that triggers the reversal if asterisk (‘*’) position is less than this fraction of the query token length. Defaults to 0.0f (disabled).
  • minTrailing – minimum number of trailing characters in query token after the last wildcard character. For good performance this should be set to a value larger than 1. Defaults to 2.

Note 1: This filter always reverses input tokens during indexing. Note 2: Query tokens without wildcard characters will never be reversed.

 

Predefined text field types (in v1.4.x schema)

The default deployment contains a set of predefined text field types. The following table gives their tokenization details and examples.

text Indexing behaviour:

1
2
3
4
5
6
7
8
9
10
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a ‘gap’ for more accurate phrase queries. -->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>

– Tokenizes at whitespaces

– Stop words are removed

– Words delimiters are used to generate word tokens.

generateWordParts=1 => wi-fi will generate wi and fi

generateNumberParts = 1 => 3.5 will generate 3 and 5

catenateWords=1 => wi-fi will generate wi, fi and wifi

catenateNumbers = 1 => 3.5 will generate 3,5 and 35

catenateAll = 1 => wi-fi-35 will generate wi, fi, wifi, 35 and wifi35.

catenateAll = 0 => wi-fi-35 will generate wi, fi, wifi and 35, but not wifi35.

splitOnCaseChange=1 => camelCase will generate camel and case.

– All text is changed to lower case.

– The Snowball porter stemmer will convert running to “run”

Querying behaviour:

1
2
3
4
5
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>

In querying, only the synonym filter is additional. So something like TV which is in the synonym group “Television, Televisions, TV, TVs” results in this query token stream: televis televis tv tvs (“televis” is because “television” has been stemmed by Snowball Porter).

textgen Very similar to “text” but without stemming.
Indexing behaviour:

1
2
3
4
5
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a ‘gap’ for more accurate phrase queries. -->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>

– Tokenizes at whitespaces

– Stop words are removed

– Words delimiters are used to generate word tokens.

generateWordParts=1 => wi-fi will generate wi and fi

generateNumberParts = 1 => 3.5 will generate 3 and 5

catenateWords=1 => wi-fi will generate wi, fi and wifi

catenateNumbers = 1 => 3.5 will generate 3,5 and 35

catenateAll = 1 => wi-fi-35 will generate wi, fi, wifi, 35 and wifi35.

catenateAll = 0 => wi-fi-35 will generate wi, fi, wifi and 35, but not wifi35.

splitOnCaseChange=1 => camelCase will generate camel and case.

– All text is changed to lower case.

Note that there is no stemmer, which is what makes this different from “text” type.

Querying behaviour:

1
2
3
4
5
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>

In querying, only the synonym filter is additional. So something like TV which is in the synonym group “Television, Televisions, TV, TVs” results in this query token stream: television televisions tv tvs

For file paths and filenames, “textgen” seems to give the most appropriate results.

textTight Very similar again to “text”, but differs in:- WordDelimiterFilter has generateWordParts=”0″ and generateNumberParts=”0″.So “wi-fi” will give just “wifi” . ”HELLO_WORLD” will give just “helloworld” ”d:filepathfilename.ext” will give just “dfilepathfilenameext”
text_ws Just simple whitespace tokenization.
text_rev Similar to “textgen”, this is for general unstemmed text field that indexes tokens normally and also reversed (via ReversedWildcardFilterFactory), to enable more efficient leading wildcard queries.

Copy from: http://www.pathbreak.com/blog/solr-text-field-types-analyzers-tokenizers-filters-explained

 
Leave a comment

Posted by on October 7, 2015 in Solr

 

Make Shared Partition between Mac OSX and Windows (Bootcamp)

Here is a quick step by step of what I did:

Install OS X on a single partition hard drive.
Run Bootcamp Assistant.
Download drivers for Mac and burn to CD.
Make 60GB Bootcamp partition for Windows.
When prompted for install disc, STOP installation and quit Bootcamp.
Launch Disk Utility. Look at the 2 partitions.
Shrink Mac OS X partition to 100GB.
Click on + to create a 3rd partition in free space.
Split that partition into however many other partitions you want.
Quit Disk Utility.
Insert Windows 7 installer DVD and restart Mac.
After the startup chime, hold down OPTION key.
Wait a while until the Windows 7 DVD appears and select it.
Mac should start up from DVD. Start installing Windows 7.
Continue until finished. DO NOT connect to internet.
Load Bootcamp drivers CD that you burned and install.
When finished, restart and log into Windows 7.
Continue installing your applications. Do activations.
When finished, restart, holding down the OPTION key.
You are done. You should now see your OS X Lion and Win 7 partitions.

 

***Notation for Macbook Pro early or late 2011, you must install bootcamp from build-in DVD drive.

If you found error like: “Drivers not found bootcamp” pleas go to enable check on bootcamp to install vai USB drive.

Right click on bootcamp in /Application/Utility -> Show Package Contents -> Contents -> open file called Info.plist

and find and add Model Identity and Model Name

<key>DARequiredROMVersions</key>
<array>
<string>MBP81.0047.B04</string>

….

<key>USBBootSupportedModels</key>
<array>
<string>MacBookPro8,1</string>

 
Leave a comment

Posted by on October 3, 2015 in Mac