Build Searching Server using Apache Solr

Posted by: Sochinda Tith

Apache Solr
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world’s largest internet sites.

Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called “indexing”) via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results.

  • Advanced Full-Text Search Capabilities
  • Optimized for High Volume Web Traffic
  • Standards Based Open Interfaces – XML, JSON and HTTP
  • Comprehensive HTML Administration Interfaces
  • Server statistics exposed over JMX for monitoring
  • Linearly scalable, auto index replication, auto failover and recovery
  • Near Real-time indexing
  • Flexible and Adaptable with XML configuration
  • Extensible Plugin Architecture


As image above that my example for preparing indexing data, I have separated as 4 parts in system architecture:

  1. Client Side: this is part that is used by end user such as website, web application and mobile platform.
  2. Web Service: this is API web service (using RESTful) that the most important for the whole system, why? because it’s a middle ware that interact between end user and indexing server, DBMS
  3. Indexing Server: this is Apache Solr for preparing all document as system file for fasting search data by using full text search.
  4. DBMS or Document: it’s a storage data such Microsoft SQL server, MySQL or any files that will show to end user.

How to build


  • Apache Tomcat
  • Apache Solr
  • Solarium (framework for PHP)


MyDownloader: A Multi-thread C# Segmented Download Manager


Screenshot - MyDownloader1.png


MyDownloader is an open source application written in C# that is almost a complete download manager.MyDownloader has many features to manage downloads:

  • Segmented downloads from HTTP and FTP
    • With smart segments: when one segment ends, starts another segment to help to terminate another segment more fast
    • Automatic retry when a segment or download fails
  • Allow downloads to be paused and resumed
  • Video Downloads
    • Support to download videos from:
      • YouTube
      • Google Video
      • Break
      • PutFile
      • Meta Cafe
    • (NEW) Support to convert downloaded videos to MPEG, AVI and MP3 (using ffmpeg)
    • (NEW) Video file name suggestion based on video title
  • Speed Limit — to avoid to use all your bandwidth
  • Support for Auto-Downloads
    • (NEW) Limit the bandwidth at specific times
    • (NEW) Possibility to enable “Auto-downloads” at startup, allowing the downloads to start automaticaly at application startup
    • Download files only on allowed times
    • Limit the number of simultaneous downloads
    • When one download ends, starts another automatically
  • Support for FTP site that requires authentication
  • Support for Mirrors
  • Download from HTTPS
  • (NEW) Download from authenticated HTTP urls
  • Notification download completion with sounds and XP balloon
  • Anti-virus integration
  • Batch downloads (enter a generic URL such as http://server/file(*).zip and MyDownloader generates a set of URLs with numbers or letters)
  • (NEW) Move up / Move down button to change the order of download on download queue
  • (NEW) Bug fixes and improvements
  • (NEW) Web Spider (Web Crawler)
    • (NEW) Download all files from an specific page
    • (NEW) Download all images from an specific page
    • (NEW) Allow to filter URLs by extension or by name
  • (NEW) Support to convert downloaded videos to MPEG, AVI and MP3 (using ffmpeg)
  • (NEW) Video file name suggestion based on video title
  • (NEW) Clipboard Monitor
  • (NEW) Internet Explorer Integration
    • (NEW) Download links when they are clicked and the user is holding the ALT key
    • (NEW) When navigating tough an video site (YouTube, Google Video, etc), enable the video button to download the video with MyDownloader
    • (NEW) Button to launch MyDownloader
  • (NEW) Import URLs from file
    • (NEW) From a local text file
    • (NEW) From a local html file

How a Segmented Download Works

Downloads can be segmented because both HTTP and FTP protocols allow the client to specify the start position of the stream. First, MyDownloader performs a request to the server to discover the file size. After that, MyDownloadercalculates the segment size as follows:

 Collapse | Copy Code
segment size = min( (file size / number of segments), 
    minimum allowed segment size )

With the segment size, MyDownloader creates another request specifying the start position of the stream. In this way, we can have multi-requests for the same files running in parallel using multi-threading techniques. This technique speeds up the transfer rate even more if you are using mirrors.

Using the Code: MyDownloader API

To start a segmented download using the MyDownloader API is very simple. Check the code below, extracted from the MyDownloader source code. When the download is finished, an XP balloon is displayed near the windows clock:


 Collapse | Copy Code
// starts to listen to the event 'DownloadEnded' from DownloadManager
DownloadManager.Instance.DownloadEnded += 

new EventHandler<DownloaderEventArgs>(Instance_DownloadEnded);

// indicates that download should start immediately
bool startNow = true;

Downloader download = DownloadManager.Instance.Add(
    3,          // Three segments 

    startNow    // Start download now

void Instance_DownloadEnded(object sender, DownloaderEventArgs e)
    if (Settings.Default.ShowBallon && 
        // Display the XP Balloon 

            String.Format("Download finished: {0}", e.Downloader.LocalFile),

Protocol Abstraction

On previous versions of MyDownloader, the protocols support was implemented by classes that inhererited fromDownloader. This was because the previous version didn’t support Mirrors, so at the time, a single download could only come from one source. But now, with Mirrors features, we can have one piece of a download coming from HTTP and another piece coming from an FTP server.

For that reason, I have refactored the code and now all supported protocols (HTTP, FTP, HTTPS) are implemented by classes that implement IProtocolProvider. The concrete instance of IProtocolProvider is created byProtocolProviderFactory, protocols providers classes are implemented in a different class hierarchy from theDownloader class. This is done to address the restriction of using a single protocol for the download.

To make it easier to retrieve the correct IProtocolProvider, the ResourceLocation class has a factory method. This method is used by the Downloader class.


Plug-in Architecture

Many features from MyDownloader are implemented using the concept of extensibility. Because the most important classes in MyDownloader offer a lot of events, extensions can listen to those events to change the application behavior. Another nice thing is that each extension has its own settings. Therefore the Options dialog needs to be created based on extensions. If you open Options at design time, you will only see an empty Panel.

MyDownloader3.pngBelow, you can see how we load settings from the extension to populate the tree view:

 Collapse | Copy Code
for (int i = 0; i < App.Instance.Extensions.Count; i++)
    IExtension extension = App.Instance.Extensions[i];
    IUIExtension uiExtension = extension.UIExtension;

    Control[] options = uiExtension.CreateSettingsView();

    TreeNode node = new TreeNode(extension.Name);
    node.Tag = extension;

    for (int j = 0; j < options.Length; j++)
        TreeNode optioNd = new TreeNode(options[j].Text);
        optioNd.Tag = options[j];


The DownloadManager that I showed in the beginning of this article also doesn’t know anything about HTTP or FTP.DownloadManager accepts protocols registered on ProtocolProviderFactory, and the HTTP and FTP protocols are registered by an extension. Check the HTTP/FTP download extension:

 Collapse | Copy Code
public class HttpFtpProtocolExtension: IExtension
    #region IExtension Members

    public string Name
        get { return "HTTP/FTP"; }

    public IUIExtension UIExtension
        get { return new HttpFtpProtocolUIExtension(); }

    public HttpFtpProtocolExtension()



When we think of an HTTP download, what are the settings that an HTTP downloader would require? Proxy is one of the answers. Many users are behind an HTTP proxy and connecting directly to an HTTP server is not allowed in most companies.

So, to expose the settings for our HttpFtpProtocolExtension, we need to create an IUIExtension and return it through UIExtension property of IExtension. On this class we implement the method CreateSettingsView, that returns all settings that will be displayed on Options dialog.

 Collapse | Copy Code
public class HttpFtpProtocolUIExtension : IUIExtension
    public System.Windows.Forms.Control[] CreateSettingsView()
        // create the Proxy user control an return it.

        return new Control[] { new Proxy() };

    public void PersistSettings(System.Windows.Forms.Control[] settingsView)


The HttpFtpProtocolUIExtension class provides a factory method named CreateSettingsView. This creates an array of Controls that are the visualization of the extension settings. The Options dialog uses this array to populate theTreeView of options and display the setting on the right panel.

Web Spider

Web Spider works over MyDownloader API, the only secret on the spider is to parse the HTML pages using regular expressions. Below we can see a screenshot of Web Spider:

MyDownloader8.pngWhen an download of an file is complete (download state is changed to DownloaderState.Ended), the spider checks if it’s an HTML document (comparing the mime type) and then lookup for all references such hyperlinks, images, frames and iframes. The following code is executed to add all page references to the download list:

 Collapse | Copy Code
if (download.RemoteFileInfo.MimeType.IndexOf("text/html",
    StringComparison.OrdinalIgnoreCase) < 0)

    using (Stream htmlStream = File.OpenRead(localFile))
        using (HtmlParser parser = new HtmlParser(htmlStream))
            AddUrls(parser.GetHrefs(context.BaseLocation), UrlType.Href);
            AddUrls(parser.GetImages(context.BaseLocation), UrlType.Img);
            AddUrls(parser.GetFrames(context.BaseLocation), UrlType.Frame);
            AddUrls(parser.GetIFrames(context.BaseLocation), UrlType.IFrame);

Video Downloads from YouTube, Google Video (and etc) with Conversion

Like many MyDownloader features, video downloads is just another extension. The secret is atVideoDownloadExtension and the “New Video Download” window. All URLs in MyDownloader are represented by the ResourceLocation class — this class has the method GetProtocolProvider which returns the apropriated instance of IProtocolProvider interface — the only thing that we need to do (at “New Video Download”) is to force the correct protocol provider type by setting the property ProtocolProviderType of ResourceLocation.

Setting this property, when ResourceLocation class calls GetProtocolProvider, the created protocol provider will be the type stored in ProtocolProviderType, and not the provider registed on ProtocolProviderFactory. In this way we can replace the default protocol provider, and avoid that the HTML content be saved, and force to download the video from web site.

The first step is register the Video protocol providers on VideoDownloadExtension:

 Collapse | Copy Code
public VideoDownloadExtension()
   handlers = new List<VideoDownloadHandler>();
   handlers.Add(new VideoDownloadHandler(YouTubeDownloader.SiteName, 
           YouTubeDownloader.UrlPattern, typeof(YouTubeDownloader)));
   handlers.Add(new VideoDownloadHandler(GoogleVideoDownloader.SiteName, 
           GoogleVideoDownloader.UrlPattern, typeof(GoogleVideoDownloader)));
   // ... register other sites here ...


After registering, we need to discover which video handler we need to use and also, set the correct protocol provider on the ProtocolProviderType property of ResourceLocation. This is done at “New Video Download” window, check Below :


 Collapse | Copy Code
VideoDownloadExtension extension;
extension = (VideoDownloadExtension)App.Instance.GetExtensionByType(
handler = extension.GetHandlerByURL(txtURL.Text);
ResourceLocation rl = ResourceLocation.FromURL(txtURL.Text);
rl.ProtocolProviderType = handler.Type.AssemblyQualifiedName;

Basically, all video site handlers only need to parse the HTML page and return the URL of the FLV. This process have three main steps:

  • Download the HTML page from the video site
  • Parse the HTML to discover the video URL
  • Return the video URL

All common things are on BaseVideoDownloader class. This class retrieves the HTML and starts to download the video. The inherited classes (YouTubeDownloaderGoogleVideoDownloader) are responsible to parse the HTML text and return the video URL to the base class. Below we can see how to get the URL from a FLV file on a YouTube page:

 Collapse | Copy Code
public class YouTubeDownloader: BaseVideoDownloader
   public const string SiteName = "You Tube";


   public const string UrlPattern = 
      @"(?:[Yy][Oo][Uu][Tt][Uu]<bb />[Ee]\.[Cc][Oo][Mm]/watch\?v=)(\w[\w|-]*)"</bb />;

   protected override ResourceLocation ResolveVideoURL(string url, string pageData, 
         out string videoTitle)
      videoTitle = TextUtil.JustAfter(pageData,
          "< meta name=\"title\" content=\"", "\">"); 

      return ResourceLocation.FromURL(String.Format("{0}/get_video?video_id={1}&t={2}", 
       TextUtil.GetDomain(url), TextUtil.JustAfter(url, "v=", "&"), 
       TextUtil.JustAfter(pageData, "&t=", "&hl=")));

After downloadeding, the video can be converted to MPEG, AVI or MP3 (audio only), this process in done using an external open source tool: ffmpeg. This tool, which is a command line tool, is called by MyDownloader with the FLV filename and conversion arguments. If you want to see details about the arguments that were send to ffmpeg, I suggest you to download the code / demo project of this article.

Selecting Files inside a Remote ZIP File

This is another very cool feature of MyDownloder. Sometimes, you need to download an big ZIP file just because you want a single file inside the ZIP, on New Download window, if user checks the option “Choose files inside ZIP”,MyDownloader will enumerate the files inside ZIP and allow user to select only that files that we want to download.

The feature is based on the article Extracting files from a remote ZIP archive and the updated version by Unruled Boy(see comments on the end of the article). Below we can the how New Download window displays the ZIP file and allow user to choose the files inside ZIP:



The Auto-Downloads is activated (or deactivated) through the “two arrows” button in MyDownloader toolbar. When this feature is enabled, MyDownloader starts to work as a batch downloader, accomplishing each download on download queue.

The maximum number of downloads is configured in the “Options” dialog. Another nice thing is that the user is able to choose at which times the “Auto-Downloads” will work and is also possible to limit the bandwidth usage at specific times. This is done easily by selecting the “time grid”:

MyDownloader5.pngThe Auto-Downloads, works using events (DownloadAdded, DownloadEnded) from DownloadManager. When some of these events were raised, the extension starts the download respecting the maximum number of simultaneous downloads:

 Collapse | Copy Code
using (DownloadManager.Instance.LockDownloadList(false))
   int count = GetActiveJobsCount();
   int maxJobs = Settings.Default.MaxJobs;

   if (count < maxJobs)
      for (int i = 0; 
         i < DownloadManager.Instance.Downloads.Count && (count < maxJobs); 
         if (DownloadManager.Instance.Downloads[i].State != 
            DownloaderState.Ended &&

            ! DownloadManager.Instance.Downloads[i].IsWorking())
            count ++;

Internet Explorer Integration

Browser integration is a critical feature for any download manager. This new version of MyDownloader introduces a very simple Internet Explorer (IE) integration. The IE integration is a IE toolbar, which is build on top of BandObjectLib, that has three main features:

  • Shortcut button that be enabled when user is navigation on a video site that allow user to download the video
  • Replace IE download window when user is holding Alt key
  • Shortcut to lauch MyDownloader

Below we can see the IE displaying an empty page and perceive that the download button is disabled, the second image shows IE displaying a YouTube video and the download button became enabled:

MyDownloader10.pngTo enable the video download button, we need to listen to the AfterNavigate event from IE and then check if property LocationURL is an URL from a video site:

 Collapse | Copy Code
void AfterNavigate(object iDisp, ref object URL)
   SHDocVw.WebBrowser IEDocument = GetIEDocument();            

   btnDownload.Enabled = videoSites.IsVideoSite(IEDocument.LocationURL);

To replace the IE download window (only when Alt is pressed), the FileDownload event is used:

 Collapse | Copy Code
void FileDownload(bool ActiveDocument, ref bool Cancel)
   if (!ActiveDocument)
      if ((Control.ModifierKeys & Keys.Alt) == Keys.Alt)
         Cancel = true;

         if ((DateTime.Now - lastDownload).TotalSeconds >= 1.9)
               delegate(object state)

            lastDownload = DateTime.Now;

Import URLs from Files

Other new feature of MyDownloader is “Import URLs from files” window, which allows the user to import the URLs from a text file or from an HTML file. The text files must to have one URL each line. For HTMLs, the URLs will be extracted using the same HTML parser used on Web Spider.

All URLs that were found in the file will be added to the download list. “Import URLs from files” window also have a shortcut to enable the “Auto-downloads”, and to setup the maximum number of simultaneous downloads.


Future Ideas

This kind of project is “infinite,” so below I have listed some ideas for future implementations. As any open source project, it would be very nice if you wish to contribute.

  • Add and remove segments while downloading
  • Option to disable the speed limit while screen saver is running
  • Integrate with FireFox and improve Internet Explorer integration
  • Improve mirrors feature by choosing the fasters mirrors sites
  • Support MMS protocol
  • Create downloads category and allow downloads to be labeled
  • XY graph to show the bandwidth usage
  • Auto shutdown after download end
  • Hang-Up internet connection after download end
  • Support metalink
  • Video downloads:
    • Create a media monitor integrated with IE and FF that allows the user to download videos from any site

I hope you enjoyed the code! If you have any questions or feedback, feel free to contact me.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Copy from:

IDM (Internet download manager) API using C#?

By: Simon Mourier

You should

1) download to your disk

2) extract the IDManTypeInfo.tlb type library from this zip file, somewhere on your disk

3) run The Type Library Importer tool like this:

[C:\Temp]"c:\Program Files\Microsoft SDKs\Windows\\v7.1\Bin\TlbImp.exe" IDManTypeInfo.tlb
Microsoft (R) .NET Framework Type Library to Assembly Converter 3.5.30729.1
Copyright (C) Microsoft Corporation.  All rights reserved.

Type library imported to IDManLib.dll

This will create an IDManLib.dll

4) Now you can reference IDManLib.dll in your project. I have not tested it, but I also suppose you want to ensure your program is compiled with the proper bitness (it depends how the IDM COM Server works). So, I suggest you compile as X86 (not anyCpu, not X64).

Copy from :

Using Solr for Search with .NET(C#)

By , 19 Oct 2012


Solr is an advanced search coming from Apache’s Lucene project. Thanks to SolrNet, a .NET library for Solr, it is quite convenient to use Solr for search in ASP.NET.

Install Apache Tomcat and Solr

First of all, make sure you get the latest version of Apache Tomcat and Solr. (I installed Tomcat 7 and Solr 1.4.1 (zip version) as of September 2010.)

Tomcat installation

When installing Tomcat, make sure to remember the port you specify (normal for Solr is 8983). After installation, the Apache Tomcat Properties window should popup. If not, find Configure Tomcat in the start menu and make sure the web server’s started. If it’s started, you should find the default Tomcat startpage if you browse to http://localhost:8983.


Solr installation

Before you install Solr, stop the Tomcat web server (through the Configure Tomcat window).

  • Download Solr 1.4 from here
  • Download the latest Java JDK

When you’ve downloaded the Solr zip file (make sure it’s the zip version!), unzip the archive and find the dist folder. In the dist folder, find the apache-solr-1.4.1.war file and copy it to C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps, renaming it to solr.war.

Now, we also need to create the Solr folder, which will host our Solr configuration files, indexes and so on. I createdC:\Solr. You’ll also need to copy the contents of the apache-solr-1.4.1\example\solr folder to your newly created Solr folder. When you’re done, you should at least have the bin and conf folder.

Finally, we need to tell Tomcat where our Solr folder is located. Open up the Configure Tomcat window, navigate to the Java tab and add this row to Java Options:



 Collapse | Copy Code
// -Dsolr.solr.home=c:\solr


It should look like this:

Now, you should first start the web server and then be able to navigate to http://localhost:8983/solr. Installation and basic configuration done!

Quick look at the configuration files

The Solr configuration files are important – you will use them to tell Solr what should be indexed and not. The most important config files are schema.xml and solrconfig.xml. These are located in the C:\Solr\conf folder.

Easier use of Solr with the SolrNet library

SolrNet is a great .NET library for Solr, making it all easier. Download assemblies (and samples) on the SolrNet Google Code page.

I’d like to thank A. Friedman for his contribution to the Solr and ASP.NET world. Here’s his great blog post on Solr and SolrNet.

Code snippets using SolrNet

Here’s some code snippets from my Solr app. You can find them in the source code, although I found it being a good idea to post code for a couple of common actions using Solr and SolrNet for search.

Search the index and bind to Repeater:

 Collapse | Copy Code
var search = new DefaultSearcher()
	.Search(query, 10, 1);

rptResults.DataSource = search.Result;

Re-index all data:

 Collapse | Copy Code
new SolrBaseRepository.Instance<Player>().Start();

var solr = ServiceLocator.Current.GetInstance<ISolrOperations<Player>>();
var players = new PlayerRepository().GetPlayers();


 Remove from index:

 Collapse | Copy Code
new SolrBaseRepository.Instance<Player>().Start();

var solr = ServiceLocator.Current.GetInstance<ISolrOperations<Player>>();
var specificPlayer = new PlayerRepository().GetPlayer(id);


Adding multiple fields to the search index

By standard, Solr lets you index one field only, thanks to the defaultSearchField in schema.xml. It’s easy to turn on indexing of multiple fields though, using copyField and an additional field which takes multi values.

What you have to do is to edit schema.xml a bit:

  1. Setup the fields you want to get indexed, using field.
  2. Create an additional field called “text”, setting its multiValued property to true.
  3. Use copyField to copy data to this additional field.
  4. Use this additional field, “text”, as the defaultSearchField.
Here’s how:
 Collapse | Copy Code
	<field name="id" type="int" indexed="true" stored="true" required="true" /> 
	<field name="firstname" type="text" indexed="true" stored="false" required="false" /> 
	<field name="lastname" type="text" indexed="true" stored="false" required="false" />
	<field name="text" type="text" indexed="true" stored="false" multiValued="true" />
<copyField source="firstname" dest="text" />
<copyField source="lastname" dest="text" />
<solrQueryParser defaultOperator=uot;AND" />

Debugging Solr

If you encounter any problems with Solr, try this to get it working:

  • Turn off elevate.xml handler (comment appropriate lines in solrconfig.xml).
  • Case sensitive configuration files – make sure you spell copyField, multiValued etc correctly.
  • In solrconfig.xml, make sure you use matching data types to those you’ve defined in your ASP.NET app.

More Solr

Solr is really powerful and gives you a lot of options. I recommend the Solr Wiki for more information on what actually is possible.

This link might also help you.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author


Software Developer
India India