Hi
I've been making some simple applications to test performance before I integrate into some of my mainstream application. One such application visits a list of url's extracts basic information this is done using a thread process.
As I don't require to crawl the site just visit url for my simple test a took the simpletask example and modified it slightly just for testing basically looping through a set of url's and doing nothing else. this was mainly to test speed to compare against.
I'm getting around 15-20 urls a second average process which is actually slower than my multithread current model (that's also processing the data ) when running a comparison.
I've checked license and this is correct and allowing task number etc to be adjusted(I have a full license).
varied the max task which makes little difference.
I've also checked against several URL list of varying sizes(1000-100000)
I've tried on my dev machine and also server which has a 500mb+ connection
any suggestions only alteration done to example is below
thanks
kev
SimpleTaskSample() alterations
StartWork() alterations
I've been making some simple applications to test performance before I integrate into some of my mainstream application. One such application visits a list of url's extracts basic information this is done using a thread process.
As I don't require to crawl the site just visit url for my simple test a took the simpletask example and modified it slightly just for testing basically looping through a set of url's and doing nothing else. this was mainly to test speed to compare against.
I'm getting around 15-20 urls a second average process which is actually slower than my multithread current model (that's also processing the data ) when running a comparison.
I've checked license and this is correct and allowing task number etc to be adjusted(I have a full license).
varied the max task which makes little difference.
I've also checked against several URL list of varying sizes(1000-100000)
I've tried on my dev machine and also server which has a 500mb+ connection
any suggestions only alteration done to example is below
thanks
kev
SimpleTaskSample() alterations
License.Lock();
var engineConfig = new CrawlerEngineConfig();
engineConfig.MaxWorkingTasks = 5000;
// engineConfig.MaxTasksPerMinute = 900000000;
// engineConfig.MaxFinishedTasks = 100000000;
var engine = new CrawlerEngine(engineConfig);
Console.WriteLine();
Console.WriteLine("Start Task");
var ofd = new OpenFileDialog();
if (ofd.ShowDialog() != DialogResult.OK) return;
string filepath = ofd.FileName;
List<string> list = new List<string>();
using (var sr = new StreamReader(File.Open(filepath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)))
{
while (sr.Peek() >= 0)
{
string input = sr.ReadLine();
if (!string.IsNullOrEmpty(input))
{
list.Add(input);
}
}
}
list = list.Distinct().ToList();
engine.Start();
foreach (string stin in list)
{
engine.AddTask(new SimpleTaskRequest { Url = new Uri(stin)});
}
StartWork() alterations
base.TaskResult = new SimpleTaskResult();
var request = await new HttpRequest((new HttpRequestConfig
{
AwaitProcessing = AwaitProcessingEnum.Success,
Url = TaskRequest.Url,
UserAgentHeader = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0",
Quota = new HttpRequestQuota { MaxDownloadSize = 5000000, OperationTimeoutMilliseconds = 60000, ResponseTimeoutMilliseconds = 15000 }
}));
this.TaskResult.Links = new List<string>();